• 'diff' behaviour when comparing FIFOs

    From Martijn Dekker@21:1/5 to All on Thu Dec 5 12:29:21 2019
    XPost: comp.unix.shell

    The following should be POSIXly correct and portable, right?

    $ mkfifo f1 f2
    $ (ls >f1 & ls -a >f2 & diff f1 f2)

    Expected output: difference between 'ls' and 'ls -a' output.
    (The (subshell) is just for suppressing job control noise.)

    This works on every unixy system, it seems, *except* Solaris, which
    quietly detects no difference. It fails at least on Solaris 10.1 through
    11.4, as well as OpenIndiana.

    So I'd like to establish whether this is a legit difference or Solaris
    'diff' is broken for not reading from FIFOs. In the POSIX spec, I can't
    find anything that says it should be refusing to do this. https://pubs.opengroup.org/onlinepubs/9699919799/utilities/diff.html

    There is some language about not comparing FIFOs and other special
    files, but as far as I can tell, that only applies when encountering
    those while recursively comparing directories, and not when directly
    specifying two special files as arguments.

    Am I reading that correctly? Is Solaris 'diff' broken?

    There are a few things that strengthen my suspicion that it may be...

    * bash/ksh/zsh process substitution works fine with Solaris 'diff', like
    'diff <(ls) <(ls -a)'. It uses character special files in /dev/fd/
    instead of FIFOs. Why would Solaris 'diff' read those, but not FIFOs?

    So, on bash/ksh/zsh, this slightly bizarre workaround forces Solaris
    'diff' to compare two FIFO output streams: diff <(cat f1) <(cat f2)

    * Solaris also comes with GNU diff as gdiff. That one has no problem
    with FIFOs; (ls >f1 & ls -a >f2 & gdiff f1 f2) works correctly.

    * Solaris 'comm' and 'cmp' read FIFOs just fine and work identically to
    the GNU versions, 'gcomm' and 'gcmp'.

    So, if there is a legit problem, it seems to be located specifically in
    the Solaris version of 'diff' and not in the kernel or anything.

    - M.

    --
    / modernish -- harness the shell \
    https://github.com/modernish/modernish

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Joerg.Schilling@fokus.fraunhofer.de@21:1/5 to martijn@inlv.demon.nl on Thu Dec 5 12:36:18 2019
    XPost: comp.unix.shell

    In article <h4s80iF60a0U1@mid.individual.net>,
    Martijn Dekker <martijn@inlv.demon.nl> wrote:
    The following should be POSIXly correct and portable, right?

    $ mkfifo f1 f2
    $ (ls >f1 & ls -a >f2 & diff f1 f2)

    Expected output: difference between 'ls' and 'ls -a' output.
    (The (subshell) is just for suppressing job control noise.)

    This works on every unixy system, it seems, *except* Solaris, which
    quietly detects no difference. It fails at least on Solaris 10.1 through >11.4, as well as OpenIndiana.

    You are mistaken, it works on every UNIX the way it does on Solaris...
    _except_ when the platform used "gdiff".

    This includes e.g. older versions of FreeBSD but not any newer *BSD since *BSD tris to avoid non-free licenses since a while.

    I would say, there is still a bug in the UNIX diff, but this differs from what you believe:

    diff should probably print that there was a seek error....

    because you cannot seek a pipe.

    The reason why your proposal diff <(cat f1) <(cat f2) works is because the output of cat is directed to a /tmp/ file that is openend and passwed as /dev/fd/# to diff.

    --
    EMail:joerg@schily.net (home) Jörg Schilling D-13353 Berlin
    joerg.schilling@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/
    URL: http://cdrecord.org/private/ http://sourceforge.net/projects/schilytools/files/

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Martijn Dekker@21:1/5 to All on Thu Dec 5 16:45:25 2019
    XPost: comp.unix.shell

    Op 05-12-19 om 13:36 schreef Joerg.Schilling@fokus.fraunhofer.de:
    In article <h4s80iF60a0U1@mid.individual.net>,
    Martijn Dekker <martijn@inlv.demon.nl> wrote:
    The following should be POSIXly correct and portable, right?

    $ mkfifo f1 f2
    $ (ls >f1 & ls -a >f2 & diff f1 f2)

    Expected output: difference between 'ls' and 'ls -a' output.
    (The (subshell) is just for suppressing job control noise.)

    This works on every unixy system, it seems, *except* Solaris, which
    quietly detects no difference. It fails at least on Solaris 10.1 through
    11.4, as well as OpenIndiana.

    You are mistaken, it works on every UNIX the way it does on Solaris... _except_ when the platform used "gdiff".

    This includes e.g. older versions of FreeBSD but not any newer *BSD since *BSD
    tris to avoid non-free licenses since a while.

    You should really check before making authoritative statements,
    especially if they're so easy to prove wrong.

    It *does* work on the current FreeBSD release, with FreeBSD diff. Try
    it. I tried it, before posting.

    It also works on the current NetBSD and OpenBSD releases. None of them
    use GNU diff.

    I didn't bother with testing older Free/Net/OpenBSD releases but I would
    be surprised if it ever did *not* work.

    I even confirmed that it works on Interix, and it is *old* and doesn't
    use GNU diff either.

    I would say, there is still a bug in the UNIX diff, but this differs from what
    you believe:

    diff should probably print that there was a seek error....

    because you cannot seek a pipe.

    Of course, you cannot. However, it is easy to show that diff works fine
    with a pipe on Solaris, as long as that pipe is not a FIFO:

    On Solaris 11.4:

    $ type diff
    diff is /usr/bin/diff
    $ ls > tmpfile
    $ ls -a | diff tmpfile /dev/stdin
    0a1,18
    .
    ..
    (etc)
    $ mkfifo F1 F2
    $ ls >F1 & ls -a >F2 &
    $ diff /dev/fd/4 /dev/fd/5 4<F1 5<F2
    .
    ..
    (etc)

    When /dev/stdin or /dev/fd/* are connected to pipes, you can't seek them
    any more than you can directly seek a FIFO. Yet, the above works.

    And there is no reason why it shouldn't work. Diff is a linear
    operation, so seeking is not necessary. At most it's an optimisation,
    and optimisations can be disabled.

    Clearly, then, Solaris diff does not, and does not need to, require the
    ability to seek. It's got some very specific problem with being given a
    FIFO directly as an argument, that's all.

    I don't know whether that strange behaviour is POSIX compliant or not (I
    now strongly suspect it's not, though) but it certainly looks like a
    bug. There is zero reason why that shouldn't work.

    The reason why your proposal diff <(cat f1) <(cat f2) works is because the output of cat is directed to a /tmp/ file that is openend and passwed as /dev/fd/# to diff.
    This statement cannot possibly be right. Not only would it defeat the
    purpose of process substitution to use temporary files, but it would be impossible.

    By definition, process substitution is processed asynchronously. The
    commands invoked are background jobs. Temporary files do not provide the required blocking mechanism. You need a pipe (FIFO or otherwise).

    - M.

    --
    / modernish -- harness the shell \
    https://github.com/modernish/modernish

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Joerg.Schilling@fokus.fraunhofer.de@21:1/5 to martijn@inlv.demon.nl on Fri Dec 6 10:50:39 2019
    XPost: comp.unix.shell

    In article <h4tr8uFgb6uU1@mid.individual.net>,
    Martijn Dekker <martijn@inlv.demon.nl> wrote:

    That is rich, coming from someone who just claimed with a straight face
    and a wagging finger that only the evil GNU diff is capable of dealing
    with non-seekable input.

    It is good practice to first read all postings before answering. If you did do that, you did see that I correctet my mistake several hours before your posting.

    So why didn't you just also admit your mistake?

    --
    EMail:joerg@schily.net (home) Jörg Schilling D-13353 Berlin
    joerg.schilling@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/
    URL: http://cdrecord.org/private/ http://sourceforge.net/projects/schilytools/files/

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Joerg.Schilling@fokus.fraunhofer.de@21:1/5 to martijn@inlv.demon.nl on Fri Dec 6 11:07:24 2019
    XPost: comp.unix.shell

    In article <h4sn0lF93bhU1@mid.individual.net>,
    Martijn Dekker <martijn@inlv.demon.nl> wrote:

    When /dev/stdin or /dev/fd/* are connected to pipes, you can't seek them
    any more than you can directly seek a FIFO. Yet, the above works.

    And there is no reason why it shouldn't work. Diff is a linear
    operation, so seeking is not necessary. At most it's an optimisation,
    and optimisations can be disabled.

    This is a clear missinterpretation, please rethink your statement.

    The imortant feature of a modern diff implementation is the best match algorithm to find the end of an insertion or deletion.

    The UNIX diff sources written by Douglas McIlroy in 1974 implement this by scanning the files and computing hashes for every line, then seeking/reading
    to find the correct location for a resync.

    The GNU diff implementaion more or less does the same, but malloc()s both
    files completely into memory at the beginning. For this reason, a "seek" is just a pointer repositioning in this implementation. The task is still not
    a linear task.


    Clearly, then, Solaris diff does not, and does not need to, require the >ability to seek. It's got some very specific problem with being given a
    FIFO directly as an argument, that's all.

    I don't know whether that strange behaviour is POSIX compliant or not (I
    now strongly suspect it's not, though) but it certainly looks like a
    bug. There is zero reason why that shouldn't work.

    Given that it is problematic, to malloc() huge files into memory, your interpretation is not useful.

    Solaris diff is POSIX compliant, you just detected a corner case problem that no longer exists since it has been fixed yesterday...

    --
    EMail:joerg@schily.net (home) Jörg Schilling D-13353 Berlin
    joerg.schilling@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/
    URL: http://cdrecord.org/private/ http://sourceforge.net/projects/schilytools/files/

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Martijn Dekker@21:1/5 to All on Sat Dec 7 06:34:03 2019
    XPost: comp.unix.shell

    Op 06-12-19 om 11:50 schreef Joerg.Schilling@fokus.fraunhofer.de:
    It is good practice to first read all postings before answering.

    Yet again, you should really try to take your own advice before dishing
    it out to others, because...

    So why didn't you just also admit your mistake?

    ...I did exactly that, on 6 Dec at 02:31:05 (CET), well before you
    posted on 6 Dec at 11:50:39 (CET).

    - M.

    --
    / modernish -- harness the shell \
    https://github.com/modernish/modernish

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)