I thought I found a bug

(os2museum.com)

118 points | by MBCook1 day ago

5 comments

  • anonymousiam1 day ago
    While reading this, I realized that I have a copy of the elusive usr/group Standard that the author mentions. I just pulled it off an image of my early DOS hard drive (before I migrated to Linux in 1991). I should probably post it somewhere.

    # ls -altr

    total 540

    -rw-r--r-- 1 root root 22606 Apr 12 1990 NOTES

    -rw-r--r-- 1 root root 172645 Apr 12 1990 LIB

    -rw-r--r-- 1 root root 102349 Apr 12 1990 APP

    -rw-r--r-- 1 root root 224037 Apr 12 1990 C

    • codetrotter23 hours ago
      > I should probably post it somewhere.

      Upload it to the Internet Archive! :D

      https://help.archive.org/help/uploading-a-basic-guide/

      • anonymousiam5 hours ago
        Thanks. It looks like the document itself is still copyrighted. The files can be uploaded, so I will.
    • xelxebar21 hours ago
      > before I migrated to Linux in 1991

      What a mic drop. That must have been a fun ride from then until now! Would love to hear some of your battle stories.

      • jsjohnst8 hours ago
        My first Linux machine was in 1993, what would you like to know? Pre-1.0 Kernels were an adventure that’s for sure.
        • icedchai6 hours ago
          My first Linux box ran 0.99.10. I was running SLS, which installed off of a dozen or so floppies. I eventually moved to Slackware a year or so later.
          • anonymousiam5 hours ago
            SLS was also my first distro. I also played with Yggdrasil Linux (bootable CD) for a while, because at the time, nobody could afford a hard drive with as much capacity as a CD-ROM.

            Those early Linux distros borrowed a lot from SunOS (Solaris 1), so it was easy to adapt between work/home.

          • xenophonf5 hours ago
            I remember downloading and installing one of the MCC Interim releases in 1993? 1994? before switching to Slackware. Early *BSD and Linux were certainly an adventure back then. I don't miss it.
  • cryptonector19 hours ago
    In the stdio implementations that don't support free intermixing of reads and writes the issue typically is that there is only one buffer for both reading and writing. You have to reset the buffer in order to switch from reading to writing or vice-versa, else you will have a dirty, non-empty buffer that does not correspond. The functions `fflush()`, `fseek()`, `rewind()`, and `fsetpos()` happen to clear the buffer, which is why you have to use them before switching from reading to writing or vice-versa!

    Without an indicator in `struct FILE` of whether the last operation was a read or a write, the stdio implementation has no way to detect the problem and correct the situation by automatically flushing and resetting the buffer, say. An alternative would be to have two buffers, naturally. But you can see how a pre-update version could be trivially made to support update modes without adding a second buffer or automatic buffer flushing. And that's almost certainly what happened when update mode was added. My guess is someone got bitten by that and then the maintainer decided to just document the problem rather than fix it, probably because by then fixing the problem was hard.

    • fweimer13 hours ago
      Historically, before mandatory locking, getc and putc have been implemented as macros, and an extra check for stream state likely mattered from a performance perspective.

      To avoid the extra check, you don't actually need two buffers, just separate buffer pointers for reading and writing. (This is probably how most libcs implement this today.) I suppose memory was really scarce back then.

      • cryptonector6 hours ago
        Separate non-overlapping pointers into one buffer is not that different from two buffers, notionally, but yeah.
        • fweimer6 hours ago
          The idea is that for the non-active mode, the current/end pointers are equal, signifying that the buffer is exhausted. This forces entering the slow path, where the mode can be switched.

          I don't think an implementation with two active, non-empty buffers is all that useful because you can't tell which buffer's progress should be used for the file pointer adjustment in ftell.

          • cryptonector6 hours ago
            I get that. One buffer that can be maximized by the path that most needs it (read or write). I'm just saying that notionally it's two independent buffers, which solves the problem of not having to force a buffer flush between mode change.

            > I don't think an implementation with two active, non-empty buffers is all that useful because you can't tell which buffer's progress should be used for the file pointer adjustment in ftell.

            Oh interesting. The other problem is that two buffers reduces memory utilization.

  • raldi20 hours ago
    I'm having trouble following whether the problem occurs with any append or only when it's two consecutive commands like this.
  • userbinator1 day ago
    I was a bit disappointed that the article didn't go into the system calls themselves, since AFAIK those have always supported interleaved reads and writes with no problems even on early Unices. E.g. POSIX has this:

    https://pubs.opengroup.org/onlinepubs/9699919799/functions/w...

    After a write() to a regular file has successfully returned:

    Any successful read() from each byte position in the file that was modified by that write shall return the data specified by the write() for that position until such byte positions are again modified.

    • PeterWhittaker22 hours ago
      Perhaps because the article is specifically about the buffered f*() calls in stdio, and not the system calls?

      Though, as I offer that thought, the divergence between C and the system calls is definitely curious.

    • mystified501621 hours ago
      I get a real kick out of the different ways people pluralize Unixen. Unices is a good one
  • purplesyringa23 hours ago
    I must be missing something.

    The article lists three libcs (Open Watcom, Microsoft Visual C++ 6.0, IBM C/C++ 3.6 for Windows) from the good old times. Does the emulator link to Open Watcom, i.e., does it emulate DOS on machines about as old as DOS itself? What's the point here?

    • justin_18 hours ago
      I believe it is a bug in the the emulator's implementation of COMMAND.COM. Often, these DOS "emulators" re-implement the standard commands of DOS, including the shell[1]. This is in addition to emulating weird 16-bit environment stuff and the BIOS.

      The bug can pop up in any C program using stdio that assumes it's fine to do `fread` followed immediately by `fwrite`. The spec forbids this. To make matters more confusing, this behavior does _not_ seem to be in modern libc implementations. Or at least, it works on my machine. I bet modern implementations are able to be more sane about managing different buffers for reading and writing.

      The original COMMAND.COM from MS-DOS probably did not have this problem, since at least in some versions it was written in assembly[2]. Even for a shell written in C, the fix is pretty easy: seek the file before switching between reading/writing.

      The title of this post is confusing, since it clearly _is_ a bug somewhere. But I think the author was excited about possibly finding a bug in libc:

      > Sitting down with a debugger, I could just see how the C run-time library (Open Watcom) could be fixed to avoid this problem.

      [1] Here's DOSBox, for example: https://github.com/dosbox-staging/dosbox-staging/blob/main/s...

      [2] MS-DOS 4.0: https://github.com/microsoft/MS-DOS/tree/main/v4.0/src/CMD/C...

      • rep_lodsb14 hours ago
        The article is very vague about which emulator and COMMAND.COM it is about, and if they're integrated with each other. Can't be DOSBox, since it handles it correctly:

            C:\> echo AB> foo.txt
            C:\> echo CD>> foo.txt
            C:\> type foo.txt
            AB
            CD
        
        (Note that echo adds a newline, same as on real DOS, or even UNIX without "-n". This other shell doesn't for some reason.)

        The "real" COMMAND.COM, and all other essential parts of MS-/PC-/DR-DOS, have always been written in asm, where none of this libc nonsense matters.

        Also it annoys me greatly when people talk about "the C Library" as if it exists in some Platonic realm, and is essential to all software ever written.

    • AntiRush23 hours ago
      The article is about compiling and running a program inside the emulator. When the unexpected behavior occurred, the author assumed it was a bug in the emulator.
      • purplesyringa20 hours ago
        So if it's not a bug in the emulator, then it's a bug in COMMAND.COM? I don't think that's the case, surely it couldn't have been missed by Microsoft at the time. The article goes on to talk about fread/fwrite calls, but COMMAND.COM was written in assembly, I'm pretty sure it didn't link to any libc, and certainly not to Open Watcom -- why would MS use it instead of their own library?
        • grodriguez10019 hours ago
          It is not a bug. The article explains that this is the expected behaviour.
          • purplesyringa18 hours ago
            What is expected behavior? Surely `echo AB> foo.txt; echo CD>> foo.txt` producing `ABBC` is either a bug in COMMAND.COM, the emulator, or something else? That can't be correct.
            • 16 hours ago
              undefined
    • stevage14 hours ago
      There's a lot of weird missing details.