Eighteen Years of ABI Stability

(daniel.haxx.se)

117 points | by TangerineDream6 hours ago

9 comments

  • nuancebydefault4 hours ago
    I'm a bit confused with the usage of ABI here. I thought compatibility between apps and libs is on the API level, while the ABI sits between machine (cpu intructions) and low level (curl?) lib?
    • rwmj2 hours ago
      API would be the source level. ABI would be that you can mix the binary program and the binary libcurl library. (Curl is attempting to preserve both)
    • throw_a_grenade3 hours ago
      Nope, API compatibility is when you write the code, and ABI compatibility is what happens to already compiled code, but still between app and library. Say you compiled your app against libcurl.so.4 file taken from curl 7.88 packaged by Debian. What happens when, far in the future, you will move your executable file, without recompiling, to a system which has curl 12345.67, but the file will still be libcurl.so.4? It should work fine, that is, the library should export the same symbols (maybe some more added), with more or less the same functionality. Possibly implemented differently, maybe accepting more flags, but the stuff that is there, will be there.

      To parent's downvoters: would you kindly cut him some slack? It's OK to ask if you don't know. https://xkcd.com/1053/

      • DanielHB2 hours ago
        Just for a small example how both are not the same, in your C library if you move the positions of a field in a struct it can actually break ABI compatibility (meaning you need to recompile your software), but not API compatibility (meaning you don't need to update your code).
        • thrtythreeforty2 hours ago
          And the opposite is true too: you can break API (by renaming a field) while preserving the ABI (since it didn't move/change type).
  • tialaramex4 hours ago
    Because this library is so very widely used, in practice it is subject to Hyrum's law ABI breaks far beyond what is characterised in the SO_NAME. At the extreme Hyrum's law gets you "spacebar heating", this is not a demand that somehow nothing must change, but an acknowledgement of our reality.

    Basically even though Daniel might say "I didn't change the ABI" if your code worked before and now it didn't, as far as you're concerned that's an ABI break. This particularly shows up for changed defaults and for removing stuff that's "unused" except that you relied on it and so now your code doesn't work. Daniel brings up NPN because that seems easy for the public Internet but there have been other examples where a default changed and well... too bad, you were relying on something and now it's changed, but you should have just known to set what you wanted and then you'd have been fine.

  • Semaphor4 hours ago
    > “third party” transfers over FTP,

    Ohh that takes me back, that feature was used heavily in the FXP warez scene (the one the proper warez people looked down on), you’d find vulnerable FTP servers to gain access to, and the best ones would support this. That way you could quickly spread releases over multiple mirrors without being slowed down by your home internet.

    • throw_a_grenade3 hours ago
      Those were fun times, but torrent fits this use case now. You seed the chunk a single time and it magically gets distributed, like you said, without being slowed down by your home internet...

      That's progress I believe.

  • exabrial2 hours ago
    There’s JavaScript code we have that won’t compile after 3 months. Pretty sad state of affairs these days.
    • robviren1 hour ago
      Compile is rather generous. The massive teetering dependency tree certainly is prone to breaking, but I am not certain I can draw much parallel between the API stability of JS vs the ABI stability of libcurl. If anything, I can still properly render some pretty dang old websites just fine and, for the most part, the websites will work just fine. Mainly benefitted by the light use of JavaScript using probably no external dependent libraries. Though your milage will vary depending on if you go back to browser specific APIs when all that browser war nonsense was going on. I love to bash on JS as much as the next guy, but a large portion of the blame is actually the modern use of it, not an inherent quality of the language breaking compatibility over time.
  • blenderob2 hours ago
    What are some things you could do in a C project to cause ABI breakage?

    I ask this because I'd like to know what practices I might want to avoid to guarantee that there is no ABI breakage in my C project.

    • DanielHB2 hours ago
      If you move fields around in a struct that is passed between a library and library-consumer it will cause ABI breakage

        typedef struct {
          char name[50];
          int age;
        } Person;
      
      vs

        typedef struct {
          int age;
          char name[50];
        } Person;
      
      
      Basically anything that moves bytes around in memory for data structures that are passed around. Of course any API breakage is also an ABI breakage.
    • CJefferson2 hours ago
      One big thing to watch out for is structs. You can’t add extra members later.

      If you have a struct which might grow, don’t actually make it part of the ABI, don’t give users any way to find it’s size, and write functions to create, destroy and query it.

      • masfuerte1 hour ago
        The solution in old Win32 APIs was to have a length field as the first member of the struct. The client sets this to sizeof(the_struct). As long as structs are only ever extended the library knows exactly which version it is dealing with from the length.

        This got a bit messy because Windows also included compatibility hacks for clients that didn't set the length correctly.

      • blenderob2 hours ago
        > If you have a struct which might grow, don’t actually make it part of the ABI, don’t give users any way to find it’s size, and write functions to create, destroy and query it.

        Thanks! This is very insightful. What is a solution to this? If I cannot expose structs that might grow what do I expose then?

        Or is the solution something like I can expose the structs that I need to expose but if I need to ever extend them in future, then I create a new struct for it?

        • GolDDranks2 hours ago
          You shall expose the structs as anonymous types behind a pointer, and expose functions that act on that pointer.
        • acuozzo1 hour ago
          > What is a solution to this? If I cannot expose structs that might grow what do I expose then?

          Option 1: If allocating from the heap or somewhere otherwise fixed in place, then return a pointer-to-void (void *) and cast back to pointer-to-your-struct when the user gives it back to you.

          Option 2: If allocating from a pool, just return the index.

        • doctorpangloss1 hour ago
          You reserve fields in them for future use.
    • rkahbv2 hours ago
      If you alter signatures or data structures in the published header file, there is breakage.

      If you add new signatures or data structures, software compiled against the previous version should still work with the new version.

      In my opinion the whole issue is more important on Windows than on Linux. Just recompile the application against the new library or keep both the old and the new soversion around.

      Some Linux distributions go into major contortions to make ABI stability work, and still compiled applications that are supposed to work with newer distros crash. It is a waste of resources.

    • throw_a_grenade21 minutes ago
      Next biggest humanity-wide problem in this area will be 32/64 bit time I believe. We store timestamps as seconds since 1.01.1970 00:00 UTC, and it turns out someone thought 32 bits will be enough for everybody, so the counter will wrap some time in 2038 year, which is less than 14 years in the future. How do you fix the fact that, on 32-bit systems (in x86 it's limited to i386/i686, which is less and less common, but the world is not only x86) time-related functions return 32-bit wide variables? You either return wrong values or you break the ABI.

      Debian chose to do both: https://wiki.debian.org/ReleaseGoals/64bit-time . Wherever they could, they recompiled much of the stuff changing package names from libsomething to libsomethingt64, so where they couldn't recompile, the app still "works" (does not segfault), but links with 32-bit library that just gets wrong values. Other distros had flag day, essentially recompiled everything and didn't bother with non-packaged stuff that was compiled against old 32-bit libs, thus breaking ABI.

  • peterkelly6 hours ago
    I wish developers of JavaScript frameworks had this level of commitment to stability.
    • kragen5 hours ago
      I agree; it's admirable.

      I wish the Python core developers had even the level of commitment to stability that developers of JavaScript frameworks do. Instead they intentionally break API compatibility every single release, I suppose because they assume that only worthless ideas are ever expressed in the form of Python programs.

      • JimDabell4 hours ago
        I’ve been writing Python for 25 years and can’t remember projects I work on ever having compatibility broken from one version to the next (with the exception of 2 to 3, of course). Occasionally it happens to a dependency, but it always seems to be something like Cython and not actual Python code.

        But then, I normally try to stay on the leading edge. I think it’s more difficult if you leave it 2+ years between updates and ignore deprecation warnings. But with a year between minor releases, that leaves almost a two year window for moving off deprecated things.

        I think that’s reasonable. I don’t experience the pain you describe, and I don’t get the impression that the Python project treats Python programs as “worthless”. The people working on Python are Python users too, why would they make their own lives difficult?

        • kragen3 hours ago
          When you bring up "ignoring deprecation warnings", you're implicitly confirming what I said, despite explicitly denying it. Perhaps this is due to a misunderstanding, so I will clarify what I was saying.

          Nobody has to worry about ignoring deprecation warnings in libcurl, or for that matter in C, in English, in Unicode, or in linear algebra. There's no point at which your linear algebra theorems stop working because the AMS has deprecated some postulates. Euclid's theorems still work just as well today as they did 2000 years ago. Better, in fact, because we now know of new things they apply to that Euclid couldn't have imagined. You can still read Mark Twain, Shakespeare, or even Cicero without having to "maintain" them first, though admittedly you have to be careful about interpreting them with the right version of language.

          That's what it means for intellectual work to have lasting value: each generation can build on the work of previous generations rather than having to redo it.

          Last night I watched a Primitive Technology video in which he explains why he wants to roof his new construction with fired clay tiles rather than palm-leaf thatch: in the rainy season, the leaves rot, and then the rain destroys his walls, so the construction only lasts a couple of years without maintenance.

          Today I opened up a program I had written in Python not 2000 years ago, not 200 years ago, not even 20 years ago, but only 11 years ago, and not touched since then. I had to fix a bunch of errors the Python maintainers intentionally introduced into my program in the 2-to-3 transition. Moreover, the "fixed" version is less correct than the version I used 11 years ago, because previously it correctly handled filename command-line arguments even if they weren't UTF-8. Now it won't, and there's evidently no way to fix it.

          I wish I had written it in Golang or JS. Although it wasn't the case when I started writing Python last millennium, a Python program today is a palm-leaf-thatched rainforest mud hut—intentionally so. Instead, like Euclid, I want to build my programs of something more lasting than mere masonry.

          I'm not claiming that you should do the same thing. A palm-leaf-thatched roof is easier to build and useful for many purposes. But it is no substitute for something more lasting.

          Today's Python is fine for keeping a service running as long as you have a staff of Python programmers. As a medium of expression of ideas, however, it's like writing in the sand at low tide.

          • LegionMammal9781 hour ago
            > Moreover, the "fixed" version is less correct than the version I used 11 years ago, because previously it correctly handled filename command-line arguments even if they weren't UTF-8. Now it won't, and there's evidently no way to fix it.

            Isn't fixing this the whole point of Python's "surrogateescape" handling? Certainly, if I put the filename straight from sys.argv into open(), Python will pass it through just fine:

              $ printf 'Hello, world!' > $'\xFF.txt'
              $ python3 -c 'import sys; print(open(sys.argv[1]).read())' $'\xFF.txt'
              Hello, world!
            
            Though I suppose it could still be problematic for logging filenames or otherwise displaying them.
          • JimDabell2 hours ago
            > Nobody has to worry about ignoring deprecation warnings in libcurl, or for that matter in C, in English, in Unicode, or in linear algebra. There's no point at which your linear algebra theorems stop working because the AMS has deprecated some postulates. Euclid's theorems still work just as well today as they did 2000 years ago. Better, in fact, because we now know of new things they apply to that Euclid couldn't have imagined. You can still read Mark Twain, Shakespeare, or even Cicero without having to "maintain" them first, though admittedly you have to be careful about interpreting them with the right version of language.

            I mean, that last part really unravels your point. Linguistic meanings definitely drift significantly over time in ways that are vitally important, and there are no deprecation warnings about them.

            Take the second amendment to the USA constitution, for example. It seems very obviously scoped to “well-regulated militias”, but there are no end to the number of gun ownership proponents who will insist that this isn’t what was meant when it was written, and that the commas don’t introduce a dependent clause like they do today.

            Take the Ten Commandments in the Bible. It seems very obvious that they prohibit killing people, but there are no end to the number of death penalty proponents who are Christian who will insist that what it really prohibits is murder, of which state killings are out of scope, and that “thou shalt not kill” isn’t really what was meant when it was written.

            These are very clearly meaningful semantic changes. Compatibility was definitely broken.

            If “you have to be careful about interpreting them with the right version of the language”, then how is that any different to saying “well just use the right version of the Python interpreter”?

            > Today I opened up a program I had written in Python not 2000 years ago, not 200 years ago, not even 20 years ago, but only 11 years ago, and not touched since then. I had to fix a bunch of errors the Python maintainers intentionally introduced into my program in the 2-to-3 transition.

            In your own words: You have to be careful about interpreting it with the right version of the language. Just use a Python 2 interpreter if that is your attitude.

            I don’t believe software is something that you can write once and assume it will work in perpetuity with zero maintenance. Go doesn’t work that way, JavaScript doesn’t work that way, and Curl – the subject of this article – doesn’t work that way. They might’ve released v7.16.0 eighteen years ago, but they still needed to release new versions over and over and over again since then.

            There is no software in the world that does not require maintenance – even TeX received an update a few years ago. Wanting to avoid maintenance altogether is not achievable, and in fact is harmful. This is like sysadmins who are proud of long uptimes. It just proves they haven’t installed any security patches. Regularly maintaining software is a requirement for it to be healthy. Write-once-maintain-never is unhealthy and should not be a goal.

  • jedisct153 minutes ago
    Take note, Rust.
  • ForHackernews3 hours ago
    > Before this release, you could use curl to do “third party” transfers over FTP, and in this release you could no longer do that. That is a feature when the client (curl) connects to server A and instructs that server to communicate with server B and do file transfers among themselves, without sending data to and from the client.

    That sounds like a super useful feature that would be great if more FTP servers supported it. I guess FTP itself is a dying protocol these days, but it's extremely simple and does what it says on the tin.

    • raxxorraxor2 hours ago
      Sadly FTP didn't seem to have a smooth TLS transition. There is FTPS (among SFTP), but I have rarely seen it in use.

      I think it will survive as a protocol as a fallback mechanism. Ironically I used FTP on a smartphone here and there because Smartphone OS are abysmally useless. Don't get me started with your awesome proprietary sync app, I don't do trashy and they all are.

      Otherwise I do everything today through scp and http, but it is less optimal technically. It just happens to be widely available. FTP theoretically would provide a cleaner way for transfers and permission management.

    • crabbone3 hours ago
      FTP is still the best way to transfer files between a phone and a PC.

      Well, Android anyways. I don't know how things work in the Apple world. It's bizarre that whatever the "official" method of file transfer is is so bad. Also, managing files on Android is, on its own very bad. FTP allows connecting a decent file manager to the phone and do the management externally.

  • formerly_proven6 hours ago
    libcurl is part of the macOS API and de-facto standard on any Linux box and commonly available on BSD boxen as well. Microsoft has been shipping curl.exe for a while as well, though not the library.

    If Microsoft would also ship the library in %system32%, we would have a truly cross-platform and stable, OS-provided and -patched high-level network protocol client.

    (So that probably won't happen)

    • actionfromafar4 hours ago
      Hm, since a DLL is a PE file and EXE is a PE file, could one load the EXE as a DLL instead?

      Edit, I had a recollection I saw something like that before, this might be that: https://www.codeproject.com/articles/1045674/load-exe-as-dll...

      • Pathogen-David3 hours ago
        I just checked and the curl.exe on my system does not export any symbols, so not in this case.

        It is possible to do that in the general sense though.

        • 2 hours ago
          undefined
    • hypeatei4 hours ago
      > Microsoft has been shipping curl.exe for a while as well, though not the library.

      I'm not sure if this is accurate. Why do they include a default alias in Powershell for `curl` that points to the `Invoke-WebRequest` cmdlet then?

      I've always installed curl myself and removed the alias on Windows. Maybe I've never noticed the default one because of that.

      • easton4 hours ago
        As of one of the later releases of Windows 10, "curl.exe" is in System32 (or somewhere else on PATH), but if you type "curl" in a powershell you get the alias. You need to type "curl.exe" to get the binary.

        Guessing this is for backwards compatibility with scripts written for the days when it was just PowerShell lying to you.

        https://curl.se/windows/microsoft.html

    • pjmlp4 hours ago
      There are more OSes out there.

      I never needed curl on Windows, because on OSes that provide a full stack developer experience such things are part of the platform SDK, and rich language runtimes.

      It is only an issue with C and C++, and their reliance on POSIX to complement the slim standard library, effectively making UNIX their "runtime" for all practical purposes.

      • fallingsquirrel4 hours ago
        You missed the point. There are more OSes out there. If you had to relearn a brand new "full stack experience" for every OS just to ship a cross-platform app, you wouldn't have any time left for business logic. libcurl is great and it runs everywhere.

        And now for a personal opinion: I'll take libcurl over .NET's IHttpClientFactory any day.

        • pjmlp3 hours ago
          I have been programming since the glory days of 8 bit home computers, try shipping cross-platform applications in Z80 and 6502, and we still had plenty of time left to do lots of stuff.

          Additionally, writing little wrappers around OS APIs is something that every C programmer has known since K&R C went outside UNIX V6, which again is why POSIX became a thing.

        • neonsunset3 hours ago
          What? You don't need to use it explicitly most of the time.

          Just `new HttpClient` and cache/dispose it. Or have DI inject it for you. It will do the right thing without further input.

          The reason for this factory existing is pooling the underlying HttpMessageHandlers which hold an internal connection pool for efficient connection/stream(in case of HTTP2 or Quic) reuse for large applications which use DI.