1. hand-written minimal ELF headers, with enough asm to do `_exit(main(argc, argv))`: https://github.com/DavidBuchanan314/kurl/blob/main/golfed/el... (currently only implemented for aarch64)
2. "Linux Syscall Support" library for conveniently making raw syscalls from C: https://chromium.googlesource.com/linux-syscall-support/
3. To avoid custom linker scripts (which I hate with a passion), I embed my hand-crafted ELF within a regular ELF, and slice it out at the end (using a python script). The "container" ELF is a regular full-fat ELF, potentially including working debug symbols, but the inner ELF has none of the cruft.
Using this technique, I wrote a barely-functional TLS1.3 client that fits in ~3.5KB (see the rest of repo from the first link)
The main goal of BGGP5 is to download the file at https://binary.golf/5/5 and display its contents, using less than 4KB of code (stored in whatever format you like).
Tiny disclaimer: As part of the BGGP staff team I knew about the theme in advance, and I absolutely could not resist getting started a few days early. This entry is more about being cool than being competitive, so I hope you can forgive me!"
"A valid submission will:
Be 4096 bytes or less
Download the text file at https://binary.golf/5/5
Display the file's contents in some way
Example Entry:
#!/bin/sh
cat 5 "
Are we excluding the size of sh, wget and cat
What is size of busybox with ssl_client as the only applet and wolfssl as the TLS library
Yes. It's not very interesting, but you can do that.
> What is size of busybox with ssl_client as the only applet and wolfssl as the TLS library
Larger than 4096 bytes.
lol why? i mean the syntax sucks but this seems like howling into the wind...
[1] https://github.com/torvalds/linux/tree/master/tools/include/...
[2] https://github.com/boricj/ghidra-delinker-extension/tree/mas...
Isn't a freestanding enviroment one without an OS? The author in the article explicitly codes against Linux syscalls and is creating an ELF file (so a hosted executable).
I've written an article about this idea:
https://www.matheusmoreira.com/articles/linux-system-calls
You can get incredibly far with just this. I wrote a freestanding lisp interpreter with nothing but Linux system calls. It turned into a little framework for freestanding Linux programs. It's been incredibly fun.
Freestanding C is a much better language. A lot of legacy nonsense is in the standard library. The Linux system call interface is really nice to work with. Calling write is not that hard. It's the printf style string building and formatting that I sometimes miss.
Linux kernel is known to be able to run binaries compiled in the 90s. Breaking user space makes Linus yell at people until the breakage gets reverted. A platform that stable is worth building on top of. Updating executables is a lot of work, sometimes it's straight up impossible.
Not on FreeBSD, NetBSD, OpenBSD or Solaris.
The article you linked says this but it's not true:
> Sometimes it's not even possible to use system calls at all. OpenBSD has implemented system call origin verification, a security mechanism that only allows system calls originating from the system's libc. So not only is the kernel ABI unstable, normal programs are not even allowed to interface with the kernel at all.
You can still make system calls from normal programs, you just need to list the addresses of system call instructions in an ELF section named openbsd.syscalls.
Can you cite any sources? I wasn't able to find any documentation that corroborates what you said when I wrote the article. The few texts I found actually suggested otherwise. Maybe things have changed since then?
> You can still make system calls from normal programs, you just need to list the addresses of system call instructions in an ELF section named openbsd.syscalls.
I see. So they have added a mechanism to list the sections allowed to perform system calls. That's news to me. Do they guarantee the system call numbers will remain stable though? That older system calls will remain available?
For one, the FreeBSD kernel specifically has a compatibility layer for Linux binaries to use their familiar syscalls [0]. For its ordinary syscalls, it also has a policy not to break binary compatibility without good reason [1]. Most other OSes just don't maintain quite the level of 'indefinite stability' that the Linux kernel does across different versions. And even Linux doesn't implement older versions of syscalls when the kernel is ported to new architectures, so eventually you have to rotate your implementation regardless, if you want people to run your code on new systems.
> The few texts I found actually suggested otherwise.
People often say "X is impossible" when the truth is "X is tricky and full of caveats, and I don't want to think about it, so stop asking". (Or if the devs themselves are saying it, it might be "I want to look like I'm 'tough on crime' toward users of undocumented behavior", as if that could stop Hyrum's law from running its course.) In this case, it's generally "If you do it on an OS other than Linux, you can run into big compatibility issues," not "It's impossible on OSes other than Linux."
As for compatibility issues, you're running into that the moment you do undocumented fun stuff like omitting ELF sections or overlapping headers, which future Linux versions could start rejecting on the basis of "no one needs to do that legitimately". So I wouldn't start drawing the line on syscall number compatibility.
[0] https://docs.freebsd.org/en/books/handbook/linuxemu/
[1] https://wiki.freebsd.org/AddingSyscalls#Backward_compatibily
I believe this strengthens my argument. Linux kernel-userspace interface is so stable other projects are implementing it. I remember Justine Tunney mentioning this before, the idea that the x86_64 Linux system call ABI is turning into some kind of lingua franca of systems programming.
> x86-64 Linux ABI Makes a Pretty Good Lingua Franca
Would be interesting if people started targeting Linux because of this, banking on the fact that other systems will just implement Linux. Even Windows has Linux built into it these days.
> For its ordinary syscalls, it also has a policy not to break binary compatibility without good reason.
Thank you for the source. I don't think that's a particularly strong guarantee. It's certainly stronger than OpenBSD's at least.
> Most other OSes just don't maintain quite the level of 'indefinite stability' that the Linux kernel does across different versions
Yeah. I think this is something that makes Linux unique.
> And even Linux doesn't implement older versions of syscalls when the kernel is ported to new architectures, so eventually you have to rotate your implementation regardless, if you want people to run your code on new systems.
That's true. Only new architectures are affected though. The old ones have all the old system calls, many with multiple versions, all supported. Porting to a new architecture doesn't invalidate the stability of existing ones.
> People often say "X is impossible" when the truth is "X is tricky and full of caveats, and I don't want to think about it, so stop asking".
> Or if the devs themselves are saying it, it might be "I want to look like I'm 'tough on crime' toward users of undocumented behavior"
I get what you're saying. I truly apologize if I came across that way. I did not mean to say that.
I got interested in this low level direct system call stuff because I literally got sick of reading "but you, mere mortal, are not meant to access these raw system interfaces, that's for us, you are meant to call the little library function we made for you" in the Linux and libc manuals. Last thing I want is to end up doing the same to others.
By "can't do this" I meant to say the developers maintaining the system don't want you bypassing their system libraries and won't take responsibility for it if you do so. If the program breaks because the kernel interfaces changed, they'll tell us it's our own fault and refuse fix to it.
Linux takes the opposite approach: breaking user space makes Linus Torvalds yell at the people until the breakage is reverted. I'm enthusiastic about it because it's the only system where this is supported.
> As for compatibility issues, you're running into that the moment you start doing undocumented fun stuff like omitting ELF sections or overlapping headers
I agree. Should be fine as long as the ELF specification is respected. It's okay though, ELF is flexible enough that even in 2024 it's possible to invent some new fun stuff.
https://www.matheusmoreira.com/articles/self-contained-lone-...
Embedding arbitrary files into an existing ELF and patching it so that Linux automatically maps it in before the program even runs. Since Linux gives processes a pointer to the program headers, the file is in memory and reachable without a issuing a single system call.
Personal experience.
> Do they guarantee the system call numbers will remain stable though?
No. Doesn't mean you can't make system calls from outside the libc though.
The problem is the system's developers don't want us bypassing those libraries. We can do it but things can and probably will break in the future when they change things. It's not supported.
I'm pretty sure that MVS syscalls (that is, the numbers you use with the SVC opcode) have remained backward-compatible at least as far back as MVS 3.8 in the 1970s and those binaries making those "raw" syscalls will still work on the latest z/OS releases.
There are a _lot_ more operating systems than Linux, Windows, and the BSDs... making a statement that the Linux kernel is the only kernel to do something a certain way is a risky proposition :-)
The Linux promise:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/lin...
I've always been all about the hidden fun stuff. The magical little programs that somehow configure audio cards. The ALSA mixer tool for example does it via special ioctls. I was reading its source code not too long ago. The manuals said those definitions were for the curious and that those ioctls were private, as though it was the library's author exclusive privilege to use those things. I seriously hate it when they say that. When they imply I'm some mere mortal who's better off using the libraries that were gifted to us by the gods of programming.
Good or bad, quite a bit of hubris is involved. Takes a certain audacity to think I can make a better wheel than people who are probably much smarter than I am. Sometimes I start projects just to prove to myself that I'm not clinically insane for thinking a better way is possible. Sometimes it works, sometimes it doesn't. Someone once called an idea I had schizophrenic. I'll never forget that day.
This Linux system call stuff started after I read an LWN article about glibc and Linux specific system call support, getrandom to be specific. Took glibc years to add support. I started a liblinux project because of that article. The idea was to get rid of libc and talk to Linux directly. In order to accomplish that, I was forced to learn a lot of compiler, linker and executable stuff. The musl libc source code taught me a lot.
It seems like the C library is doing a huge amount of stuff but it turns out you don't actually need most of it. Linux just puts your binary in memory and jumps into some address specified in the ELF header. Normally this when the C library or dynamic linker takes over in order to prepare to call main(). Turns out I can just replace all that with some simple code that calls a function and then exits the process when it returns. It just works. I won't have init/fini section processing but I can live with that, that's harmful stuff that shouldn't even have been invented to begin with.
https://www.muppetlabs.com/~breadbox/software/tiny/
They sparked my interest in ELF and freestanding programs.
It's a webserver written in x86 assembler, which makes raw syscalls. It has no functions, and unmaps the stack so it uses only one 4KB page of memory at runtime.
I wrote this page for my own compiler that I'm working on, but I think it would be a good complement to this article. Note that the page is not that great on mobile, the extra real estate on desktop really helps.
A printf-hello-world is about 1 KiB. A write-hello-world (syscalls only) is less than 200 bytes. Assembly programming skills not needed to use it.
It reminds me of a funny little bug in ARM Linux, fixed by adding volatile to an asm statement: https://lore.kernel.org/lkml/92a00580828a1bdf96e7e36545f6d22...
Adding an output for the %rax result would prevent the call from being omitted without volatile (assuming it is actually consumed by something), but it could still be reordered, right? I suppose with general syscalls that might be okay, but certainly not with sys_exit().
They also need memory clobbers, but I don't think memory clobbers would necessarily prevent reordering? In the case of the ARM bug though, it did: https://lore.kernel.org/lkml/Zqa4SAyPKPuaXdgg@mozart.vkv.me/
For me it would be sort of like writing programs in C versus higher level languages: much more tedious, will take longer and require better planning/upfront design, but doable.
With practice you learn some tricks that can seem clever to anyone not writing a lot of asm. It’s “just” a very low level language IMO.
That'd be a good introduction to assembly for someone who already knows C well.
2. Why is it that exiting at the end of main() requires a system call? Wouldn't a `ret` instruction go "back" to somplace where the OS itself will do cleanup work?
Usually that's done by the C runtime library, but there isn't one there since this is a freestanding environment. Had the program not exited through a syscall (or entered an infinite loop), it would most likely crash after veering off the main() function.
The only way for execution to cross the barrier between "user space" and "kernel space" is through a system call or an interrupt (we won't speak of call gates). Even if the OS had put an address on the stack, so that the "ret" would go there after returning from main(), the code there would still need to do a system call to go back to the OS.
While nowadays Linux has a shared page of code mapped on every process (the vDSO), that wasn't the case in the past; all code on the "user space" side had to come from either the executable itself, or a library it loaded. Given that, it's natural that it was left to the executable to call the "exit" system call at the end.
A return instruction from main hands things back to libc which does some cleanup and then makes this same syscall.
I did design my own runtime binary executable/dynamic library format which I do embed in an ELF capsule to be loaded by legacy systems. The thing I need to port though is the core user level drivers:vulkan/drm & alsa-lib. The main issue would be the alsa-lib since some part of its API still "requires" a C runtime (you have to call free() on some returned data).
The issue with this "format": it is so much simple, I wonder if it would not be better if each software "dynamic library/user level system interface" should design its own minimal and giga simple "dynamic library" format, taylored for its semantics.
Dunno yet.
On modern hardware architecture, you load position independent memory segment (code and data). You should need its alignment requirement and you are good to go.
Basically, a magic with the alignment, then a table of offsets or re-entrant code (possible on modern hardware architecture which supports try-lock hardware semantics) right after the "header". I chose to use the re-entrant code guarded with an hardware try-lock mechanism, because it is more generic and will be cleaner on the long run than a table of offsets.
Bending the product of code generators (assemblers) into some runtime format was a good idea until most hardware architectures support a hardware try-lock mechanism, then it became really nasty legacy.
If anything, PE piggybacks on top of COFF which is a complete mess of a file format. I'm currently writing a standalone library for reading and writing toolchain file formats [2] (to replace some messy bespoke code in my Ghidra extension) and this under-specified, fragmented into multiple dialects, weirdly contorted relic is a pain to deal with.
COFF was a stepping stone from a.out to ELF that should've lasted only a couple of years on Unix systems and somehow it managed to metastasize at a crucial point in time inside multiple software ecosystems, most notably Windows and indirectly .NET and UEFI through PE. Frankly, I'd ask instead why couldn't PE and COFF have lost.
[1] https://nathanotterness.com/2021/10/tiny_elf_modernized.html