The unspoken responsibility



We have an embrassing problem.

There is something very important in the four essential software freedoms, which are defined as: Most people seem to skip what that means and what it implies. So I think it is best if I just remind everyone: But there was a quiet part unspoken.

Having source code doesn't mean anything if you can't build it.

Even worse than that, even if you can build it, you can't know if the functionality of the code you wrote corresponds to the functionality in the binary that you produced. Ken Thompson himself proved this in his ACM Turing award lecture: Reflections on trusting trust

Let us just be honest with ourselves and each other for a minute and realize, the weight of bootstrapping from auditable root is an impossible task, one that has been worked on for decades by government agencies with billions of dollars spent without success.

We lost that connection, abandoned our responsibilities far too long. We need to (re)build those bootstrap chains back to the Free Software Foundation we depend upon. We need to have bootstrap chains for every language that exist and that is a huge job.

We literally had to make code changes to enable us to build tools such as GNU Bison from source code because otherwise it was impossible to build without depending on "Pregenerated files", a term which seems harmless but it is as bad as distribution of binary blobs. It took a brilliant developer months to solve Guile's psynax.pp bootstrap problem only for the next Guile release to break it.

What should have taken us a weekend hackathon, took 6 years with more than a dozen brilliant programmers and the problem is only getting worse faster and faster.

So we bootstrapped GCC from a 357byte bootstrap binary.

Now I can hear you saying: but you stil depend on a POSIX kernel so you really didn't resolve all of your responsibility.

You are absolutely right, we haven't entirely solved that yet and I have much more work ahead of me; looking like a fool doing kernel work but...

we figured out how to do bios calls from protected mode: so we actually are not that far from our short term goals thanks to GNU Guix and the amazing folks in the Reproducible builds community. But preserving these bootstrap chains will require our community to first take up its responsibilities and make sure people can actually build the source code without pregenerated files.


Well I guess I should explain this work we have done.

Hex0

We needed a language that could be audited, understood and implementable in a few hundred bytes. Unfortunately there was no language we found that could be honestly considered source because all of them ended up being like brainfuck. So I guess I just have to do something stupid.

So we took hex and added line comment support but that wasn't good enough so we added even more line comment support It probably will never be considered a "Programming language" nor "Source Code" but it kinda works, in the making art out of individually colored grains of sand way.

oh and if you don't trust me or my binaries, great. Here is the grand total of functionality you would need to implement to replace our bootstrap root:

sed 's/[;#].*$//g' hex0_x86.hex0 | xxd -r -p > hex0
chmod +x hex0

But guess what, making art out of individual grains of colored sand is actually a popular hobby, so lets just use it to build a better language.

hex1

So I don't know about you but having to manually count every byte to figure out addresses and relative offsets is a huge pain (especially the relative offsets for jumps, lets fix those first)

So we made a big table of single character names and added single character labels.

We only really need to support relative offsets of a single size so that is fine.

So hex1 is just hex0 with single character label names and a single relative offset size SOOOOOOOOOO Much better and less tedious; but definitely not good enough for our girl.

hex2

Well she would probably want different immediate offset sizes, long label name support and not to have to manually calculate absolute addresses ever again. And I can probably support objects without too much pain. Ok, this isn't so bad. Really tedious but nothing objectively error prone anymore but not good enough for her, as this is just effectively just a linker language which is designed for humans to write/read.

M0

You know what. She wants human names, immediate values and not having to manually convert strings to hex blocks. It can't be that bad to implement a simple assembly language right? Well that was easy, just DEFINE your assembly instruction and use it. But she deserves a real language.

cc_x86

Well FORTH is a bad bootstrapping language and LISP is just a bad idea to implement in assembly.

I know, lets use the absolute best bootstrapping language after assembly.

C [M2-Planet subset]



It doesn't have the fancy preprocessor support, macros or anything fancy but you know what it does have.

hint, something super important for bootstrapping




You can write it in less than 24 hours in assembly



M2-Planet

Well she would probably want C macros and those C features that were missing but hey, we can just write that in our C subset and now that is the start of fulling our responsibilities. We now have a rather useful C library and a bunch of useful little tools

We have got a shell (kaem), cp, mkdir, chmod, sha256sum, untar and ungz; we can really build something cool!!!

GNU MES

GNU MES is an amazing program, a Scheme interpreter written in the C subset we support and a C compiler that runs on that Scheme interpreter. Not to mention capable of building TCC and the rest.

We can just download the tarballs and build it all of the way

This is not going to be easy, as the last 6 years of work has shown us. But this is meaningful work that needs to be done and if we don't do it now, we know that it is less likely for anyone else to do it in the future. Build the bootstrap chain for the language(s) you like for the GNU Community you love.

-Jeremiah