A colleague once told me, the difference between software engineering and civil engineering is that they build the same bridge repeatedly while we never build the same thing twice.
I'm not sure why people seem to be under the impression that writing a compiler means that the language the compiler is implemented in should have "low level" features. A compiler is just a text -> text translation tool if you can leverage other tools such as an assembler and never needs to access machine level instructions. E.g., Pascal compilers have traditionally been written in Pascal, hardly a language which conjures up a "low level" image. Even when an assembler isn't available, all your implementation language needs to support, in terms of "low level" features, is writing of bytes to a file.
But manipulating instruction and file formats and such can be tedious if your language doesn't have the right capabilities but it's not impossible.
> I'm not sure why people seem to be under the impression that writing a compiler means that the language the compiler is implemented in should have "low level" features.
Performance.
You definitely can write a compiler in a high-level language and given the choice I certainly prefer to on my hobby projects. Having a garbage collector makes so many compiler algorithms and data structures easier.
But I also accept that that choice means there's an upper limit to how fast my compiler will. If you're writing a compiler that will be used to (at least aspirationally) compile huge programs, then performance really matters. Users hate waiting on the compiler.
When you want to squeeze every ounce of speed you can get out of the hardware, a low-level language that gives you explicit control over things like memory layout matters a lot.
> But I also accept that that choice means there's an upper limit to how fast my compiler will.
Don't buy it.
A decent OCaml version of a C or Zig compiler would almost certainly not be 10x slower. And it would be significantly easier to parallelize without introducing bugs so it might even be quite a bit faster on big codebases.
Actually designing your programming language to be processed quickly (can definitively figure things out with local parsing, minimizing the number of files that need to be touched, etc.) is WAY more important than the low-level implementation for overall compilation speed.
And I suspect that the author would have gotten a lot further had he been using a GC language and not had to deal with all the low-level issues and debugging.
I like Zig, and I use it a lot. But it is NOT my general purpose language. I'm definitely going to reach for Python first unless I absolutely know that I'm going to be doing systems programming. Python (or anything garbage collected with solid libraries) simply is way more productive on short time scales for small codebases.
> I suspect that the author would have gotten a lot further had he been using a GC language and not had to deal with all the low-level issues
Agree, that many people are using languages out of context to what they are actually trying to do. In many cases, using a GC language would be far more productive for them. Though I do think we should distinguish between compiled and interpreted GC languages, as often there is a significant gap in performance, that can be wanted or appreciated.
> Though I do think we should distinguish between compiled and interpreted GC languages, as often there is a significant gap in performance, that can be wanted or appreciated.
Sure, that is tautologically true.
However, I maintain that the original author would have gotten much further even with a pathologically slow Python implementation. In particular, munging all the low-level stuff like linking is going to have full-on libraries that you could pass off the task to. You can then come back and do it yourself later.
For me, reaching a point that helps reinforce my motivation is BY FAR the most relevant consideration for projects. Given the original article, it seems like I'm not alone.
I promise that he knows a thing or two about compilers and performance!
For what it's worth, I agree with him. A recent example is the porting of the TypeScript compiler to Go: it hasn't been fully released yet, but people are already going wild for its performance improvement over the original in-TS compiler.
Of course, it took them over a decade to reach the point where a port was necessary - so it's up to you to decide when that decision makes sense for your language.
I think once you get the design of the IR right and implement it relatively efficiently, an optimizing compiler is going to be complicated enough that tweaking the heck out of low-level data structures won't help much. (For a baseline compiler, maybe...but).
E.g. when I ported C1 from C++ to Java for Maxine, straightforward choices of modeling the IR the same and basic optimizations allowed me to make it even faster than C1. C1X was a basic SSA+CFG design with a linear scan allocator. Nothing fancy.
The Virgil compiler is written in Virgil. It's a very similar SSA+CFG design. It compiles plenty fast without a lot of low-level tricks. Though, truth be told I went overboard optimizing[1] the x86 backend and it's significantly faster (maybe 2x) than the nicer, more pretty x86-64 backend. I introduced a bunch of fancy representation optimizations for Virgil since then, but they don't really close the gap.
[1] It's sad that even in the 2020s the best way to make something fast is to give up on abstractions and use integers and custom encodings into integers for everything. Trying to fix that though!
Does “low level” translate to performance? Is Rust a “low level” language?
Take C#. You can write a compiler in it that is very fast. It gives you explicit control over memory layout of data structures and of course total control over what you wrote to disk. It is certainly not “low level”.
> It gives you explicit control over memory layout of data structures
Some with structs, yes. But overall it doesn't give you much control over where things up in memory once references get involved compared to C, C++, or Rust.
>Having a garbage collector makes so many compiler algorithms and data structures easier.
Does it really?
Compilers tend to be programs that just appends a bunch of data to lists, hashmaps, queues and trees, processes it, then shuts down.
So you can just make append-only data structures and not care too much about freeing stuff.
I never worry about memory management when I write compilers in C.
> Compilers tend to be programs that just appends a bunch of data to lists, hashmaps, queues and trees, processes it, then shuts down.
This is true if you're writing a batch mode compiler. But if you're writing a compiler that is integrated into an IDE where it is continuously updating the semantic state based on user edits, there is a whole lot more mutability going on and no clear time when you can free everything.
> Pascal compilers have traditionally been written in Pascal, hardly a language which conjures up a "low level" image.
It may be the case that it doesn't conjure up such an image, but Pascal is approximately on the same rung as Zig or D—lower level than Go, higher level than assembly. If folks have a different impression, the problem is just that: their impression.
Pascal, as defined by Wirth, had no "low level" features. E.g., no control over memory allocation other than the language provided new/dispose, no bit operators, clunky strings of fixed size, no access to system calls, no access to assembly, not even any hex or octal constants, all features which a language allowing "low level" access is expected to have (e.g. Ada, Modula-2/3, Oberon, all Pascal-derived languages). Things like conformant array parameters showed up much later in the ISO version but were not widely adopted. No modules either but this is not a low level feature. Turbo Pascal attempted to fix all this on the PC later on and it was deservedly well loved. Still, Wirth successfully wrote Pascal compilers in Pascal without --- obviously -- having a Pascal compiler available. [Link](https://en.wikipedia.org/wiki/Pascal_(programming_language)#...)
Many people seem to forget that Pascal evolved, and Wirth was very involved in that evolution. Wirth (as a consultant) helped create Object Pascal (from Apple), which then influenced Turbo Pascal and Delphi. Modula and Oberon are often referred to as influences on earlier or certain versions of Turbo Pascal (versions 4 to 5.5) as well.
It's because a compiler is supposed to be high-level to low-level; you already have a lower-level language to write in it, and not a higher-level one. Writing a C compiler in a higher-level language than C is going backwards.
E.g., Pascal compilers have traditionally been written in Pascal, hardly a language which conjures up a "low level" image.
How could the first Pascal compiler be compiled if it was written in Pascal, but a Pascal compiler didn't yet exist?
Most people don't realize how bad geolocated data is for a free society. I can buy data from a broker, geo-fence your house address, and then I'm able to see all the places where you went, who you associate with, and identify all you associate with by tracking them to addresses. All of this happens with anonymized device identifiers. It is the wet dream of a company such as Palantir and all governments who desire absolute control over their populations.
The high level language -> assembly seems like an apt analogy for using LLMs but I would like to argue that it is only a weak one. The reason in that, previously, both the high level language and the assembly language had well defined semantics and the transform was deterministic whereas now you are using English or other human language, with ambiguities and lacking well-defined semantics. The reason math symbolisms were invented in the first place is because human language did not have the required unambiguous precision, and if we encounter hurdles with LLMs, we may need to reinvent this once more.
I wonder have the reconstruction techniques been verified by a double-blind experiment to reconstruct the face of a homo sapiens from a skull with a known photograph. Otherwise, you're just wondering how much of it is just artistry and how much solid, verified technique.
Thanks, but this seems to be optimized for the smallest number of gates, so it applies for simple microcontrollers and FPGA, and with limited precision. I was interested in actual state of the art used in modern CPUs and GPUs.
I've had the good fortune to attend two of his lectures in person. Each time, he effortlessly derived provably correct code from the conditions of the problem and made it seem all too easy. 10 minutes after leaving the lecture, my thought was "Wait, how did he do it again?".
Is it that they're privacy obsessed, or rather that most people have a passion for self destruction and exhibition?
If you think about it, the "dork" position was the one that was most normal, it's the status-quo. The people wanting to record in lockerooms and what not is not the status-quo. They win because most people are short-sighted, or even secretly love hurting themselves.
People don't care about privacy as long as a faceless corporation is doing the spying. People very much care if it has a plausible path to embarrassing or creepy situations involving actual people in your life. The chilling effect of ubiquitous phone cameras is well documented now this would amp it up by a 100. Many cool clubs already put stickers on phone cameras.
> People don't care about privacy as long as a faceless corporation is doing the spying.
This isn't true. Most everyone hates the fact they are being surveilled, but it is pervasive and people only can deal with so many complications in life.
Avoiding surveillance is not a decision or action, it is 1000 decisions and actions. Endless decisions and actions.
In my experience most people don't care at all. Even if you tell them about these topics, they find it weird, and tinfoil-hat adjacent. "If you have nothing to hide..." and "why would anyone care about my data in particular?"
reply