GraalVM native-image is not a listed option, but I regularly use it to produce binaries that are competitive with (often better than) Go in terms of size and (start time) perf.
A Clojure tool that parses, traverses and processes JSON (caro, see below) worked out to 3.2MB. I'm sure C and Rust can do better, but it's not bad compared to a JVM, and it's an order of magnitude better than the best option in the article. It's also small enough that while comparisons like "that's three floppies!" are interesting historical perspective, you can't reasonably complain about having a 3 MB binary in ~/.local/bin.
It mostly works out of the box. My biggest frustration is that you can't easily link in dynamic libraries, which makes it a little annoying to do e.g. EC TLS with a single binary. (Go would have a similar problem but sidestepped it by reimplementing most of it natively.) Second biggest frustration: it is not a fast compiler. You're definitely doing development on the JVM (I use Graal as my default JVM now) and a "production build" afterwards with native-image.
>I regularly use it to produce binaries that are competitive with (often better than) Go in terms of size and (start time) perf.
How do they compare to go in terms of speed? Another comment also said graal native images still suffer with throughput vs using the jvm due to profile guided optimizations.
For a simple project requiring java.net.http, using these steps produced a 23MB jlink image
23MB may seem tiny today, but depending on what that "simple project" does, in absolute terms it's still twenty-three million bytes. For comparison, a full installation of Windows 3.11 is roughly that size too, not compressed. As the saying goes, "there's still plenty of room at the bottom."
If you start talking to most programmers about efficiency, you’ll almost immediately be shut down with “premature optimization is the root of all evil” (no matter how non-premature the optimization might be). There’s a pervasive belief among programmers that hardware is so fast that, as long as the software works, it doesn’t matter how efficiently it uses memory, or CPU, or disk space, or the network… yet the #1 complaint I hear from software _users_ is “damn, this thing is slow”. I yearn for a day when we start behaving like real professional engineers and strive to make the most efficient use of resources.
In fairness, I feel like we have closer to a Gustafson's law [0] situation; individual tasks haven't gotten much faster, but it feels like we do more than we did in the 90s. The problem space has gotten so much larger than it used to be, and at some level we got to "fast enough" for things like desktop applications.
Not to mention, programs are so much more stable nowadays; I can't remember the last time my computer just locked up for no apparent reason, and I am typically pretty happy that most programs don't leak memory all over the place anymore.
Are a lot of programs bloated now? Sure, but I think I'd rather have that than having my program crash all the time because someone who didn't really know what they're doing cleared a pointer incorrectly.
It's not necessarily connected, but often connected. GCs introduce some level of bloat, but they also give stronger guarantees of memory safety. There are a lot more programmers out there now than there were 20 years ago, and with that we're going to inherently have more bad programmers writing bad programs. If they aren't good programmers, I don't trust them to handle memory correctly.
I know languages like Rust and Swift have ways of safely dealing with memory without a GC, and I know Go has a non-blocking GC now, but remember that all of these things are relatively recent.
> I know languages like Rust and Swift have ways of safely dealing with memory without a GC
Those are very different things, to my knowledge. Rust is "no-GC" in the vein of C and C++ (but the borrow checker makes the memory management portion easier to deal in that you generally don't have to manually free stuff). Swift is "memory managed" in that is uses reference counts to automatically check through the used variables and free ones that are no longer references. In my eyes, that's pretty clearly a type of garbage collection, even if it seems the term has become specific enough that it's less often applied to that case.
The Swift ARC is not a "full" GC; it does have a runtime effect, but a lot of the work is done at compile-time, and no need for a background process like Java's GC.
That said, technically you are correct, it is a GC, just one with a lot lower overhead than something like Java/C#/JavaScript/Python.
That quote about premature optimization is from 1974. Back then, manually tweaking code to produce the "right" machine instructions really mattered. But it was time-consuming and hard, and hence should be applied only to the most time-sensitive part of a program.
Today, we get that kind of optimization, for the whole program, simply by adding -O2 to our build options, and waiting slightly longer for the compiler to complete.
Engineers using that quote are an important reason for the failure of software projects. Not only should we make efficient use of resources, but performance of software must be taken into account right from the beginning of any software project. Performance requirements must be clear from the start (how many concurrent users, how many requests per second, etc., times 2 (or 20) for safety) and when a design is made, performance requirements for each individual component should be set (and measured after implementation).
> #1 complaint I hear from software _users_ is “damn, this thing is slow”.
I strangely hear that quite often too on morning traffic ;). When will theses real professional engineers will learn?!?!?!
> make the most efficient use of resources
Can you define this? Everything can be done more efficiently.
A real professional engineers won't do premature optimization. You won't see bridges using some amazing new carbon material, it's not only too costly for the budget, it's also not needed (the bridge does everything it has to do per the requirement).
A real professional engineers will do the most he can with the resource he has, which is pretty much what we do in software engineering.
Sure, the number one requirement is to get the job done. But, nobody that takes pride in their work is going to be satisfied with fat, slow code. Efficient code is a thing of beauty that takes time to craft. Unfortunately, our desire to make beautiful things gets crushed under the constant race to production of the next feature.
IME the discussion is almost immediately rekindled if you have actual perf numbers to back it up, or a specific 10 second fix to turn O(scary) code into O(1) code.
Over the years I have wasted a lot of time reviewing or discussing "optimizations" that didn't help - or even hurt - performance. Micro-optimizing some once-off init function instead of tackling major per-frame overhead and stuttering. Creating a nice lean core site and then burdening it with a billion third party scripts requested by marketing.
30MB matters for a website, or keyboard firmware.
On the other hand, my gamedev workloads often crash on 32GB machines. I measure useful SSD sizes in TB. In that context, 30MB is line noise. A rounding error. You could spend a lifetime optimizing more important performance issues, and still never turn your attention to that 30MB as something useful to spend time reducing further.
I like faster and smaller, but it sounds like we are striving to make the most efficient use of resources, including developer time. "professional engineers" don't have to worry so much about this because the costs of their projects aren't dominated by their own salaries as ours are.
I yearn for a day when a manager will approve a month-long rewrite of the app fronted in something sensible just to save 200MB of RAM. "The developer who wrote the MVP had 2 web dev courses and a week to write it from scratch, now all of you with your CS degrees want to spend a month making it smaller?"
The fact of the matter is that quality simply does not matter in today's world. Not in software, not in hardware, not in food, clothing, cars, or basically any other area. It's all about minimizing costs.
I am always surprised by devs that have a strong sense for O notation and how it affects code complexity, but have no clue about the bloat in tools or libraries they throw on the pile. So version 1.0 comes around and they immediately need to refactor to get down from a 15 second page load (or from a huge file/data stream).
Fine tuning code is one thing, but installing a bloated package for one tiny feature should be well-considered at the time of implementation because every addition adds to the tech debt bill.
The quote was about not sweating about the most efficient solution first, and instead focusing on getting it correct, because you can easily fall into the trap of optimising for the wrong thing before profiling and making bad trade-offs on complexity and maintainability.
Nothing about the original quote was related to Turing's law.
I agree completely, except for the "fun" part --- what developer enjoys waiting for gargantuan projects to compile, slow tools to respond, or debugging through a byzantine maze of dependencies and indirection bloat?
Yes, I remember when people suddenly started writing win32 programs in assembly, as a backlash against the bloat of MFC etc. (and maybe because there was a bunch of non-PC assembly programmers switching?)
One interestin aspect of this is that we're going a bit back to Lisp/Smalltalk images in the mainstream, bundling runtime and code. (With few of the former's benefits, of course.)
Has been done with scriptings apps, of course (Python, Perl, Tcl's Starkits), but with Java and Electron it's bound to have a wider audience.
Maybe because we can't complain about size/complexity issues anymore, now that something even more complex is common (code+runtime+wholefrigginoperatingsystem).
I've found a combination of Spring Boot executable jar + a small jlink built JRE runtime packaged next to the executable jar works fairly well.
Our zipped distribution is 50mb. Uncompressed the executable is 26mb and the runtime is 75mb.
I'd love for it to be smaller but I can't complain. This also makes solving for Linux, macOS, and Windows pretty trivial. They each get their own zip with packaged runtime.
Interesting to see this as I didn't know it was so simple to compile the JDK these days. But curious how this compares to Quarkus (https://quarkus.io/) which is built upon GraalVM?
I keep a close eye on quarkus.io. I love what they're doing for the Java ecosystem. It's interesting to see other frameworks follow suit (Micronaut, etc)
Agreed... I almost looked it up before I hit "reply". I didn't start seeing the container native/GraalVM stuff in Micronaut's tweets until more recently, but that doesn't mean anything. Good info to know. Thanks!
Since people tested on one JVM version and ran production in another. Finding that gives unpredictables results people moved to linking the versions together. And then the next natural step is distributing the runtime and the app together.
Note this was not possible in the past as the java license prohibited redistribution.
It's an interesting turnaround. Once upon a time all the thinking was that you should pile as much into shared DLLs and common runtimes as you can, because those things will almost certainly be kept in the cache and will be much faster to launch / lower overhead.
Now not only is it becoming "best practice" to bundle your whole runtime with your app, people are bundling the whole operating system via Docker.
I'm curious if people have compared actually using a single JVM that is kept hot in OS cache vs separate minimal packaging for every app. It might favor single packaging if you only run 1 java app, but if you ran 10 different ones the benefits of shared code might win.
Yes and no. Graal Native Image doesn’t work for spring, since some of the jvm functionality wrt. class loading don’t work yet (I think). Also native image has lower throughput due to the lack of profile driven code optimizations.
The Spring team and Graal folks are working on full support. I believe the main angle is to ensure the annotation-based model will work smoothly[0]. Another is to introduce a fully-functional model (Spring Fu/Kofu [1]).
Bear in mind, a lot of what Spring -- Spring Boot in particular -- gets charged with is beyond its control. It's an entrypoint to the vast Java ecosystem, much of which is also incompatible with Graal native images at the moment.
I get Docker and I use it, but it does bother me that one bundles the JDK along with the docker image. Such a waste of space. I like this idea, I hope it gets more traction.
In terms of deployment of Java apps, we've found that Nomad works really well as it doesn't need a full Docker image (or equivalent) like Docker, Kubernetes etc. In Nomad you just make sure the worker node has the JDK, and then you specify the .jar file in the job.
I'm not sure this is the issue you are referencing, but you can use multi-stage Docker image builds to avoid bundling the JDK in the final Docker image:
FROM openjdk:jdk AS build
COPY . .
RUN mvn package
FROM openjdk:jre
COPY --from=build ./somepath/app.jar .
CMD java app.jar
The first stage (defined by the first FROM) is only used to build the ".jar", the second stage will only contain the JRE and the ".jar" copied from the first stage.
Anything after Java 9 would have to use the jlink method, as with the new modules system, there is no longer a distinction between the JDK and JRE.
The problem is most 3rd party libraries aren't ready for the modules switchover, so for anyone using those libraries, you are stuck with having either a 3-400mb docker image or using jlink
There are “jre” images for java 10 plus that exclude the java compiler and other unnecessary JDK components, you don’t need to use modules to use this.
I don't see what the issue is with space/size having the JDK in a docker issue layer vs installing through ansible/chef later (or whenever it is you're doing). Docker images are layered, and by including a pinned version of the JVM that's baked into a core Docker stack for your org that's been security hardened, tested, etc, seems like it's better than taking the time to install a JVM every time during deployment of the worker node.
EDIT: checking out Nomad now, since we have used other Hashicorp prods in the past...
The underlying file system might be able to dedupe the data. I remember playing around with linux containers and ZFS, where the base image was a snapshot, so only the changes to the base image take up hdd space. With modern solid state disks able to read several GB per second and with 1TB capacity, and Gbit Internet, i do not worry much if the app itself is 300MB big. If you skip the GUI the app should be fairly small - even when statically linked, then I use Linux namespaces for isolation, so no need to ship the entire OS with the app (which is a bit absurd imho. I wonder how many containers run old OS and risk container escape that would end up with root on the host, or the unnecessary extra virtualization layer killing the performance)
That's not the whole truth, though. If you don't tweak chroot_env in the client settings you get a whole copy of /lib, /lib64, /sbin, /usr and some other things in each allocation. So less network traffic, but uses more real estate.
A Clojure tool that parses, traverses and processes JSON (caro, see below) worked out to 3.2MB. I'm sure C and Rust can do better, but it's not bad compared to a JVM, and it's an order of magnitude better than the best option in the article. It's also small enough that while comparisons like "that's three floppies!" are interesting historical perspective, you can't reasonably complain about having a 3 MB binary in ~/.local/bin.
Examples:
https://github.com/latacora/wernicke/releases https://github.com/latacora/recidiffist-cli/releases https://github.com/latacora/caro/releases
It mostly works out of the box. My biggest frustration is that you can't easily link in dynamic libraries, which makes it a little annoying to do e.g. EC TLS with a single binary. (Go would have a similar problem but sidestepped it by reimplementing most of it natively.) Second biggest frustration: it is not a fast compiler. You're definitely doing development on the JVM (I use Graal as my default JVM now) and a "production build" afterwards with native-image.