Hacker News .hnnew | past | comments | ask | show | jobs | submitlogin
Really Small Java Apps (nagro.us)
194 points by goranmoomin on Nov 5, 2019 | hide | past | favorite | 64 comments


GraalVM native-image is not a listed option, but I regularly use it to produce binaries that are competitive with (often better than) Go in terms of size and (start time) perf.

A Clojure tool that parses, traverses and processes JSON (caro, see below) worked out to 3.2MB. I'm sure C and Rust can do better, but it's not bad compared to a JVM, and it's an order of magnitude better than the best option in the article. It's also small enough that while comparisons like "that's three floppies!" are interesting historical perspective, you can't reasonably complain about having a 3 MB binary in ~/.local/bin.

Examples:

https://github.com/latacora/wernicke/releases https://github.com/latacora/recidiffist-cli/releases https://github.com/latacora/caro/releases

It mostly works out of the box. My biggest frustration is that you can't easily link in dynamic libraries, which makes it a little annoying to do e.g. EC TLS with a single binary. (Go would have a similar problem but sidestepped it by reimplementing most of it natively.) Second biggest frustration: it is not a fast compiler. You're definitely doing development on the JVM (I use Graal as my default JVM now) and a "production build" afterwards with native-image.


>I regularly use it to produce binaries that are competitive with (often better than) Go in terms of size and (start time) perf.

How do they compare to go in terms of speed? Another comment also said graal native images still suffer with throughput vs using the jvm due to profile guided optimizations.


It's still optimised native code - it's not like they're running in an interpreter.

The real issue is the GC is not as efficient.


I thought this article would be about something like https://en.wikipedia.org/wiki/Java_4K_Game_Programming_Conte... but it's more appropriately "Really Small JVM".

For a simple project requiring java.net.http, using these steps produced a 23MB jlink image

23MB may seem tiny today, but depending on what that "simple project" does, in absolute terms it's still twenty-three million bytes. For comparison, a full installation of Windows 3.11 is roughly that size too, not compressed. As the saying goes, "there's still plenty of room at the bottom."


Things That Turbo Pascal is Smaller Than https://prog21.dadgum.com/116.html


The wikipedia page for Turbo Pascal, at 43,234 bytes, is bigger than Turbo Pascal


This reminds me how small programs written in Red/Rebol/Lisp are in general: https://www.red-lang.org/


It is remarkable how bloated modern software is. My first hard drive, back in 1988, was 30 megabytes.


If you start talking to most programmers about efficiency, you’ll almost immediately be shut down with “premature optimization is the root of all evil” (no matter how non-premature the optimization might be). There’s a pervasive belief among programmers that hardware is so fast that, as long as the software works, it doesn’t matter how efficiently it uses memory, or CPU, or disk space, or the network… yet the #1 complaint I hear from software _users_ is “damn, this thing is slow”. I yearn for a day when we start behaving like real professional engineers and strive to make the most efficient use of resources.


In fairness, I feel like we have closer to a Gustafson's law [0] situation; individual tasks haven't gotten much faster, but it feels like we do more than we did in the 90s. The problem space has gotten so much larger than it used to be, and at some level we got to "fast enough" for things like desktop applications.

Not to mention, programs are so much more stable nowadays; I can't remember the last time my computer just locked up for no apparent reason, and I am typically pretty happy that most programs don't leak memory all over the place anymore.

Are a lot of programs bloated now? Sure, but I think I'd rather have that than having my program crash all the time because someone who didn't really know what they're doing cleared a pointer incorrectly.

[0] https://en.wikipedia.org/wiki/Gustafson%27s_law


You'd rather have bloated programs than some random not-necessarily connected bad-thing?


It's not necessarily connected, but often connected. GCs introduce some level of bloat, but they also give stronger guarantees of memory safety. There are a lot more programmers out there now than there were 20 years ago, and with that we're going to inherently have more bad programmers writing bad programs. If they aren't good programmers, I don't trust them to handle memory correctly.

I know languages like Rust and Swift have ways of safely dealing with memory without a GC, and I know Go has a non-blocking GC now, but remember that all of these things are relatively recent.


> I know languages like Rust and Swift have ways of safely dealing with memory without a GC

Those are very different things, to my knowledge. Rust is "no-GC" in the vein of C and C++ (but the borrow checker makes the memory management portion easier to deal in that you generally don't have to manually free stuff). Swift is "memory managed" in that is uses reference counts to automatically check through the used variables and free ones that are no longer references. In my eyes, that's pretty clearly a type of garbage collection, even if it seems the term has become specific enough that it's less often applied to that case.


The Swift ARC is not a "full" GC; it does have a runtime effect, but a lot of the work is done at compile-time, and no need for a background process like Java's GC.

That said, technically you are correct, it is a GC, just one with a lot lower overhead than something like Java/C#/JavaScript/Python.


That quote about premature optimization is from 1974. Back then, manually tweaking code to produce the "right" machine instructions really mattered. But it was time-consuming and hard, and hence should be applied only to the most time-sensitive part of a program.

Today, we get that kind of optimization, for the whole program, simply by adding -O2 to our build options, and waiting slightly longer for the compiler to complete.

Engineers using that quote are an important reason for the failure of software projects. Not only should we make efficient use of resources, but performance of software must be taken into account right from the beginning of any software project. Performance requirements must be clear from the start (how many concurrent users, how many requests per second, etc., times 2 (or 20) for safety) and when a design is made, performance requirements for each individual component should be set (and measured after implementation).


> #1 complaint I hear from software _users_ is “damn, this thing is slow”.

I strangely hear that quite often too on morning traffic ;). When will theses real professional engineers will learn?!?!?!

> make the most efficient use of resources

Can you define this? Everything can be done more efficiently.

A real professional engineers won't do premature optimization. You won't see bridges using some amazing new carbon material, it's not only too costly for the budget, it's also not needed (the bridge does everything it has to do per the requirement).

A real professional engineers will do the most he can with the resource he has, which is pretty much what we do in software engineering.


Sure, the number one requirement is to get the job done. But, nobody that takes pride in their work is going to be satisfied with fat, slow code. Efficient code is a thing of beauty that takes time to craft. Unfortunately, our desire to make beautiful things gets crushed under the constant race to production of the next feature.


IME the discussion is almost immediately rekindled if you have actual perf numbers to back it up, or a specific 10 second fix to turn O(scary) code into O(1) code.

Over the years I have wasted a lot of time reviewing or discussing "optimizations" that didn't help - or even hurt - performance. Micro-optimizing some once-off init function instead of tackling major per-frame overhead and stuttering. Creating a nice lean core site and then burdening it with a billion third party scripts requested by marketing.

30MB matters for a website, or keyboard firmware.

On the other hand, my gamedev workloads often crash on 32GB machines. I measure useful SSD sizes in TB. In that context, 30MB is line noise. A rounding error. You could spend a lifetime optimizing more important performance issues, and still never turn your attention to that 30MB as something useful to spend time reducing further.


I like faster and smaller, but it sounds like we are striving to make the most efficient use of resources, including developer time. "professional engineers" don't have to worry so much about this because the costs of their projects aren't dominated by their own salaries as ours are.


I yearn for a day when a manager will approve a month-long rewrite of the app fronted in something sensible just to save 200MB of RAM. "The developer who wrote the MVP had 2 web dev courses and a week to write it from scratch, now all of you with your CS degrees want to spend a month making it smaller?"

The fact of the matter is that quality simply does not matter in today's world. Not in software, not in hardware, not in food, clothing, cars, or basically any other area. It's all about minimizing costs.


I am always surprised by devs that have a strong sense for O notation and how it affects code complexity, but have no clue about the bloat in tools or libraries they throw on the pile. So version 1.0 comes around and they immediately need to refactor to get down from a 15 second page load (or from a huge file/data stream).

Fine tuning code is one thing, but installing a bloated package for one tiny feature should be well-considered at the time of implementation because every addition adds to the tech debt bill.


The quote was about not sweating about the most efficient solution first, and instead focusing on getting it correct, because you can easily fall into the trap of optimising for the wrong thing before profiling and making bad trade-offs on complexity and maintainability.

Nothing about the original quote was related to Turing's law.


And yet its remarkable how many user shop for features, not performance.

Try to make software good enough to sell, and that is quickly the obvious conclusion.


The modern developer cares for only their own time, their own money, and their own fun. User concerns aren't even on the list.


I agree completely, except for the "fun" part --- what developer enjoys waiting for gargantuan projects to compile, slow tools to respond, or debugging through a byzantine maze of dependencies and indirection bloat?

...unless it's this sort of fun, perhaps? https://xkcd.com/303/


To be fair, Windows at that time was a pretty simple 16-bit operating system, and didn't even support networking until you added Winsock or WFW 3.11.

It didn't really support much of anything until you started adding lots of other DLL's and libraries


Well, simple or not - the first UNIX needed 24 kilobytes of RAM. (Not sure about the disk space - probably 10x that if you count all the utilities.)


Well support thousands of users using mvs. Only got 24 MB as well.


Yes, I remember when people suddenly started writing win32 programs in assembly, as a backlash against the bloat of MFC etc. (and maybe because there was a bunch of non-PC assembly programmers switching?)

One interestin aspect of this is that we're going a bit back to Lisp/Smalltalk images in the mainstream, bundling runtime and code. (With few of the former's benefits, of course.) Has been done with scriptings apps, of course (Python, Perl, Tcl's Starkits), but with Java and Electron it's bound to have a wider audience.

Maybe because we can't complain about size/complexity issues anymore, now that something even more complex is common (code+runtime+wholefrigginoperatingsystem).


23 megabytes does not seem, by any metric, “tiny” to anyone that I work with.


I thought it would be about sim cards and other smart cards.


That'd be 16 floppy disks, or roughly a quarter of the hard disk space on my first PC clone. Or nearly ten Dooms!


I've found a combination of Spring Boot executable jar + a small jlink built JRE runtime packaged next to the executable jar works fairly well.

Our zipped distribution is 50mb. Uncompressed the executable is 26mb and the runtime is 75mb.

I'd love for it to be smaller but I can't complain. This also makes solving for Linux, macOS, and Windows pretty trivial. They each get their own zip with packaged runtime.


This seems to be about the sweet spot that I keep finding, although I normally work in clojure.


Interesting to see this as I didn't know it was so simple to compile the JDK these days. But curious how this compares to Quarkus (https://quarkus.io/) which is built upon GraalVM?


I keep a close eye on quarkus.io. I love what they're doing for the Java ecosystem. It's interesting to see other frameworks follow suit (Micronaut, etc)


Micronaut predates Quarkus by almost a year, but they both seem to be great frameworks.


Agreed... I almost looked it up before I hit "reply". I didn't start seeing the container native/GraalVM stuff in Micronaut's tweets until more recently, but that doesn't mean anything. Good info to know. Thanks!


"Since bundling apps and runtime is the new best-practice"

When did this become the new best-practice in Java land?


After OpenJDK 8 the separate JRE was deprecated.

The recommendation now is for app developers to include their own JRE with the app; just like how each electron app includes its own copy of chromium.


Since people tested on one JVM version and ran production in another. Finding that gives unpredictables results people moved to linking the versions together. And then the next natural step is distributing the runtime and the app together.

Note this was not possible in the past as the java license prohibited redistribution.


Since mid 2017 when the JRE was discontinued. How else are you shipping Java apps? With the JDK? That’s for developers.


since the author internalized the AWS marketing message.


I mean AWS has driven a lot of developer practices in the last 10 years. Not a terrible bandwagon to jump on.


It's an interesting turnaround. Once upon a time all the thinking was that you should pile as much into shared DLLs and common runtimes as you can, because those things will almost certainly be kept in the cache and will be much faster to launch / lower overhead.

Now not only is it becoming "best practice" to bundle your whole runtime with your app, people are bundling the whole operating system via Docker.

I'm curious if people have compared actually using a single JVM that is kept hot in OS cache vs separate minimal packaging for every app. It might favor single packaging if you only run 1 java app, but if you ran 10 different ones the benefits of shared code might win.


With a JVM kept hot in OS cache, how many different JVM versions would you need to support apps made with different JVM versions?


Ideally one, maybe two? An LTS and a current version?


Is GraalVM a good option for this?


Yes and no. Graal Native Image doesn’t work for spring, since some of the jvm functionality wrt. class loading don’t work yet (I think). Also native image has lower throughput due to the lack of profile driven code optimizations.


The Spring team and Graal folks are working on full support. I believe the main angle is to ensure the annotation-based model will work smoothly[0]. Another is to introduce a fully-functional model (Spring Fu/Kofu [1]).

Bear in mind, a lot of what Spring -- Spring Boot in particular -- gets charged with is beyond its control. It's an entrypoint to the vast Java ecosystem, much of which is also incompatible with Graal native images at the moment.

[0] https://github.com/spring-projects-experimental/spring-fu

[1] https://github.com/spring-projects-experimental/spring-graal...


More info on Spring support for GraalVM native images: https://github.com/spring-projects/spring-framework/wiki/Gra...


I get Docker and I use it, but it does bother me that one bundles the JDK along with the docker image. Such a waste of space. I like this idea, I hope it gets more traction.

In terms of deployment of Java apps, we've found that Nomad works really well as it doesn't need a full Docker image (or equivalent) like Docker, Kubernetes etc. In Nomad you just make sure the worker node has the JDK, and then you specify the .jar file in the job.


I'm not sure this is the issue you are referencing, but you can use multi-stage Docker image builds to avoid bundling the JDK in the final Docker image:

  FROM openjdk:jdk AS build
  
  COPY . .
  RUN mvn package

  FROM openjdk:jre

  COPY --from=build ./somepath/app.jar .

  CMD java app.jar
The first stage (defined by the first FROM) is only used to build the ".jar", the second stage will only contain the JRE and the ".jar" copied from the first stage.


Anything after Java 9 would have to use the jlink method, as with the new modules system, there is no longer a distinction between the JDK and JRE.

The problem is most 3rd party libraries aren't ready for the modules switchover, so for anyone using those libraries, you are stuck with having either a 3-400mb docker image or using jlink


There are “jre” images for java 10 plus that exclude the java compiler and other unnecessary JDK components, you don’t need to use modules to use this.


Sorry, I didn't intend to write "image", but rather container. I can no longer edit my comment.


I don't see what the issue is with space/size having the JDK in a docker issue layer vs installing through ansible/chef later (or whenever it is you're doing). Docker images are layered, and by including a pinned version of the JVM that's baked into a core Docker stack for your org that's been security hardened, tested, etc, seems like it's better than taking the time to install a JVM every time during deployment of the worker node.

EDIT: checking out Nomad now, since we have used other Hashicorp prods in the past...


The underlying file system might be able to dedupe the data. I remember playing around with linux containers and ZFS, where the base image was a snapshot, so only the changes to the base image take up hdd space. With modern solid state disks able to read several GB per second and with 1TB capacity, and Gbit Internet, i do not worry much if the app itself is 300MB big. If you skip the GUI the app should be fairly small - even when statically linked, then I use Linux namespaces for isolation, so no need to ship the entire OS with the app (which is a bit absurd imho. I wonder how many containers run old OS and risk container escape that would end up with root on the host, or the unnecessary extra virtualization layer killing the performance)


Docker images are layered, so this isn't a problem in most cases.


Cloud Native Buildpacks take advantage of this to isolate a JRE or JDK to its own layer for fast rebasing and high reuse[0].

[0] https://github.com/cloudfoundry/java-cnb


Yeah I meant the container i.e. the end result, not the actual image. I'm aware of layered images.


That's not the whole truth, though. If you don't tweak chroot_env in the client settings you get a whole copy of /lib, /lib64, /sbin, /usr and some other things in each allocation. So less network traffic, but uses more real estate.


> bundles the JDK

Not being pedantic, honestly - do you mean the JDK (which includes the compiler) or the JRE (just the VM)?


Hmmm.. Makes me wonder if this could help bazel slim down even more (although it's very acceptable size right now).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: