Not an easy read for C++ fans like me, especially since it is coming from Bartosz Milewski.
I did try out several other languages and I keep coming back to C++11 for anything that requires scalability and raw performance, like APIs. The same basic server that gets 400 req/sec when implemented in C# ASP.NET achieves 7900 req/sec using C++.
So far I could not find a programming language that does not have similar (or worse) scissors. It's more like "pick your poison" type of choice.
After I learned Scala, my C++ code started to look like functional programming. According to Bartosz that's a good thing, and I did not have to dive into Haskell (yet). :)
With C#, did you try using manual memory management? I wrote a rather high-performance (50K requests/sec, 550K ops/sec) daemon in F#. The big trick was to use my own mem management when it makes sense. I had a ~1GB managed heap, and 12+GB unmanaged.
For instance, you can stack-allocate many objects in C# (strings and arrays, for instance), if you're willing to give up the safety (and if C++ is an option, then you are willing). You can manually heap-alloc managed objects, too, although it gets tricky if they are nested objects. After all, the JIT is just taking pointers to objects and doing stuff with them - it doesn't care where the memory came from (just remember the GC won't scan your manually allocated stuff).
The CLR (unlike the JVM) has native capabilities built right into it. People should take more advantage of such things instead of only trying fully safe C# and then deciding to dump it all for no safety.
To avoid the costs of serialization and deserialization on a per-request basis, I built a little object system that stored all its instance data in a byte array. All internal references were offsets from the start of the array, but what made it fast was that I could read and write in it using raw pointers.
I then recycled the byte arrays to avoid GC pressure, as many were just over the edge of the large object heap, only collectable with gen2 scans.
I had a version working with manually allocated buffers, but it wasn't any faster - GC overhead was only 2% or so.
Go-lang is roughly 10 times faster than ASP.NET (implemented as recommended by Microsoft). Skipping the ORM doubles the speed, but then you lose most of the components, so I'm not sure that's a realistic optimization.
In case anyone is curious what is going on with those ASP.NET TechEmpower benchmarks:
The TechEmpower benchmarks are all about how much overhead you can eliminate. I profiled the CPU-bound ASP.NET TechEmpower benchmarks and most of the time is spent in IIS<->ASP.NET overhead. After I removed IIS and ASP.NET by just using the .NET HttpListener class [0], the result (on my machine) gives Go a run for its money. Hopefully these results will show up in the next Round that TechEmpower runs.
I profiled the .NET TechEmpower tests that access PostgreSQL and MySQL and found that the database drivers had a lot of unnecessary overhead, for example the Oracle .NET MySQL provider does a synchronous ping to the MySQL Server for every connection, even if it's pooled. Plus, it sends an extra 'use database;' command. [1] The PostgreSQL provider also sends extra synchronous commands for every connection. [2]
So C++ is bad because certain functions should be avoided and has no garbage collector. (With C++11 STL normally there's no need for manual memory management or naked pointers.)
C# is good, just you should avoid the garbage collector and 90% of the standard library. Roll your own web server, raw sql commands, have manual memory management, use undocumented calls, emit byte code and there you have: almost 40% of the C++ performance.
I'm the only one who thinks this does not make any sense?
No, I'm not implying any of that. I'm just saying that an important step to getting good performance is profiling to determine the cause of poor performance, then deciding whether it makes sense to try to improve performance in the found areas, then executing your decision.
Imagine the alternative if I didn't profile: I'd just declare that C++/whatever is way faster than ASP.NET, and I'd switch to C++ and open myself to a whole world of debugging memory corruption issues whatnot. Instead, because I profiled, I found the areas that could be improved, I wrote the HttpListener code and I can stay on the .NET platform for all of its other benefits. By following this process I have more options than just "C# blows, I gotta throw it out the window for C++".
In reality, I have probably written more useful, shipped, production C/C++ than C#, so I'm also a C++ advocate, but hey, when it makes sense, and for the right reasons, you know?
I don't understand how you got there from the parent post. The github links look like idiomatic C# to me, aside from avoiding IIS. IIS has nothing to do with C# vs C++ vs anything else.
The application we built did not access the database per request. That was the point of the blob of data that was not serialized and deserialized; it was a disconnected data set.
IIS was used purely as a driver for an IHttpHandler implementation. We didn't use any of ASP.NET apart from implement that interface. We implemented our own AJAX component system before ASP.NET had AJAX controls.
It was fairly quick on 2003 hardware. Further, the blob of data started out with an embedded URL that referenced a private metadata server, with the version of the app included in the URL. The behaviour for the objects in the little heap was driven from code loaded (normally, cached) via that URL. This meant that you could post-mortem a session by loading up the heap almost like you load up a core dump; you could save the heap along with the request if an error occurred, and have everything you needed for a full repro.
The fact that the URL referenced the code for the behaviour also meant we could roll out updates without ever restarting anything, and existing sessions would see the old behaviour until they were finished, but new sessions would roll over immediately to the new code.
For the niche (data entry, specifically for the insurance industry), I have yet to see a better system than the one we built; it was declarative and typed, and if it compiled, you could be pretty sure it worked, and yet had a built-in API for testing. Loading up the aforementioned core dump even had a REPL. Rails is an enormous pile of work and rats nest of polyglot spaghetti by comparison.
To take a specific example, the application code had a definition for the UI for any given page, and knew what controls were on it, and knew their master / detail relationships. The controls had bindings that were evaluated in the context of the object model stored in the little heap, written in a little expression language called Gravity (so called because it sucked in other responsibilities). So when the end-user loaded up a page, the declaration of the UI and its bindings were sufficient to infer what data to send down to the client; and when e.g. a button was clicked, and the model changed, we could calculate minimal updates to send back to the client. Because the framework knew so much about the app declaratively, you had to do very little work to implement things.
Yes, as everyone says: measure measure measure! Sometimes, despite the CLR generating what looked like terrible x64, it went faster than my alternatives. Performance can be counter-intuitive sometimes, esp. in a somewhat complicated environment like the CLR.
Although, it's safe to say if you have 20GB heaps and are doing Gen2s often, then you might wanna investigate another approach ;)
How CLR ninja skills are better than C++ ninja skills? Both might kill average developers.
The named C# implementation came from .NET experts and an army of consultants from Microsoft. The final advice was "yes, C# is easier to develop, but it comes with a price - just pay the price and drop in some additional servers".
The crucial difference is that C# is memory-safe by default, and for most code, this is fine. As another commenter said, manual memory management is available as an escape hatch from the garbage collector if you need it.
To extend the Edward Scissorhands analogy, with C#, you can use scissors when you need them, but with C++, your hands are scissors, so you'd better be careful all the time.
Well first off, if you have something that suits bytes well (for instance, I have compressed indexes that are base128 delta encoded values, so a byte* is good enough for that) - just use the normal malloc routines and call it a day.
For allocating managed objects on unmanaged heaps, you just need to read a bit about CLR internals. The object pointer (so the value of s in "var s = "hello") points to the start of the object data (IIRC). There's a few words in front of that that provide the object type, as well as the sync block (kitchen sync). Arrays have their own little syntax, where they have a type indicator as well as the length.
You can just manually copy those by pinning an arbitrary object and grabbing a few bytes before it. Then you can write those values to arbitrary memory locations, and use that memory address as the value to put back into a managed object reference.
For APIs that need strings, arrays, or other simple heap-allocated objects, but are pure and just return (like, say, String.Split takes an array to hold splitter values), you might get some large wins by stack allocating them.
The parts of my app that were GC managed, the optimization mainly went into reducing the number of objects allocated - at high request rates, even a few measly objects here and there can easily add GC pressure and suck up CPU. (Although, I will say the CLR GC does pretty well with short lived objects.)
You can also use the freedom of memory access to do things like mutate a string, or to change the type of an array (say, from a byte[] to a long[]).
Another common trick is allocating large arrays of structs, then passing ints that "point" to the struct in question. This a: locks the structs in memory and avoids GC if the entire array is over 85K (it goes on the Large Object Heap), and second, you're only passing a single int param, instead of a struct, which can be slow for larger structs.
I went C++ -> Python -> Scala. Still do Python. Don't do C++ anymore for the same exact reasons he mentions. The guy is a C++ hero and to hear him say those things means a lot. I encourage you to keep playing around with Scala. Have fund with the Actor model. Enjoy GC.
It's easier to start from a GC'ed language, and use escape hatches when you need them, then to start without GC and then realize that your program has become a mess because you don't have it.
Many GC'ed languages have such escape hatches. Those that don't, well, don't start writing performance-critical code in them would be my recommendation.
And also, do not miss his point about reference counting just being a form of GC. "Manual memory management" isn't really a thing that exists, there's a whole suite of memory management techniques and alternatives, and it's easy with something like "C++ with ref counting" to actually be experiencing the worst of all worlds (all the work of "manual" memory management with all the disadvantages of GC) if you don't deeply understand what's really going on with your memory.
Funny you mention the "worst of all worlds scenario" because that's exactly what I was thinking about your advice to start with a GC'd language and then escape for performance critical (which usually means huge heaps btw) parts. This plan combines you all the flexibility, predictability and reliable latency of managed VM systems combined with all the safety guarantees of a manual memory managed system.
Read my whole first sentence again. Carefully. I think it does not say what you think it says.
Furthermore, I still don't think you got the point that there is no royal road to perfect memory management. You can't just say the words "manual memory management" and pretend you've solved the problem, or indeed, that you've said anything at all. Many of the same "manual memory management" techniques that you might use are still readily available in "garbage collected" languages. It's not a boolean, it's a rich space. And thus, I stand by my first sentence.
Learning how to manage memory is a tiny fraction of the work involved in writing complex applications, and it's done upfront. Not having to worry about it would only give a proportionally tiny increase in 'enjoyment'.
Indeed, this is exactly the niche that Rust is targeting: C++ performance without C++'s pitfalls. But it'll likely still be quite a while before it's battle-tested enough for everyday use.
7900 req/sec is on my laptop, first try without optimization - and we will need a lot more.
I know Disqus can serve up to 1M/server and they try to patch the Linux kernel to reach 2M. (Not exactly apples to apples since I use SQL transactions, but in the ballpark.)
The .NET never went over 440 request/sec on the server, not even with help from Microsoft. That means either C++ or using 2000 servers instead of the current 400 servers for every major deployment (hospital).
You're seriously serving a billion requests a day? And you're writing new software which is expected to go from zero to a billion requests, full stop? And you're not expecting anything to change between beginning development and production? And, once the software is in production, it's not going to change at all?
I seriously doubt that all of those things are true - in which case you'd probably be better off using a different language. In my experience, it's much, much cheaper to run 2000 servers ($10m capital cost? something like that?) than double the number of software engineers you've got ($Xm a year, continuously?), and doubling the number of software engineers isn't a linear scaling.
It's per deployment 2000 servers with C# or 400 with C++.
And yes, it is currently used by the largest cancer center in USA (MD Anderson), DoD, VA, Kaiser etc to serve MRI, CT, XRay and similar images for diagnostics (PACS).
Furthermore, how optimized was the C#.NET version? C# will even let you drop into C++ if you want..I have a feeling the C# version may not have been tuned effectively.
Yeah I'd guess perhaps it was part of the framework. The CLR's unfortunate use of UCS-2 (2-byte per char) can really hurt in a string processing system. You can easily go from C# to native and back, so who knows. Also, "help from Microsoft" can range widely.
It's still my favorite language. I do respect his thoughts, but when I have systems programming to do, I'll take C++ over anything else any day, warts and all.
We've come pretty far if without blinking it can be said that 8000 API calls per second is fast and you can get there only by using a language like C++, in 2013...
From what I've read it's mostly used in safety-critical applications. The kind where if something is amiss, your plane crashes. It's niche, but it is actually in use. It's not a toy or academic-only programming language, like you seem to be alluding to.
I did try out several other languages and I keep coming back to C++11 for anything that requires scalability and raw performance, like APIs. The same basic server that gets 400 req/sec when implemented in C# ASP.NET achieves 7900 req/sec using C++.
So far I could not find a programming language that does not have similar (or worse) scissors. It's more like "pick your poison" type of choice.
After I learned Scala, my C++ code started to look like functional programming. According to Bartosz that's a good thing, and I did not have to dive into Haskell (yet). :)