a) Show me the code to start with, don't send me digging for it. A SimpleController example as the very first thing would help give a feel for the project, and makes me more likely to consider the project.
b) If there's an easier way (like the drogon_ctl utility at the start of Quickstart [0]), show that first, and the more detailed way second.
Other than that, it looks great. I've used libmicrohttpd a few times, so a bit less of an overhead always looks great.
This is cool, and I like it. Very Haskell like, which is a compliment in my book.
But one thing that surprises me is that folks are essentially sleeping on HTTP/2. HTTP/2 is just a hell of a lot better in most every dimension. It's better for handshake latency, it's better for bandwidth in most cases, it's better for eliminating excess SSL overhead and also, it's kinda easier to write client libraries for, because it's so much simpler (although the parallel and concurrent nature of connections will challenge a lot of programmers).
It's not bad to see a new contender in this space, but it's surprising that it isn't http/2 first. Is there a good reason for this? It's busted through 90% support on caniuse, so it's hard to make an argument that adoption holds it back.
The reason is that HTTP/2 will be short-lived. Most of its potential (protocol multiplexing, mainly) is wasted because it still runs over TCP/IP. HTTP/3 will correct that with QUIC. I think we can except HTTP/3 to really see widespread adoption.
HTTP/1.1 will remain ubiquitous though because there's a gazillion box that only speaks that.
To the extent that HTTP/3 is faster/cheaper/better than its predecessors, it should allow ads to be served more cheaply. Combine that with support in the dominant browser (and why wouldn't they, after all they did basically invent it and they have a vested interest in serving ads efficiently) and I'd say HTTP/3 has a pretty good shot at success.
> To the extent that HTTP/3 is faster/cheaper/better
This is rather questionable assumption. For almost everyone in the world it's not faster/cheaper/better than HTTP/1.1, definitely not worthy enough to even bother with it. So, the only way it can get anywhere is if Google abuses its position and forces everyone to adopt it. Which they probably won't do for such a silly thing, they get more out of it by coming up with more useless mediocre "faster/cheaper/better" protocols HTTP/[4567], because this pressures competition to waste resources on that, instead of on something that can compete with Google.
One way to get around this is to use HAProxy as a middleman where it can handle HTTP2 concurrent connections on the frontend and then connect to the HTTP1.1 webservers on localhost on the backend so that you don't have to pay the big TCP connection latency.
Right, and this is "better" but if you start handling significant volume you're going to want HTTP/2 behind your proxy. Envoy does this, and it's a basically free way to just get more out of your hardware and network.
Not necessarily because let's say your backend server is multi-threaded and assigns each connection to its own thread for DoS safety. Now with HTTP2, you will make one connection to the backend server even though you are multiplexing those requests they’re still being served serially by one thread+socket. Even if you Demux and use threadhandlers you have to remux and are still limited to the single socket.
Now if HAProxy makes multiple connections to the backend each one gets served by its own thread+socket and that's going to load much faster because at the very least it's going to get more attention from the OS. Furthermore, if you use the keepAlive header, and set long timeouts, then you don’t even have to pay the connection penalty. So essentially you’ve shifted thread management to HAProxy by virtue of its parallel connections which keeps the webserver code pretty simple. And in a C++ program simplicity is key to correctness
I'm not sure I agree with most of this post. Firstly, a userland socket mux/demux implementation isn't such an insurmountable challenge. It's essentially the core of a good http/2 implementation. If you've got a good HTTP/2 implementation, you've necessarily got a good mux/demux solution.
> Now if HAProxy makes multiple connections to the backend each one gets served by its own thread+socket and that's going to load much faster because at the very least it's going to get more attention from the OS.
I'm not sure what you're basing this on. What is the technical definition of "more attention from the OS." If anything, limiting things down to a single process over one connection will improve latency. It'll can help minimize memory copies if you get volume because you'll get more than one frame per read (and you certainly aren't serving a L>4 protocol out of your NIC). Most importantly, it'll remove connection establishment and teardown costs. These aren't free.
Now, if you really are loading backends so that responsiveness is a problem, you'll need to appeal to whatever load balancing solution you have on hand. But most folks agree that this is faster.
> Furthermore, if you use the keepAlive header, and set long timeouts, then you don’t even have to pay the connection penalty.
But you still have head-of-line blocking, so you're still spamming N connections per client to get that concurrency factor. Connections aren't free, they have state and take up system resources. You'll serve more clients with one backend if you take up less resources per connection.
> So essentially you’ve shifted thread management to HAProxy by virtue of its parallel connections which keeps the webserver code pretty simple.
I don't think this is true at all. If you want to serve connections in parallel or access resources relevant to your service in parallel, you're going to need to do that. You can of course choose to NOT do this and spam a billion processes rather than use concurrency.
> And in a C++ program simplicity is key to correctness
I don't think this is unique to C++, but I'm also not sure it's really relevant here. From an application backend's perspective, it's very much the same model. They need a model that supports re-entrant request serving.
First let me give some background. In the beginning there was HTTP1.0
where you send a request recieve a reply and then terminate the connection. A TCP connection required 3 round trip packets for every connection + a reques and a recieve meant that 60% of the delay was mearly getting ready to talk.
Http1.1 brought pipelining where you could use the keepalive header and send req1, req2, recN and then expect to recieve reply1, reply2, replyN. The replies are expected in the order they are requested.
Http2 adds a bunch of things. For one, the requests are in a binary format instead of a text in order to acheive better compression. Another thing is that it allows multiplexing. This is different from pipelining because now you can recieve replies out of order which allows small files not to get stalled out by large files.
However, HAProxy will Demux the http2 requests and separate them into multiple parallel connections to the backend server where each connection supports pipelining so as not to close immediately. Each request will use a connection that’s free (ie doesnt already have a pipelined req in progess) so this is effectively the same as http2 multiplexing since HAProxy will send them back to the client in the order they are recieved from the backend (which isnt necessarily the requested order aka multiplexed)
The benefit here is that
1) If you have multiple webservers, they dont have to each deal with the muxing/demuxing of streams and converting the binary to http.(might be possible to skip binary translation but code will be ugly)
2) If you have parallel connections each using threatpools to handle each request stream then you could start running into thread contention problems
3) You protect your C++ webservice with a battle tested service like HAProxy
> Http1.1 brought pipelining where you could use the keepalive header and send req1, req2, recN and then expect to recieve reply1, reply2, replyN. The replies are expected in the order they are requested.
Remember though that because of the model, you cannot possibly serve these in parallel. You must serve them serially to be on spec.
> Each request will use a connection that’s free (ie doesnt already have a pipelined req in progess) so this is effectively the same as http2 multiplexing
It's not the same at all though, is it? HTTP/2 doesn't wait for each request to return. You could easily do exactly that same process with HTTP/2, and by decoupling the notion of "utilization" from "that connection is busy", you can actually balance to servers based on more sophisticated metrics.
> 2) If you have parallel connections each using threatpools to handle each request stream then you could start running into thread contention problems
You already have these problems with resource management for application services.
> You protect your C++ webservice with a battle tested service like HAProxy
Why does the http/2 architecture not get a load balancer but the HTTP 1.1 architecture does?
In a perfect world, I would recommend http2 all the way. But since an HTTP1.1 implementation is much easier to dissect I think it's a better choice for someone writing their own web service.
In response to your first 2 questions:
In front of HAProxy is HTTP2 and in the backend is say 32 long-lived HTTP1.1 connections to your web server each handled on its own thread. HAProxy demuxes the HTTP2 request and then parallelizes the parts over the 32 connections. Then whichever order they come back from the backend is the order in which they are returned to the sender. HAProxy also knows that the connection that just returned is now available to immediately work on a new request. It is not necessary to wait for a request to finish before scheduling the next request on a connection by virtue of pipelining. If you were round robin-ing through your connections and reusing available connections before pipelining, then head-of-line blocking will be fairly rare.
In response to question 3:
Now it's important to remember that thread performance is directly related to the number of physical cores. If the system has 4 cores, then running the web server with 4 threads will perform better than running with 40 threads due to the overhead of context switching. If you have 4 HTTP2 connections using a shared thread-pool to handle the multiplexed requests then there will be a performance hit as each connection waits for a thread to be available as another connection might use multiple threads simultaneously. In the solution I propose, since HTTP1.1 requests are already sequential in nature, each connection can be given its own thread and run as fast as it can. You can't do that with HTTP2 because then you would lose all ability to parallelize over a single connection. Really what it comes down to is scheduling tasks for CPU cores and a thread that has constant work in its queue will work faster than one that continuously returns a value and waits to be given another request to process. However, HTTP2 does have the advantage of scheduling on the thread-level vs the connection level. I suspect this results in better performance under very high loads and when there is a significant size/time-complexity differential between various assets which causes frequent head-of-line blocking.
In response to question 4:
I figured that if you already implemented HTTP2 and you were that concerned with speed that you wouldn't want to introduce a middleman, but HAProxy's overhead is pretty low.
I understand what you are saying. But why do you need to use http 1.1 behind the proxy for this setup? Why can't you use http 2 both behind and in front of the reverse proxy, but still demux into multiple connections at the proxy? Just because http 2 supports multiplexing, doesn't mean that you need to use it.
You definitely can, I'm just arguing that the complexity involved to implement the feature is not worth the real world performance gain. Furthermore, by adding that complexity you make it that much harder to make sure the program is free of undefined behavior.
Fair point, although this doesn't need to be implemented in the application code itself. In reality it should be implemented by an HTTP library like this, and any general purpose HTTP library ought to implement HTTP 2 anyway. So it could end up being that by consistently using HTTP 2 across the whole stack, you end up with less complexity. Plus you get the potential performance gains, even if small.
One possibility is that HTTP/2 is much more work to implement, and since it isn’t /absolutely/ required for base “serve a thing” functionality, it maybe be left to be implemented later.
I'm not sure that "HTTP/3 is about to overtake HTTP/2" is a very fair statement given how many years it took load balancers to start supporting HTTP/2.
HTTP/3 (QUIC) is genuinely exciting and I'm very eager to see it. But given it's even more different from http/2 than http/2 is from http 1.1, it's probably gonna be another 3-4 years before we see good load open source balancer performance in front of it.
It conforms better to the async model that JavaScript loves. For everyone outside the single largest deployment in history, it's simpler because it's a smaller spec.
I see your point, though I'd very much prefer if we could split the stack into application server > http server > SSL terminating reverse proxy by default. That includes HTTP/2 handling in the proxies task list. That would split concerns way more elegantly than having to replicate logic between http and https modules.
By that argument, they should get cracking on HTTP/3 already. :)
Even though HTTP/2 is usable, adoption is still low (last I checked it was around 25%), and any server that supports 2 needs to support 1 for backwards compatability. It's also more complicated to implement, so I can see why you'd want to start with HTTP/1.
HTTP/2 has support on over 90% on caniuse, which is a decent approximation of what most folks selling products here will see. The browsers that don't support it aren't the sort we really care that much about anyways unless we're building a bank or something, in which case suffering is your mandate anyways.
As for client libraries for APIs, it's a non-issue. There's plenty of flexibility on backends.
I'm not saying you'd skip "HTTP/1" but I certainly wouldn't put a lot of effort into it.
You lose out bigtime between your revproxy and your backends. You might as well abandon it entirely and go with grpc streaming or thrift.
Revproxies are essentially awful, ugly lynchpins in your distributed architecture. You want to be as low as you can afford to be on utilization for them, because if they have a bad day you have no product.
I didn't say you wouldn't have a proxy. I said you want to have as few as you possibly can for volume and redundancy, because they're a centralized point of failure in your architecture and they have to scale over your entire infrastructure.
I am certainly NOT advocating for their lack. I'm not even sure how that could realistically work.
If you really want a security story, you shouldn't be looking into a C++ server. That's a very false sense of security, there is no way to guarantee memory safety.
Absolutely not true for C++. The language was invented to guarantee memory safety, like Rust was. (The fact that people ignore the safety features when coding is irrelevant, people liberally throw 'unsafes' around in Rust and Haskell too.)
C++ wasn't invented to be safe in the same way as Rust or Ada, whose author's saw state of the art safety as a critical tenant of the language. Stroupsoup was, I believe, mostly interested in making C more suitable for large software engineering projects and was more interested in better abstractions, which could lead to better safety.
The fact that people can ignore the safety features is the point. Unsafe is a contract, and it only opens a small number of extra "features". It forms the axioms of a proof the your code correctly uses memory. C++ is essentially just patched with safer abstractions, some of which are major improvements, but there is no proof, no rigorous check for safety until runtime from asan or a fuzzer, or other tools that aren't a part of core C++.
The biggest issue with the safety of C++, however, is the size of the language. It has become so big and so complicated, and the rules that govern things we take for granted like function name lookups are horrendously unintuitive and lead to unintended consequences.
The language was invented so that Bjarne after his experience having to rewrite Simula into BCPL, never had to go through such a low level language again in his life.
Plenty of interviews where he states this, including some of his books.
leads to undefined behavior (aka the standard says the program can delete your hard drive) if x is too big, that's not a language that guarantees memory safety.
According to the spec, it can lead to all the same problems that any UB leads to, including all the problems that any memory unsafety leads to. The code is allowed by the spec to cause memory unsafety.
But to list some direct memory unsafety possibilities: indexing off the end of an array, indexing off the end of a vector, dereferencing a bad pointer, dereferencing a null pointer, dereferencing a bad iterator, double delete.
> it crashes in debug mode, and wraps (two's complement) in release mode.
Well, that's indeed a nice feature in debug mode. Mostly wraparound is undesired and wraparound bugs are pretty common. Shame the performance cost is too much to do same in release.
Whenever wraparound semantics it is actually wanted, one can use an appropriate type.
I have a big amount of love for C++, but even if the user of this framework knows about security, I would really recommend to not use it.
The risk of security issues of such framework, since it uses C++, might be quite high. I'm not into security a lot, so don't trust my word, but I'd be very cautious.
Generally, I guess higher performance means lower security.
I like it, but the API for defining HTTP endpoints seems messy compared to, say, Flask or ExpressJS. Is there any particular reason it is done this way vs something syntactically cleaner, like:
The method in the readme file is for decoupling. Imagine a scene with dozens of path handlers. Isn't it better to disperse them in their respective classes?
Of course, for very simple logic, the interface you are talking about is really more intuitive, I will consider adding such an interface. Thank you for your comment.
> Use a NIO network lib based on epoll (kqueue under MacOS/FreeBSD) to provide high-concurrency, high-performance network IO
What is "NIO" here? I am familiar with NIO as New Input-Output from Java, which was the extension of the standard library to cover asynchronous I/O. Is the author using "NIO" to mean asynchronous IO? Their use of '"JSP-like" CSP file' elsewhere hints at a Java background.
Good question!
If the type of the parameter of callback is an r-reference, users must move resp into the callback. Note that the resp is a shared_ptr, which means callback can reuse the buffer or do anything to the resp object and copying smart pointers does not require too much cost.
Why not just wrap libevent or some other battle tested network poller?
Also, this would be super cool if it were in Rust. As it stands writing web services in C++ is basically programming malpractice at this point. Would love to see the author contributing to tokio or hyper and helping us get rid of all this legacy inheritancey impl junk.
I am unapologetic about the truth. The calculus engineers employee that leads them to defer the safety of user facing software for the familiarity of C++ has to stop.
I don't claim that Rust is bulletproof. You still have to verify the axiomatic unsafe sections of your code. There can still be bugs in Rust itself, or logical errors in your programs. What I do claim is Rust is a giant leap forward over C++, which is an inadequate tool to write software end users will use. As society continues to rely more and more on software, as it becomes the thing that we stake our lives on, we have a moral responsibility to modernize our tools.
Rust is designed with the hindsight of C++ and many other languages, it frankly sweeps the floor in every way other than compile times and certain efficiencies, which will inevitably improve. The only legitimate reason to use C++ is to access legacy C++ libraries that Rust has not yet developed and maintain existing C++ projects.
That is not the only legitimate reason, at all; adoption of new technology takes time for good reasons. Rust is still pretty new, a lot of things are still only in nightly, and while impressive the ecosystem has a long way to go in terms of robust libraries.
Open source software is only one facet; there are plenty of companies with millions of lines of C++ internally, some of which making up parts of their infrastructure.
I'm being intentionally very non exhaustive for two reasons:
1, I don't feel I should have to explain this. It's self explanatory that these two extremely complicated things can't be compared like integers
2, because it's very much subjective. If you are starting a completely greenfield project Rust might be a good idea. I am a huge fan of what Rust has to offer. But it has it's kludges and kinks to work out, whereas companies have had decades to work out and improve their usage of C++. In particular, linting and coding standards for modern C++ projects generally go very far in improving memory safety even if they don't provide strong guarantees like Rust can. With enforced coding standards you can virtually eliminate risks like buffer overflow and discourage risks like data races. In addition to the ecosystem of libraries, there's a lot of excellent tooling for C++, for static analysis and debugging.
I fear now that I have elaborated the first instinct you will have is to poke holes in individual points. Please don't. All I'm trying to illustrate is that switching to Rust costs time+effort, and it is seriously not better than C++ in every way, in the same way that Go isn't.
There are also frameworks for C++ like Qt that mitigate data races by using an internal event loop and function scheduler to give concurrency guarantees.
Then we generally agree. Like I said, the one use case for C++ is if you're already using it, or require a library that is only available in C++.
My general point is that people should make a commitment to upgrade, and set a course in that direction. The opposite approach is the wait and see mentality, which leads companies waiting in perpetuity because they collectively never adopt better languages due to the sunk cost fallacy.
It's exactly the big companies that need to commit, because they have the resources to invest in sanding down the kludges and kinks and making Rust really better than C++ in every way, and it's in the early stages of Rust's development were senior C++ talent could really help set the direction of Rust based on what we've learned from C++.
C++ is updated every 3 years as an ISO standard. Rust is updated at the whim of the developers. C++ templates blow Rust's generics out of the water in terms of writing smaller code.
Thanks for correcting my ignorance! Definitely a stupid statement on my part. The process looks pretty solid. What is confusing is who the Rust Team is and how the project is sponsored. Is it all volunteer like C++ or is there some sponsorship involved?
The team is largely volunteers, though several companies sponsor people as well. Some people who are sponsored are only part time, or only on the bits relevant to that company. Mozilla sponsors the most full-time people at the moment. We’re working on getting more full-time people, there are some challenges there.
How does Rust sweep the flor in game engines like CryEngine or Unreal, GUI frameworks like WPF, Qt, MFC, IDE tooling like Visual Studio, database drivers like OCI and ADO, CUDA, Metal, DirectX....?
As a language Rust sleeps the floor with C++. You're mentioning a bunch of pieces of software, most of which is already written in C++ so you're giving me a kind of apples to oranges question.
Language coherency without business critical libraries, tooling and support for existing use cases, doesn't help much regarding language uptake.
I like Rust, yet I would be very masochistic to drop the C++ tooling support to write native libraries to plug into Java or .NET applications, and drag our customers and team alongside, just in name of coherency.
I have seen too many coherent languages fail to survive the adoption curve, because their communities failed to focus on the essencials to actually adopt them.
Despite the ISO process being insanely difficult, I consider it and the people that make it happen C++ greatest strength. Will the creators of Rust and Go be as dedicated as Bjarne has with C++? Are they prepared to support and improve it for the rest of their lives? Oh wait, Gaydon is working on Swift now, sorry Rust people. Rob Pike also has a history of giving up and starting fresh, though he has been pretty dedicated to Go.
In other words, I'm confident if I write my code in C++ today, it will work in the next 30 years, while I can't say the same for Rust or Go.
Just an update. Rust's governance model looks pretty solid and I'm confident it could survive. What is unclear to me is the sponsorship and who the Rust Team is. What is clear is that C++'s process has proven it works over long periods. I think Rust's has many elements of that.
It would be much better to use asp.net core over Rust or Drogon. Safe, fast, easy to develop for, easy to hire engineers for, better for actually building a product quickly with fewer bugs.
I'm comparing using Rust versus C++ to build something like Drogon. We're talking about tools to build high performance infrastructure, which is completely different than using the infrastructure from an application developers perspective.
Nim is playing much of the same tune of D, which ultimately failed to gain the needed traction to be a real competitor. It's already ruined by inheritance and the desire to please everyone with the multi-paradigm, take it or leave it GC, compile to everything approach.
Literally anything would be better than C++ at this point though. I'd take D or Nim over C++ in a heartbeat, which is drowning in the weight of its own specification, coupled with a desire to keep programmers in the dark ages of memory safety by convention and the C preprocessor. It's also impossible to find a C++ code base that doesn't have some sort of safety issue even with smart pointers and RAII. Fuzzing, testing, and trying to patch things up simply isn't enough.
The software landscape has changed, there are now more reasons than every to chose a language based on safety and efficiency than can run everywhere than ever before.
It's possible and actual not difficult to write memory-safe C++ code. In the last several years the C++ projects I worked on had close to zero memory issues (I cannot recall any). I think with the advancements of C++14/17 memory issues is a largely solved problem.
a) Show me the code to start with, don't send me digging for it. A SimpleController example as the very first thing would help give a feel for the project, and makes me more likely to consider the project.
b) If there's an easier way (like the drogon_ctl utility at the start of Quickstart [0]), show that first, and the more detailed way second.
Other than that, it looks great. I've used libmicrohttpd a few times, so a bit less of an overhead always looks great.
[0] https://github.com/an-tao/drogon/wiki/quick-start