the vast majority of apache's are in bundled optional modules, not the core.
I am not sure where ohloh is getting its numbers, but here are the directory stats from slocount, for the recent httpd 2.3.5-alpha:
SLOC Directory SLOC-by-Language (Sorted)
95386 modules ansic=95095,lex=191,yacc=100
27977 server ansic=27977
11362 build sh=9980,awk=657,perl=433,pascal=292
7248 support ansic=7076,perl=92,sh=80
3250 include ansic=3250
1622 os ansic=1622
1094 test ansic=843,sh=230,perl=21
159 top_dir sh=159
If you were to strip out everything 'not needed' for a basic server, I expect you would end up 'running' about 40k LOC, which I think is 'inline' with others like varnish/lighttpd/etc
apache's problem isn't bloat, its architecture and bad sysadmins. it was designed from 1994 for prefork-based systems, and sysadmins have been configuring it since then, or not configuring it. Yes, it will use lots of ram when you embed Python and PHP in it, so don't do that.
mini-rant over, and a disclaimer: i am an httpd committer.
Does it really use any more RAM with python or PHP embedded than Apache + out of process Python or PHP would together? Seems to me that the main inefficiency is that the memory overhea for the scripting language is incurred whether or not there is active execution. So, that RAM is still tied up when sending a static file to a client, or when sending the buffered result of a static page. Seems like having an event-driven piece that took over for each prefork process when it came to returning the result and any logging that happens after the result has been sent would be a relatively small amendment to the architecture with a big payoff.
I know people are reverse proxying to Apache with nginx or similar for similar reasons.
I haven't tried the other MPM plugins extensively, but it seems like the default should be "Don't use a ton of ram, don't fail if there's more than like 10 concurrent connections". Especially these days when people are ajax and cometting to their hearts content.
I guess one of the issues is that if you're using it with PHP/Python/etc, those expect to be in their own thread/process, so using more of a scalable async approach in apache isn't always possible.
the line count wasn't intended as a brag. Yahoo! worked on cleaning up the source code for 9 mo in prep for open source release and in the process reduced the size from 700k to 300k
Yeah, and we've just begun cleanup :). There's still a bunch that can go, there's plenty of stuff that's no longer necessary or supported, and over time, I'd expect that to go away entirely.
Also, the long term goal is to remove more from the TS "core", and move that out to plugins, so the core will be sleeker. Of course, we'll also add on new features, but hopefully many of those can be implemented as plugins as well.
It's multithreaded, so can take advantage of more CPU mores, unlike squid.
It has total disk cache persistence, so it won't lose any cache if the server crashes, traffic server crashes, unlike varnish.
If you've got cache spread over multiple spindles and one of the spindles dies while in production, it'll just keep on going, skipping that spindle. Squid does this, varnish does not.
It does run on linux. Actually it's what it was built on, originally, IIRC.
I have used TrafficServer (at Yahoo) and I'm willing to bet that it will overtake both squid and varnish as time goes on.
Definitely worth looking at - will spend my weekend on that, but so far not sure if(and where?) this thing can find own niche in popular deployment architectures.
What makes you think it doesn't run on Linux? From the article it sounds like it only runs on Linux since they say "contributors outside of Yahoo! can choose to build the code on more modern, non-Linux OSes as they see fit."
nginx 75,214
lighttpd 84,852
apache 294,002
squid 283,417
(according to ohloh.com)