HN2new | past | comments | ask | show | jobs | submitlogin
Open-Sourcing Traffic Server: 700k lines of code, 9 months (yahoo.net)
37 points by peter123 on Feb 19, 2010 | hide | past | favorite | 16 comments


LOC is not something you should brag about. That said, it looks like the open sourced version (300K) is about on par with Squid and Apache:

nginx 75,214

lighttpd 84,852

apache 294,002

squid 283,417

(according to ohloh.com)


It's worrying just how bloated and large many simple web servers get. Those are some seriously large LOC counts.


the vast majority of apache's are in bundled optional modules, not the core.

I am not sure where ohloh is getting its numbers, but here are the directory stats from slocount, for the recent httpd 2.3.5-alpha:

  SLOC    Directory       SLOC-by-Language (Sorted)
  95386   modules         ansic=95095,lex=191,yacc=100
  27977   server          ansic=27977
  11362   build           sh=9980,awk=657,perl=433,pascal=292
  7248    support         ansic=7076,perl=92,sh=80
  3250    include         ansic=3250
  1622    os              ansic=1622
  1094    test            ansic=843,sh=230,perl=21
  159     top_dir         sh=159
If you were to strip out everything 'not needed' for a basic server, I expect you would end up 'running' about 40k LOC, which I think is 'inline' with others like varnish/lighttpd/etc

apache's problem isn't bloat, its architecture and bad sysadmins. it was designed from 1994 for prefork-based systems, and sysadmins have been configuring it since then, or not configuring it. Yes, it will use lots of ram when you embed Python and PHP in it, so don't do that.

mini-rant over, and a disclaimer: i am an httpd committer.


Does it really use any more RAM with python or PHP embedded than Apache + out of process Python or PHP would together? Seems to me that the main inefficiency is that the memory overhea for the scripting language is incurred whether or not there is active execution. So, that RAM is still tied up when sending a static file to a client, or when sending the buffered result of a static page. Seems like having an event-driven piece that took over for each prefork process when it came to returning the result and any logging that happens after the result has been sent would be a relatively small amendment to the architecture with a big payoff.

I know people are reverse proxying to Apache with nginx or similar for similar reasons.


I haven't tried the other MPM plugins extensively, but it seems like the default should be "Don't use a ton of ram, don't fail if there's more than like 10 concurrent connections". Especially these days when people are ajax and cometting to their hearts content.

I guess one of the issues is that if you're using it with PHP/Python/etc, those expect to be in their own thread/process, so using more of a scalable async approach in apache isn't always possible.

I agree @ architecture + misconfiguration.


the line count wasn't intended as a brag. Yahoo! worked on cleaning up the source code for 9 mo in prep for open source release and in the process reduced the size from 700k to 300k


Yeah, and we've just begun cleanup :). There's still a bunch that can go, there's plenty of stuff that's no longer necessary or supported, and over time, I'd expect that to go away entirely.

Also, the long term goal is to remove more from the TS "core", and move that out to plugins, so the core will be sleeker. Of course, we'll also add on new features, but hopefully many of those can be implemented as plugins as well.


It's multithreaded, so can take advantage of more CPU mores, unlike squid.

It has total disk cache persistence, so it won't lose any cache if the server crashes, traffic server crashes, unlike varnish.

If you've got cache spread over multiple spindles and one of the spindles dies while in production, it'll just keep on going, skipping that spindle. Squid does this, varnish does not.

It does run on linux. Actually it's what it was built on, originally, IIRC.

I have used TrafficServer (at Yahoo) and I'm willing to bet that it will overtake both squid and varnish as time goes on.


Kind of hard to find information out there... The OP link and links off that page don't lead anywhere useful. Googled around and came up with this:

http://cwiki.apache.org/confluence/display/TS/Traffic+Server

Which also yields:

http://wiki.apache.org/incubator/TrafficServerProposal

http://incubator.apache.org/projects/trafficserver.html

The Apache "champion" is Doug Cutting, of Hadoop fame, so it seems to have street cred.


Will be interesting to see how this compare to the current king-of-caches, Varnish.

The plugin architecture looks like it'll make it pretty flexible, but it'll have to be pretty special to beat Varnish in terms of performance.


(see my post above) :)


Definitely worth looking at - will spend my weekend on that, but so far not sure if(and where?) this thing can find own niche in popular deployment architectures.


If you do, come join #traffic-server on freenode, the "installation" is still a little rough in the edges.


Any takes on how this compares to, for instance, Squid?


They mentioned everything apart from the fact that its a web server. Also, have I got this wrong or did they just hint that it doesn't run on linux?


What makes you think it doesn't run on Linux? From the article it sounds like it only runs on Linux since they say "contributors outside of Yahoo! can choose to build the code on more modern, non-Linux OSes as they see fit."




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: