I posted this comment:
"It's obvious that you're new to Node.js.
First of all, you should be aware that Async.js is a mere flow-control library. It does not offload work to separate threads, and neither is it able to parallelize work. Internally, it mostly does bean-counting (but very helpfully so).
As you can see in the source https://github.com/caolan/async/blob/master/lib/async.js#L35... async.sortBy simply uses Array.sort for the actual sorting. The only reason you'd want to use Async.sortBy is if the values that the array of "keys" is not known beforehand (and needed to be loaded through io - asynchronously). This is clearly exemplified in the documentation. https://github.com/caolan/async#sortBy
The implication of this that your call to async.sortby can be replaced by a call to array.sort. This will remove two unnecessary runs of async.map, inflicting a potentially huge performance penalty.
You do need to pass array.sort a comparator function, otherwise it will sort lexicographically (see https://developer.mozilla.org/en-US/docs/JavaScript/Referenc... ). That said, I'm not sure what the actual contents of your input file is. In your .Net example, you do not seem to bother to convert the array of strings to an array of ints (or floats). I think that .Net sort will sort an array of strings lexicographically as well. Furthermore, in the node.js example, you seem to be content with returning the resulting median as an int, not as float. Do the input "decimals" in the input file represent ints or floats? Do they all have exactly the same amount of decimals? Are both the Node.js and .Net algorithms doing the same thing? I think not.
Finally, we get to Array.sort. Array.sort blocks. Depending on the multi-threading efficiency of the underlying algorithm of Array.sort (which I don't have insight in), the code may not be able to use all available system resources. Keep in mind that Node.js is single-threaded. I practically don't know anything about .Net, but I assume it will magically start new processes and or threads if the runtime deems this beneficial. For Node.js, you may want to try the Cluster api, http://nodejs.org/api/cluster.html . You could try seeing if performance increases by adding one or more extra server processes.
I can't comment about the quality of the .net code since I don't have any experience with it.
I think it would be fair (and very educative to others) if you'd rerun the benchmarks with
1. Async.sortBy replaced with array.sort
2. with both .Net and Node.js algorithms fully doing the same thing (i.e. let them both sort either floats, ints, or strings), and
3. at least one extra server process for Node.
I think most interesting would be if you'd made the changes step-by-step, and run the benchmarks at each step.
My guess is that step 1 would give the biggest difference.
Depending on how you decide to resolve the differences in the two algorithms, performance of your .Net code may be slightly affected. It could potentially be speed up in fact, if somehow it's able to sort ints (or floats) faster than strings. The actual job of sorting probably overshadows it all though."
It's kind of funny he talks about it being fast for due to non-blocking IO:
One of the key reasons most argue is that node.js is fast, scalable because of forced non-blocking IO, and it’s efficient use of a single threaded model.
...then goes on and sets up a benchmark which is more dependent on CPU than IO. Not to mention as mentioned here that benchmark itself is flawed.
In my experience Node is faster in the case of most web apps that select a row from the db, read network requests to do aggregation, or update a column in a db. Anything that does do CPU computation generally uses a native hook or a different tech altogether.
"...then goes on and sets up a benchmark which is more dependent on CPU than IO. Not to mention as mentioned here that benchmark itself is flawed."
Thank you. It amazes me how some people can write code in a framework like Node without even understanding the event-driven paradigm's strength and weaknesses. I was sitting there reading it, and then he states he is using a file sort??? WTF? As if anyone would use Node for that purpose.
I ran it too and came to the same conclusion: problem in the original comparison is that CPU usage of the node version never gets over 40% on my laptop, while the .NET version uses 100%. Using Array.sort() makes little difference. Adding threads to the node version does.
I think also adding threads to the .net version wont help it much; it already uses 100% CPU.
> I can't comment about the quality of the .net code since I don't have any experience with it.
I'm by no means a .NET guru, but reading his code a couple non-optimal points jumped out at me:
- after parsing the file into an Array, he needlessly converts it into a List (while calling his variable 'array'). How bad this is is hard to say -- if the .NET compiler is sufficiently smart enough, it could optimize this to just a wrap operation, since the default List implementation uses an Array internally. That seems unlikely to me though. You'd have to check the generated code and benchmark.
- by sorting with the default string comparator, he's doing a culturally-aware unicode sort. I.e., the values are being sorted to "alphabetical" order, for however Unicode defines "alphabetical" for his current culture setting. A lot of people seem to feel it's obviously faster to compare the strings than the parsed floats. I don't think that's at all obvious.
Thinking this through some more, using sort with a comparator function would be unnecessary slow. Given the objective of the algoritm (return a median), it's much better to convert the array of strings to an array of floats (or ints, whatever he wants) first.
meryn this is an opportunity for you to properly design a benchmark and write your own blog post.
EDIT: I meant to imply that most people who write benchmarks usually aren't experts in every language/framework in the comparison, so it would be nice to see someone who is competent in both .NET and in nodejs put together a benchmark.
If I ever would be doing such a thing, it would be more of a lesson in how to use node.js properly (even for non-typical tasks like sorting) then a performance comparison between node.js and .Net. Even then, I don't think it would be very interesting for people who like to read about node.js. It would be non-news to them, and for an atypical use-case.
I actually have no interest at all in getting a .net stack running on my mac.
The misleading information is what bothers me, so I hope to see it corrected. It's especially harmful because his blog will be most likely read by people who naturally favor .net. It's not even a tech blog per se. Then objective information is all the more important. (EDIT: Actually it does seem to a tech blog, which surprises me a bit, given the superficiality of his analysis)
You can see at the end of his posts, how he diverts from the main subject and goes on about (I'm sure to be) wonderful technologies and possibilities that the .net stack offers.
First of all, you should be aware that Async.js is a mere flow-control library. It does not offload work to separate threads, and neither is it able to parallelize work. Internally, it mostly does bean-counting (but very helpfully so).
As you can see in the source https://github.com/caolan/async/blob/master/lib/async.js#L35... async.sortBy simply uses Array.sort for the actual sorting. The only reason you'd want to use Async.sortBy is if the values that the array of "keys" is not known beforehand (and needed to be loaded through io - asynchronously). This is clearly exemplified in the documentation. https://github.com/caolan/async#sortBy
The implication of this that your call to async.sortby can be replaced by a call to array.sort. This will remove two unnecessary runs of async.map, inflicting a potentially huge performance penalty.
You do need to pass array.sort a comparator function, otherwise it will sort lexicographically (see https://developer.mozilla.org/en-US/docs/JavaScript/Referenc... ). That said, I'm not sure what the actual contents of your input file is. In your .Net example, you do not seem to bother to convert the array of strings to an array of ints (or floats). I think that .Net sort will sort an array of strings lexicographically as well. Furthermore, in the node.js example, you seem to be content with returning the resulting median as an int, not as float. Do the input "decimals" in the input file represent ints or floats? Do they all have exactly the same amount of decimals? Are both the Node.js and .Net algorithms doing the same thing? I think not.
Finally, we get to Array.sort. Array.sort blocks. Depending on the multi-threading efficiency of the underlying algorithm of Array.sort (which I don't have insight in), the code may not be able to use all available system resources. Keep in mind that Node.js is single-threaded. I practically don't know anything about .Net, but I assume it will magically start new processes and or threads if the runtime deems this beneficial. For Node.js, you may want to try the Cluster api, http://nodejs.org/api/cluster.html . You could try seeing if performance increases by adding one or more extra server processes.
I can't comment about the quality of the .net code since I don't have any experience with it.
I think it would be fair (and very educative to others) if you'd rerun the benchmarks with 1. Async.sortBy replaced with array.sort 2. with both .Net and Node.js algorithms fully doing the same thing (i.e. let them both sort either floats, ints, or strings), and 3. at least one extra server process for Node. I think most interesting would be if you'd made the changes step-by-step, and run the benchmarks at each step.
My guess is that step 1 would give the biggest difference. Depending on how you decide to resolve the differences in the two algorithms, performance of your .Net code may be slightly affected. It could potentially be speed up in fact, if somehow it's able to sort ints (or floats) faster than strings. The actual job of sorting probably overshadows it all though."
What do you guys think of this?