> The Taco Bell answer? xargs and wget. In the rare case that you saturate the network connection, add some split and rsync. A "distributed crawler" is really only like 10 lines of shell script.
As someone who has had to cleanup the messes of people who started with this and built many hundred line dense bash scripts... please do not do this.
> I made most of a SOAP server using static files and Apache's mod_rewrite. I could have done the whole thing Taco Bell style if I had only manned up and broken out sed, but I pussied out and wrote some Python.
I feel sad for whoever inherited this person's systems.
"Write code as if whoever inherits it is a psychopath with an axe who knows where you live" is something I heard pretty early on in life and it's been pretty useful.
Most experienced programmers know a little bash and enough UNIX commands to get by. This is enough to write a script that handles the happy path, but not enough to handle all error conditions correctly. There are all sorts of tricks you need to know that are commonly skipped. (Forgetting to use -print0 for example, and that's an easy one.) The resulting script is probably okay if you run it interactively and check the output but will blow up or silently do the wrong thing for unexpected input in production. To properly review a bash script for errors you need to be an expert.
By contrast, Go programmers with a few months of experience typically know all of Go.
The older tool is not necessarily better if it has lots of obscure sharp edges that most people don't learn.
This is a simple UNIX pipeline, not multi-hundred line spaghetti of korn, c, bash or even zsh shell scripts.
No builtins were used in the example, just core utilities deployed the way they were designed.
Rewriting the wheel is completely bogus, doubly so when you ultimately make calls to those utilities, as is common when 'admins-cum-programmers' start getting their hands dirty with Python.
I am generally in favor of the idea of the OP, but the core utilities do differ across environments and this will bite you sooner or later.
I was bit recently by some pretty boring search and replace functionality differing between sed on OSX and on Debian. Like, I would have had to pass a different argument to sed based on the version of sed (So I switched to Perl for the task). But this is certainly an insidious category of bug where you don't discover it until to try to run the script in another environment and then you're potentially stuck debugging the script from the top down.
> By contrast, Go programmers with a few months of experience typically know all of Go.
Not really. Just to give an example of things that new Go programmers don't know: what the limits of json serialization are, how introspection works, the funcitonality and limits of the "virtual inheritance", how the GC handles stuff like Go routines... There might be a selection bias though
It's also expensive in resources and by extension energy. Unless your data centre is powered by renewables inefficient code can become an ethical issue.
As someone who has had to cleanup the messes of people who started with this and built many hundred line dense bash scripts... please do not do this.
> I made most of a SOAP server using static files and Apache's mod_rewrite. I could have done the whole thing Taco Bell style if I had only manned up and broken out sed, but I pussied out and wrote some Python.
I feel sad for whoever inherited this person's systems.
"Write code as if whoever inherits it is a psychopath with an axe who knows where you live" is something I heard pretty early on in life and it's been pretty useful.