> when I was learning Go, I read a guide that told you to fire off a goroutine to walk a tree and send the values back to the main goroutine via a channel.
Okay, I gotta ask - what exactly is wrong with this approach? Unless you're starting only a single goroutine[1], this seems to me like a reasonable approach.
Think about recursively finding all files in a directory that match a particular filter, and then performing some action on the matches. It's better to start a goroutine that sends each match to the caller via a channel so that as each file is found the caller can process them while the searcher is still finding more matches.
The alternatives are:
1. No async searching, the tree-walker simply collects all the results into a list, and returns the one big list when it is done, at which point the caller will start processing the list.
2. Depending on which language you are using, maybe have actual coroutines, so that the caller can re-call the callee continuously until it gets no result, while the callee can call `yield(result)` for each result.
Both of those seem like poor choices in Go.
[1] And even then, there are some cases where you'd actually want the tree-walking to be asynchronous, so starting a single goroutine so you can do other stuff while talking the tree is a reasonable approach.
Before Go had iterators, you either had callbacks or channels to decompose work.
If you have a lot of files on a local ssd, and you're doing nothing interesting with the tree entries, it's a lot of work for no payoff. You're better off just passing a callback function.
If you're walking an NFS directory hierarchy and the computation on each entry is substantial then there's value in it because you can run computations while waiting on the potentially slow network to return results.
In the case of the callback, it is a janky interface because you would need to partially apply the function you want to do the work or pass a method on a custom struct that holds state you're trying to accumulate.
Now that iterators are becoming a part of the language ecosystem, one can use an iterator to decompose the walking and the computation without the jank of a partially applied callback and without the overhead of a channel.
Assuming latest Go 1.13 I would write an iterator and used goroutines internally.
The caller would do:
for f := range asyncDirIter(dir) {
}
Better than exposing channel.
But my first question would be: is it really necessary? Are you really scanning such large directories to make async dir traversal beneficial?
I actually did that once but that that was for a program that scanned the whole drive. I wouldn't do it for scanning a local drive with 100 files.
Finally, you re-defined "traversing a tree" into "traversing a filesystem".
I assume that the post you're responding to was talking about traversing a tree structure in memory. In that context using goroutines is an overkill. Harder to implement, harder to use and slower.
> In that context using goroutines is an overkill. Harder to implement, harder to use and slower.
I agree with this, but my assumption was very different to yours: that the tree was sufficiently large and/or the processing was sufficiently long to make the caller wait unreasonably long while walking the tree.
For scanning < 1000 files on the local filesystem, I'd probably just scan it and return a list populated by a predicate function.
For even 20 files on a network filesystem, I'd make it async.
Okay, I gotta ask - what exactly is wrong with this approach? Unless you're starting only a single goroutine[1], this seems to me like a reasonable approach.
Think about recursively finding all files in a directory that match a particular filter, and then performing some action on the matches. It's better to start a goroutine that sends each match to the caller via a channel so that as each file is found the caller can process them while the searcher is still finding more matches.
The alternatives are:
1. No async searching, the tree-walker simply collects all the results into a list, and returns the one big list when it is done, at which point the caller will start processing the list.
2. Depending on which language you are using, maybe have actual coroutines, so that the caller can re-call the callee continuously until it gets no result, while the callee can call `yield(result)` for each result.
Both of those seem like poor choices in Go.
[1] And even then, there are some cases where you'd actually want the tree-walking to be asynchronous, so starting a single goroutine so you can do other stuff while talking the tree is a reasonable approach.