> when I was learning Go, I read a guide that told you to fire off a goroutine t...

fnord123 · 2024-08-29T09:11:46 1724922706

> what exactly is wrong with this approach?

Before Go had iterators, you either had callbacks or channels to decompose work.

If you have a lot of files on a local ssd, and you're doing nothing interesting with the tree entries, it's a lot of work for no payoff. You're better off just passing a callback function.

If you're walking an NFS directory hierarchy and the computation on each entry is substantial then there's value in it because you can run computations while waiting on the potentially slow network to return results.

In the case of the callback, it is a janky interface because you would need to partially apply the function you want to do the work or pass a method on a custom struct that holds state you're trying to accumulate.

Now that iterators are becoming a part of the language ecosystem, one can use an iterator to decompose the walking and the computation without the jank of a partially applied callback and without the overhead of a channel.

kjksf · 2024-08-29T09:23:36 1724923416

Assuming latest Go 1.13 I would write an iterator and used goroutines internally.

The caller would do:

    for f := range asyncDirIter(dir) {
    }

Better than exposing channel.

But my first question would be: is it really necessary? Are you really scanning such large directories to make async dir traversal beneficial?

I actually did that once but that that was for a program that scanned the whole drive. I wouldn't do it for scanning a local drive with 100 files.

Finally, you re-defined "traversing a tree" into "traversing a filesystem".

I assume that the post you're responding to was talking about traversing a tree structure in memory. In that context using goroutines is an overkill. Harder to implement, harder to use and slower.

lelanthran · 2024-08-29T09:44:36 1724924676

> In that context using goroutines is an overkill. Harder to implement, harder to use and slower.

I agree with this, but my assumption was very different to yours: that the tree was sufficiently large and/or the processing was sufficiently long to make the caller wait unreasonably long while walking the tree.

For scanning < 1000 files on the local filesystem, I'd probably just scan it and return a list populated by a predicate function.

For even 20 files on a network filesystem, I'd make it async.