As a scientist that ends up working closely with actual professional software engineers... lots of the stuff they do looks like this do me, and I can't for the life of me make sense of why you'd do it.
I have seen a single line of code passed through 4 "interface functions" before it is called that call each other sequentially, and are of course in separate files in separate folders.
It makes reading the code to figure out what it does exhausting, and a few levels in you start to wonder if you're even looking at the right area, and if it will ever get to the part where it actually computes something.
This is actually really bad practice and a very “over eager junior engineer” way of writing software. You’re not off base at all that it seems excessive and confusing. It’s the kind of thing that seems technically complex and maybe even “elegant” (in isolation, when you first write the “interesting” code) at first but becomes a technical nightmare when used in real software that has to grow around and with it. You’re actually more on point in worrying about the understandability and debuggability this introduces.
I spent the better part of two years unfucking some Go software that (among other things) misused channels. The problem with channels is that you rarely actually need them, but can use them for a lot of different things without too much initial difficulty.
I think a good litmus test for proper use of channels is if you answer no to “could this be done with a direct function call instead?” and “can I use a wait group or mutex instead”, and yes to (zooming out a bit to think about what kind of decisions you previously made that led you to think about using channels) “am I really benefitting from concurrency/parallelism enough to justify the technical complexity of debugging concurrent code”.
I saw some code in a job I was just starting where they had added several abstractions that I found...confusing.
After taking an extra long time to understand what the code actually did, I realized that some junior engineer had been using some design pattern they didn't really understand, and that added zero actual value to the routine.
After deleting all of that code and refactoring it to use completely different abstractions, everything was suddenly much easier to read and to extend.
Design is a hard skill to learn, and junior developers profoundly haven't learned that skill yet. But that's what we need to teach them as senior engineers, right?
Not that I could teach the author of the code I changed, since I think it was written by an intern that no longer worked for the company. But you do what you can.
> I realized that some junior engineer had been using some design pattern they didn't really understand, and that added zero actual value to the routine.
£3.50p says it was the Generic Repository pattern implemented over Entity Framework dbContext, right?
--------
Speaking of design-patterns, I subscribe to the opinon that _Design-patterns are idioms to work-around missing features in your programmign language_, which explains why Java has no end of them, and why us jaded folk find happiness in modern languages that adopt more multi-paradigm and FP (the post-Java cool-kids' club: Kotlin, Rust, Swift, TypeScript, (can C# join?)) - so my hope is that eventually we'll have a cohort of fresh-faced CS grads entering industry who only know-of Facades/Decorator/Adapter as something a language designer does to spite their users because any reasonable compiler should handle interface-mapping for you - and the Visitor-pattern as a great way to get RSI.
I can't even remember the details. It's like trying to remember nonsense sentences; they don't stick because they don't really make sense.
To the best I can remember, it was something like the use of an adapter pattern in a class that was never going to have more than one implementation? And it was buried a couple layers deep for no particularly good reason. Or something.
And yes, modern languages like the ones you list make many of the original GoF Design Patterns either absolutely trivial (reducing them to idioms rather than patterns) or completely obsolete.
FP has design patterns too, just different ones, and they don't all have tidy names.
Also some GoF design patterns map pretty closely to FP equivalents... pattern-matching on ADTs + traverse/fold + ReaderT ends up looking a lot like the visitor pattern.
There is also the fact it’s much easier to write something when you know where you are going. When you start you often just make lots of items general in nature to improve later on.
As someone in leadership, my ‘strong opinion held loosely’ on this, is that there’s absolutely no way to meaningfully build this skill in people, in a theoretical setting.
You can, at best, make them aware that there is such thing as “too much”, and “the right tool for the job”, and keep reminding them.
But nothing, nothing, comes remotely close to the real-world experience of needing to work with over-engineered spaghetti, and getting frustrated by it. Especially if it’s code that you wrote 6 months prior.
Juniors will always do this. It’ll always be the senior’s job to…let it happen, so the junior learns, but to still reduce the blast radius to a manageable amount, and, at the right moment, nudge the junior toward seeing the errors in their ways.
> This is actually really bad practice and a very “over eager junior engineer” way of writing software.
To recycle a brief analysis [0] of my own youthful mistakes:
> I used to think I could make a wonderful work of art which everyone will appreciate for the ages, crafted so that every contingency is planned for, every need met... But nobody predicts future needs that well. Someday whatever I make is going to be That Stupid Thing to somebody, and they're going to be justified demolishing the whole mess, no matter how proud I may feel about it now.
> So instead, put effort into making it easy to remove. This often ends up reducing coupling, but--crucially--it's not the same as some enthusiastic young developer trying to decouple all the things through a meta-configurable framework. Sometimes a tight coupling is better when it's easier to reason about. [...]
when I was learning Go, I read a guide that told you to fire off a goroutine to walk a tree and send the values back to the main goroutine via a channel. I think about that "just an example" guide a lot when I see bad channel code.
For me the biggest red flag is somebody using a channel as part of an exported library function signature, either as a param or a return value. Almost never the right call.
I've used that pattern to write tools to e.g. re-encrypt all whatever millions of objects in an S3 bucket, and examine 400m files for jars that are or contain the log4j vulnerable code. I had a large machine near the bucket/NFS filer in question, and wanted to use all the CPUs. It worked well for that purpose. The API is you provide callbacks for each depth of the tree, and that callback was given an array of channels and some current object to examine; your CB would figure out if that object (could be S3 path, object, version, directory, file, jar inside a jar, whatever) met the criteria for whatever action at hand, or if it generated more objects for the tree. I was able to do stuff in like 8 hours when AWS support was promising 10 days. And deleted the bad log4j jar few times a day while we tracked down the repos/code still putting it back on the NFS filer.
The library is called "go-treewalk" :) The data of course never ends back in main, it's for doing things or maybe printing out data, not doing more calcualation across the tree.
> when I was learning Go, I read a guide that told you to fire off a goroutine to walk a tree and send the values back to the main goroutine via a channel.
Okay, I gotta ask - what exactly is wrong with this approach? Unless you're starting only a single goroutine[1], this seems to me like a reasonable approach.
Think about recursively finding all files in a directory that match a particular filter, and then performing some action on the matches. It's better to start a goroutine that sends each match to the caller via a channel so that as each file is found the caller can process them while the searcher is still finding more matches.
The alternatives are:
1. No async searching, the tree-walker simply collects all the results into a list, and returns the one big list when it is done, at which point the caller will start processing the list.
2. Depending on which language you are using, maybe have actual coroutines, so that the caller can re-call the callee continuously until it gets no result, while the callee can call `yield(result)` for each result.
Both of those seem like poor choices in Go.
[1] And even then, there are some cases where you'd actually want the tree-walking to be asynchronous, so starting a single goroutine so you can do other stuff while talking the tree is a reasonable approach.
Before Go had iterators, you either had callbacks or channels to decompose work.
If you have a lot of files on a local ssd, and you're doing nothing interesting with the tree entries, it's a lot of work for no payoff. You're better off just passing a callback function.
If you're walking an NFS directory hierarchy and the computation on each entry is substantial then there's value in it because you can run computations while waiting on the potentially slow network to return results.
In the case of the callback, it is a janky interface because you would need to partially apply the function you want to do the work or pass a method on a custom struct that holds state you're trying to accumulate.
Now that iterators are becoming a part of the language ecosystem, one can use an iterator to decompose the walking and the computation without the jank of a partially applied callback and without the overhead of a channel.
Assuming latest Go 1.13 I would write an iterator and used goroutines internally.
The caller would do:
for f := range asyncDirIter(dir) {
}
Better than exposing channel.
But my first question would be: is it really necessary? Are you really scanning such large directories to make async dir traversal beneficial?
I actually did that once but that that was for a program that scanned the whole drive. I wouldn't do it for scanning a local drive with 100 files.
Finally, you re-defined "traversing a tree" into "traversing a filesystem".
I assume that the post you're responding to was talking about traversing a tree structure in memory. In that context using goroutines is an overkill. Harder to implement, harder to use and slower.
> In that context using goroutines is an overkill. Harder to implement, harder to use and slower.
I agree with this, but my assumption was very different to yours: that the tree was sufficiently large and/or the processing was sufficiently long to make the caller wait unreasonably long while walking the tree.
For scanning < 1000 files on the local filesystem, I'd probably just scan it and return a list populated by a predicate function.
For even 20 files on a network filesystem, I'd make it async.
The only time I've seen it work with channels in the API is when it's something you'd realistically want to be async (say, some sort of heavy computation, network request, etc). The kind of thing that would probably already be a future/promise/etc in other languages.
And it doesn't really color the function because you can trivially make it sync again.
> And it doesn't really color the function because you can trivially make it sync again.
Yes, but this goes both ways: You can trivially make the sync function async (assuming it's documented as safe for concurrent use).
So I would argue that the sync API design is simpler and more natural. Callers can easily set up their own goroutine and channels around the function call if they need or want that. But if they don't need or want that, everything is simpler and they don't even need to think about channels.
I get your point, but a wait group or a mutex can be removed in favor of a clean usage of channels if the proper concerns are isolated at first. And I would personally much rather reason about channels than mutexes and wait groups. Wait groups and mutexes are just begging for deadlocks and race conditions, where a proper channel, used correctly, eliminates both of those by design.
> Wait groups and mutexes are just begging for deadlocks and race conditions, where a proper channel, used correctly, eliminates both of those by design.
By that same logic, if you just use wait groups and mutexes correctly, you should also not worry about deadlocks and race conditions. It's also quite trivial to introduce a deadlock with a channel.
Regardless, channels are basically a more expressive/flexible type than mutexes, waitgroups, and function calling, but in the same family as all of them. You can implement any of those rather trivially with a channel, but there are things you can do with a channel that are quite complex or impossible to implement using those. Such a flexible tool allows you to start doing things that are "easy" to implement yet poor design decisions. For example, instead of direct function calling you can now start passing data over channels, which "works" just as well except it incurs some scheduling overhead (not always a concern depending on how perf sensitive you are), makes debugging and interpreting stack traces more difficult (increasingly so as the logic on both sides of the channel increases over time), and allows the software to start evolving in an unintended way (specifically into an overly complex Actor model with tons of message-passing that is impossible to untangle, rather than a directed tree of direct function calling). Or you have a hard time understanding the state and properties of a piece of data throughout the program lifetime because it doesn't "belong" anywhere in particular.
---
Something I thought about recently: perhaps the biggest balancing act in software is between specificity and expressiveness. To solve a problem you might be able to find something that is perfectly tailored to your needs where you just click something or run something and your problem is solved, but it's highly likely that it will only solve that specific problem and not others ones. Alternatively, a lot of software (like Jira, SAP, many enterprise software) is highly configurable and expressive but requires a lot of effort to set up and may not be particularly good at solving your specific task. At its most extreme, you could technically call a computer containing only a brainfuck compiler and basic text editor as being able to solve any problem solvable by a computer, because it's a programmable turing machine.
This extends even into the weeds of programming, especially when you're working on software with other people or over long periods of time, where you might struggle to enforce or maintain your particular mental model for how the software should work. When faced with implementing something with an expressive approach vs a specific one, you want to be expressive enough to be able to modify the code later to do things you plan to do or think you have a high probability of doing, but you want to be specific enough that the purpose and function of something (be it a library, class, binary, or entire distributed software system) is clear - if it isn't clear, the people using it will struggle with it or avoid it, and the people working on it will start taking it in a direction you didn't intend.
Channels are the type of thing that are expressive enough to be broadly applicable, but are easily misinterpreted (you might be using them to implement parallelism, but your coworker Bob might think you're using them because you want to design your software under a message-passing actor model) and easily misused. They also make it very, very easy to "code yourself into a corner" by introducing inscrutable logical/data paths that can't be untangled. You might be able to use them safely in lieu of a mutex but it only takes one Bob to start taking them in the direction of unmaintainability. And sometimes you might be that Bob without knowing it. That's why I think it's best to avoid them unless your other options are even worse.
Using channels where mutexes would suffice has by far been the main cause of bad concurrent code I've encountered.
Using more than 2 'semantic' channels plus one ctx.Done() channel? There's probably a bug. So far that has been well over 50% accurate, across dozens of libraries.
When they're used like this, chans often break into non-blocking algorithm details, because they don't ensure mutual exclusion. And non-blocking algorithms are freakin hard - few things are guaranteed without great care.
> By that same logic, if you just use wait groups and mutexes correctly, you should also not worry about deadlocks and race conditions.
I agree with most everything else you said - especially about software being a trade off between specificity and expressiveness - but I can’t agree with this.
The problem being that a mutex can hide things that a channel can’t. Channels will always give you what you expect, but that is not the case for mutexes or wait groups or error groups or whatever.
Honestly, the older I get, the more I understand that joke about “you must be this tall to write concurrent programs” and the mark is at the ceiling.
Wait groups are preferred to channels for the purposes they serve. Mostly waiting for goroutines to finish. You can use a channel but wait groups are much cleaner.
Mutexes for shared memory are less preferred than channels. There are always exceptions.
But yeah, if all you have is a hammer then everything looks like a nail. Go has mutexes and wait groups and channels and all of these have their right place and use case. If you're using mutexes to effectively re-implement what channels support then you're doing it wrong. If you're using channels for something that can be a function call then you're also doing it wrong. Software is hard.
In the multi-year series of blaming (us) juniors for every ill in the programming world, they now also get blamed for over-architecting.
I took the opportunity to share that quote with some others on our project because this is a pattern that we recognize from our boss and a few of the productive consultants in the past. Not juniors but people who have/had the weight to set the code style tone of the project and has lead to hours of all of us scratching our head when having to read the code.
Back at uni, we had a 200-level ‘software engineering’ unit, largely introducing everyone to a variety of ‘patterns’. Reading the Gang of Four book, blah blah blah. You get the idea.
Our final assignment for this unit was to build a piece of software, following some provided specification, and to write some supplementary document justifying the patterns that we used.
A mature-aged student that had a little bit of industry experience under his belt didn’t use a single pattern we learned about the entire semester. His code was much more simple as a result. He put less effort in, even when taking into account his prior experience. His justifying documentation simply said something to the effect of “when considering the overall complexity of this problem, and the circumstances under which this software is being written, I don’t see any net benefit to using any of the patterns we learned about”.
He got full marks. Not in a “I tricked the lecturer!” way. I was, and still am, a massive fan of the academic that ran the unit. The feedback the student received was very much “you are 100% correct, at the end of the day, I couldn’t come up with an assignment that didn’t involve an unreasonable amount of work and ALSO enough complexity to ever truly justify doing any of the stuff I’ve taught you”.
All these years later, I still tell this story to my team. I think it’s such a compelling illustration of “everything in moderation”, and it’s fun enough to stick with people.
Joke apart this is a really interesting example. In a job interview I've been asked once if I ever regretted something I did and I couldn't quite word it on the spot, but definitely my first project included extra complexity just so that it "looked good" and in the end would have been more reliable had I kept it simple.
Sometimes there's a scary lack of understanding and competency where you'd expect to find it.
As an undergrad, I once spent about half an hour peer programming with a computer science PhD - it was enlightening.
He didn't have the slightest understanding of software - calling me out for things like not checking that the size of a (standard library) data structure wasn't negative.
But other times these things are done for a reason; sometimes it's actually sane and sometimes it's just a way to deal with the lunacy of a codebase forged by the madmen who came before you.
I often wonder if, unbeknownst to me, I am writing similarly over-complicated software. People seem to be unable to tell, so I doubt I can either. It makes me second-guess my code a lot.
Is there any reliable/objective/quantitative way to evaluate such a thing? The repo is a great example of what not to do, but it's so extreme it's hardly useful in practice.. (nor it should care to be..as a joke)
I think it's circumstantial - do your abstractions make it easier or harder to implement and integrate the sort of features that the application generally requires?
There's sometimes significant use in having some powerful abstractions in place to allow for code reuse and application customisation - but all too often people build fortresses of functionality with no idea how or if it's ever going to be used.
Foresight is useful here; if you can look at new features and break them up into feature specific business logic and generalisable application logic, then similar features can be cleanly integrated with less work in the future.
Sometimes however, the level of customisability far exceeds what is strictly necessary; the complexities involved no longer help, but rather hinder - not only understanding, but feature implementation and integration as well.
IMO the question hinges on stakeholder-dynamics and future-predictions of where the software will go, and both of those inputs are highly subjective so the result will also be subjective.
A simple example would be the difference between code consumed by an in-house application managed by a single team, versus code in an open-source utility library used by 50+ companies.
In the first case, extra layers of indirection tend to have bad cost/benefit tradeoffs, particularly if your programming stack makes automatic refactoring fast and safe.
In the second case, layers of indirection are basically required for backwards-compatibility, because you can't just "fix the caller's code" or have a hard-break as conditions change.
> I often wonder if, unbeknownst to me, I am writing similarly over-complicated software.
If you want to know, and you have a project that you can do this with - pick a reasonably complex project, back it up, don't touch it for a year. Can you work out what the hell is going on? If yes - you're probably doing ok.
If you have abstractions that you weren’t forced to make after exhausting every other option then you can improve your code. If you begrudgingly add abstractions when you can no longer convince yourself that there must be away to avoid it, then you’re likely doing well.
The coder spectrum, scientists on one end, software engineers on the other. Only balance can save us.
I have read code used in research papers. The theoretical math usually goes beyond my comprehension, so I always dive into the code to better understand the logic, only to find... it's way worse... unintelligible.
At the end of the day we are used to what we do, and anything different will be foreign to us.
I agree- I'd like to think I'm somewhere in the middle despite being a scientist. I try to write the best code I can, and keep up on best practices, but try to keep things simple.
A lot of the scientific code out there in research papers is so bad, that when you look at it, you realize the whole paper is actually B.S.. What the paper claims was never even implemented, but they just did the most kludgy and quick thing possible to produce the plots in the paper. The whole paper will often hinge on the idea that they did something that generalizes, and are showing specific examples, when actually they just skipped to producing those specific examples - cherry picked no doubt from other ones that they couldn't get to work - and did nothing else. As a reviewer if I notice this and really tear into the authors, it will usually get published and not fixed anyways.
This happens because you get inexperienced new people doing the actual coding work without any proper mentors or instruction, and then they're under enormous pressure for results from a PI that doesn't understand or care about coding at all. It makes the whole thing a house of cards.
Trying to do it 'properly' as a scientist is an uphill battle, because funders, collaborators, etc. expect the quick (and fake) stuff other people seem to be doing. The realities of developing good reusable software and maintaining it long term are not possible to fund through scientific grants.
People writing usable code in academia are doing it for free on the weekends.
A lot of the code written in math, physics or data analysis settings is really written for the authors alone to understand. They benefit from tens of pages of documentation (papers) plus decades of previous experience from the readers. None of which commercial software systems have.
> I can't for the life of me make sense of why you'd do it.
Over-engineering is a common cause: simple solutions can be deceitfully difficult to find. That being said, additional indirection layers are usually justified by the overall architecture, and — assuming they're reasonable — can't always be appreciated locally.
« I'm always delighted by the light touch and stillness of early programming languages. Not much text; a lot gets done. Old programs read like quiet conversations between a well-spoken research worker and a well-studied mechanical colleague, not as a debate with a compiler. Who'd have guessed sophistication bought such noise? » (Dick Gabriel)
Had a math teach full of pithy, paradoxical, sardonic and witty quotes. Told us that if we only take away one from his class is that you can never go wrong attributing an unknown quote to Mark Twain or Benjamin Franklin.
On the contrary, and I do agree that software engineers take the abstraction too far when they don’t know better, I don’t hold the code produced by people who aren’t software engineers by profession in particularly high esteem either.
You’re looking at two extremes: the codebase that is spread out too much with too much abstraction, and the codebase with zero abstraction that is basically a means to an end. In both cases they are difficult to work with.
I’ve certainly dealt with enough python, JS and PHP scripts that are basically written with the mindset of ‘fuck this, just give me what I want’, whereas people working in the code day to day need the abstractions to facilitate collaboration and resilience.
> You’re looking at two extremes: the codebase that is spread out too much with too much abstraction, and the codebase with zero abstraction that is basically a means to an end. In both cases they are difficult to work with.
Yeah, neither's great. If given a choice though, I'm absolutely going to take the latter. Yeah, changing something cross-cutting is going to be rough, but my need to do that is usually orders of magnitude less than my need to change specifics.
On a long enough timeline, both will bite me, but the former is much more likely to bite me today.
Agree with this. Abstraction and design patterns when used in a well-thought out manner should make large or complex codebases easier to work with.
And like you, have experienced code bases that tried to throw every design pattern in the book at you, even for a relatively simple application, and made it a pain to work with.
But have also seen them used carefully, in a standard company-wide usage that made all the code easier to understand - worked on a high-volume website with a large codebase, where they had micro-services that all used common 3-tier architecture, security-services, tooling... Really-well thought-out and you could work on any one of their ~100 microservices and already have a good understanding of its design, how to build and debug it, how its security worked, it's caching...
Yeah, agreed, its how these techniques are used that determine if they are useful or just add complexity.
> I have seen a single line of code passed through 4 "interface functions"
I once had to deal with a HTTP handler that called `Validate` on interface A which called `Validate` on interface B which called `Validate` on interface C which called `Validate` on interface D which finally did the actual work. There was a lot of profanity that month.
This can happen when some of the interface operations have added logic, but others (like Validate here) don't, so just get delegated as-is.
One typical example is a tower of wrapped streams, where each layer applies additional transformations to the streamed data, but the Close operation is just passed across all layers to the lowest one (which closes the underlying file or whatever).
I mean to a point that makes sense; you got your base data types like idk, a bank account number which can be validated, which is inside a bank account which can be validated, which is in a customer which can be validated, etc etc. Visitor pattern style, I believe?
That would make sense but this was one piece of data being validated against a lookup based off that data. The previous devs just had a, uh, Unique™ style of development. I swear they must have been on some kind of "editor tab count" bonus scheme.
Pretty common, for example when using databases as a mostly dumb store with all the logic in application code, and then a second application (or big refactor) appears and they introduce a subtle bug that results in an invalid `INSERT` (or whatever), and the database happily accepts it instead of rejecting it.
This is actually my preferred approach. If you want to put a 4gb base64 as your phone number, go right on ahead; best believe I will truncate it to a sensible length before I store it, but sure. Who am I to question your reality.
Sadly, people abuse shit like that to pass messages (like naming Spotify playlists with messages to loved/friends/colleagues while in jail) and maybe we have to assert a tiny bit of sanity on the world.
I think something people forget is that computer programming is a craft which must be honed.
A lot of people are introduced to it because computers are such an important part of every discipline, but unfortunately the wealth of mistakes maintaining many a code base occur from those who, quite honestly, simply lack experience.
In the authors case, they don’t explain the simple understanding that every pointer is simply adding a dimension to the data.
int *data; // one dimensional vector
int **data; // two dimensional matrix
int ***data; // three dimensional matrix
Which is one way to interpret things. The problem is that when folks learn computer programming using Python (or another high level language), it’s like using power tools bought from the hardware store compared to some ancient, Japanese wood working technique. The latter takes time to understand and perfect.
”every pointer is simply adding a dimension to the data.”
No, it’s not. The concept of having a pointer to a pointer has nothing to do with the concept of dimensionality.
You must be thinking of lists of lists (of lists…), which can be implemented using pointers. The dimensionality, however, comes from the structure of the list, not from the pointers.
Right. -ish. A two dimensional array can be modeled as an array of pointers to one dimensional arrays or as one pointer to a two dimensional array. Both have use cases. It's probably a right of passage for someone new to the C language to understand that difference (I learnt C as a teenager and it took me some time, months, to comprehend all the different permutations of pointers and square brackets).
Ignoring inexperience/incompetence as a reason (which, admittedly, is a likely root cause) domain fuzziness is often a good explanation here. If you aren't extremely familiar with a domain and know the shape of solution you need a-priori all those levels of indirection allow you to keep lots of work "online" while (replacing, refactoring, experimenting) with a particular layer. The intent should be to "find" the right shape with all the indirection in place and then rewrite with a single correct shape without all the indirection. Of course, the rewrite never actually happens =)
Contrary to the "over-engineering" claims, I'll put this explanation up for consideration: it's a result of fighting the system without understanding the details. Over-engineering absolutely exists and can look just like this, but I think it's mostly a lack of thought instead of too much bad thinking.
You see the same thing with e.g. Java programmers adding `try { } catch (Exception e) { log(e) }` until it shuts up about checked exceptions (and not realizing how many other things they also caught, like thread interrupts).
It's a common result of "I don't get it but it tells me it's wrong, so I'll change random things until it works". Getting engs in this state to realize that they're wasting far more time and energy not-knowing something than it would take to learn it in depth has, so far, been my most successful route in dragging people into the light.
(Not surprisingly, this is one of my biggest worries about LLM-heavy programmers. LLMs can be useful when you know what you're doing, but I keep seeing them stand in the way of learning if someone isn't already motivated to do so, because you can keep not-understanding for longer. That's a blessing for non-programmers and a curse for any programmer who has to work with them.)
How would you describe a path to learn this kind of things ? (Even just dropping a link would be appreciated).
Indeed typical education is about algos and programing paradigm (like procedural, functional, OO, etc) and context (system, native apps, web, data), but I don't remember/understand much about what you describe (but definitely faced it on toy projects and reacted like the "junior way" you describe). Heck we even did some deep stuff like language grammar / compiler design / and this thing with the petri boxes but it's a lot less practical and actionable I find.
Frankly: a comprehensive book about the language / subject is generally the best source. Fixing those foundational knowledge gaps takes time, because it's often not clear to anyone exactly what the gaps are - better to be exhaustive and fix it for real rather than thinking the "ah hah!" moment they just had was the only issue.
Not because I think ink on paper is superior somehow, but because books go in depth in ways that blog posts almost never do - if they did, they'd be as large as a book, and nobody reads or writes those. Narrow, highly technical ones exist and are fantastic, but they largely assume foundational knowledge, they don't generally teach it.
---
Learners are stuck in a weird place with programming. At the extreme beginning there's an unbelievable amount of high-quality information, guided lessons, etc, it's one of the best subjects to self-learn on period. I love it.
Experts also have a lot excellent material because there are a lot of highly technical blogs about almost anything under the sun, and many of them are relevant for years if not decades. Programmers are extremely open about sharing their knowledge at the fringes, and the whole ecosystem puts in a lot of effort to make it discoverable.
The middle ground though, where you know how to put words in a text file and have it run, but don't know how to go beyond that, is... pretty much just "get some experience". Write more code, read more code, do more coding at work. It's highly unstructured and highly varied because you've left the well-trodden beginning and have not yet found your niche (nor do you have the knowledge needed to even find your niche).
It's this middle-ground where I see a lot of people get stuck and churn out, or just haphazardly struggle forever, especially if they lack a solid foundation to build on, because every new thing they learn doesn't quite fit with anything else and they just memorize patterns rather than thinking. Which I do not claim is the wrong choice: if that's all you need, then that's likely (by far) the best effort/reward payoff, and I think that's where the vast majority of people can stop and be happy.
But if you want to go further, it's soul-draining and often looks like there's no escape from the chaotic drudgery. Making completely sure the basics are in place and that you think about everything added on top of that is the only real way I've seen people make progress. Whether that's through a mentor, or a book, or just brute-forcing it by hand on your own doesn't seem to matter at all, you just have to find one that works for you. The good news though is that after you've got "I can put words in a text file and it runs" figured out, it goes a lot faster than it does when you're starting in the beginning. And a lot of what you've already learned will be reinforced or slightly corrected in ways that often make immediate sense, because you have a lot of context for how it has failed or worked in the past.
Historically I would've recommended O'Reilly, but they've been absolutely trashing their brand by publishing everything under the sun without decent editing (bad english, outrageously clear flaws in code, entire missing paragraphs, you name it - editing matters). Manning has some real gems too, but I've flipped through enough mediocre ones that I can't make any broad claims (beyond "buyer beware" but it's worth checking anyway).
Which is not particularly useful advice, I know. It's hard to judge quality before you know what quality looks like, at which point you're probably done with the book and maybe much further.
So concretely I can really only recommend:
0) Hit a bookstore or library, browse through the books a bit.
1) Don't buy any giant books (unless you like them). They usually waste absurd amounts of space on stuff that won't remain true in the long run (e.g. individual libraries), and the sheer size means people tend to churn out and not read enough of it to get much of a benefit. Larger also often means worse editing, more mistakes per page, etc because it costs more to check more :\
If you want a physical reference for stuff the book covers, then it can be worth it, but otherwise no. Reference-like material is generally available online, but more up to date.
2) Actually try to build things the book guides you through. Then change it. Break it, debug it, fix it, etc. Make 100% sure what you wrote and why it broke makes sense, not that the book has convinced you that what they wrote is reasonable, which are very different things. The latter is just a sign of good writing, and is useless otherwise.
3) The more-detailed first-party docs in ~all popular languages are at least as good as most books (and sometimes much better), are more likely to be up to date, and are absolutely worth reading. Read the language spec, read the technical blog articles and guides, they're generally truly excellent. Even if you don't understand it all yet, it's exposure to quality code and patterns and concerns you haven't seen, from what are almost always literal experts + edited by literal experts.
This is a popular pattern in apps or other "frameworky" code, especially in C++.
I can think of at least two open source C++ apps I used where every time I checked in the IRC channel, the main dev wasn't talking about new app features, but about how to adopt even newer cooler C++ abstractions in the existing code. With the general result that if you want to know how some algorithm backing a feature works, you can't find it.
There's an element of what you might call "taste" in choosing abstractions in software.
Like all matters of taste, there are at least two things to keep in mind:
(1) You can't develop good taste until you have some experience. It's hard to learn software abstractions, and we want engineers to learn about and practice them. Mistakes are crucial to learning, so we should expect some number of abstraction mistakes from even the smartest junior engineers.
(2) Just because something is ugly to a non-expert doesn't mean it's necessarily bad. Bebop, for example, has less mass appeal than bubblegum pop. One of the things that makes bebop impressive to musicians, is the amount of technical and musical skill it takes to play it. But if you're not a musician those virtues may be lost on you whereas the frenetic noise is very apparent.
One of the things Google does better than other huge tech companies (IMO) is demonstrate good taste for abstractions. Those abstractions are often not obvious.
[Obviously the bebop comparison
breaks down, and showing off technical skills isn't a virtue in software. But there are other virtues that are more apparent to experts, such as maintainability, library reviews, integration with existing tooling or practices etc.]
A crucial thing to remember in a shared code base is that taste is ultimately subjective, and to let it go when people do something different from you so long as the code is understandable and not objectively incorrect. Remember you're making something functional at the end of the day, not ASCII art.
Yes absolutely. I think harmonizing with things around you is part of good taste. It's one of the things that separates for example a professionally designed interior from a college dorm room filled with an eclectic array of things the occupants like.
As a software engineer, this is something I get onto with my team. There is a such thing as too much abstraction and indirection. Abstraction should serve a purpose; don’t create an interface until you have more than one concrete implementation (or plan to within a PR or two). Premature abstraction is a type of premature optimization, just in code structure instead of execution.
You deal with getting a disparate bunch of people to persuade a mostly documented and mostly functional (as purchased) collection of IT systems to do largely what is required according to an almost complete specification (which changes on a daily basis). All of that is exhaustively and nearly documented correctly.
There's the weird bits where 2=3 but you don't talk about that too often. James was really clever but a bit strange even by your terms and left a lot of stuff that we can get away with describing as legacy. We sometimes have to explain ourselves to management why the system buys a bunch of flowers every Hallowe'en (piss off Google and your wiggly red line - it's hallowed evening and I know how to abbreviate that phrase correctly) and ships them to a graveyard plot in NOLA. We generally blame James but no-one really knows what on earth is going on. We wrote an automated call closer for that one with a random dialogue.
"There is a such thing as too much abstraction and indirection"
Yes there is. You got two words out of sequence!
"Premature abstraction is a type of premature optimization, just in code structure instead of execution."
I will try to follow your suggestion but it sounds like advice to a teenage boy.
English is a programming language too. You can make people do things by using it. How you deploy it is up to you. I try to come across as a complete wanker on internet forums and I'm sure I have been successful here.
> I have seen a single line of code passed through 4 "interface functions" before it is called that call each other sequentially, and are of course in separate files in separate folders.
Procrustination. n. the act of writing an infinite regression of trivial nested 'helper' functions because you're not sure how to actually attack the problem you're trying to solve.
I worked with a guy that did this.
int CalcNumber(int input) { int val = GetNumberFromInput(input); return val; }
int GetNumberFromInput(int num) { int calcedNumber = 0; for (int i=0; i<1; i++) calcedNumber += NumberPart(i, num); return calcedNumber; }
int NumberPart(int part, int seed) { return actuallyGenerateNumberPart(part, seed); }
int actuallyGenerateNumberPart(int part, int seed) {
// todo
return 0;
}
this is the classic over abstraction problem so that you can change things behind an interface at some point down the line if you ever need to while being totally opaque to any consuming code.
A lot of languages force you to start with this from day one, unless you want to go refactor everything to use an interface later on, so people just do it even when there will literally never be a reason to (and for testability, sometimes).
The cool thing about Go is the interface system is inverted kind of like duck-typing, so if you write purely idiomatic Go, then the thing receiving an arg in a function call is the one specifying the interface it must meet, rather than the implementing code having to declare every interface that some implementation meets.
People screw this up a lot though, especially if they came from Java/C# backgrounds.
Basically if the Jump function takes an Animal interface, the package that defines the jump function is the one that defines the Animal.Jump method. So if you provide it with whatever entity that has this method, it will just work. You don’t have to define the Animal interface in your Cat package. But if your cat does not Jump, it won’t be accepted. Of course you can also pass in a box that Jumps.
Most C# education will teach you to always make an interface for everything for some reason. Even in academia they’ll teach CS students to do this and well… it means there is an entire industry of people who think that over-engineering everything with needless abstractions is best practice.
It is what it is though. At least it’s fairly contained within the C# community in my part of the world.
Isn't that "for some reason" in C# being it's the standard way of doing dependency injection and being able to unit test/mock objects?
I've found it easier to work in C# codebases that just drank the Microsoft Kool-Aid with "Clean architecture" instead of Frankenstein-esque C# projects that decidedly could do it better or didn't care or know better.
Abstraction/design patterns can be abused, but in C#, "too many interfaces" doesn't seem that problematic.
I agree with you on this, my issue is mainly when they bring this thinking with them into other languages. I can easily avoid working with C# (I spent a decade working with it and I’d prefer to never work with it again), but it’s just such a pain in the ass to onboard developers coming from that world.
It may be the same for Java as GP mentioned it along with C#, but they tend to stay within their own little domain in my part of the world. By contrast C# is mostly used by mid-sized stagnant to failing companies which means C# developers job hop a lot. There is also a lot of them because mid-sized companies that end up failing love the shit out of C# for some reason and there are soooo many of those around here. Basically we have to un-learn almost everything a new hire knows about development or they’ve solely worked with C#.
> I've found it easier to work in C# codebases that just drank the Microsoft Kool-Aid with "Clean architecture" instead of Frankenstein-esque C# projects that decidedly could do it better or didn't care or know better.
I agree, for the most part. There's a little bit of a balance: if you just drink the kool-aid for top level stuff, but resist the urge to enter interface inception all the way down, you can get a decent balance.
e.g. on modern dotnetcore. literally nothing is stopping you from registering factory functions for concrete types without an interface with the out-of-the-box dependency injection setup. You keep the most important part, inversion of control. `services.AddTransient<MyConcreteClass>(provider => { return new MyConcreteClass(blah,blah,blah)});`
It happens so that we don't have "a single 15,000 line file that had been worked on for a decade". We don't have the luxury of asking the GitHub team and John Carmack to fix our code when we are forced to show it to the stakeholders.
I mean we are in “midwit meme” territory here. 4 levels of indirection look fine to an idiot. A “midwit” hates them and cleans them up. But a very seasoned engineer, thinking about the entire system, will happily have 4 layers of indirection.
Like anyone using a HashSet in Rust is already doing four layers of indirection: rusts HashSet is actually wrapping a hashbrown::HashSet. That set is wrapping a HashTable and the HashTable is wrapping an inner Table.
If you’re having trouble comprehending such code then a good IDE (or vim) that can navigate the code on a keypress should help.
Most of the midwit memes I see on programming are :
- beginner "lets write simple, to the point code"
- midwit: noo let's have many layers of abstraction just in case
- jedi: let's write simple, to the point code
It’s called Clean Architecture, Clean Code or SOLID and it’s extremely stupid. It’s widely used because the man behind it, and a lot of other grifters, are extremely good at selling their bullshit. You also have crazy things like the Agile Manifesto to thank “Uncle Bob” for.
What is the most hilarious, however, is that these things are sold by people who sometimes haven’t coded professionally since 15-20 years before Python was even invented.
Anyway, if you want to fuck with them ask them how they avoid L1/L2/L3 chance misses with all that code separation. They obviously don’t but you’re very likely to get a puzzled look as nobody ever taught them how a computer actually works.
> Anyway, if you want to fuck with them ask them how they avoid L1/L2/L3 chance misses with all that code separation. They obviously don’t but you’re very likely to get a puzzled look as nobody ever taught them how a computer actually works.
It hardly even matters now because each major function will have to wait on the scheduler queue until the cluster manages to assign it a container, then incur a dozen kinds of network slowness spinning up and initializing, then cold-start an interpreter, just to check a value in a painfully-slowly serialized-then-deserialized structure that was passed in as its arguments + context, only to decide based on that check it doesn’t need to do anything after all and shut down.
So why would you want to add to that? A loop in which you change a few attributes on a thousand entities will run 20 times slower when you cause cache misses, even worse if your cloud provider isn’t using fast ram. Then add to that your exponential slowness as your vtable class hierarchy grows and you’re adding a load of poor performance to your already poor performance.
Which might made sense if spreading your code out over 20 files in 5 projects gave you something in return. But I’d argue that it didn’t just cause your CPU, but also your brain, to have memory issues while working on the code.
Ugh, dealing with stuff like this right now for project config. Why can't I just use yaml files in the build to generate environment files based on region/app and then read those values in the code? Instead, it's that plus a few layers of interfaces grouping multiple levels of config items.
I have seen a single line of code passed through 4 "interface functions" before it is called that call each other sequentially, and are of course in separate files in separate folders.
Trigger warning next time. I was trying to eat lunch
This perfectly summarizes a lot of production code I've seen. We once replaced like 20 Java files with like a single page of easy to understand and "to the point" code.
Sometimes when you see things split up to what seems like an excessive degree it's because it can make the code more testable. It's not always the case that you split up a function for the purposes of "re-use" but that you might start by writing a thing that takes input X and transforms it to output Y, and write a test for lots of different inputs and outputs.
Then you write a function that gets some value that one might feed into this function from source Z, and separately test that function.
Then you create another function that reacts to a user event, and calls both functions.
And if a codebase is doing reasonably complicated things sometimes this can result in having to track through many different function calls until you find the thing you're actually looking for.
Sometimes it's also auto generated by an IDE or a framework as boilerplate or scaffolding code, and sometimes it's split up in a seemingly irrational way in one part of the code because you are actually using the same function you wrote somewhere else, so in your case of "4 interface functions" what you might find is that there is a function somewhere that accepts TypeA and that TypeA is a specialisation of TypeB and that TybeB implements InterfaceD.
Or maybe you're just reading some shitty code.
Either way lots of stuff that can look kind of dumb turns out to not be as dumb as you thought. Also lots of stuff that looks smart sometimes turns out to be super dumb.
I spent several years as a front-end contractor. I saw a lot of the same thing on the front-end, especially with JS. Even when I saw stuff like inserting a single black space on a line with JS, or similar things where its just a mess of calls back and forth before something actually gets executed, I asked a senior dev WTF was going on and why did this stuff get written like this.
His answer was as simple as it was dumbfounding. He said, "Its contractors. They get paid by the hour. Make sense now?" So basically someone was writing a ton of code just to take up time during the day so they could charge the hours back to the company and prove they were working their 40 hours. They DGAF about the code they were writing, they were more concerned with getting paid.
Completely maddening. My senior dev at the time said they won't even spend time refactoring any of it because it would waste too much time. He said they just made sure they have an FTE write the code next time.
It was my "welcome to the wonderful world contracting" wakeup call.
Many software engineering adjacent courses, starting with AP Computer Science A, are heavy on the Java-style OOP. And you're never designing an actually complex system, just using all the tools to "properly" abstract things in a program that does very little. It's the right idea if applied right, but they don't get a sense of the scale.
The first place this bites a new SWE in the rear, the database. "Let's abstract this away in case we ever want to switch databases."
I enjoy reading about new languages on HN. It’s weird because the grammar is weird. It’s like seeing start var ! oper addition sub var !! mltplctn var % close close or something and eventually you realize it means a + b*c. Why invent a new language? Why not just write list<int> if you want a list of integers or write Channel<string> or one of the idioms that everyone knows? I don’t know, but it can be fun to play with communication and meaning and grammar, and maybe even more fun if you get to be part of a group that gets paid for it.
I have seen a single line of code passed through 4 "interface functions" before it is called that call each other sequentially, and are of course in separate files in separate folders.
It makes reading the code to figure out what it does exhausting, and a few levels in you start to wonder if you're even looking at the right area, and if it will ever get to the part where it actually computes something.