Hacker News .hnnew | past | comments | ask | show | jobs | submit | eugenhotaj's commentslogin

This is because everyone is training with synchronous sgd. all gpus need to synchronize on each gradient step so tail latency will kill you.


I’ve worked at companies with async training. Async training does help on fault tolerance and also can assist with training thoroughput by being less reliant on slowest machine. It does add meaningful training noise and when we did experiments against sync training we got much more stable results with sync training and some of our less stable models would even sometimes have loss explosions/divergence issues with async training but be fine with sync training.

Although even for async training generally I see dataset just sharded and if worker goes down then shard of data may be loss/skipped not some kind of smarter dynamic file assignment factoring when workers go down. Even basic things like job fails continue from last checkpoint with same dataset state for large epoch is messy when major libraries like tensorflow lack a good dataset checkpointing mechanism.


This is pretty cool. I had the same idea but in zig: https://github.com/EugenHotaj/zig_gpt2

Not fully finished yet, haven't gotten around to implementing bpe encoding/decoding and only some ops use BLAS.


In my experience processes like these rarely work out as intended and usually add layers of bureaucracy for marginal benefit. It’s usually senior engineers or middle managers looking for “org wide impact” so they can get promoted.


At [current job] it's run by a group of architects that aren't responsible for delivering anything. Which works about as well as you might imagine. Had one legit tell me something like "that's for you to figure out" when I pointed out some implementation challenges.

Of course now that the project is behind there's no architect in sight to help us out.


My current role is an architect position where we are hands on, after I turned down two that sounds like they were like what you described. To me it's a red flag if an architect team is separate from the dev team. At a minimum I want to be embedded in the dev team during projects. (One of the places I walked away from the "opportunity" it turned out the architects rarely even talked directly to the dev team)


I recently joined a group of architects that aren't responsible for delivering anything and you are not totally wrong. But it isn't the architect role to involve themselves in weeds of the implementation - that's your job. They should be helping if the project is falling behind, though. Then again, if the project is falling behind, then that is probably more indicative of a poorly-run product team and/or a bad SDLC process.


    Then again, if the project is falling behind, 
    then that is probably more indicative of a 
    poorly-run product team and/or a bad SDLC process.
That's a worrisomely glib take on the impact that architecture can have on the project especially since in practice "architecture" includes the choice of language, platform, and tooling for a project.

Architecture choices directly or indirectly affect every line of code, every minute, every meeting, everything. As you said there are plenty of other places where things can go bad. But it's galling to hear an architect say, "Probably not the architecture, must be something else."

    a group of architects that aren't responsible for
    delivering anything
And here's the root of the problem. The craft of delivering software is, well, delivering.

The code itself is often somewhat trivial. Effective delivery is all about working about quirks and potholes and traps in the toolchain and architecture inefficiencies.

As we all know software moves fast, and once you've divorced yourself from the reality of actually delivering anything at all, your knowledge quickly decays. You know abstract concepts and how to draw the rectangles on the whiteboard, but before very long you don't know how anything at all actually happens and soon you are unqualified to tell anybody else how to do it.

There can be exceptions to this rule, like projects with an extraordinarily stable toolchain and environment. If you've been writing COBOL at a bank for 40 years you can probably step back into an advisory role and benefit the company by getting yourself out of the trenches and guiding others instead.


If architects are responsible for shipping features, they won't have time to be platform-wide architects. If you are shipping product-level features, you are on a scrum team and if you are on a scrum team, your other platform-wide duties are not your priority. At some point, companies realize they need people focused on the full picture, and they end up calling them architects.


It's certainly true: architects' primary focus needs to be that architecture/platform work.

Architecture isn't something you can just do on the side when you've got some free time, while still crushing out feature requests and bugfixes 40-50 hours/week.

But.

Architects can't be so detached from shipping code that they have no idea how things are actually made and shipped.

Otherwise you get situations like aloof architecture astronauts casually dreaming up complex distributed or microservice architectures, which look great on the whiteboard, but entail an enormous shift in thinking, tooling, and processes. As well as lots of overhead in general compared to a monolithic architecture. Sometimes that is the right move, but it is a costly one and not a shift to make lightly.

Had another "architect" 20 years ago dream up what was probably the world's dumbest architecture. He demanded that all application layer code be stripped out. We were going to build an entire interactive website with nothing but XML and XSLT. Where conditional logic was required, we could use conditional XSLT <xslt:if> statements. He tried to have us implement a social networking site with that. Had he tried to ship a single bit of code he would have realized it was insane.

Also recently had architecture astronauts push an entirely new primary language on the team. The old one was deprecated. This change was largely championed by people who had never written a line of code in either language with no thought to the tooling changes, process changes, or the overhead of throwing away 10+ years of company-wide language expertise in Language X or the ramp-up time and bruises that would be required to get good at Language Y.

After all, what did they care? They're architects.

One way to avoid/mitigate this is to have architects build and ship proofs-of-concept and reference implementations using the proposed shiny new architecture bits before the rest of the team is expected use and master them. Learn as many of the speed bumps and potholes as possible. They should continue to do this and work closely with scrum teams as those teams build and ship. (That may sound obvious, but lol @ this industry)


With examples like that, I can't blame you at all for being skeptical. Architects should definitely deliver POCs and/or reference projects for totally new stuff.


Yeah. In those negative examples, clearly the root of the problem was "architects making bad calls" and not necessarily a fundamental argument against the existence of architects.

But, when the role is poorly constructed, it's a nearly guaranteed mess.

One could say the opposite extreme is just as bad of course. Without architects, you have a bunch of unherded cats. That's also true.


Must the "platform" be so different from a "product"? I think a platform team should think of their output as a product, and so should the rest of the company. Have a PM, scrum or whatever, normal engineering titles, all the usual trappings.

And then you conveniently avoid the architect title!


I just hope one day I am good enough at programming to draw some boxes, tell someone else to figure it out, and then blame them when it doesn't work out


It's the job of all engineers involved to utterly understand one another. That goes for junior engineers cranking out unit tests as well as for architecture astronauts.

If someone here is a junior and gets "figure it out" from some strategic level engineer, ask professional for clarification or a team-up until you understand. Involve management if the other engineer is uncooperative or fails to help. It's their job to explain slightly more than it's your job to understand.

If all else fails, find an organization that expects the architecture astronauts to be explainers in chief as well.


If the architect is planting weeds, the architect should figure out how to kill the weeds. Ignorance of programming is omni excuse.


My point was that the architect shouldn't plant the weeds in the first place.


This post would sound so dumb if it didn’t come from the almighty pg.


The issue is not that ChatGPT will kill things off, the issue is that ChatGPT 4.0 will kill things off. If you don’t think that’s a real possibility, you’re sleeping.


Some of these are so odd. When Kylie Jenner launched her company she could have sold celery and still made $1B+. It wasn’t because of the effectiveness of her small team.


She also certainly had an army of contractors and laborers overseas who designed and manufactured her cosmetics.


Most likely her cosmetics are fully designed, manufactured and distributed by a well-established company such as Procter & Gamble.


Still stands. She build a "billion dollar" audience with a small team.


My experience is exactly the opposite in almost all cases. Most software is much more complicated than it needs to be. Reads like the author is just butthurt at feedback they received about their work.


Except when it is not. I have been burned many times on that one. Trying to "simplify" code but in the end complexifing it even more after all the edge cases have been taken into account.

You can often simplify code at the surface level, such as removing dead code, but structural complexity is really hard to get rid of, often impossible.


I’ve likely had it all my life, but really started noticing about two years ago during the pandemic. Now I can’t unhear it. Went to a doctor a couple of times but nothing they tried really helped. Thankfully mine is not too bad and I’m mostly able to ignore it.


The network communication overhead would be way too high to make this useful. At least for current methods of training large models.


Source: trust me bro


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: