HN2new | past | comments | ask | show | jobs | submit | growthwtf's commentslogin

Rendering takes a few hours means humans are building it at least partially.


False. This is just gaussian splats being queued up on a server somewhere


You could be correct, but it would be a real indictment of their rendering farm I think.


It's not "rendering" with gaussian splats. It's more "training" (or "fitting"). And not knowing how much the usage vs compute ratio is, I would hesitate to comment.

But knowing a little bit about gaussian splatting, I can't think what manual steps requiring human assistance are even likely to be necessary?


Without knowing the specifics of their pipeline I would also hesitate to comment further.


Well we know it's a gaussian splat, we know what the inputs are (RGB and 6dof pose) and we know how gaussian splat training is usually done...


Sure, in production envs I have seen humans being used in 3 places: 1. the pose data calibration 2. cleaning up covariances (reducing blobbiness) 3. adding metadata for app usage But, to your point, hard to say which of these or any apply without more info. I would be very very impressed if there were no humans and it's 'just' a training time issue though!


You all need to stop being so pessimistic. This is a great idea.

Want PBS to stick around? Make it so anybody who's sticking on chat GPT gets great answers from PBS and every time ChatGPT scrapes it, PBS gets money.

Is it extremely difficult? Obviously. Will it work? Probably not, very few things do. Is it a great thing that some folks are doing it and trying to make it work so that we can have a functional media ecosystem in a post-social-media age? Absolutely.


A fun project for somebody who has more time than myself would be to see if they can get it working with the new Mojo stuff from yesterday for Apple. I don't know if the functionality would be fully baked out enough yet to actually do the port successfully, but it would be an interesting try.


New Mojo stuff from Apple?



Nah. There's huge alpha here, as one might say. I feel like this comment could age even more poorly than the infamous dropbox comment.

Even with Jax, PyTorch, HF Transformers, whatever you want to throw at it--the dx for cross-platform gpu programming that are compatible with large language models requirements specifically is extremely bad.

I think this may end up be the most important thing that Lattner has worked on in his life (And yes, I am aware of his other projects!)


Comments like this view the ML ecosystem in a vacuum. New ML models are almost never written—all LLMs for example are basically GPT-2 with extremely marginal differences—and the algorithms themselves are the least of the problem in the field. The 30% improvements you get from kernels and compiler tricks are absolutely peanuts compared to the 500%+ improvements you get from upgrading hardware, adding load balancing and routing, KV and prefix caching, optimized collective ops etc. On top of that, the difficulty even just migrating Torch to the C++11 ABI to access fp8 optimizations is nigh insurmountable in large companies.

I say the ship sailed in 2012 because that was around when it was decided to build Tensorflow around legacy data infrastructure at Google rather than developing something new, and the rest of the industry was hamstrung by that decision (along with the baffling declarative syntax of Tensorflow, and the requirement to use Blaze to build it precluding meaningful development outside of Google).

The industry was so desperate to get away from it that they collectively decided that downloading a single giant library with every model definition under the sun baked into it was the de facto solution to loading Torch models for serving, and today I would bet you that easily 90% of deep learning models in production revolve around either TensorRT, or a model being plucked from Huggingface’s giant library.

The decision to halfass machine learning was made a long time ago. A tool like Mojo might work at a place like Apple that works in a vacuum (and is lightyears behind the curve in ML as a result), but it just doesn’t work on Earth.

If there’s anyone that can do it, it’s Lattner, but I don’t think it can be done, because there’s no appetite for it nor is the talent out there. It’s enough of a struggle to get big boy ML engineers at Mag 7 companies to even use Python instead of letting Copilot write them a 500 line bash script. The quality of slop in libraries like sglang and verl is a testament to the futility of trying to reintroduce high quality software back into deep learning.


Thank you for the kind words! Are you saying that AI model innovation stopped at GPT-2 and everyone has performance and gpu utilization figured out?

Are you talking about NVIDIA Hopper or any of the rest of the accelerators people care about these days? :). We're talking about a lot more performance and TCO at stake than traditional CPU compilers.


I’m saying actual algorithmic (as in not data) model innovation has never been a significant part of the revenue generation in the field. You get your random forest, or ResNet, or BERT, or MaskRCNN, or GPT-2-with-One-Weird-Trick, and then you spend four hours trying to figure out how to preprocess your data.

On the flipside, far from figuring out GPU efficiency, most people with huge jobs are network bottlenecked. And that’s where the problem arises: solutions for collective comms optimization tend to explode in complexity because, among other reasons, you now have to package entire orchestrators in your library somehow, which may fight with the orchestrators that actually launch the job.

Doing my best to keep it concise, but Hopper is like a good case study. I want to use Megatron! Suddenly you need FP8, which means the CXX11 ABI, which means recompiling Torch along with all those nifty toys like flash attention, flashinfer, vllm, whatever. Ray, jsonschema, Kafka and a dozen other things also need to match the same glibc and glibc++ versions. So using that as an example, suddenly my company needs C++ CICD pipelines, dependency management etc when we didn’t before. And I just spent three commas on these GPUs. And most likely, I haven’t made a dime on my LLMs, or autonomous vehicles, or weird cyborg slavebots.

So what all that boils down to is just that there’s a ton of inertia against moving to something new and better. And in this field in particular, it’s a very ugly, half-assed, messy inertia. It’s one thing to replace well-designed, well-maintained Java infra with Golang or something, but it’s quite another to try to replace some pile of shit deep learning library that your customers had to build a pile of shit on top of just to make it work, and all the while fifty college kids are working 16 hours a day to add even more in the next dev release, which will of course be wholly backwards and forwards incompatible.

But I really hope I’m wrong :)


Lattner's comment aside (which I'm fanboying a little bit at), I do tend to agree with your pessimism/realism for what it's worth. It's gonna be a long long time before that whole mess you're describing is sorted out, but I'm confident that over the next decade we will do it. There's just too much money to be made by fixing it at this point.

I don't think it's gonna happen instantly, but it will happen, and Mojo/Modular are really the only language platform I see taking a coherent approach to it right now.


I tend to agree with you, but I hoped the field would start collectively figuring out how to be big boys with CICD and dependency management back in 2017–I thought Google’s awkward source release of BERT was going to be the low point, and we’d switch to Torch and be saved. Instead, it’s gotten so much worse. And the kind of work that the Python core team has been putting into package and dependency management is nothing short of heroic, and it still falls short because PyTorch extends the Python runtime itself, and now Torch compile intercepting Py_FrameEval and NVIDIA is releasing Python CUDA bindings.

It’s just such a massive, uphill, ugly moving target to try to run down. And I sit here thinking the same as many of these comments—on the one hand, I can’t imagine we’re still using Python 3 in 2035? 2050?? But on the other hand I can’t envision a path leading out of the mess making money, or at least continue pretending they’ll start to soon.


And comments like this forget that there is more to AI and ML than just LLMs or even NNs.


I'm not the original commentator, that makes a lot of sense! I had assumed there was a huge overlap, personally.


I think it's pretty common for folks to enter the software field without a CS degree, start building apps, and see big O notation without understanding what it is. These people have jobs, deadlines, they want to ship features that make peoples' lives easier. I'd bet many of those people don't care so much about calculus, but a quick intro to what all this big O nonsense is about could help them.


Good thing they didn't nuke the data centers after all!


It's life changing for most people in the SF bay area too!


YES! Finally they can upgrade from a studio to a one bedroom!

(I am mostly joking but also crying inside because Seattle prices are also insane)


After taxes, that’s a down payment on a house, not a lavish one, in the SF Bay Area.


I think it's a perfectly valid take coming from some intersection of an engineering mindset and FOSS culture. And, the comparison you bring up is a bit of a category error.

We know how James Webb works and it's developed by an international consortium of researchers. One of our most trusted international institutions, and very verifiable.

We do not know how Genie works, it is unverifiable to non-Google researchers, and there are not enough technical details to move much external teams forward. Worst case, this page could be a total fabrication intended to derail competition by lying about what Google is _actually_ spending their time on.

We really don't know.

I don't say this to defend the other comment and say you're wrong, because I empathize with both points. But I do think that treating Google with total credulity would be a mistake, and the James Webb comparison is a disservice to the JW team.


I think it's an interesting correspondence—some general design principles about creating good auditory user interface somewhere in here. I would be interested if someone smarter than me can tell me what that principle is.


I suspect that there's some marketing component at play here. People who do not own but observe devices making seemingly unnecessary noises might perceive these devices as premium. Think about the various beeps that occur when locking a car and arming the alarm, the startup sound that infotainment systems in some EVs play, the twinkle twinkle little star of a fancy rice cooker.


Me either. I have heard stories of it happening, but never personally seen one live. It's really a tooling issue. I think the causal story is super important and will only become more so in the future, but it would be basically impossible to implement and maintain longer-term with today's software.

Kind of like flow-based programming. I don't think there are any fundamental reason why it can't work, it just hasn't yet.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: