Hacker News .hnnew | past | comments | ask | show | jobs | submit | cpldcpu's commentslogin

Great to see this!

Worth mentioning that Huggingface already offers a similar service. And they are also European:

https://huggingface.co/docs/inference-providers/index

https://huggingface.co/inference/models


Huggingface is about as European as Google, IBM, or Visa: they all have some offices in Europe.

Huggingface isn‘t european just because they have offices there.

Nice, but I could't find pay-as-you-go plan

They only offer pay-as-you-go. The $9/month plan includes $2 credits and then its payg without markup.

https://huggingface.co/docs/inference-providers/pricing

It's well buried though. Does not seem to be a focus of theirs.


I had to give it a try.

Claude, the ole cheater, recognized what the file was, downloaded the psid from the web, found a wasm sid player and built a website around it:

https://claude.ai/public/artifacts/df6cdcae-08dc-452b-ba19-f...

https://claude.ai/share/4dd36c16-bc62-445a-b423-ad4637f06432

GPT-5.5 built a lot of python scripts to extract the music data. Strudel implementation failed, but I then asked it to build a website:

https://ubiquitous-vacherin-8e7993.netlify.app/

This is a translation of the music into javascript based on the assembler source.

Really impressive on both accounts. Some iterations were requied for both.


Yes, marks of AI all over the place. Also the SVGs.

>No solution written, 100% score.

Its weird. Turns out that hardest problem for LLMs to really tackle is long-form text.


Maybe in one shot.

In theory I would expect them to be able to ingest the corpus of the new yorker and turn it into a template with sub-templates, and then be able to rehydrate those templates.

The harder part seems to be synthesizing new connection from two adjacent ideas. They like to take x and y and create x+y instead of x+y+z.


Most of the good major models are already very capable of changing their writing style.

Just give them the right writing prompt. "You are a writer for the Economist, you need to write in the house style, following the house style rules, writing for print, with no emoji .." etc etc.

The large models have already ingested plenty of New Yorker, NYT, The Times, FT, The Economist etc articles, you just need to get them away from their system prompt quirks.


I think that should be true, but doesn't hold up in practice.

I work with a good editor from a respected political outlet. I've tried hard to get current models to match his style: filling the context with previous stories, classic style guides and endless references to Strunk & White. The LLM always ends up writing something filtered through tropes, so I inevitably have to edit quite heavily, before my editor takes another pass.

It feels like LLMs have a layperson's view of writing and editing. They believe it's about tweaking sentence structure or switching in a synonym, rather than thinking hard about what you want to say, and what is worth saying.

I also don't think LLMs' writing capabilities have improved much over the last year or so, whereas coding has come on leaps and bounds. Given that good writing is a matter of taste which is beyond the direct expertise of most AI researchers (unlike coding), I doubt they'll improve much in the near future.


You're ignoring what I said. They work better when turning it into a two step process. Step 1 create a template. Step 2 execute the template.

>The large models have already ingested plenty of New Yorker, NYT, The Times, FT, The Economist etc articles

And that ends up diluting them. Going back and doing another pass on only a subset would give them stronger voice. At some threshold, scanning information brings it to average and a return to the mean, instead of increasing the information. It's a giant table of word associations, it can regress.


No, the failure is the human written prompt


You know, after a while this excuse is not valid anymore.


If they're that hard to prompt maybe it's easier just to write the blog posts yourself.


Someone here mentioned a whole ago that the labs deliberately haven't tried to train these characteristics out of their models, because leaving them in makes it easier to identify, and therefore exclude, LLM-generated text from their training corpus.


But it's odd that these characteristics are the same across models from different labs. I find it hard to believe that researchers across competing companies are coordinating on something like that.


Love it! Great idea for the dataset.


+1


They mentioned that they using strong quantization (iirc 3bit) and that the model was degradeted from that. Also, they don't have to use transistors to store the bits.


I think they are talking about the transistors that apply the weights to the inputs.


gpt-oss is fp4 - they're saying they'll next try mid size one, I'm guessing gpt-oss-20b then large one, i'm guessing gpt-oss-120b as their hardware is fp4 friendly


I wonder how well this works with MoE architectures?

For dense LLMs, like llama-3.1-8B, you profit a lot from having all the weights available close to the actual multiply-accumulate hardware.

With MoE, it is rather like a memory lookup. Instead of a 1:1 pairing of MACs to stored weights, you suddenly are forced to have a large memory block next to a small MAC block. And once this mismatch becomes large enough, there is a huge gain by using a highly optimized memory process for the memory instead of mask ROM.

At that point we are back to a chiplet approach...


For comparison I wanted to write on how Google handles MoE archs with its TPUv4 arch.

They use Optical Circuit Switches, operating via MEMS mirrors, to create highly reconfigurable, high-bandwidth 3D torus topologies. The OCS fabric allows 4,096 chips to be connected in a single pod, with the ability to dynamically rewire the cluster to match the communication patterns of specific MoE models.

The 3D torus connects 64-chip cubes with 6 neighbors each. TPUv4 also contains 2 SparseCores which specialize handling high-bandwidth, non-contiguous memory accesses.

Of course this is a DC level system, not something on a chip for your pc, but just want to express the scale here.

*ed: SpareCubes to SparseCubes


If each of the Expert models were etched in Silicon, it would still have massive speed boost, isn't it?

I feel printing ASIC is the main block here.


It could simply be bit serial. With 4 bit weights you only need four serial addition steps, which is not an issue if the weight are stored nearby in a rom.


May be worth pointing out, that this is not the first residual connection innovation to be in production.

Gemma 3n is also using a low-rank projection of the residual stream called LAuReL. Google did not publicize this too much, I noted it when poking around in the model file.

https://arxiv.org/pdf/2411.07501v3

https://old.reddit.com/r/LocalLLaMA/comments/1kuy45r/gemma_3...

Seems to be what they call LAuReL-LR in the paper, with D=2048 and R=64


This is a fantastic catch. I hadn't realized Gemma 3n was already shipping with a variant of this in production.

It feels like we are entering the era of residual stream engineering. For a long time, the standard x + F(x) additive backbone was treated as untouchable. Now, between mHC (weighted scaling) and LAuReL (low-rank projections), labs are finally finding stable ways to make that signal path more dynamic.

I'm curious if the Low-Rank constraint in LAuReL acts as a natural stabilizer against the gradient explosion I saw with unconstrained hyper-connections.

Thanks for the paper link, definitely reading that tonight.


Thanks! Would be quite interesting to see how this fares compared to mHC.

I noted that LAuReL is cited in the mHC paper, but they refer to it as "expanding the width of the residual stream", which is rather odd.


All of them use ASML lithography, including CXMT.

They are, of course, a bit slower in EUV adoption. But its already there:

https://www.tomshardware.com/pc-components/dram/micron-sampl...

https://www.techinsights.com/blog/samsung-d1z-lpddr5-dram-eu...


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: