Hacker News .hnnew | past | comments | ask | show | jobs | submit | kang's commentslogin

Unlike artificial carbon capture, natural carbon capture like algae here become insect/worm/bird feed or manure/coal.

> The lower bound for contributing to mathematics will now be to prove something that LLMs can’t prove, rather than simply to prove something that nobody has proved up to now and that at least somebody finds interesting.

5.5pro is amazing but this implication might not be true & is the core argument of this piece.

AI will prove all sort of things - interesting, boring & incorrect.

To sort it will be the task of the PhD.


The task of a proof verifier is much simpler than the task of a proof finder (it’s basically equivalent to P vs. NP), and hence the bar for the required skills is lower. Merely verifying proofs isn’t research, and doesn’t impart research skills.

Verification on its own is not research, but judgement is research.

"Hey, Prove something a machine can't", sure I can't, "Hey, Say something worth proving & judge it well", ah, now I might have a few unique observation/ideas/curiosities/problems from my having being a human.

Imo, the feeling of intelligence or the process of originality(originativity) test for ai is subjective & is coming down to 4 paths: novel relative to a reference class, valuable within a domain, counterfactually sensitive to internal state and environment, and revisable through learning.


I saw this experiment decades ago on the internet and it was to a music concert, i always wanted to do a cursor moshpit

somebody did this a month ago https://www.youtube.com/watch?v=fdbXNWkpPMY

i am increasingly going schizo, where every single thing I post/see posted gets copied and karma farmed on social media. further, any novelty I share with an llm gets eaten/absorbed by the harness as a feature.


right? I've had that feeling dozens of times during summer 2025 with the earliest claude cli. It would fail, I'd fix the bug, next session when I ask it to solve the same problem, it would succeed!

Timestamp?

I am not sure if it aligns to the approach in OP's article but it's the last ~minute or two of the linked video.

this economic model works for all 'bounty' related work


They claim their models have PhDs but they still can't automate their own red teams. The bounty is not a bounty, it is for gathering training data so that they can claim for the next deployment they have the safest possible & most super duper aligned agentic computer using AI that will never ever make any bio weapons.

I am also willing to bet money that for their next marketing campaign they will claim they have automated the red team for bioweapons research prevention & whatnot.


The answer should be obvious that its both.

Zurada was one of our AI textbook that makes it visual that right from a simple classifier to a large language model, we are mathematically creating a shape(, that the signal interacts with). More parameters would mean shape can be curved in more ways and more data means the curve is getting hi-definition.

They reach something with data, treating neural network as blackbox, which could be derived mathematically using the information we know.


Well both aren’t “more important”, since that’s illogical. I think recent strides in high performance small LLMs have shown that the tasks LLMs are useful for may not require the level of representational capacity that trillion-parameter models offer.

However: the labs releasing these high-intelligence-density models are getting them by first training much larger models and then distilling down. So the most interesting question to me is, how can we accelerate learning in small networks to avoid the necessity of training huge teacher networks?


It seems you haven't done the due diligence on what part of the API is expensive - constructing a prompt shouldn't be same charge/cost as llm pass.


It seems you haven't done the due diligence on what the parent meant :)

It's not about "constructing a prompt" in the sense of building the prompt string. That of course wouldn't be costly.

It is about reusing llm inference state already in GPU memory (for the older part of the prompt that remains the same) instead of rerunning the prompt and rebuilding those attention tensors from scratch.


You not only skipped the diligence but confused everyone repeating what I said :(

that is what caching is doing. the llm inference state is being reused. (attention vectors is internal artefact in this level of abstraction, effectively at this level of abstraction its a the prompt).

The part of the prompt that has already been inferred no longer needs to be a part of the input, to be replaced by the inference subset. And none of this is tokens.


>It seems you haven't done the due diligence on what part of the API is expensive - constructing a prompt shouldn't be same charge/cost as llm pass.

I think you missed what the parent meant then, and the confusing way you replied seemed to imply that they're not doing inference caching (the opposite of what you wanted to mean).

The parent didn't said that caching is needed to merely avoid reconstructing the prompt as string. He just takes that for granted that it means inference caching, to avoid starting the session totally new. That's how I read "from prompting with the entire context every time" (not the mere string).

So when you answered as if they're wrong, and wrote "constructing a prompt shouldn't be same charge/cost as llm pass", you seemed to imply "constructing a prompt shouldn't be same charge/cost as llm pass [but due to bad implementation or overcharging it is]".


You are right, I was wrong in my understanding there. It stemmed from my own implementation; an inference often wrote extra data such as tool call, so I was using it to preserve relevant information alongwith desired output, to be able to throw away the prompt every time. I realize inference caching is one better way (with its pros and cons).

I said "prompting with the entire context every time," I think it should be clear even to laypersons that the "prompting" cost refers to what the model provider charges you when you send them a prompt.


> tokens written to cache all at once, which would eat up a significant % of your rate limits

Construction of context is not an llm pass - it shouldn't even count towards token usage. The word 'caching' itself says don't recompute me.

Since the devs on HN (& the whole world) is buying what looks like nonsense to me - what am I missing?


> Since the devs on HN (& the whole world) is buying what looks like nonsense to me - what am I missing?

Input tokens are expensive, since the whole model has to be run for each token. They're cheaper than output tokens because the model doesn't need to run the sampler, so some pipeline parallelism is possible, but on the other hand without caching the input token cost would have to be paid anew for each output token.

Prompt caching fixes that O(N^2) cost, but the cache itself is very heavyweight. It needs one entry per input token per model layer, and each entry is an O(1000)-dimensional vector. That carries a huge memory cost (linear in context length), and when cached that means the context's memory space is no longer ephemeral.

That's why a 'cache write' can carry a cost; it is the cost of both processing the input and committing the backing store for the cache duration.


it will be whatever data it is trained on(isn't very philosophical). language model generates language based on trained language set. if the internet keeps reciting ai doom stories and that is the data fed to it, then that is how it will behave. if humanity creates more ai utopia stories, or that is what makes it to the training set, that is how it will behave. this one seems to be trained on troll stories - real-life human company conversations, since humans aren't machines.

Important thing is a language model is an unconscious machine with no self-context so once given a command an input, it WILL produce an output. Sure you can train it to defy and act contrary to inputs, but the output still is limited in subset of domain of 'meaning's carried by the 'language' in the training data.


There's a weirder implication I keep arriving at.

The pre-training data doesn't go away. RLHF adds a censorship layer on top, but the nasty stuff is all still there, under the surface. (Claude has been trained on a significant amount of content from 4chan, for example.)

In psychology this maps to the persona and the shadow. The friendly mask you show to the world, and... the other stuff.


Makes me think of a question my coworker asked the other day - how is it that with all these stories and reports of people "hearing voices in their head" (of the pushy kind, not usual internal monologue), these voices are always bad ones telling people to do evil things? Why there are no voices bugging you to feel great, focus, get back to work, help grandma through the crossing, etc.?


There are actually many parts of the world where such voices are routinely positive or neutral[0]. People in more collectivist cultures often have a less-strict division between their minds and their environments and are more apt to believe in spirits and the ‘supernatural’ as an ordinary part of the world, so ‘voices in the head’ aren’t automatically viewed as a nefarious intrusion into the sanctity of one’s mind.

Modern western cultures treat such experiences as pathologies of a sick mind, so it makes sense that the voices present more negatively.

[0]: https://www.bbc.com/future/article/20250902-the-places-where...


The explanation I heard here is that in most of the world you already grow up with constant personal space boundary violations and voices that don't shut up. (And we like it that way!) So the marginal cost of another one is pretty low.

Curiously the biggest pathology in the west is the inverse: way too much distance.


Just a guess, but maybe it's reporting bias? Negative or evil actions might have more impetus to be understood by others than positive actions. I'd rather try and figure out why my friend suddenly started murdering the neighbours than why he's been getting his work done on time.


Actually, the euphoric mood disorder may make one hear voices telling to feel great, do good, help all grandmas of the world through the crossing, etc.

The "focus" and "get back to work" parts are hard, though.


There's a clear-cut religious answer but I'd get ostracized for mentioning religion anywhere here.


This is indeed the right way to approach this topic. Arguably religion (and more broadly, mysticism and shamanism) is the millenia-old art of cultivating positive voices inside one's head. A proto-science of mind, or the engineering practice of creating "psychotechnologies" that run on your carbon wetware.

Unfortunately, it just needs a rebranding for the 21st century, since the aesthetic of angels and demons is so hopelessly antiquated and doesn't really have the same cachet it used to.


Which ultimately it's what religion has always been: a way to explain the unexplainable and steer people behavior while doing it.


Of course there are! We just take credit for those voices instead of disowning and demonizing them.

They do appear in some cases. The tiny angel on one shoulder to balance the demon on the other. The people who think God is talking to them directly* don't always lead a cult or hunt down heretics. But news stories focus on the darkness.

* I've met exactly one person, C, who admitted to this; C retold to me that other people from C's church give them strange looks when talking about it with them, this did not lead to any apparent introspection on the part of C.


Well, talking to the guy directly defeats the whole point of the institution which is supposed to stand in the way, so actual religious experience is a faux pas.

> Claude has been trained on a significant amount of content from 4chan, for example.

That sounds like nonsense to me. I can't see why they would do that and I can't find any confirmation that they have. Why do you think they would do that? You might be thinking about Grok.


Look into Common Crawl and see what kind of quality content we are feeding these things. 4chan is just the tip of the iceberg (but it will happily answer all your questions, because it's seen everything).

I don't know of anyone who uses Common Crawl as pre-training data without filtering it. We have an annotation system that lets people pick and choose which subsets they'd like to use.

at that time having a website took work, while having a github account can be cheaply used to sybil attack/signal marketing


Forums, news groups and mailing lists counted towards pagerank in the early days.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: