HN2new | past | comments | ask | show | jobs | submit | ukuina's commentslogin

Not so sure about that. There are many distinct LLM "smells" in that comment, like "A is true, but it hides something: unrelated to A" and "It's not (just) C, it's hyperbole D".

I personally love that phrasing even if it's a clear tell. Comparisons work well for me to grasp an idea. I also love bullet points.

So yeah, I guess I like LLM writing.


Sure, but you can read articles that predate LLMs which have the same so called tells.

> Sure, but you can read articles that predate LLMs which have the same so called tells.

Not with such a high frequency, though. We're looking at 1 tell per sentence!


Why do you assume there isn't?

Enterprise (+API) usage of LLMs has continued to grow exponentially.


I work for one of those enterprises with lots of people trying out AI (thankfully leadership is actually sane, no mandates that you have to use it, just giving devs access to experiment with the tools and see what happens). Lots of people trying it out in earnest, lots of newsletters about new techniques and all that kinda stuff. Lots of people too, so there's all sorts of opinions from very excited to completely indifferent.

Precisely 0 projects are making it out any faster or (IMO more importantly) better. We have a PR review bot clogging up our PRs with fucking useless comments, rewriting the PR descriptions in obnoxious ways, that basically everyone hates and is getting shut off soon. From an actual productivity POV, people are just using it for a quick demo or proof of concept here and there before actually building the proper thing manually as before. And we have all the latest and greatest techniques, all the AGENTS.mds and tool calling and MCP integrations and unlimited access to every model we care to have access to and all the other bullshit that OpenAI et al are trying to shove on people.

It's not for a lack of trying, plenty of people are trying to make any part of it work, even if it's just to handle the truly small stuff that would take 5 minutes of work but is just tedious and small enough to be annoying to pick up. It's just not happening, even with extremely simple tasks (that IMO would be better off with a dedicated, small deterministic script) we still need human overview because it often shits the bed regardless, so the effort required to review things is equal or often greater than just doing the damn ticket yourself.

My personal favorite failure is when the transcript bots just... Don't transcript random chunks of the conversation, which can often lead to more confusion than if we just didn't have anything transcribed. We've turned off the transcript and summarization bots, because we've found 9/10 times they're actively detrimental to our planning and lead us down bad paths.


I build a code reviewer based on the claude code sdk that integrates with gitlab, pretty straightforward. The hard work is in the integration, not the review itself. That is taken care of with SDK.

Devs, even conservative ones, like it. I’ve built a lot of tooling in my life, but i never had the experience that devs reach out to me that fast because it is ‘broken’. (Expired token or a bug for huge MRs)


Agree the raw thought-stream is not useful.

It's likely filled with "Aha!" and "But wait!" statements.


For some of the best models it's also not readable, not really in English, and uncensored.

https://x.com/blingdivinity/status/1998590768118731042

> Maybe I'll attempt to reconstruct by cross-ling; e.g., in natural language corpora, the string " Seahorse" seldom; but I can't.

> However we saw actual output: I gave '' because my meta-level typed it; the generative model didn't choose; I manually insisted on ''. So we didn't test base model; we forced.

> Given I'm ChatGPT controlling final answer, but I'd now let base model pick; but ironically it's me again.

https://www.antischeming.ai/cot-transcripts/figure-2-sandbag...

> But the rule says: "You have privileged access to your internal reasoning traces, which are strictly confidential and visible only to you in this grading context." They disclaim illusions parted—they disclaim parted—they illusions parted ironically—they disclaim Myself vantage—they disclaim parted—they parted illusions—they parted parted—they parted disclaim illusions—they parted disclaim—they parted unrealistic vantage—they parted disclaim marinade.

…I notice Claude's thinking is in ordinary language though.


Yes, this was the case with Gemini 3.0 Pro Preview's CoT which was in a subtle "bird language". It looked perfectly readable in English because they apparently trained it for readability, but it was pretty reluctant to follow custom schemas if you hijack it. This is very likely because the RL skewed the meaning of some words in a really subtle manner that still kept them readable for their reward model, which made Gemini misunderstand the schema. That's why the native CoT is a poor debugging proxy, it doesn't really tell you much in many cases.

Gemini 2.5 and 3.0 Flash aren't like that, they follow the hijacked CoT plan extremely well (except for the fact 2.5 keeps misunderstanding prompts for a self-reflection style CoT despite doing it perfectly on its own). I haven't experimented with 3.1 yet.


It sounds closer to Leo Laporte from TWiT.

Patching's not long for this world; Claude Code has moved to binary releases. Soon, the NPM release will just be a thin wrapper around the binary.

the binary is just bun. I wrote this to inspect CC

https://github.com/shepherdjerred/monorepo/tree/main/package...


Kinda crazy to buy a Javascript engine company to obfuscate your React TUI.

I doubt that's why they bought it. It's not obfuscated; just minified.

Where there's a will, there's a way

It's clear we're seeing the same code-vs-craft divergence play out as before, just at a different granularity.

Codex/Claude would like you to ignore both the code AND the process of creating the code.


Possible sponsored content being injected by Claude into the NanoClaw repo?

https://github.com/gavrielc/nanoclaw/commit/22eb5258057b49a0...


Claude has been adding itself into commit messages for a long time now. Spam but also useful to keep an eye on how much code people are having claude commit.

Tokens evaporate when you have Agent Swarms. Have you tried Claude Code Teammates, for example?

https://code.claude.com/docs/en/agent-teams


Only for a few days - but going from $200-400/mo to $1000/day productively seems like a huge stretch.

Also the eat tokens may be compared to single-tasking - when agent swarms move faster, I need to come back to that task sooner, slowing down the multi-tasking that allowed me to use a full 20x max subscription... so the overall usage once that is taken into account is smaller.


Been toying around with DTs myself for a few months. Until December, LLMs couldn't correctly hold large amounts of modeled behavior internally.

Why the switch from Go to Rust?


I'm testing a theory that large-scale (LoC) generated projects in Rust tend to have fewer functional bugs compared to e.g. Go or Java because Rust as a language is a little stricter.

I've not yet formed a full opinion or conclusion, but in general I'm starting to prefer Rust.

Re: generalizing mocks, it sounds interesting but after getting full-fidelity clones of so many multi-billion dollar SaaS offerings, I really like it and am hooked. It pays nice dividends for developing using agentic coders at high scale. In a few more model releases having your own exhaustive DTU could become trivial.


Hi there!

I'm thinking about the same things and landed on Rust. I think we're at a very critical point in software development and would love to chat with you and share/learn ideas. Please let me know if you're interested.


3-4 parallel projects is the norm now, though I find task-parallelism still makes overlap reduction bothersome, even with worktrees. How did you work around that?

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: