More

sophiabits · 2026-06-11T15:28:16 1781191696

> You can't fire Claude if it fucks up

What's the difference between "firing" Claude vs moving to a model from a different provider? The latter seems very analogous to firing an employee for performance and backfilling with someone new.

Re the rest, it's just not my experience that models become incapable of making good decisions in cases where input token count > the context window, but ymmv based on domain.

A very extreme example of this: a couple years ago when GPT 4 was state of the art and the 32K context variant was gated to design partners I worked at an EdTech company in the college admissions space that wanted to produce quarterly reports on student progress for parents. That involved crunching a LOT of data (multiple hours of meeting transcripts per week, very detailed notes about student activities, their general profile - UK and US admissions function very differently!)

It was a difficult problem, but we _did_ manage to produce these reports 4K output tokens at a time at a level of quality that exceeded what humans could do internally, and models+the surrounding tooling have only gotten better since then.

logicchains · 2026-06-11T16:41:01 1781196061

>What's the difference between "firing" Claude vs moving to a model from a different provider? The latter seems very analogous to firing an employee for performance and backfilling with someone new.

A human may learn and improve to avoid being fired, while Claude is incapable of that.

>Re the rest, it's just not my experience that models become incapable of making good decisions in cases where input token count > the context window, but ymmv based on domain.

If they've been trained a lot on your domain (maths, coding) then they can make good decisions. But I've just started using Mythos and even it makes some awful decisions in domains it's not trained on. Of course the majority of decisions are good, but it only takes a couple bad ones to sink a project.

sophiabits · 2026-06-03T00:59:15 1780448355

TIL you can get RSS feeds for YouTube channels. Thanks for this!

phyzix5761 · 2026-06-03T01:11:43 1780449103

You're welcome. You can also turn any subreddit into a RSS feed: https://arkvis.com/blog/2026-04-16_interesting-finds.html

sophiabits · 2026-05-15T00:44:17 1778805857

> Selfishly, I don't want my screen scraped because it feels like an invasion of my privacy, [...] But zooming out, I don't want to live in a world where humans—employees or otherwise—are exploited for their training data.

I wonder whether this Meta employee felt the same way about privacy a month ago, before they were personally impacted by this new initiative.

sophiabits · 2026-05-12T01:59:09 1778551149

I do not envy the position the npm team are in. They removed the ability to unpublish packages as a response to the left-pad incident[1] because it wasn't desirable for individual developers to break downstream dependencies by pulling their package maliciously.

Of course the side effect is that now it's much harder to pull packages for legitimate reasons :/

[1] https://en.wikipedia.org/wiki/Npm_left-pad_incident

superfrank · 2026-05-12T04:34:42 1778560482

Maybe give publishers a way to quarantine versions with a warning that stops the install, but allows users can override if they choose to is the next step?

Give a publisher a way to tag a version as malicious and then in those hours between the exploit being noticed and the package being removed anyone who tries to install gets a message about that version being quarantined and asking whether they want to proceed.

It's not a perfect solution, but I think it's better than just waiting for NPM to take action without opening the door up to another left pad situation.

thayne · 2026-05-12T04:41:42 1778560902

I think cargo's yank is a good balance. It makes it difficult to pull the yanked version in as a dependency, but doesn't break existing usages, as long as the version is in the lockfile. And I think even then gives you a warning that you are using a yanked package.

zarzavat · 2026-05-12T03:00:01 1778554801

The obvious solution is that unpublish should be available within a time window after a new version is published and then unavailable after that.

beart · 2026-05-12T03:07:00 1778555220

There is a time window - https://docs.npmjs.com/policies/unpublish

zarzavat · 2026-05-12T03:15:44 1778555744

Yes but they didn't do it properly. They only allow unpublishing if there are no dependants, which means it can't be used to pull a package version for security reasons.

It should be that within the first X hours you can pull a version regardless of dependants, after that you should need approval.

antihero · 2026-05-12T12:15:12 1778588112

I would prefer my builds to break than the ecosystem to be compromised.

That said, once unpublished the version should be permanently unavailable to prevent publishing over known good versions.

KajMagnus · 2026-05-13T04:47:09 1778647629

If a package developer maliciously breaks everyone's builds,

isn't that pretty great?

Because now you have learnt that you can't trust them

ummonk · 2026-05-12T02:34:43 1778553283

I mean they brought that incident on themselves...

shimman · 2026-05-12T16:37:50 1778603870

Yeah, all left pad incident showed was that NPM cares more about their corporate users than open source developers.

sophiabits · 2026-05-03T21:48:49 1777844929

A browser is OOM more expensive to run than a terminal app, regardless of what you're running inside said browser

vineyardmike · 2026-05-04T04:18:25 1777868305

I've literally never met anyone in real life who used a computer that didn't already have a browser running 24/7

eVeechu7 · 2026-05-03T23:19:18 1777850358

Is that because they are much more likely to pay the ultimate price at the hands of the OOM killer?

sophiabits · 2026-04-15T16:28:54 1776270534

Purchasing power is probably a better metric in a vacuum, but it's hard to analyze accurately

For example, the comment you're citing is claiming that because minimum wage has increased only 3x over the same period of time in which inflation has eroded the relative value of a dollar by 6x, that wages overall have increased at half the rate of inflation. But minimum wage is a measurement of a minimum, while inflation is a measurement of /average/ price increase so they can't be compared 1:1 in this way.

The housing argument also seems odd. In New Zealand (where I'm from -- I'm not familiar with the US' housing market, so the commenter could be right about that geo!) house prices have increased by far more than 20x since the 70s, but the houses available are of substantially higher quality due to improved regulations (e.g. all newer homes are subject to healthy homes rules which mandate insulation) so just comparing inflation-adjusted home prices vs income doesn't tell the full story

(Aside from that, a whole heap of items like food, electronics, transportation are all both far cheaper AND higher quality today than in the 70s)

hnlmorg · 2026-04-15T17:02:27 1776272547

“Higher quality” isn’t an objective measurement though. And it certainly doesn’t matter if the end result is that people cannot afford to buy it.

What I’d be interested to understand is whether changes to materials (be that buildings or home appliances) has caused an increase in the cost to manufacturer.

I’d wager most things have gotten cheaper to produce these days because the same improvements in technology that can be integrated into the product also applies to technology used to reduce the cost to manufacturer. Plus if wages are below inflation then any labour costs would have declined (relatively speaking) in that time too.

queenkjuul · 2026-04-15T17:25:00 1776273900

Modern US houses are made of the cheapest, shittiest, flimsiest materials money can buy. I go out of my way not to live in US housing less than 50 years old.

sophiabits · 2026-03-01T19:27:56 1772393276

> the MCP server is automatically launched when the Agent loads that skill

The main problem with this approach at the moment is it busts your prompt cache, because LLMs expect all tool definitions to be defined at the beginning of the context window. Input tokens are the main driver of inference costs and a lot of use cases aren't economical without prompt caching.

Hopefully in future LLMs are trained so you can add tool definitions anywhere in the context window. Lots of use cases benefit from this, e.g. in ecommerce there's really no point providing a "clear cart" tool to the LLM upfront, it'd be nice if you could dynamically provide it after item(s) are first added.

goranmoomin · 2026-03-01T19:34:28 1772393668

> The main problem with this approach at the moment is it busts your prompt cache, because LLMs expect all tool definitions to be defined at the beginning of the context window.

TBH I'm not really sure how it works in Amp (I never actually inspected how it alters the prompts that are sent to Anthropic), but does it really matter for the LLMs to have the tool definitions at the beginning of the context window in contrast to the bottom before my next new prompt?

I mean, skills also work the same way, right? (it gets appended at the bottom, when the LLM triggers the skill) Why not MCP tooling definitions? (They're basically the same thing, no?)

sophiabits · 2025-09-30T05:57:13 1759211833

Comments are often the best tool for explaining why a bit of code is formulated how it is, or explaining why a more obvious alternate implementation is a dead end.

An example of this: assume you live in a world where the formula for the circumference of a circle has not been derived. You end up deriving the formula yourself and write a function which returns 2piradius. This is as simple as it gets, not hacky at all, and you would /definitely/ want to include a comment explaining how you arrived at your weird and arbitrary-looking "3.1415" constant.

sophiabits · on Aug 6, 2024

I’ve especially noticed this with gpt-4o-mini [1], and it’s a big problem. My particular use case involves keeping a running summary of a conversation between a user and the LLM, and 4o-mini has a really bad tendency of inventing details in order to hit the desired summary word limit. I didn’t see this with 4o or earlier models

Fwiw my subjective experience has been that non-technical stakeholders tend to be more impressed with / agreeable to longer AI outputs, regardless of underlying quality. I have lost count of the number of times I’ve been asked to make outputs longer. Maybe this is just OpenAI responding to what users want?

[1] https://sophiabits.com/blog/new-llms-arent-always-better#exa...

atlex2 · on Aug 14, 2024

Did you try giving the model an "out"?

> You may output only up to 500 words, if the best summary is less than 500 words, that's totally fine. If details are unclear, do not fill-in gaps, do leave them out of the summary instead.

sophiabits · on Aug 5, 2024

I wanted to document a particular genAI antipattern which I've seen a few times now.

LLMs are theoretically pretty fungible, because you send English and get English back--but in practice you still need to do some amount of technical due diligence before swapping model. These things are benchmarked on tasks which rarely resemble your specific use case. Blindly swap models at your own risk!

Something that has become very clear since the advent of GPT-3.5 is that LLMs are far from magic, and using them does not remove the need for good engineering fundamentals. It's important to have a solid eval suite so you can quickly benchmark your system against different LLMs, because the APIs we're all building on are constant moving targets.