20 hours is low in this category. The Sony XM6s are 30h, the Bose QCs are ~24h. Sennheisers can do 40-50h. All with ANC on, the numbers are slightly higher with ANC off.
Does that include R&D? Google is an AI _provider_, which is a considerably different profile in terms of spend from companies who are consumers. I would expect Google to be investing considerable resources to keep up with Anthropic and OpenAI.
I'd argue not, as with tool calls it has available to it at all times a description of what each tool can be used for. There's plenty of intermediate but still important information that could be compacted away, and unless there was a logical reason to go looking for it the model doesn't know what it doesn't know.
If your test can deterministically result in a race condition 100% of the time, is that a race condition? Assuming that we're talking about a unit test here, and not a race condition detector (which are not foolproof).
> Assuming that we're talking about a unit test here
I think the categorisation of tests is sometimes counterproductive and moves the discussion away from what's important: What groups of tests do I need in order to be confident that my code works in the real world?
I want to be confident that my code doesn't have race conditions in it. This isn't easy to do, but it's something I want. If that's the case then your unit test might pass sometimes and fail sometimes, but your CI run should always be red because the race test (however it works) is failing.
This is also hints at a limitation of unit tests, and why we shouldn't be over-reliant on them - often unit tests won't show a race. In my experience, it's two independent modules interacting that causes the race. The same can be true with a memory bug caused by a mismatch in passing of ownership and who should be freeing, or any of the other issues caused by interactions between modules.
> I think the categorisation of tests is sometimes counterproductive
"Unit test" refers to documentation for software-based systems that has automatic verification. Used to differentiate that kind of testing from, say, what you wrote in school with a pencil. It is true that the categorization is technically unnecessary here due to the established context, but counterproductive is a stretch. It would be useful if used in another context, like, say: "We did testing in CS class". "We did unit testing in CS class" would help clarify that you aren't referring to exams.
Yeah, Kent Beck argues that "unit test" needs to bring a bit more nuance: That it is a test that operates in isolation. However, who the hell is purposefully writing tests that are not isolated? In reality, that's a distinction without a difference. It is safe to ignore old man yelling at clouds.
But a race detector isn't rooted in providing verifiable documentation. It only observes. That is what the parent was trying to separate.
> I want to be confident that my code doesn't have race conditions in it.
Then what you really WANT is something like TLA+. Testing is often much more pragmatic, but pragmatism ultimately means giving up what you want.
> often unit tests won't show a race.
That entirely depends on what behaviour your test is trying to document and validate. A test validating properties unrelated to race conditions often won't consistently show a race, but that isn't its intent so there would be no expectation of it validating something unrelated. A test that is validating that there isn't race condition will show the race if there is one.
You can use deterministic simulation testing to reproduce a real-world race condition 100% of the time while under test.
But that's not the kind of test that will expose a race condition 1% of the time. The kinds of tests that are inadvertently finding race conditions 1% of the time are focused on other concerns.
So it is still not a case of a flaky test, but maybe a case of a missing test.
Because the Tools model allows for finer grained security controls than just bash and pipe. Do you really want Claude doing `find | exec` instead of calling an API that’s designed to prevent damage?
not for every user or use case. when developing of course i run claude —-do-whatever-u-want; but in a production system or a shared agent use case, im giving the agent least privilege necessary. being able to spawn POSIX processes is not necessary to analyze OpenTelemetry metric anomalies.
yeah, I would rather it did that. You run Claude in a sandbox that restricts visibility to only the files it should know about in the first place. Currently I use a mix of bwrap and syd for filtering.
Wow, it is really awful. This is such a pointless misstep given that Standard Notes has been around for years, was not vibe coded, is not an AI app - but this landing page makes me immediately assume it’s slop.
Yes - Anthropic _does_ incur business risk if their products are misused and this becomes a scandal. Legally the government may be in the clear to use the product, but that doesn’t mean Anthropic’s business is protected. Moral concerns aside, it’s their prerogative to decide not to take on a customer that may misuse their product in a way that might incur reputational harm.
Or it was their prerogative, until the Trump administration. Now even private companies must bend the knee.
Either Anthropic is seen as the clear leader (it certainly is for coding agents) or this is a political stunt to stamp out any opposition to the administration. Or both.
100%. A lot of these AI anxiety driven odes to the loss of craft have me wondering whether anyone cares about the value being provided to the user (or the business), which is the part that is actually your job.
Elegant, well-written and technically sound projects will continue to exist, but I’ve seen too many “well crafted” implementations of such technically vexing features as “fetching data and returning it” that were so overengineered that it should have been considered theft of company money.
"I’ve seen too many «well crafted» implementations of such technically vexing features as «fetching data and returning it» that were so overengineered that it should have been considered theft of company money."
This judgement has merit. However, over the years I got to perceive that over-engineering tendency to be the manifestation of exploratory spirit in one's craft. This is how the Unix got to be created at Bell Labs. To their managers, Ken Thompson and Dennis Ritchie worked on programs like the "ed" editor, thus they cared about "value being provided to the user (or the business)". What was later officially named Unix was not pitched as an operating system, but instead framed mostly just a needed way to organize the growing set of utilities, among other things (i.e. as a footnote). What are the over-engineered bits (and the related gained experience) in a given project may become useful for something else. People (tend to) do this kind of stuff. But should they be blamed, considering the enticing promise of growth and development of new technologies, practiced by employers themselves, as part of recruitment game?
I’m saying it’s silly hyperbole to make the leap to implying that only people in other countries have easy access to information.
These absurd claims always turn into a game of motte and bailey when they’re called out, with retreats to safer claims. I’m talking about the original claim, that “people in other countries” have easy access to this information which we, in the US, see everywhere all the time right now (except TikTok apparently).
reply