More

nullbyte · 2026-04-23T18:09:09 1776967749

82.7% on Terminal Bench is crazy

toephu2 · 2026-04-23T19:33:49 1776972829

Is it? There are 5 other models near ~80% and it was achieved in March... which in AI-world seems like a century ago.

https://www.tbench.ai/leaderboard/terminal-bench/2.0

ejpir · 2026-04-23T20:58:49 1776977929

those are not verified. I've tried forgecode and I cannot believe they didn't do something to influence the benchmarks

GodelNumbering · 2026-04-23T21:07:41 1776978461

Yup, they were found to be sneaking the answer key using agents.md

https://debugml.github.io/cheating-agents/#sneaking-the-answ...

nullbyte · 2026-04-21T16:41:17 1776789677

I guess it depends if you are working on something important to national security. Especially corporate codebases, etc.

nullbyte · 2026-04-21T16:35:01 1776789301

Does that include OpenCode? That's what I care about most and it's the primary reason I've been sticking with OAI the past few months.

nullbyte · 2026-04-20T19:37:23 1776713843

Anthropic doesn't, but Google and OAI both release open source models. Just not 1T parameter ones.

osiris970 · 2026-04-20T19:41:33 1776714093

Exactly, they release cool consumer stuff, but they aren't releasing anything close to the performance of the best open weight Chinese models. They basically compete in the "fun running at home doing basic stuff" scene. (Except OSs 120 by openai but it's been ages since then)

Zetaphor · 2026-04-22T02:45:46 1776825946

That sentence is giving OpenAI way more credit than they are due.

They released a single open model after being goaded by the community because everyone except "Open"AI were multiple generations into open releases.

We haven't heard a word since, I wouldn't be surprised if it takes them another 6 years to release their next one.

nullbyte · 2026-04-02T22:11:44 1775167904

Last paragraph made me chuckle

nullbyte · 2026-03-31T03:46:21 1774928781

npm security team has removed the offending package: https://github.com/axios/axios/issues/10604#issuecomment-415...

new installs should be safe now

nullbyte · 2026-03-24T17:38:02 1774373882

What a brilliant idea! is this all done locally? That's incredible.

apwheele · 2026-03-24T17:44:08 1774374248

While the vector store is local, it is sending the data to Gemini's API for embedding. (Which if using a paid API key is probably fine for most use cases, no long term retention/training etc.)

jakejmnz · 2026-03-26T14:54:21 1774536861

works completely locally with a decent model: https://github.com/jakejimenez/sentinelsearch

jakejmnz · 2026-03-26T14:54:00 1774536840

Make a proof of concept, honestly worked fairly well: https://github.com/jakejimenez/sentinelsearch

nullbyte · 2026-03-24T17:18:59 1774372739

I am curious how the TPS compares vs default OS virtual memory paging

nullbyte · 2026-03-06T23:26:17 1772839577

I always enjoy reading Anthropic's blogposts, they often have great articles

nullbyte · 2026-03-03T18:35:03 1772562903

They did something to Settings after MacOS Monterey that made it very slow. I miss the snappiness of the old app!

dilap · 2026-03-03T19:00:00 1772564400

I don't know for a fact, but I'd bet a few digits of cold hard cash it's a SwiftUI rewrite that is to blame. (Any1 in the know want to chime in?)

And yeah, it's terrible. Apple doesn't make good apps anymore.

(This is part of why I think electron does so well -- it's not as good as a really good native app [e.g. Sublime Text], but it's way better than the sort of default whatever you'll get doing native. You get a lot of niceness that's built into the web stack.)

poszlem · 2026-03-03T19:00:07 1772564407

Well, perhaps it has something to do with the fact that they started using webviews for stuff like system UI: https://blog.jim-nielsen.com/2022/inspecting-web-views-in-ma...