Hacker News .hnnew | past | comments | ask | show | jobs | submit | pscanf's commentslogin

I only use GitHub (and actions) for personal open-source projects, so I can't really complain because I'm getting everything for free¹. But even for those projects I recently had to (partially) switch actions to a paid solution² because GitHub's runners were randomly getting stuck for no discernible reason.

¹ Glossing over the "what they're getting in return" part. ² https://www.warpbuild.com/


I'm building an app that is, in a way, a modern take on Lotus Notes (https://github.com/superegodev/superego), and I couldn't feel this more:

> It is hard, today, to explain exactly what Lotus Notes was.

Whenever I try to explain what it does to a non-tech person, I'm met with confused looks that make me quickly give up and mumble something like "It's for techies and data nerds". I think to myself "they're not my target audience".

But I actually would like them to be, at some point. In the 90s "the generality and depth of its capabilities meant that it was also just plain hard to use", but now LLMs lower a lot the barrier to entry, so I think there can be a renaissance of such malleable¹ platforms.

Of course, the user still needs to "know what they need" and see software as something that can be configured and shaped to their needs which, with "digital literacy" decreasing, might be a bigger obstacle than I think.

¹ https://www.inkandswitch.com/malleable-software


If you squint, Notion and Coda are childish versions of Lotus Notes.

One noted science fiction author, C.J. Cherryh, notes, “It is perfectly okay to write garbage --- as long as you edit brilliantly.”[1] --- for a while I've been wondering if this adage was applicable to Vibe-coding, and your methodology would seem to be a reasonable approach/response to get the benefits of this and to shield against the detriments, and to ensure that a human developer understands the code before committing.

1 - https://www.goodreads.com/quotes/398754-it-is-perfectly-okay...


> your methodology would seem to be a reasonable approach/response to get the benefits of this and to shield against the detriments

If you're referring to the sandboxing / isolation of each app, I agree. Plus, the user can change the app quite easily, so if when they spot a bug, they can tell the agent to fix it (and cross their fingers!).

> ensure that a human developer understands the code before committing

Just to clarify: for Superego's app there's no human developer oversight, though. At least for the ones the user self-creates. Obviously the user will check that the app they just made works, but they might not spot subtle bugs. I employ some strategies to _decrease the likelihood of bugs_ (I wrote a bit about it here https://pscanf.com/s/351/, if you're interested), but of course only formal verification would ensure there aren't any.


I was referring more to your commentary/explanation about it not being a Vibe-coded app.

Yeah, I can see that one is on their own recognizance when letting an LLM run unsupervised.


Ah yeah, I understand now. And I also agree with the quote then! (Though it does change the nature of the job, and it's not terribly enjoyable...)

Hopefully you can find some way to keep the enjoyment in this.

Thanks for sharing. The demo linked below looked pretty cool, I think this might be a nice complement to Glamorous Toolkit in some of my personal and work flows.

Just watched your demo here:

https://youtu.be/vB3xo2qn_g4?si=y2udkdfezSR9ktUO

Pretty cool!


I like that you can generate new programs from within the system.

That's something I miss with Notion. I basically want a Notion but extensible and malleable like Emacs.


Yes! That's more or less the angle I'm going for. I mean, I don't aim just yet for Emacs-levels of malleability, but at least for something where you can create some useful day to day personal tools.

Is there a story behind the old guy in the logo?

It must be a nod to Freud (i.e. id, ego, and super ego)

Correct. Admittedly, graphic design is not even my passion, so there's probably lots of room for improvement. But at this point I've grown accustomed to the friendly face. :D

Many people seem to associate "ego" with negative connotation.

The name gives a weird vibe. But, it's free and it's your project so, whatever. ¯ \ _ ( ツ ) _ / ¯


Yeah, I agree, though it wants to be slightly provocative as well: it's all about you, your data, your software, your rights.

Ah... Ok, that makes sense.

Very cool project!

Question regarding the pluggable js engine: I have an electron app where I'm currently using QuickJS to run LLM-generated code. Would edge.js be able (theoretically) to use electron's v8 to get a "sanboxed within electron" execution environment?


Yes, this should be fully possible.

We actually believe Edge.js will a great use case for LLM-generated code.


naively, based on their install.sh script, you'd pick the correct edge.js executable and shell out to that. I'm sure there's some more integral means, but if you wanted a quick test, that should be easily setup.

I quite like the GPT models when chatting with them (in fact, they're probably my favorites), but for agentic work I only had bad experiences with them.

They're incredibly slow (via official API or openrouter), but most of all they seem not to understand the instructions that I give them. I'm sure I'm _holding them wrong_, in the sense that I'm not tailoring my prompt for them, but most other models don't have problem with the exact same prompt.

Does anybody else have a similar experience?


These little 5.4 ones are relatively low latency and fast which is what I need for voice applications. But can't quite follow instructions well enough for my task.

That's really the story of my life. Trying to find a smart model with low latency.

Qwen 3.5 9b is almost smart enough and I assume I can run it on a 5090 with very low latency. Almost. So I am thinking I will fine tune it for my application a little.


I've had such the opposite experience, but mainly doing agentic coding & little chat.

Codex is an ice man. Every other model will have a thinking output that is meaningful and significant, that is walking through its assumptions. Codex outputs only a very basic idea of what it's thinking about, doesn't verbalize the problem or it's constraints at all.

Codex also is by far the most sycophantic model. I am a capable coder, have my charms, but every single direction change I suggest, codex is all: "that's a great idea, and we should totally go that [very different] direction", try as I might to get it to act like more of a peer.

Opus I think does a better job of working with me to figure out what to build, and understanding the problem more. But I find it still has a propensity for making somewhat weird suggestions. I can watch it talk itself into some weird ideas. Which at least I can stop and alter! But I find its less reliable at kicking out good technical work.

Codex is plenty fast in ChatGPT+. Speed is not the issue. I'm also used to GLM speeds. Having parallel work open, keeping an eye on multiple terminals is just a fact of life now; work needs to optimize itself (organizationally) for parallel workflows if it wants agentic productivity from us.

I have enormous respect for Codex, and think it (by signficiant measure) has the best ability to code. In some ways I think maybe some of the reason it's so good is because it's not trying to convey complex dimensional exploration into a understandable human thought sequence. But I resent how you just have to let it work, before you have a chance to talk with it and intervene. Even when discussing it is extremely extremely terse, and I find I have to ask it again and again and again to expand.

The one caveat i'll add, I've been dabbling elsewhere but mainly i use OpenCode and it's prompt is pretty extensive and may me part of why codex feels like an ice man to me. https://github.com/anomalyco/opencode/blob/dev/packages/open...


“every single direction change I suggest, codex is all: "that's a great idea, and we should totally go that [very different] direction", try as I might to get it to act like more of a peer.“

That’s not the model, that’s a personality setting you can change in the codex config file.

Set it to Pragmatic, and ask it (not command it) about your new direction in planning mode.

It will tell you if your idea is not good for the given project. It’s an excellent peer.


> I've had such the opposite experience

Yeah, I've actually heard many other people swear by the GPTs / Codex. I wonder what factors make one "click" with a model and not with another.

> Codex is an ice man.

That might be because OpenAI hides the actual reasoning traces, showing just a summary (if I understood correctly).


OpenClaw guy (he's Austrian, it's relevant) much prefers Codex over Claude and articulated it as being due to Claude's output feeling very "American" and Codex's output feeling very "German", and I personally really agree with the sentiment.

As an American, Claude feels much more natural to me, with the same overly-optimistic "move fast, break things" ethos that permeates our culture. It takes bigger swings (and misses) at harder-to-quantify concepts than Codex, cuts corners (not intentionally, but it feels like a human who's just moving too fast to see the forest for the trees in the moment), etc. Codex on the other hand feels more grounded, more prone to trying to aggregate blind spots, edge cases, and cover the request more thoroughly than Claude. It's far more pedantic and efficient, almost humorless. The dude also claimed that most of the Codex team is European while Claude team is American, and suggested that as an influence on why this might be.

Anyways, I've found that if I force Claude and Codex to talk to each other, I can get way better results and consistency by using Claude to generate fairly good plans from my detailed requests that it passes to Codex for review and amendment, Claude incorporates the feedback and implements the code, then Codex reviews the commit and patches anything Claude misses. Best of both worlds. YMMV


Oh, interesting perspective. I'm Italian, but from an Alpine valley not far from Austria, so I don't know what I should prefer. :D

But joking aside, putting it like that I'd think I'd prefer the German/Codex way of doing things, yet I'm in camp Claude. But I've always worked better with teammates that balance my fastidiousness, so maybe that's my answer.


Claude Code now hides thinking as well unless you turn on an undocumented setting:

https://github.com/anthropics/claude-code/issues/31326#issue...

https://x.com/nummanali/status/2032451025500528687


Opinions are my own.

For agentic work, both Gemini 3.1 and Opus 4.6 passed the bar for me. I do prefer Opus because my SIs are tuned for that, and I don't want to rewrite them.

But ChatGPT models don't pass the bar. It seems to be trained to be conversational and role-playing. It "acts" like an agent, but it fails to keep the context to really complete the task. It's a bit tiring to always have to double check its work / results.


I find both Opus 4.6 and GPT-5.4 have weaknesses but tend to support each other. Someone described it to me jokingly as "Claude has ADHD and Codex is autistic." Claude is great at doing something until it gets done and will run for hours on a task without feedback, Codex is often the opposite: it will ask for feedback often and sometimes just stop in the middle of a task saying it's done with step 1 of 5. On the other hand, Codex is a diligent reviewer and will find even subtle bugs that Claude created in its big long-running "until its done" work mode.

Seems like the diagnoses are backwards, in this case. Claude usually stays on task no matter what, but lately Opus 4.6 is showing signs of overuse. I never used to get overload/internal server error messages, but I've seen about a half-dozen of them today alone. And it has been prone to blowing off subtasks that I'd have expected it to resolve.

Yea absolutely. I am using GPT 5.2 / 5.2 Codex with OpenCode and it just doesn't get what I am doing or looses context. Claude on the other side (via GitHub Copilot) has no problem and also discovers the repository on it's own in new sessions while I need to basically spoonfeed GPT. I also agree on the speed. Earlier today I tasked GPT 5.2 Codex with a small refactor of a task in our codebase with reasoning to high and it took 20 minutes to move around 20 files.

I don't know any reason to use 5.2, when 5.3 is quite a bit faster.

If using OpenAI models, use the Codex desktop app, it runs circles around OpenCode.

Can you educate me as to what makes Codex app superior using the same GPT model in both? Thx in advance!

Usually it's the prompts, or the model is tuned to the specific first-party tools. Sometimes that gives an edge over the generic tools, unfortunately.

It's the harness.

Same, and I can't put my finger on the "why" either. Plus I keep hitting guard rails for the strangest reasons, like telling codex "Add code signing to this build pipeline, use the pipeline at ~/myotherproject as reference" and codex tells me "You should not copy other people's code signing keys, I can't help you with this"

Are you requesting reasoning via param? That was a mistake I was making. However with highest reasoning level I would frequently encounter cyber security violation when using agent that self-modifies.

I prefer Claude models as well or open models for this reason except that Codex subscription gets pretty hefty token space.


Yes, I think? But I was talking more specifically about using the models via API in agents I develop, not for agentic coding. Though, thinking about it, I also don't click with the GPT models when I use them for coding (using Codex). They just seem "off" compared to Claude.

I am also talking about agents I'm developing. They just happen to be self-modifying but they're not for agentic coding. You have to explicitly send the reasoning effort parameter. If you set effort to None (default for gpt-5.4) you get very low intelligence.

Ah OK sorry, I misinterpreted. But yes, I double checked one case and I am indeed setting the parameter explicitly (defaulting to medium effort). But no luck. It feels like the model ignores what I'm telling it.

For example, I pass it a list of database collections and tools to search through them, ask a question that can very obviously be answered with them, and it responds with "I can’t tell yet from your current records" (just tested with GPT 5.4-mini).

But I've prodded it a bit more now, and maybe the model doesn't want to answer unless it can be very very confident of the answer it produces. So it's sort of a "soft refusal".


I like GPT models in Codex, for a fully vibecoded experience (I don't look at code) for my side-projects. In there, they really get the job done: you plan, they say what they'll do, and it shows up done. It's rare I need to push back and point out bugs. I really can't fault them for this very specific use-case.

For anything else, I can't stand them, and it genuinely feels like I am interacting with different models outside of codex:

- They act like terribly arrogant agents. It's just in the way they talk: self-assured, assertive. They don't say they think something, they say it is so. They don't really propose something, they say they're going to do it because it's right.

- If you counter them, their thinking traces are filled with what is virtually identical to: "I must control myself and speak plainly, this human is out of his fucking mind"

- They are slow. Measurably slow. Sonnet is so much faster. With Sonnet models, I can read every token as it comes, but it takes some focusing. With GPT, I can read the whole trace in real-time without any effort. It genuinely gives off this "dumb machine that can't follow me" vibe.

- Paradoxically, even though they are so full of themselves, they insist upon checking things which are obvious. They will say "The fix is to move this bit of code over there [it isn't]" and then immediately start looking at sort of random files to check...what exactly?

- I feel they make perhaps as many mistakes as Sonnet, but they are much less predictable mistakes. The kind that leaves me baffled. This doesn't have to be bad for code quality: Sonnet makes mistakes which _might_ at points even be _harder_ to catch, so might be easier to let slip by. Yet, it just imprints this feeling of distrust in the model which is counter-productive to make me want to come back to it

I didn't compare either with Gemini because Gemini is a joke that "does", and never says what it is "doing", except when it does so by leaving thinking traces in the middle of python code comments. Love my codebase to have "But wait, ..." in the middle of it. A useless model.

I've recently started saying this:

- Anthropic models feel like someone of that level of intelligence thinking through problems and solving them. Sonnet is not Opus -- it is sonnet-level intelligence, and shows it. It approaches problems from a sensible, reasonably predictable way.

- Gemini models feel like a cover for a bunch of inferior developers all cluelessly throwing shit at the wall and seeing what sticks -- yet, ultimately, they only show the final decision. Almost like you're paying a fraudulent agency that doesn't reveal its methods. The thinking is nonsensical and all over the place, and it does eventually achieve some of its goals, but you can't understand what little it shows other than "Running command X" and "Doing Y".

On a final note: when building agentic applications, I used to prefer GPT (a year ago), but I can't stand it now. Robotic, mechanic, constantly mis-using tools. I reach for Sonnet/Opus if I want competence and adherence to prompt, coupled with an impeccable use of tools. I reach for Gemini (mostly flash models) if I want an acceptable experience at a fraction of the price and latency.


> They act like terribly arrogant agents

Oh I feel that. I sometimes ask ChatGPT for "a review, pull no punches" of something I'm writing, and my god, the answers really get on my nerves! (They do make some useful points sometimes, though.)

> On a final note: when building agentic applications, I used to prefer GPT (a year ago), but I can't stand it now. Robotic, mechanic, constantly mis-using tools. I reach for Sonnet/Opus if I want competence and adherence to prompt, coupled with an impeccable use of tools. I reach for Gemini (mostly flash models) if I want an acceptable experience at a fraction of the price and latency.

Yeah, this has been almost exactly my experience as well.


A bit off topic, but reading your post I suddenly realized that if I read it three years ago I’d assume you’re either insane or joking. The world moved fast looking back.

> cyber security violation

Would you mind expanding on this? Do you mean in the resulting code? Or a security problem on your local machine?

I naively use models via our Copilot subscription for small coding tasks, but haven't gone too deep. So this kind of threat model is new to me.


No, I mean literal API response. They think I'm using it to hack. See related Github issue: https://github.com/anomalyco/opencode/issues/15776

I don't use OpenCode but looks like it also triggered similar use. My message was similar but different.


Ahhh okay, I see. Thanks!

I ran 5.4 Pro on some data analytics (admittedly it was 300+ pages). It took forever. Ran the same on Sonnet 4.6, night and day difference. I understand it's like using a V8 engine for a V4 task, but I was curious. These new models look promising though. I'd rather use something like a Haiku most of the time over the best rated. I'm not a rocket scientist or solving the mysteries of the universe. They seem to do a great job 80% of the time.

As a user, I'm of course very excited about v7. As a developer of an app that _integrates_ TypeScript, however, I feel a bit uneasy seeing the API feature still marked as "not ready" on the roadmap.

On the other hand, I can understand leaving it as the last thing to do, after the foundation has set. Also, the TypeScript team has really done an amazing job all these years with backward compatibility, so that is extremely reassuring.

Maybe my uneasiness is just impatience to get to use the sped-up version in my app as well! :)


FWIW I tried it with the VSCode preview in two monorepos we have - one frontend and one backend GraphQL servers with complex types - absolutely no issues (except one breaking change in tsconfig with baseUrl being removed) and fast compile times.


Same experience here with a pnpm workspace monorepo. The baseUrl removal was the only real friction — we were using it as a path alias root, had to move everything to subpath imports.

  The moduleResolution: node deprecation is the one I'd flag for anyone not paying attention yet. Switching to nodenext forced us to add .js extensions to all      
  relative imports, which was a bigger migration than expected.

  Compilation speed improvement is real though. Noticeably faster on incremental builds.


As a developer that integrates typescript into client-side web pages and writes lots of custom plugins, I'm extremely nervous about the port to Go.

I think the last estimate I saw was the the Go port compiled to WASM was going to be about triple the size of the minified JS bundle.


Very cool!

I'm particularly impressed by the bookmark "trick" to install it on a page. Despite having spent 15 years developing for the browser, I had somehow missed that feature of the bookmarks bar. But awesome UX for people to try out the tool. Congrats!


Thanks!

Bookmarklets are such an underrated feature. It's super convenient to inject and test scripts on any page. Seemed like the perfect low-friction entry point for people to try it out.

Spent some time on that UX because the concept is a bit hard to explain. Glad it worked!


I was also spammed (twice) by voice.ai.

You mention GDPR, which also "applies" to me, though I wonder if what they're doing is actually illegal. I mean, after all, I'm putting my email on GitHub precisely to give people a way to contact me.

Of course, I do that naïvely, assuming good faith, not expecting _companies_ to use it to spam me. So definitely what they're doing is, at the very least, in poor taste.


> I'm putting my email on GitHub precisely to give people a way to contact me.

They’re not only looking at the public email in your profile, they’re also looking at your committer email (git config user.email). You could argue that you’re not putting that out for people to contact you.

(I’ve used that trick a couple times to reach out to people, too, but never mass emailing.)


Is there any company that will take my money to solve GDPR issues? And by solve I mean sue the spammers? For last few years I saw they "try" to look legit, by claiming addresses are managed by some Hungarian/Spanish shell company, hoping no one will be able to afford pursuing infractions over borders.


There's probably a law against it, but I've always thought a legal company could make decent money taking cases like this in bulk for free, on the condition that they get to keep all the compensation, while the "client" still gets the satisfaction of punishing the offending party.


On the U.S., only Attorneys General can go after violators of the CAN-SPAM Act.

It needs to be modified like how individuals can go after telemarketers.


That’s pretty much class action lawsuits!


This is hard, because private right of action in Europe is often very limited, and the damages are low.

THe US basically has a "private police force" for certain laws, notably the ADA. Many people are against this, I personally think it's a great idea and something countries should be doing a lot more of of.


> Is there any company that will take my money to solve GDPR issues? And by solve I mean sue the spammers?

A lawyer


They spammed me as well.


How do you sync the data out of Garmin? Something like https://github.com/matin/garth, or syncing directly from the watch?


> when dealing with a lot of unknowns it's better to allow divergence and exploration

I completely agree, though I'm personally sitting out all of these protocols/frameworks/libraries. In 6 months time half of them will have been abandoned, and the other half will have morphed into something very different and incompatible.

For the time being, I just build things from scratch, which–as others have noted¹–is actually not that difficult, gives you understanding of what goes on under the hood, and doesn't tie you to someone else's innovation pace (whether it's higher or lower).

¹ https://fly.io/blog/everyone-write-an-agent/


I recently heard that when automobiles were new the USA quickly ended up in a state with 80 competing manufacturing brands. In a couple decades, the market figured out what customers actually want and what styles and features mattered, and the competition ecosystem consolidated to 5 brands.

The same happened with GPUs in the 90s. When Jensen formed Nvidia there were 70 other companies selling Graphics Cards that you could put in a PCI slot. Now there are 2.


Hey German. Congrats on shipping the project!

Reading the "Why another CRDT / OT library?" I like that you seem to have taken a "Pareto approach": going for a simpler solution, even if not theoretically perfect. In the past few months I've been building a local-first app, and I've been a bit overwhelmed by the complexity of CRDTs.

The goal that I have with my app is to allow syncing between devices via Dropbox / Google Drive / iCloud or any another file-syncing server that the user is already using. I don't want to host a sync server for my users, and I don't want my users to need to self-host either.

Do you think it would be possible to use Dropbox as the sync "transport" for DocNode documents? I'm thinking: since a server is needed, one device could be designated as the server, and the others as clients. (Assuming a trusted environment with no rogue clients.)


Thanks! The answer depends on what you want:

1. Do you care about resolving concurrent conflicts? That is, if two users modify the same document simultaneously (or while one is offline), is it acceptable if one of their changes is lost? If that’s not a problem, then using Dropbox or something similar is totally fine, and you don't need a CRDT or OT like DocNode at all. Technologies like Dropbox aren't designed for conflict resolution and can't be integrated with CRDT or OT protocols.

2. If you do want to resolve conflicts, you have two options. (a) Use a CRDT, which doesn’t require a server. One downside is that clients must be connected simultaneously to synchronize changes. Personally, I don’t think most people want software that behaves like that, and that’s one of the reasons I didn’t focus on building a CRDT. If you’re going to end up needing a server anyway, what’s the point? (b) Use a server, either hosted by you or by your users. The good news is that it’s extremely simple. With DocNode Sync, you can literally set it up with one line of code on the server.

That doesn't apply if you're using a CRDT with "a server as an always-present client". But in that scenario, DocNode will be more efficient.


Thanks for the detailed answer!

I think I do want to solve conflicts. My use case is for a personal database, which simplifies things a bit: sync is between devices of a single person, so it's unlikely for concurrent offline changes to occur.

What I have in mind is a setup like the one from this experiment: https://tonsky.me/blog/crdt-filesync/ . I don't know if it's at all possible in my use case though, or–in case it is possible–if it ends up being practical. As you said, the resulting user experience might be so strange that it's not something users want.

Anyway, thanks again for the info and good luck with DocNode. :)


That's an interesting idea.

A "bring your own cloud" provider could be used for DocNode Sync. For example, Dropbox. Dropbox doesn't have the ability to resolve conflicts, but it can be used to deterministically store the order of operations. Then, clients would reconcile and combine the operations into a single state file. Authentication might be a bit tricky, since permissions would have to reside elsewhere. But I think it's doable.

I'll consider it!


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: