HN2new | past | comments | ask | show | jobs | submit | btown's commentslogin

Something I've recently come to appreciate is that Claude, with the context of your codebase and your ORM models and how they connect to your frontend, given read-only access to production databases (perhaps proxied to anonymize client data), and to be able to drive production sites with Chrome MCP, can be a monster at answering operational questions.

Say you need to present a new statistic to a prospective partner, or an enterprise client has an operational issue that needs to be escalated. Sales/account management pings people, and pretty soon there's a web of connections that range between email, ticketing systems, Slack, and Claude Code sessions. Someone being brought in needs to be brought up to speed on that entire web. It's a highly focused conversation with human and AI participants, that (because human counterparties need to weigh in) by definition must happen in parallel with other work.

So many companies would benefit from a Hub that speaks agentic workflows, and streams progress token by token, fluently.

Could Anthropic excel at building a backend for this? Absolutely.

Could they excel at building a frontend that takes the world by storm the way Slack did, with its radical simplicity? Unfortunately I'm not as confident here. Consider that their VS Code plugin lags their terminal TUI so massively that it still is impossible to rename sessions [0], much less use things like remote-control functionality.

Show me that they can treat native-feeling multi-platform UI with as much care as they do their agentic loops, and I'll show you a company that could change every business forever.

[0] https://github.com/anthropics/claude-code/issues/24472


I'm discovering new possibilities all the time with how Claude can work on a new type of task in our codebase and business more broadly. While a lot of this can be brought to the team by saying "encapsulate what you just did into a skill," sometimes it's as much about knowing what kinds of prompts to use to guide it as well.

Showing a colleague that flow, and the sequence of not just prompts but the types of Claude outputs to expect, all leading to Claude doing something that would have taken us a half day of work? As a linear video, rather than just a dump or screenshot of a final page? That could help to diffuse best practices rapidly.

OP - you might want to look at the kind of data model Loom used for this problem for videos in general, in terms of workspaces and permissions. Could make a startup out of this!

(Also as a smaller note - you might want to skip over long runs and generations by default, rather than forcing someone into 5x mode! A user of this would want to see messages, to and from Claude, at a standardized rate - not necessarily a sped up version of clock time.)


That’s a really interesting way to frame it — showing the flow of prompts and responses rather than just the final result.

I’ve mostly been using it for demos and sharing sessions with teammates, but the training / best-practices angle is a great point.

On navigation: you can already step through turns with the arrow keys or jump around the timeline, so you don’t have to sit through long generations. But I agree that smarter defaults (skipping or collapsing long runs) could make it smoother.

And the Loom comparison is interesting — I hadn’t thought about the workspace/permission side yet since this started as a small CLI tool for sharing sessions, but that’s a good direction to think about.


> Showing a colleague that flow, and the sequence of not just prompts but the types of Claude outputs to expect, all leading to Claude doing something that would have taken us a half day of work? As a linear video, rather than just a dump or screenshot of a final page? That could help to diffuse best practices rapidly.

Would this not be visible in a text dump without taking half a day to watch? What's/who's the benefit/benificiary of the realtime experience here?

Granted, I have friends who don't read but prefer visual stimulation. I don't think the overlap with people comfortable with code is very large at all.


In all seriousness, I’m unsure that official job numbers (even if they weren’t intentionally distorted, which is a big if these days) have caught up with the gig/creator economy. If a person making ends meet with food delivery and a few dollars of ad revenue is classified as “self-employed,” is that the same level of stability and ability to keep up with cost-of-living increases (which may outpace traditional inflation) vs. self-employed freelancers with clients? Which isn’t to cast shade on those paths, but it’s meaningful to the metrics we choose to follow.

Yes, they have. The BLS actually tracks a number of different "unemployment" numbers, whose definition you see here [0].

The "official" unemployment number, the one now reported as 4.4%, basically only counts the "percent of people actively looking for work that can't find it, who have been looking for work for more that 15 weeks.

The number you are trying to capture is what the BLS calls "U-6". That number is defined as:

> total unemployed, plus all marginally attached workers, plus total employed part time for economic reasons, as a percent of the civilian labor force plus all marginally attached workers.

In other words, anyone that would like more work but can't get it. I encourage you to read the entire definition and footnotes at the link I shared. It's very interesting!

Right now U-6 is at 8%. During the 2007 recession it peaked at about 17%. [1]

[0]: https://www.bls.gov/lau/stalt.htm

[1]: https://fred.stlouisfed.org/series/U6RATE


Thanks for bringing this up, and you're right that this is closer. I still think it's imperfect, because a gig economy worker who works 35+ hours per week would be considered "employed full time" (footnotes, https://www.bls.gov/cps/cpsaat36.htm) and as far as I know would not be included in the U-6.

I don't have a more recent statistic, but in 2018 half of Uber rides were provided by drivers working 35+ hours per week: https://www.epi.org/publication/uber-and-the-labor-market-ub...

So while I was perhaps too harsh on the work of the BLS, I do think that newer metrics are warranted.


> The second time the same (or similar) input is used these states are already created and it is linear.

Does this imply that the DFA for a regex, as an internal cache, is mutable and persisted between inputs? Could this lead to subtle denial-of-service attacks, where inputs are chosen by an attacker to steadily increase the cached complexity - are there eviction techniques to guard against this? And how might this work in a multi-threaded environment?


Yes, most (i think all) lazy DFA engines have a mutable DFA behind a lock internally that grows during matching.

Multithreading is generally a non-issue, you just wrap the function that creates the state behind a lock/mutex, this is usually the default.

The subtle denial of service part is interesting, i haven't thought of it before. Yes this is possible. For security-critical uses i would compile the full DFA ahead of time - the memory cost may be painful but this completely removes the chance of anything going wrong.

There are valid arguments to switch from DFA to NFA with large state spaces, but RE# intentionally does not switch to a NFA and capitalizes on reducing the DFA memory costs instead (eg. minterm compression in the post, algebraic simplifications in the paper).

The problem with going from DFA to NFA for large state spaces is that this makes the match time performance fall off a cliff - something like going from 1GB/s to 1KB/s as we also show in the benchmarks in the paper.

As for eviction techniques i have not researched this, the simplest thing to do is just completely reset the instance and rebuild past a certain size, but likely there is a better way.


> Multithreading is generally a non-issue, you just wrap the function that creates the state behind a lock/mutex, this is usually the default.

But you also have to lock when reading the state, not just when writing/creating it. Wouldn’t that cause lock contention with sufficiently concurrent use?


No, we do not lock reading the state, we only lock the creation side and the transition table reference stays valid during matching even if it is outdated.

Only when a nonexistent state is encountered during matching it enters the locked region.


Ah, I see, so it’s basically the Racy Single-Check Idiom.

> Readers are simply more willing to tolerate a lightspeed jump from belief X to belief Y if the writer himself (a) seems taken aback by it and (b) acts as if they had no say in the matter - as though the situation simply unfolded that way.

This reminds me of p-hacking in academia: https://pmc.ncbi.nlm.nih.gov/articles/PMC4359000/ is a decent overview.

And, to a certain extent, the manipulation of "league tables" in finance: https://mergersandinquisitions.com/investment-banking-league... / https://www.wsj.com/articles/SB117616199089164489

All these allow a presenter to frame a discovery or result as "surprising" and "novel" - even if, from the very start, the rhetorical goal was to take a pre-ordained desire to publish along certain lines, and tweak things to present it as if it was a happenstance discovery, washing the presenter's hands of that intentionality.

One of the things I worry about, especially as education shifts more and more towards AI, is that we lose the critical thinking skill of: "here are a set of facts that are true, but there can still be bias in the process by which those facts are selected, thus one must look beyond the facts presented."

And in theory, AI could help us to do this with every fact we consume! But it's steered (quite intentionally) towards giving simple answers, even when reality isn't simple, and the underlying goal of those presenting the facts that entered one's corpus is as important as those facts' existence.


This is also just the direction that AI is taking us, even for people who wouldn't describe themselves as traditional developers.

Setting aside on-device LLMs, one needs RAM and disk space just for the multiple isolated Claude Cowork etc. VMs that will increasingly become part of people's everyday lives.

And when it's easier than ever to create an Electron app, everything's going to have an Electron app, with all the RAM/disk overhead that entails. And of course, nobody's asking their agents "optimize the resource usage of the app I made last week" - they're moving on to the next feature or project.

I suppose the demoscene will always be there, for those of us who increasingly need a refuge from ram-flation.


Some press coverage (though I highly recommend just reading the paper linked as the OP, it’s quite approachable to skim without prior knowledge, and you get to see how they turn the Star Trek replicator problem into “just” a loss optimization problem with projectors and spinning mirrors!):

https://aminsightasia.com/education/tsinghua-dish-3d-printin...

And as other have noted, it’s worth bearing in mind that most images here are less than a centimeter in scale; the scale bar is a millimeter. Super impressive stuff.


Minor correction. Actually confusingly the scale bars vary not just from figure to figure but from image to image within a single figure as noted in the captions. It's a rather odd choice IMO.

There's also a reasonable alignment between Tailwind's original goal (if not an explicit one) of minimizing characters typed, and a goal held by subscription-model coding agents to minimize the number of generated tokens to reach a working solution.

But as much as this makes sense, I miss the days of meaningful class names and standalone (S)CSS. Done well, with BEM and the like, it creates a semantically meaningful "plugin infrastructure" on the frontend, where you write simple CSS scripts to play with tweaks, and those overrides can eventually become code, without needing to target "the second x within the third y of the z."

Not to mention that components become more easily scriptable as well. A component running on a production website becomes hackable in the same vein of why this is called Hacker News. And in trying to minimize tokens on greenfield code generation, we've lost that hackability, in a real way.

I'd recommend: tell your AGENTS.md to include meaningful classnames, even if not relevant to styling, in generated code. If you have a configurability system that lets you plug in CSS overrides or custom scripts, make the data from those configurations searchable by the LLM as well. Now you have all the tools you need to make your site deeply customizable, particularly when delivering private-labeled solutions to partners. It's far easier to build this in early, when you have context on the business meaning of every div, rather than later on. Somewhere, a GPU may sigh at generating a few extra tokens, but it's worthwhile.


Art and engineering are both constrained optimization problems - at their core, both involve transforming a loosely defined aesthetic desire into a repeatable methodology!

And if we can call ourselves software engineers, where our day-to-day (mostly) involves less calculus and more creative interpretation of loose ideas, in the context of a corpus of historical texts that we literally call "libraries" - are we not artists and art historians?

We're far closer to Jimi than Roger, in many ways. Pots and kettles :)


We should not call ourselves engineers - it's a massive insult to actual professional engineers.

Speak for yourself, some of us value and incorporate both science and methodology into our craft, and adhere to a system of ethics.

That's great, but it doesn't make you or any of us engineers.

Just because I drive my car with immense focus, make precision shifts, and hit the apex of all of my turns when getting onto and off of the freeway doesn't make me a race car driver.

Engineers don't just feel good vibes about science and mix it into their work. It is the core of their work.

Simply having a methodology absolutely is not sufficient for being an engineer.

And great, you have an arbitrary system of ethics, like everyone does I imagine. But no one holds you to these ethics.


Care to share your operational definition of the word "engineer"?


The saving grace of Claude Code skills is that when writing them yourself, you can give them frontmatter like "use when mentioning X" that makes them become relevant for very specific "shibboleths" - which you can then use when prompting.

Are we at an ideal balance where Claude Code is pulling things in proactively enough... without bringing in irrelevant skills just because the "vibes" might match in frontmatter? Arguably not. But it's still a powerful system.


For manual prompting, I use a "macro"-like system where I can just add `[@mymacro]` in the prompt itself and Claude will know to `./lookup.sh mymacro` to load its definition. Can easily chain multiple together. `[@code-review:3][@pycode]` -> 3x parallel code review, initialize subagents with python-code-guide.md or something. ...Also wrote a parser so it gets reminded by additionalContext in hooks.

Interestingly, I've seen Claude do `./lookup.sh relevant-macro` without any prompting by me. Probably due it being mentioned in the compaction summary.


Compaction includes all user prompts from the most recent session verbatim, so that's likely what's happening!

Fun fact, it can miss! I've seen it miss almost half my messages, including some which were actually important, haha.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: