Hacker News .hnnew | past | comments | ask | show | jobs | submit | 100ms's commentslogin

Tinfoil looks super interesting! Do you have load balancers in front of the trusted compute stack? Looked at a design like this in a different space and the options for ensuring privacy in a traditional "best practice" architecture seemed very limited

Yes we do, but the load balancer also runs inside the enclave and is attested: https://github.com/tinfoilsh/confidential-model-router

In turn, that attests the model enclaves, for instance, see https://github.com/tinfoilsh/confidential-deepseek-v4-pro. The model repo/release that the model router attests is included in the attestation config, which creates a chain of trust.

Also see https://docs.tinfoil.sh/verification/attestation-architectur...


By the time the dust settles I wouldn't be surprised if personal interactive usage couldn't even be had for under $200. I can't fit my modelling of the serving costs of these things to any public reporting, even the more bearish examples

Comes down to what you mean by interactive usage. Most of chat & say openclaw usage is already within self-host range so no need to spend 200 a month on that.

High end SOTA coding is harder, but even there I suspect a mix of usage based strong models and selfhost small is viable if necessary.


We pay per token in our company. It is not hard to spend $100 for one morning coding session. So thousands per month per programmer. The company finds it valuable enough to pay for, but if I ever paid these from my own pocket I'd look into DeepSeek et.al.

Not a lot of people have this budget, and I'm not sure how many people with that type of cash are also interested in paying it for AI.

Of course, this is fine for people in the bay area earning hundreds of thousands of dollars a year. But then your client base becomes so reduced its hard to justify the valuation these companies have.

These AI companies are not hyped so much because they will offer a luxury product, they're valued because they're supposed to "change the world" which luxury does not do.


I dislike neg comments but really curious - I can see the how but absolutely clueless about the why. Running a block device over a high latency WAN link seems like a terrible idea, what's the use case?

https://scsipub.com/blog/an-esp32-as-a-network-attached-usb-...

Apparently, exposing small USB sticks to industrial equipment that uses it for loading/saving configs and screenshots and being able to 'network' it with shared iSCSI drives.

"The scope writes screen_001.png to “USB”; the file appears in a directory on my desktop, in the iSCSI overlay. Combined with a dropbox-style sync I no longer need to walk over and pull the stick out."

Quite brilliant and clever, if you ask me.

I'm wondering now about using an ESP32 stick and an iSCSI image of Windows install media - that could make for some fun in-house computer imaging setups.


That was indeed one of the main drivers for it! ESP32 (especially with 2.4GHz WiFi latencies) is not super well suited for OS installs, but... many UEFI firmwares (and some network drivers!) will let you boot iSCSI directly.

The other one is the Raspberry Pi{3,4,5} iSCSI shim linked there as well - I have a bunch of them for a bunch of paying clients CI/CD kinds of work, and I wanted these to boot from network, not from microSD.

Both of these projects could've benefited from a public demo iSCSI endpoint, we have http://example.com and whateveryouwant@mailinator.com - why not iSCSI


Ah, yeah, drat. I forgot entirely about the moonshot that becomes streaming several GB through the ESP... I was just thinking of an easier solution that avoids UEFI networking - wireless devices, tablets, odd things like that ;)

Then again this might still be useful yet - a small 64MB thumb drive with an autounattend.xml streamed to it is also an equally powerful tool for some Windows shenanigans.


The Pi4 shim actually exposes USB device as well. This works way, way better (and IMHO mostly because wired network is better than wireless for latency, ESP32’s feeble CPU aside)

I don’t have a use case, but I was thinking the same thing. But then I realized that the WAN speeds available now are equal to or faster than the LAN speeds I had when I had reason to use iSCSI. And things worked out decently well then, so I can see this being useful.

Eh, the main thing you would feel with this is latency, not bandwidth. Even on a 10 Mbps LAN, you would be able to open a file pretty quick, but over the internet latency is going to be > 100 ms in almost every case. That's a lot more painful.

Correct. Well, almost correct. Will see how much uptake this service will take (if any), and we can probably place it really close to the edge - for now it's on an Oregon server only.

That said, this isn't too far from mechanical HDD latencies of the /real/ SCSI drives.


I've answered some down the tree a bit for the inspirational use case for it.

Since I built it, I've started seeing it as a hammer for many nail-like problems - I think that would die down over time;

but.. I have my ESP32 "pendrive" that's net-synced. I have used it to install OS through UEFI-built-in initiator. I have added iSCSI targets to my windows laptop machine (and VMs) - while you need to deal with disconnects and reconnects, it actually works well enough.

It is a terrible idea, that doesn't sound as terrible for odd use-cases. But yes, the ESP32 over 2.4GHz over 3G internet is slow as molasses (20-30kB/s) - but when the alternative is 0.. or walking over there with a laptop, it works OK.


> Full stop.

Why people don't edit out obvious sloppification and expect to still have readers left


Third line in to the article: "But there’s one result in the benchmarks I keep coming back to."

I hear this sort of thing all the time now on YouTube from media/news personalities:

“And that’s the part nobody seems to be talking about.”

"And here's what keeps me up at night."

“This is where the story gets complicated.”

“Here’s the piece that doesn’t quite fit.”

“And this is where the usual explanation starts to break down.”

“Here’s what I can’t stop thinking about.”

“The part that should worry us is not the obvious one.”

“And that’s where the real problem begins.”

“But the more interesting question is the one no one is asking.”

“And this is where things stop being simple.”

It doesn't really worry me but I think its interesting that LLM speak sounds so distinctive, and how willing these media personalities are to be so obvious in reading out on TV what the LLM spat out.

I've never studied what LLMs say in depth is it is interesting that my brain recognises the speech pattern so easily.


I think this kind of language predates widespread LLM use, and has been picked up from that kind of writing. It's a "and here's where it gets interesting" pattern that people like Malcolm Gladwell and Freakonomics have used, even if the same thing could be said in a way that makes it sound much less intriguing.

There's even a word for it: “cliché”

How banal

10 EASY WAYS TO SPOT A LLM~ THE 10TH ONE WILL SURPRISE YOU!

The language of drama and import without meaningful substance. Words statistically likely to be used in a segue, regardless of the preceding or subsequent point. Particularly effective when it seems like you’re getting let in on a secret. Really fatiguing to read

A writing teacher once excoriated me for saying that something was important. “Don’t tell me it’s important, show me, and let me decide, and if you do your job I’ll agree”

I don’t know how a completion can tell when it needs to do this. Mostly so far it doesn’t seem capable


Maybe the solution is to cull the bad, cliché writing from the training data.

You can just instruct the LLM not to write like an LLM.

Isn't this the format of "hook-driven media" a constant stream of "second-act pivots" - where some new twist is added to a story to re-engage the reader and keep them reading.

BuzzFeed and Upworthy etc pioneered this for web 'news stories', then it got used in linkedin, twitter, and everywhere where views are more important than the content.


Ugh, you're making me remember the last time I listened to NPR. It's so bad.

I listen to NPR daily and I don't think I've ever heard any of them use that phrasing.

I notice this very often in LinkedIn posts, and it's annoying, but I had not realized it was LLM-speak? Isn't it possible that people write like this naturally?

I think LLM's have that sort of "summarise, wrap it in a bow tie, give a little dramatic punch as a preview to the next few points".

Guys, LLMs are build on all these social cues which were developed pre-model. There's atleast 10 years of pre-llm gibberish.

This is to say: Marketers and spammers repeat the same things over and over, and these models are build on coalescing repetition into the basis.

So yeah, of course people talked like this before, but it was always in some known context like linked in or a spam website.


Sure, but RLHF ended up emphasizing this to a level beyond normal human writing.

Arguably it's exactly because it was used naturally so often that the LLMs parrot it so frequently.

Yes. Some people are very trigger happy in attributing human slop to LLMs.

I listened to a lot of NPR podcasts before LLM were around, and most of them are full of these kinds of filler phrases.

Nate B Jones videos ... YouTube channel "AI News and Strategy Daily" channel uses all of these. Every video.

The general concept of a hook with delayed payoff is far from new, and generally one of the better ways at keeping attention.

It's also exactly the Mr beast playbook, and got him to the largest channel on YouTube.

Any system attempting to capture human attention will use these techniques, nothing LLM-specific here at all.


Apparently John Oliver was an LLM before they were even invented.

So are we saying it's fine that the article is written by an LLM as long as it doesn't have the tell-tale signs of LLMs?

It's more about curating the things you're publishing. Why would I bother reading what you couldn't bother to read?

They could easily have read it, and thought , that communicates the information that it needs to.

No point creating busywork for yourself just shuffling words around when the information is there, no?

I guess it depends on what you want out of the article. Substance, or style?


> They could easily have read it, and thought , that communicates the information that it needs to.

I'd they aren't self-aware enough or smart enough to determine that what they wrote is indistinguishable from text generation, how probable is it that they have something of value to add to any thought?


I don't really see reason to complain about tool use, so long as the result is cohesive, accurate and that ultimately means a human has at least read their own output before publishing. It's a bit like receiving a supposedly personal letter that starts "Dear [INSERT_FIRST_NAME_FIELD]," are you really going to read such a thing?

An article without telltale signs of an LLM is indistinguishable from an article written by a human, so yes.

My opinion is that literature and art will continue pushing the envelope in the places they always pushed the envelope. LLMs will not change this, humans love making art, and they love doing it in new ways.

Corporate announcements were never the places that literature and art were pushing the envelope. They were slop before, and they're slop now.


Are you referring to the literal use of the expression "full stop"? I don't see it anymore in the article, maybe they edited it out?

These seem amazing for hobbyist, but that TDP given the perf might be an issue deploying a lot of them

Its performance is pretty unbalanced. If you're using it for the couple of things that it's good at, the TDP is competitive.

That's not an ideal tone for here. From my perspective the most incredible thing is the concentration of IO. I might like at some point for elements of my computer usage to remain private, it would be nice if that ability were preserved. A bit hard to accomplish when 1 out of 4 bits processed globally all run through the same network

It's literally a distinct model with a different optimisation goal compared to normal chat. There's a ton of public information around how they work and how they're trained

Unironically the best method to implement that browser feature you're looking for is probably also AI. Which tells a meta-story, AI isn't a new feature it's also a new medium. It can be used to turn cave speak into works of literature just as easily as it can turn voluminous spew into one liners (Ed Zitron just popped into mind for some reason). You can't ignore it once it exists, but it sounds like the problem you have genuinely can be solved by it, and I expect over the next decade we'll see a lot more of exactly that.

Here's to reading HN projected through the lens of manga comic strips sometime after we solve the GPU shortage..


He goes way beyond saying it's a test, he's legitimising the change in the follow-up rationale


I'm excited for Taalas, but the worry with that suggestion is that it would blow out energy per net unit of work, which kills a lot of Taalas' buzz. Still, it's inevitable if you make something an order of magnitude faster, folk will just come along and feed it an order of magnitude more work. I hope the middleground with Taalas is a cottage industry of LLM hosts with a small-mid sized budget hosting last gen models for quite cheap. Although if they're packed to max utilisation with all the new workloads they enable, latency might not be much better than what we already have today


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: