Tinfoil looks super interesting! Do you have load balancers in front of the trusted compute stack? Looked at a design like this in a different space and the options for ensuring privacy in a traditional "best practice" architecture seemed very limited
In turn, that attests the model enclaves, for instance, see https://github.com/tinfoilsh/confidential-deepseek-v4-pro. The model repo/release that the model router attests is included in the attestation config, which creates a chain of trust.
By the time the dust settles I wouldn't be surprised if personal interactive usage couldn't even be had for under $200. I can't fit my modelling of the serving costs of these things to any public reporting, even the more bearish examples
Comes down to what you mean by interactive usage. Most of chat & say openclaw usage is already within self-host range so no need to spend 200 a month on that.
High end SOTA coding is harder, but even there I suspect a mix of usage based strong models and selfhost small is viable if necessary.
We pay per token in our company. It is not hard to spend $100 for one morning coding session. So thousands per month per programmer. The company finds it valuable enough to pay for, but if I ever paid these from my own pocket I'd look into DeepSeek et.al.
Not a lot of people have this budget, and I'm not sure how many people with that type of cash are also interested in paying it for AI.
Of course, this is fine for people in the bay area earning hundreds of thousands of dollars a year. But then your client base becomes so reduced its hard to justify the valuation these companies have.
These AI companies are not hyped so much because they will offer a luxury product, they're valued because they're supposed to "change the world" which luxury does not do.
I dislike neg comments but really curious - I can see the how but absolutely clueless about the why. Running a block device over a high latency WAN link seems like a terrible idea, what's the use case?
Apparently, exposing small USB sticks to industrial equipment that uses it for loading/saving configs and screenshots and being able to 'network' it with shared iSCSI drives.
"The scope writes screen_001.png to “USB”; the file appears in a directory on my desktop, in the iSCSI overlay. Combined with a dropbox-style sync I no longer need to walk over and pull the stick out."
Quite brilliant and clever, if you ask me.
I'm wondering now about using an ESP32 stick and an iSCSI image of Windows install media - that could make for some fun in-house computer imaging setups.
That was indeed one of the main drivers for it! ESP32 (especially with 2.4GHz WiFi latencies) is not super well suited for OS installs, but... many UEFI firmwares (and some network drivers!) will let you boot iSCSI directly.
The other one is the Raspberry Pi{3,4,5} iSCSI shim linked there as well - I have a bunch of them for a bunch of paying clients CI/CD kinds of work, and I wanted these to boot from network, not from microSD.
Both of these projects could've benefited from a public demo iSCSI endpoint, we have http://example.com and whateveryouwant@mailinator.com - why not iSCSI
Ah, yeah, drat. I forgot entirely about the moonshot that becomes streaming several GB through the ESP... I was just thinking of an easier solution that avoids UEFI networking - wireless devices, tablets, odd things like that ;)
Then again this might still be useful yet - a small 64MB thumb drive with an autounattend.xml streamed to it is also an equally powerful tool for some Windows shenanigans.
The Pi4 shim actually exposes USB device as well. This works way, way better (and IMHO mostly because wired network is better than wireless for latency, ESP32’s feeble CPU aside)
I don’t have a use case, but I was thinking the same thing. But then I realized that the WAN speeds available now are equal to or faster than the LAN speeds I had when I had reason to use iSCSI. And things worked out decently well then, so I can see this being useful.
Eh, the main thing you would feel with this is latency, not bandwidth. Even on a 10 Mbps LAN, you would be able to open a file pretty quick, but over the internet latency is going to be > 100 ms in almost every case. That's a lot more painful.
Correct. Well, almost correct. Will see how much uptake this service will take (if any), and we can probably place it really close to the edge - for now it's on an Oregon server only.
That said, this isn't too far from mechanical HDD latencies of the /real/ SCSI drives.
I've answered some down the tree a bit for the inspirational use case for it.
Since I built it, I've started seeing it as a hammer for many nail-like problems - I think that would die down over time;
but.. I have my ESP32 "pendrive" that's net-synced. I have used it to install OS through UEFI-built-in initiator. I have added iSCSI targets to my windows laptop machine (and VMs) - while you need to deal with disconnects and reconnects, it actually works well enough.
It is a terrible idea, that doesn't sound as terrible for odd use-cases. But yes, the ESP32 over 2.4GHz over 3G internet is slow as molasses (20-30kB/s) - but when the alternative is 0.. or walking over there with a laptop, it works OK.
Third line in to the article: "But there’s one result in the benchmarks I keep coming back to."
I hear this sort of thing all the time now on YouTube from media/news personalities:
“And that’s the part nobody seems to be talking about.”
"And here's what keeps me up at night."
“This is where the story gets complicated.”
“Here’s the piece that doesn’t quite fit.”
“And this is where the usual explanation starts to break down.”
“Here’s what I can’t stop thinking about.”
“The part that should worry us is not the obvious one.”
“And that’s where the real problem begins.”
“But the more interesting question is the one no one is asking.”
“And this is where things stop being simple.”
It doesn't really worry me but I think its interesting that LLM speak sounds so distinctive, and how willing these media personalities are to be so obvious in reading out on TV what the LLM spat out.
I've never studied what LLMs say in depth is it is interesting that my brain recognises the speech pattern so easily.
I think this kind of language predates widespread LLM use, and has been picked up from that kind of writing. It's a "and here's where it gets interesting" pattern that people like Malcolm Gladwell and Freakonomics have used, even if the same thing could be said in a way that makes it sound much less intriguing.
The language of drama and import without meaningful substance. Words statistically likely to be used in a segue, regardless of the preceding or subsequent point. Particularly effective when it seems like you’re getting let in on a secret. Really fatiguing to read
A writing teacher once excoriated me for saying that something was important. “Don’t tell me it’s important, show me, and let me decide, and if you do your job I’ll agree”
I don’t know how a completion can tell when it needs to do this. Mostly so far it doesn’t seem capable
Isn't this the format of "hook-driven media" a constant stream of "second-act pivots" - where some new twist is added to a story to re-engage the reader and keep them reading.
BuzzFeed and Upworthy etc pioneered this for web 'news stories', then it got used in linkedin, twitter, and everywhere where views are more important than the content.
I notice this very often in LinkedIn posts, and it's annoying, but I had not realized it was LLM-speak? Isn't it possible that people write like this naturally?
> They could easily have read it, and thought , that communicates the information that it needs to.
I'd they aren't self-aware enough or smart enough to determine that what they wrote is indistinguishable from text generation, how probable is it that they have something of value to add to any thought?
I don't really see reason to complain about tool use, so long as the result is cohesive, accurate and that ultimately means a human has at least read their own output before publishing. It's a bit like receiving a supposedly personal letter that starts "Dear [INSERT_FIRST_NAME_FIELD]," are you really going to read such a thing?
My opinion is that literature and art will continue pushing the envelope in the places they always pushed the envelope. LLMs will not change this, humans love making art, and they love doing it in new ways.
Corporate announcements were never the places that literature and art were pushing the envelope. They were slop before, and they're slop now.
That's not an ideal tone for here. From my perspective the most incredible thing is the concentration of IO. I might like at some point for elements of my computer usage to remain private, it would be nice if that ability were preserved. A bit hard to accomplish when 1 out of 4 bits processed globally all run through the same network
It's literally a distinct model with a different optimisation goal compared to normal chat. There's a ton of public information around how they work and how they're trained
Unironically the best method to implement that browser feature you're looking for is probably also AI. Which tells a meta-story, AI isn't a new feature it's also a new medium. It can be used to turn cave speak into works of literature just as easily as it can turn voluminous spew into one liners (Ed Zitron just popped into mind for some reason). You can't ignore it once it exists, but it sounds like the problem you have genuinely can be solved by it, and I expect over the next decade we'll see a lot more of exactly that.
Here's to reading HN projected through the lens of manga comic strips sometime after we solve the GPU shortage..
I'm excited for Taalas, but the worry with that suggestion is that it would blow out energy per net unit of work, which kills a lot of Taalas' buzz. Still, it's inevitable if you make something an order of magnitude faster, folk will just come along and feed it an order of magnitude more work. I hope the middleground with Taalas is a cottage industry of LLM hosts with a small-mid sized budget hosting last gen models for quite cheap. Although if they're packed to max utilisation with all the new workloads they enable, latency might not be much better than what we already have today
reply