More

_jayhack_ · 2026-04-06T17:10:10 1775495410

Would love to understand how you compare to other providers like Modal, Daytona, Blaxel, E2B and Vercel. I think most other agent builders will have the same question. Can you provide a feature/performance comparison matrix to make this easier?

benswerd · 2026-04-06T17:17:49 1775495869

I'm working on an article deep diving into the differences between all of us. I think the goal of Freestyle is to be the most powerful and most EC2 like of the bunch.

Daytona runs on Sysbox (https://github.com/nestybox/sysbox) which is VM-like but when you run low level things it has issues.

Modal is the only provider with GPU support.

I haven't played around with Blaxel personally yet.

E2B/Vercel are both great hardware virtualized "sandboxes"

Freestyle VMS are built based on the feedback our users gave us that things they expected to be able to do on existing sandboxes didn't work. A good example here is Freestyle is the only provider of the above (haven't tested blaxel) that gives users access to the boot disk, or the ability to reboot a VM.

tomComb · 2026-04-06T17:20:39 1775496039

And fly.io sprites

benswerd · 2026-04-06T17:27:32 1775496452

Fly.io sprites is the most similar to us of the bunch. They do hardware virtualization as well, have comparable start times and are full Linux. What we call snapshots they call checkpoints.

The big pros of Sprites over us is their advanced networking stack and the Fly.io ecosystem. The big cons are that Sprites are incredibly bare bones — they don't have any templating utilities. I've also heard that Sprites sometimes become unavailable for extended periods of time.

The big pros of Freestyle over Sprites is fork, advanced templating, and IMO a better debugging experience because of our structure.

knowsuchagency · 2026-04-06T17:32:23 1775496743

Thanks for the thoughtful response. I'm predominantly a self-hoster, but I think your product makes a lot of sense for a wide variety of users and businesses. I'm excited to try out freestyle!

benswerd · 2026-04-06T17:38:28 1775497108

Self hosting can be doable for constant small/medium size workloads

You can handroll a lot with: https://github.com/nestybox/sysbox?tab=readme-ov-file https://gvisor.dev https://github.com/containers/bubblewrap?tab=readme-ov-file

For hardware virtualized machines it much harder but you can do it via: https://github.com/firecracker-microvm/firecracker/ https://github.com/cloud-hypervisor/cloud-hypervisor

Freestyle/other providers will likely provide better debugging experience but thats something you can probably get past for a lot of workloads.

The time when you/anyone should think about Freestyle/anyone is when the load spikes/the need to create hundreds of VMs in short spikes shows up, or when you're looking for some of the more complex feature sets any given provider has built out (forks, GPUs, network boundaries, etc).

I also highly recommend self hosting anything you do outside of your normal VPC. Sandboxes are the biggest possible attack surface and it is a feature of us that we're not in your cloud; If we mess up security your app is still fine.

indigodaddy · 2026-04-06T17:43:59 1775497439

This is what I do (my project) for self hosting on a VPS/server:

https://GitHub.com/jgbrwn/vibebin

Also I'm a huge proponent of exe.dev

Obviously your service/approach is different than exe, more like sprites but like you said more targeted/opinionated to AI coding/sandboxing tasks it looks like. Interesting space for sure!

kstenerud · 2026-04-07T00:45:21 1775522721

I built yoloAI, which is a single go binary that runs anywhere on mac or linux, sandboxing your agents in disposable containers or VMs, nested or not.

Your agent never has access to your secrets or even your workdir (only a copy, and only what you specify), and you pull the changes back with a diff/apply workflow, reviewing any changes before they land. You also control network access.

Free, open-source, no account needed.

https://github.com/kstenerud/yoloai

sahil-shubham · 2026-04-06T18:20:06 1775499606

I've been building an open-source, self-hostable Firecracker orchestrator for the past month: https://github.com/sahil-shubham/bhatti (https://bhatti.sh)

Still WIP, but the core works — three rootfs tiers (minimal Ubuntu, headless Chromium with CDP, Docker-in-VM), OCI image support (pull any Docker image), automatic thermal management (idle VMs pause then snapshot to disk, wake transparently on next API call), per-user bridge networking with L2 isolation, named checkpoints, persistent volumes, and preview URLs with auto-wake.

Fair warning: the website is too technical and the docs are mostly AI-generated, both being actively reworked. But I've been running it daily on a Hetzner server for my AI agents' browser automation, and deploy previews.

I'd love any feedback if you want to go ahead and try it yourself

0123456789ABCDE · 2026-04-06T18:48:15 1775501295

sprites have weird lately, i think fly.io is having trouble with capacity in various locations.

is the experience similar? can i just get console to one machine, work for a bit, logout. come back later, continue?

how does i cost work if i log into a machine and do nothing on it? just hold the connection.

benswerd · 2026-04-06T18:53:07 1775501587

This will just work on us.

We do auto suspend depending on your configured timeout. We'll pause your VM and when you come back the processes will be in the exact same state as when you left.

tomComb · 2026-04-06T20:47:04 1775508424

But your pricing page suggests that that is not available without a subscription: in the on-demand pricing section "persistent Snapshots" and "Persistent VM's" have an 'x'.

benswerd · 2026-04-07T00:00:43 1775520043

We do not allow long term persistence for the free tier.

This is purely a defense mechanism, I don't want to guarantee storing the data of an entire VM forever for non paying users. We have persistence options for them like Sticky persistence but it doesn't come with the reliability of long term persistence storage.

tomComb · 2026-04-07T00:52:42 1775523162

But it wouldn’t be non paying customers. That was from the on demand section. I just want to pay for what I use without getting into a subscription.

benswerd · 2026-04-07T00:58:03 1775523483

Ah I see. This is very interesting but not what we're focused on right now. I will keep this in mind for future prioritization.

rsyring · 2026-04-06T17:37:45 1775497065

I'd also be interested in a comparison with exe.dev which I'm currently using.

benswerd · 2026-04-06T18:19:17 1775499557

Exe.dev is a individual developer oriented service. Freestyle is more oriented at platforms building the next exe.dev.

Thats why our pricing is usage based and we have a much larger API surface.

_jayhack_ · 2026-01-01T19:58:21 1767297501

Great article. For another fantastic explainer on optics, see 3Blue1Brown's video on refraction: https://www.youtube.com/watch?v=KTzGBJPuJwM

_jayhack_ · 2025-11-04T22:09:45 1762294185

> static analysis tools that produce flowcharts and diagrams like this have existed since antiquity, and I'm not seeing any new real innovation other than "letting the LLM produce it".

Inherent limitation of static analysis-only visualization tools is lack of flexibility/judgement on what should and should not be surfaced in the final visualization.

The produced visualizations look like machine code themselves. Advantage of having LLMs produce code visualizations is the judgement/common sense on the resolution things should be presented at, so they are intuitive and useful.

gnarlouse · 2025-11-04T22:46:13 1762296373

Although I haven't personally experienced the feeling of "produced visualizations looking like machine code", I can appreciate the argument you're making wrt judgment-based resolution scaling.

_jayhack_ · 2025-10-12T00:09:24 1760227764

Vector embedding is not an invention of the last decade. Featurization in ML goes back to the 60s - even deep learning-based featurization is decades old at a minimum. Like everything else in ML this became much more useful with data and compute scale

senderista · 2025-10-12T00:28:47 1760228927

Yup, when I was at MSFT 20 years ago they were already productizing vector embedding of documents and queries (LSI).

jongjong · 2025-10-12T07:52:51 1760255571

Interesting. Makes one think.

senderista · 2025-10-12T19:16:29 1760296589

To be clear, LSA[1] is simply applied linear algebra, not ML. I'm sure learned embeddings outperform the simple SVD[2] used in LSA.

[1] https://en.wikipedia.org/wiki/Latent_semantic_analysis

[2] https://en.wikipedia.org/wiki/Singular_value_decomposition

_jayhack_ · 2025-09-26T22:25:52 1758925552

Gary Marcus has been taking victory laps on this since mid-2023, nothing to see here. Patently obvious to all that there will be additional innovations on top of LLMs such as test-time compute, which nonetheless are structured around LLMs and complementary

layer8 · 2025-09-26T22:39:35 1758926375

“On top of LLMs” is exactly not “pure LLMs”, though, and it’s also not clear if TTC will end up supporting the bitter lesson.

_jayhack_ · 2025-07-11T15:52:31 1752249151

Very cool and interesting project. Ideas like this are a threat to traditionally-conceived project management platforms like Linear; that being said, Linear and others (Monday, ClickUp, etc.) are pushing aggressively into UX built for human/AI collaboration. I guess the question is how quickly they can execute and how many novel features are required to properly bring AI into the human project workspace

louiskw · 2025-07-11T16:24:12 1752251052

Cheers! Smaller teams, more infrastructure, more testing, tasks requiring review in minutes not days - the features are just totally different for the new world than what legacy PM tools are optimised for, and who they have to continue to serve.

_jayhack_ · 2025-07-10T19:25:57 1752175557

This does not take into account the fact that experienced developers working with AI have shifted into roles of management and triage, working on several tasks simultaneously.

Would be interesting (and in fact necessary to derive conclusions from this study) to see aggregate number of tasks completed per developer with AI augmentation. That is, if time per task has gone up by 20% but we clear 2x as many tasks, that is a pretty important caveat to the results published here

_jayhack_ · 2025-05-26T23:03:47 1748300627

It looks like you used AI-generated videos for customer testimonials. This should be illegal

mpeg · 2025-05-26T23:06:33 1748300793

And the testimonial photos, they're all AI generated

peab · 2025-05-26T23:19:22 1748301562

I came here to make this exact comment. What a horrible idea! The whole point of customer testimonials is to display trust - this does the opposite.

_jayhack_ · 2025-05-14T01:57:49 1747187869

Also worth checking out MultiLSPy, effectively a python wrapper around multiple LSPs: https://github.com/microsoft/multilspy

Used in multiple similar publications, including "Guiding Language Models of Code with Global Context using Monitors" (https://arxiv.org/abs/2306.10763), which uses static analysis beyond the type system to filter out e.g. invalid variable names, invalid control flow etc.

nielstron · 2025-05-14T08:34:27 1747211667

Yes this work is super cool too! Note that LSPs can not guarantee resolving the necessary types that we use to ensure the prefix property, which we leverage to avoid backtracking and generation loops.

_jayhack_ · 2025-04-15T15:12:09 1744729929

If you are looking for an alternative that can also chat with you in Slack, create PRs, edit/create/search tickets and Linear, search the web and more, check out codegen.com