Would love to understand how you compare to other providers like Modal, Daytona, Blaxel, E2B and Vercel. I think most other agent builders will have the same question. Can you provide a feature/performance comparison matrix to make this easier?
I'm working on an article deep diving into the differences between all of us. I think the goal of Freestyle is to be the most powerful and most EC2 like of the bunch.
I haven't played around with Blaxel personally yet.
E2B/Vercel are both great hardware virtualized "sandboxes"
Freestyle VMS are built based on the feedback our users gave us that things they expected to be able to do on existing sandboxes didn't work. A good example here is Freestyle is the only provider of the above (haven't tested blaxel) that gives users access to the boot disk, or the ability to reboot a VM.
Fly.io sprites is the most similar to us of the bunch. They do hardware virtualization as well, have comparable start times and are full Linux. What we call snapshots they call checkpoints.
The big pros of Sprites over us is their advanced networking stack and the Fly.io ecosystem. The big cons are that Sprites are incredibly bare bones — they don't have any templating utilities. I've also heard that Sprites sometimes become unavailable for extended periods of time.
The big pros of Freestyle over Sprites is fork, advanced templating, and IMO a better debugging experience because of our structure.
Thanks for the thoughtful response. I'm predominantly a self-hoster, but I think your product makes a lot of sense for a wide variety of users and businesses. I'm excited to try out freestyle!
Freestyle/other providers will likely provide better debugging experience but thats something you can probably get past for a lot of workloads.
The time when you/anyone should think about Freestyle/anyone is when the load spikes/the need to create hundreds of VMs in short spikes shows up, or when you're looking for some of the more complex feature sets any given provider has built out (forks, GPUs, network boundaries, etc).
I also highly recommend self hosting anything you do outside of your normal VPC. Sandboxes are the biggest possible attack surface and it is a feature of us that we're not in your cloud; If we mess up security your app is still fine.
Obviously your service/approach is different than exe, more like sprites but like you said more targeted/opinionated to AI coding/sandboxing tasks it looks like. Interesting space for sure!
I built yoloAI, which is a single go binary that runs anywhere on mac or linux, sandboxing your agents in disposable containers or VMs, nested or not.
Your agent never has access to your secrets or even your workdir (only a copy, and only what you specify), and you pull the changes back with a diff/apply workflow, reviewing any changes before they land. You also control network access.
Still WIP, but the core works — three rootfs tiers (minimal Ubuntu, headless Chromium with CDP, Docker-in-VM), OCI image support (pull any Docker image), automatic thermal management (idle VMs pause then snapshot to disk, wake transparently on next API call), per-user bridge networking with L2 isolation, named checkpoints, persistent volumes, and preview URLs with auto-wake.
Fair warning: the website is too technical and the docs are mostly AI-generated, both being actively reworked. But I've been running it daily on a Hetzner server for my AI agents' browser automation, and deploy previews.
I'd love any feedback if you want to go ahead and try it yourself
We do auto suspend depending on your configured timeout. We'll pause your VM and when you come back the processes will be in the exact same state as when you left.
But your pricing page suggests that that is not available without a subscription: in the on-demand pricing section "persistent Snapshots" and "Persistent VM's" have an 'x'.
We do not allow long term persistence for the free tier.
This is purely a defense mechanism, I don't want to guarantee storing the data of an entire VM forever for non paying users. We have persistence options for them like Sticky persistence but it doesn't come with the reliability of long term persistence storage.
> static analysis tools that produce flowcharts and diagrams like this have existed since antiquity, and I'm not seeing any new real innovation other than "letting the LLM produce it".
Inherent limitation of static analysis-only visualization tools is lack of flexibility/judgement on what should and should not be surfaced in the final visualization.
The produced visualizations look like machine code themselves. Advantage of having LLMs produce code visualizations is the judgement/common sense on the resolution things should be presented at, so they are intuitive and useful.
Although I haven't personally experienced the feeling of "produced visualizations looking like machine code", I can appreciate the argument you're making wrt judgment-based resolution scaling.
Vector embedding is not an invention of the last decade. Featurization in ML goes back to the 60s - even deep learning-based featurization is decades old at a minimum. Like everything else in ML this became much more useful with data and compute scale
Gary Marcus has been taking victory laps on this since mid-2023, nothing to see here. Patently obvious to all that there will be additional innovations on top of LLMs such as test-time compute, which nonetheless are structured around LLMs and complementary
Very cool and interesting project. Ideas like this are a threat to traditionally-conceived project management platforms like Linear; that being said, Linear and others (Monday, ClickUp, etc.) are pushing aggressively into UX built for human/AI collaboration. I guess the question is how quickly they can execute and how many novel features are required to properly bring AI into the human project workspace
Cheers! Smaller teams, more infrastructure, more testing, tasks requiring review in minutes not days - the features are just totally different for the new world than what legacy PM tools are optimised for, and who they have to continue to serve.
This does not take into account the fact that experienced developers working with AI have shifted into roles of management and triage, working on several tasks simultaneously.
Would be interesting (and in fact necessary to derive conclusions from this study) to see aggregate number of tasks completed per developer with AI augmentation. That is, if time per task has gone up by 20% but we clear 2x as many tasks, that is a pretty important caveat to the results published here
Used in multiple similar publications, including "Guiding Language Models of Code with Global Context using Monitors" (https://arxiv.org/abs/2306.10763), which uses static analysis beyond the type system to filter out e.g. invalid variable names, invalid control flow etc.
Yes this work is super cool too! Note that LSPs can not guarantee resolving the necessary types that we use to ensure the prefix property, which we leverage to avoid backtracking and generation loops.
If you are looking for an alternative that can also chat with you in Slack, create PRs, edit/create/search tickets and Linear, search the web and more, check out codegen.com
reply