Why go this route? Why Python is more powerful than JS is mostly because of third party plugins like pandas which are excplicitly not supported (C bindings, is this possible to fix?)...
At that point it might be just easier to convince the model to write JS directly
I would love for the component model tooling to reach that level of maturity.
Since the runtime uses standard WASI and not Emscripten, we don't have that seamless dynamic linking yet. It will be interesting to see how the WASI path eventually converges with what Pyodide can do today regarding C-extensions.
I understand your point. I added native Python support because C extensions will eventually become compatible. Also, we might see more libraries built with Rust extensions appearing, which will be much easier to port to Wasm.
Creator of Browser Use here, this is cool, really innovative approach with ARIA roles. One idea we have been playing around with a lot is just giving the LLM raw html and a really good way to traverse it - no heuristics, just BS4. Seems to work well, but much more expensive than the current prod ready [index]<div ... notation
I actually tried a raw HTML when I was exploring solutions. It worked for "one-off" tasks, but I ran into major issues with replayability on modern SPAs.
In React apps, the raw DOM structure and auto-generated IDs shift so frequently that a script generated from "Raw HTML" often breaks 10 minutes later. I found ARIA/semantics to be the only stable contract that persists across re-renders.
You mentioned the raw HTML approach is "expensive". Did you feed the full HTML into the context, or did you create a BS4 "tool" for the LLM to query the raw HTML dynamically?
Browser Use creator here; we are working on prototypes like this but always find ourselves stuck with the safety vs freedom questions. We are very well aware how easy it is to inject stuff into the browser and do something malicious hence sandboxed browser still seem to like a very good idea. I guess in the long run we will not even need browsers, just a background agent that does stuff in the background. Is there any good research for guardrails of how to prevent “go to my bank and send the money to nigerian prince” style prompts?
Less flippantly that was sort of my thought. I’m probably a paranoid idiot and I’m not really sure I can articulate this idea properly but I can imagine a less concise but broader prompt and an agent configured in a way it has privileges you dont want it to have or a path to escalate them and its not quite AGI but its a virus on steroids - like a company or resource (think utilities) killer. I hope Im just missing something but these models seem pretty capable of wreaking all kinds of havoc if they just keep looping and have access nobody in their right mind wants.
I really like both nodriver and pydoll. I am definitely keeping the option of switching to them open, but we just wanted to have full control for now and see how painful CDP-use is to maintain first and then reconsider.
I mean... Playwright was built and is maintained by Microsoft, so I don't think VC money argument really makes sense here.
By the very nature of how Playwright is built we can't contribute to it - it runs inside a JS subprocess and does not expose a bunch of CDP apis that we NEED (for example to make cross origin iframes work).