Hacker News .hnnew | past | comments | ask | show | jobs | submit | gregpr07's commentslogin

Love it! From first principles: this kinda answers the "do we really even need CDP" I always have in my head building browser use...


Totally, I feel that CDP was designed for a different category of automations.


Why go this route? Why Python is more powerful than JS is mostly because of third party plugins like pandas which are excplicitly not supported (C bindings, is this possible to fix?)...

At that point it might be just easier to convince the model to write JS directly


You can run libraries like Pandas in WebAssembly in Pyodide - in fact Pandas works already. Here's a demo I built with it a while ago: https://tools.simonwillison.net/pyodide-bar-chart

It's not too hard to compile a C extension for Python to a WebAssembly and bundle that in a .so file in a wheel. I did an experiment with that the other day: https://github.com/simonw/tiny-haversine?tab=readme-ov-file#...


I would love for the component model tooling to reach that level of maturity.

Since the runtime uses standard WASI and not Emscripten, we don't have that seamless dynamic linking yet. It will be interesting to see how the WASI path eventually converges with what Pyodide can do today regarding C-extensions.


I understand your point. I added native Python support because C extensions will eventually become compatible. Also, we might see more libraries built with Rust extensions appearing, which will be much easier to port to Wasm.


Wow this is really cool


Creator of Browser Use here, this is cool, really innovative approach with ARIA roles. One idea we have been playing around with a lot is just giving the LLM raw html and a really good way to traverse it - no heuristics, just BS4. Seems to work well, but much more expensive than the current prod ready [index]<div ... notation


Thanks!

I actually tried a raw HTML when I was exploring solutions. It worked for "one-off" tasks, but I ran into major issues with replayability on modern SPAs.

In React apps, the raw DOM structure and auto-generated IDs shift so frequently that a script generated from "Raw HTML" often breaks 10 minutes later. I found ARIA/semantics to be the only stable contract that persists across re-renders.

You mentioned the raw HTML approach is "expensive". Did you feed the full HTML into the context, or did you create a BS4 "tool" for the LLM to query the raw HTML dynamically?


Hey HN,

we heard a lot of complaints about Browser Use being slow. Last few weeks we focused a lot on improving the speed, while keeping the same accuracy.

Go try it out. It's really fun to see it glide the web.


Congratulations! I was one of the people who complained. Not only was it slow it couldn't complete basic tasks like login into a site.

Has it been tested in the wild on speed and accuracy?


Browser Use creator here; we are working on prototypes like this but always find ourselves stuck with the safety vs freedom questions. We are very well aware how easy it is to inject stuff into the browser and do something malicious hence sandboxed browser still seem to like a very good idea. I guess in the long run we will not even need browsers, just a background agent that does stuff in the background. Is there any good research for guardrails of how to prevent “go to my bank and send the money to nigerian prince” style prompts?


AGI was just 1 bash for loop away all this time I guess. Insane project.


Less flippantly that was sort of my thought. I’m probably a paranoid idiot and I’m not really sure I can articulate this idea properly but I can imagine a less concise but broader prompt and an agent configured in a way it has privileges you dont want it to have or a path to escalate them and its not quite AGI but its a virus on steroids - like a company or resource (think utilities) killer. I hope Im just missing something but these models seem pretty capable of wreaking all kinds of havoc if they just keep looping and have access nobody in their right mind wants.


Just need to add ID.md, EGO.md and SUPEREGO.md and we're done.


was deeply unsettling among other things


It is, isn't it mate? Shit, I stumbled upon Ralph back in February and it shook me to the core.


Not that I want to be shaken but what is Ralph? A quick search showed me some marketing tools but that cant be what you are referring to is it?


Ralph is a technique. The stupidest technique possible. Running an agent in a while true loop. https://ghuntley.com/ralph


re side-note: if you know anyone who would be willing to interact connect me :)


i was going to ask the same to you. :-)

i'm just stubborn enough to find out, though. and i still have a few contacts at the googleplex...


I really like both nodriver and pydoll. I am definitely keeping the option of switching to them open, but we just wanted to have full control for now and see how painful CDP-use is to maintain first and then reconsider.


I mean... Playwright was built and is maintained by Microsoft, so I don't think VC money argument really makes sense here.

By the very nature of how Playwright is built we can't contribute to it - it runs inside a JS subprocess and does not expose a bunch of CDP apis that we NEED (for example to make cross origin iframes work).


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: