Hacker News .hnnew | past | comments | ask | show | jobs | submit | leenify's commentslogin

As the stuff is rendered on the front-end how do you deal with tags where you do not even have the information to decide how they shall be parsed on the server?

This seems rather ignorant and, in my experience, leads to security issues, such as CVE-2023-38500 or CVE-2023-23627. This is not decidable on the server-side, so you will always mess stuff like this up. Sanitization can only work properly on the client for HTML.


Thank you for the effort to bring this to life together with Freddy!


Are you certain that this is secure? What about parsing depth/DOM clobbering, etc?

See https://mizu.re/post/exploring-the-dompurify-library-bypasse... for an example of why this is really hard. Please do not roll your own sanitizers; DOMPurify has very good maintenance hygiene, and the maintainer is an expert. I have reported a bunch of issues and never waited for more than two hours for a response in the past. He is also one of the leading authors of the specification behind `setHTML`.


My code accepts only a very limited subset of HTML tags and their respective attributes. (<a>, <img>, <font>, <br>, <b>, <strong>, <i>, <em>, <del>, <s>, <u>, <p>, <hr>, <li>, <ul>, <ol>).

I could easily add more, like headings or tables. Just decided to not overwhelm the readers. But all of the allowed elements / attributes here are harmless. When I'm copying them, I'm only copying the known safe elements and attributes (forbids unknown attributes, including styles/scripts, event handlers, style attributes, ids, or even classes). I have fine control over the allowed elements / attributes and the structure. This makes things much easier. For a basic html content management this kind of filtering is fine since DOMParser actually does the heavy lifting.

Sure, DomPurify is powerful and handles much more complex use cases (doesn't it also use DOMParser though?), no doubts about that. But a basic CMS probably has to handle basic HTML text elements. I guess inline SVG sanitation is more complicated (maybe just use ordinary <img> instead?).

If you have some html example that will inject js/css or cause any unexpected behavior in my code example, please provide that HTML.


Their example only supports a very small subset of html, which makes the problem much easier.


> The real problems with Gecko is just that it’s harder to fork

That goes contrary to my experience. I'm a maintainer of a Firefox fork (with rather extensive changes to a lot of the internals), and it is pretty manageable to maintain. We manage to keep it roughly up to date and add new features without financial backing or folks working full-time on it.

If all you do is change the branding and apply some superficial stuff, Chromium might be doable, but that is hardly a new browser. Everybody who forked Chromium from the folks I know (mostly research/security testing people) gave up due to the constant churn.

For this reason, from my experience, Firefox forks are much easier to maintain once you start applying changes to internal things. Firefox is changing at a slower pace, making keeping up to date much more manageable, but that also has its drawbacks, as it does not support every crazy feature Google pushes out, e.g., WebUSB. But, for example, folks I know maintained a v8 fork that was shelved as the introduction of Torque (which has spotty public documentation, to be very kind) means it is a complete rewrite.


I think you're conflating the Engine, Gecko, with the browser, Firefox. The Browser, Firefox, is easy to fork, judging by the multitudes of soft forks a la Libre Wolf, Zen, etc...

The Engine, Gecko, however, is hard to fork since it's tightly coupled to the browser itself.

I also think that when the parent mentioned "forking Gecko", it might be in the sense of extracting the engine and putting a new browser on top of it, just like other Webkit based browsers e.g. Orion and Gnome Web.


That's fair but most of these browsers simply fork chromium and make their changes there instead of genuinely building something new off of blink.


Thats interesting. Thanks for sharing.

Why do you think that very few projects adopt Gecko then?


I'm somewhat uncertain personally about the future of Mozilla myself, as well as compatibility issues and a lack of mindshare.

Also, I feel working with the Chromium codebase is easier if you only apply superficial changes, e.g., the linked browser. The patch files are all very simple, so the fact that Chrome is generally less crufty (Mozilla is working on cleaning up a lot of ancient stuff, which causes us a lot of pain but is probably great in the long term), simply due to being newer, might make it easier to get started. Although I always felt the most significant hurdle (if you know C++ and JavaScript sufficiently well to patch a Browser) is getting the stuff to build, Mozilla is doing reasonably well on that front. Building Firefox always felt less annoying than building Chromium.


How those patch files work? I was trying to read the Helium codebase but is only full of .patch files.


Hmm, is that really true? I spoke at BH last year and was not required to submit a paper. And based on the briefings link, there surely isn't a paper link, only slides and tool.


That can work, but it can also bring quite a few issues. Mozilla effectively does this; their build process downloads the build toolchain, including a specific clang version, during bootstrap, i.e., setting up the build environment.

This is super nice in theory, but it gets murky if you veer off the "I'm building current mainline Firefox path". For example, I'm a maintainer of a Firefox fork that often lags a few versions behind. It has substantial changes, and we are only two guys doing the major work, so keeping up with current changes is not feasible. However, this is a research/security testing-focused project, so this is generally okay.

However, coming back to the build issue, apparently, it's costly to host all those buildchain archives. So they get frequently deleted from the remote repository, which leads to the build only working on machines that downloaded the toolchain earlier (i.e., not Github action runner, for example).

Given that there are many more downstream users of effectively a ton of kernel versions, this quickly gets fairly expensive and takes up a ton of effort unless you pin it to some old version and rarely change it.

So, as someone wanting to mess around with open source projects, their supporting more than 1 specific compiler version is actually quite nice.


Conceptually it's no different than any other build dependency. It is not expensive to host many versions. $1 is enough to store over 1000 compiler versions which would be overkill for the needs of the kernel.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: