HN2new | past | comments | ask | show | jobs | submitlogin

It would be a fun test to run. But I'm not encouraged by the fact that the existing brotli dictionary already contains a bunch of javascript specific stuff:

https://gist.github.com/klauspost/2900d5ba6f9b65d69c8e

brotli literally already has a tokens for function/return/throw/indexOf(/.match/.length/etc.

Also verify after decompress is not without tradeoffs. On one hand we have folks like github who can't change the version of zlib because people rely on identical .tar.gz. https://hackernews.hn/item?id=34586917

On the other hand we have a whole lot of iffy stuff you can do to make programs decompressing content use large amounts of resources https://en.wikipedia.org/wiki/Zip_bomb which makes "decompress this potentially untrusted file so that I can validate it's safe to use" hard.



> brotli literally already has a tokens for function/return/throw/indexOf(/.match/.length/etc.

Yeah, I see it already has a lot of JavaScript, HTML and CSS content. Interesting. I didn't realize it had an existing web-focused token library, and figured it was more like zstd, 7z and zlib, which I believe have none.

I would love to do the experiment if I had time. I wonder what is the laziest way to do it?


I've done that experiment with zstd before.

https://github.com/facebook/zstd/blob/dev/programs/zstd.1.md...

Not sure about brotli though.


> On one hand we have folks like github who can't change the version of zlib because people rely on identical .tar.gz.

People rely on the same checksums for the same files. They’re perfectly fine with changing the method for new files.


The files (repo tarballs) are generated on demand, there are no “new” or “old” files.

I guess you could base it on the commit time, but this is user supplied.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: