Hacker News .hnnew | past | comments | ask | show | jobs | submit | rorosen's commentslogin

If seeking to frames is good enough for random seeking, this can be done with zeekstd already. The cli sets a custom window log (ZSTD_c_windowLog) on the compression context when creating binary patches[1], I regularly use it with a window size above 1G.

[1] https://github.com/rorosen/zeekstd/blob/main/cli/src/compres...


Nice, thanks.


It depends on the frame size you choose. Every frame requires a few bytes of additional metadata, how much exactly depends on other compression settings (e.g. frame checksums, which are 4 byte, are only present if enabled). I just tested with a 1G file and compression level 3. zstd compresses it to 559M, zeekstd with a 2M frame to 565M. If I increase the frame size to 4M, zeekstd yields 562M.

I will add a section to the readme, this is a good question that other people might have too!


Zeekstd will just error when the seek table is corrupted. Scanning for frame boundaries should also be possible, though it isn't very efficient. If you don't need the seek table, you can just write it to /dev/null or not write it at all when using the lib.


You can decompress a complete file with "zeekstd d seekable.zst".

Piping a seekable file for decompression via stdin isn't possible unfortunately. Decompression of seekable files requires to read the seek table first (which is usually at the end of the file) and eventually seek to the desired frame position, so zeekstd needs to able to seek the file.

If you want to decompress the complete file, you can use the regular zstd tool: "cat seekable.zst | zstd -d"


But you can probably tail -F quite well! Which is perfect for logs (eg gimme last day i can get grep through)


From what I can see zstd-seekable is more closely aligned to the C functions in the zstd repo.

The decompress function in zstd-seekable starts decompression at the beginning of the frame to which the offset belongs and discards data until the offset is reached. It also just stops decompression at the specified offset. Zeekstd uses complete frames as the smallest possible decompression unit, as only the checksum data of a complete frame can be verified.


Writing the seek table to an external file is also possible with zeekstd, the initial spec of the seekable format doesn't allow this.


Yes, dictionaries should be totally possible. However, I've never tried them to be honest because I usually only compress big files. They can be set on the (de)compression contexts the same way as with regular zstd.


It's not standardized as far as I know.


Thanks! I was also surprised that there are very few tools to work with the seekable format. I could imagine that at least some people have a use-case for it.

Yes, the name is a combination of zstd and seek. Funnily enough, I wanted to name it just zeek first before I knew that it already exists, so I switched to zeekstd. You're not the first person asking me if there is any relation to zeek and I understand how that is misleading. In hindsight the name is a little unfortunate.


Zeek is well known in "security" spaces, but not as much in "developer" spaces. It did get me a bit excited to see Zeek here until I realized it was unrelated, though :)


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: