Hacker News .hnnew | past | comments | ask | show | jobs | submitlogin

Is there a list of web page archive formats I could look at? There are a few things I'd love to do where it would be very handy to have one file per page


The main archive formats for web content are WARC, ZIM, Memento, and static HTML (e.g. from a tool like wget or Singlefile).

If you want 1 page per URL I recommend Singlefile.

Lots more info here if you want to compare different software options: https://github.com/ArchiveBox/ArchiveBox/wiki/Web-Archiving-...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: