HN2new | past | comments | ask | show | jobs | submit | stzups's commentslogin

>> it's significantly more difficult with HTML

Right Click > Save as

Try it with this page!


> Right Click > Save as

> Try it with this page!

Say hello to your new sidecar directory (or broken CSS/images/God knows what else)!

I tried to save an NY Times article, and it 1) needed JS to display anything, 2) even with the sidecar stuff was broken, 3) it was so plastered with ads and other junk I thought it was incomplete (it wasn't, I just had to scroll waaay down past something that looked like a footer and some voids after that).

If you save a PDF, you get that exact PDF on your hard drive, and when you open it (even in 10 years) it will look exactly the same as it did on the site.

With PDF WYSIWYS: What you see is what you save.


This is of course the point of the article - that the web is a giant steaming pile of shit for the most part, plagued by JS and external resource requirements, all of which contribute to massive total page size.

I'll preface by saying I have some expertise in HTML, but none in PDF (the format).

The point of most commenters who suggest that HTML is still a better alternative than PDF (I agree), are assuming that if this is an important issue to you, that you would craft your page in a simpler style compared to most of what we see on the web, making Print to PDF or Save As... more viable.

  > PDFs and a PDF tool ecosystem  exist today. No need for another ghost town   GitHub   repo   with   a   promising   README   and   v0.1   in progress.
This is news to me. I'm not sure that I buy it. PDFs have always been a pain in the ass to work with in my opinion. Maybe there are tools, but in my experience they aren't very good.

In general, we know that HTML is going to be much more compact (and compressible!) than PDF and that's the biggest advantage I see on a web where bandwidth still matters. Another downside shows itself by trying to copy and pasting the above quote: PDF formatting seems to be weird.


> In we know that HTML is going to be much more compact (and compressible!) than PDF and that's the biggest advantage I see on a web where bandwidth still matters.

PDFs can be tiny if they do not embed fonts. Serving fonts is very much a complex technology in HTML world.

Browsing the web is a pain in the ass if you don't use a browser compliant with up-to-date standards, but the whole "HTML can be lightweight" argument pretty much depends on avoiding much of today's standardisation. As an objection to the original argument, it is not comparing like with like.


> This is news to me. I'm not sure that I buy it. PDFs have always been a pain in the ass to work with in my opinion. Maybe there are tools, but in my experience they aren't very good.

> In general, we know that HTML is going to be much more compact (and compressible!) than PDF and that's the biggest advantage I see on a web where bandwidth still matters. Another downside shows itself by trying to copy and pasting the above quote: PDF formatting seems to be weird.

PDF is a display format. I once worked on a project parallel to a guy who was parsing PDF to extract text content. IIRC, Text in PDFs is stored in a way that works fine for printing/rendering but not so well for manipulation (e.g. it's a bunch of commands to render line Z at position X,Y with font W). Those commands don't have to be in reading order, nor do they have the semantic meaning you can get from markup like HTML (e.g. superscript can just be nothing more than a different line rendered with a smaller font).

IMHO, PDF is actually less optimal than HTML for what this guy is advocating, except that it's those precisely those limitations that have prevented PDF from becoming the mess than Web HTML has. Though, that's probably in large part because the bloaters have been too distracted by the easier-target that is HTML to bother.


Yeah, no. Try it with any other page, and see why nobody would be inclined to even try "Save As.." a web page anymore.


I actually did this pretty recently, in an attempt to get some magazine articles onto my Kobo e-book reader since Pocket couldn’t fetch the paywalled ones (I do pay).

I figured I could just save the page, automate a few edits to get around dynamic stuff, and then use it as, you know, an HTML document.

Even with a nice friendly mostly-text literary magazine, after about five hours I gave up and just copy-pasted the rendered text.


> >> it's significantly more difficult with HTML

> Right Click > Save as

> Try it with this page!

HN is not a good site to illustrate the unpleasantnesses of navigating the modern web. As you'd hope for a hacker news site, it is very friendly to this sort of thing. Most sites aren't.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: