Hacker News .hnnew | past | comments | ask | show | jobs | submitlogin

Of course Acrobat Reader doesn't handle it well since it's an inherent design flaw of the format despite your trying to deny the obvious. Just tried it - same issue, a paragraph of 3 lines is pasted as 3 lines

> PDF file format (which supports semantic paragraph tags, for example).

These are called newlines and have a pretty widespread support outside of some paper pockets of resistance! You only need some other semantic tags because the format fails at basics



Example PDF? Because I tried it too and it worked. Does your PDF use tags?


Any PDF from a generic google search?

Here is one from Adobe https://www.adobe.com/support/products/enterprise/knowledgec...

Or even better: their annual investor docs a team of professionals has spent time carefully preparing...

like this https://www.adobe.com/pdf-page.html?pdfTarget=aHR0cHM6Ly93d3...

(but don't look at the annual report, that marvel of a public disclosure document not only doesn't copy&paste paragraphs, but has another nice niche use of PDF - you get garbage chars instead of text, rather ironic)

https://www.adobe.com/pdf-page.html?pdfTarget=aHR0cHM6Ly93d3...


I tried a few documents and got the same result (ie. each line being treated as separate paragraphs), but was able to find that the fed FOMC meeting doc[1] actually worked properly, but only on adobe acrobat. It was still screwed up on pdf.js. So I guess the format itself technically supports it, but implementations rarely do it properly.

[1] https://www.federalreserve.gov/mediacenter/files/FOMCprescon...


The first two work just fine in Adobe Acrobat Reader on iOS. The third is garbage, probably because the producer didn't include a ToUnicode map or equivalent.

The format supports a lot that is not commonly implemented by PDF readers (or PDF producers).


How does this help me on Windows?

And a good format wouldn't require any ToUnicode maps for simple text in the first place

And poorly supporting a lot without common implementations isn't a defence against the charge of high complexity and bad design, but a reinforcement thereof

(also, no, the first document doesn't work on iOS, I select title and two paragraphs, copy, paste, and I get a single line instead of 3, so a different manifestation of the same common fail of PDFs)


“Here’s a nickel kid, get yourself a better OS”?

Still, the fact that some PDF processors can make this work shows that the format isn’t broken “by design”.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: