Hacker News .hnnew | past | comments | ask | show | jobs | submit | gluejar's commentslogin

I should hit Matt up for a donation.


later this year


Amazing!!! As ereaders get faster and with colour, this could make books from the Project even more attractive. I love the work of your team. Thank you.


A silly legal tribunal confused PG with pirate sites. We sent the tribunal a letter pointing out their error but it was ignored. The block was served on local dns providers so many Italian users evade the block by using DNS from Google or Cloudflare.


only 20% of our books have original publication data in the db. We have a project to add another 40% or so from another database, let us know if you want to help. reply


Yes I am willing to help. Plz include me in your efforts. Thank you for this


only 20% of our books have original publication data in the db. We have a project to add another 40% or so from another database, let us know if you want to help.


I have the same problem on catholiclibrary.org, but insist on having something as the book date for every work. My solution is to temporarily default to the author dates until the book date can be refined. If there is no known author date I at least have a date range, hopefully to century or better.

Author dates are a much smaller data set, can be generally supplemented from public marc records (viaf, loc, etc - I don't do that, but it's an option) and at least provide basic filtering / sorting.


we have a tarball of all text files - link posted somewhere here


traffic yesterday ~20% more than recent average. 4971601 sessions 177 robots 863462 robot files 3390115 user files 20.30% robot files (robots id'd based on requests/ip address) 5 apache servers for static content, 1 CherryPy server for dynamic content hosted at iBiblio.


We're using git repos internally to keep history for each book. They existed on github for a while, but our implementation was awkward, and too big of project for the volunteer dev team. But it's likely that we'll evolve towards that.


Check out Distributed Proofreaders: https://pgdp.net


I didn't realized DP was still around. I used to do it quite a bit, 15 years ago, but OCR has improved considerably since then.


OCR has improved a lot since then, but OCR is just step 1 of reading in text. They make a lot of errors (even now, especially on old worn out paper pages) and even if they didn't, one has to format the book, deal with footnotes, sidenotes, illustrations, etc. DP is very active, we will welcome you back with open arms :)


wikipedians, please help update this article.


In what way? And from what sources? (Wikipedia as a tertiary source is supposed to be a summary of information present in reliable secondary sources — see for instance https://en.wikipedia.org/wiki/Wikipedia:Based_upon. So if the information on the Wikipedia article is incomplete or out of date, where is the correct information available?)


There's quite a lot of information here: https://www.gutenberg.org/about/ All our text is now utf-8. No Plucker! Almost every book is HTML(5).


good question. Eric - any pointers?


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: