More

gluejar · 2026-05-16T19:21:48 1778959308

I should hit Matt up for a donation.

gluejar · 2026-05-16T19:19:14 1778959154

later this year

zahirbmirza · 2026-05-16T20:27:36 1778963256

Amazing!!! As ereaders get faster and with colour, this could make books from the Project even more attractive. I love the work of your team. Thank you.

gluejar · 2026-05-16T19:15:58 1778958958

A silly legal tribunal confused PG with pirate sites. We sent the tribunal a letter pointing out their error but it was ignored. The block was served on local dns providers so many Italian users evade the block by using DNS from Google or Cloudflare.

gluejar · 2026-05-16T19:00:30 1778958030

only 20% of our books have original publication data in the db. We have a project to add another 40% or so from another database, let us know if you want to help. reply

Guestmodinfo · 2026-05-16T19:33:42 1778960022

Yes I am willing to help. Plz include me in your efforts. Thank you for this

gluejar · 2026-05-16T18:59:30 1778957970

only 20% of our books have original publication data in the db. We have a project to add another 40% or so from another database, let us know if you want to help.

sgc · 2026-05-17T13:58:47 1779026327

I have the same problem on catholiclibrary.org, but insist on having something as the book date for every work. My solution is to temporarily default to the author dates until the book date can be refined. If there is no known author date I at least have a date range, hopefully to century or better.

Author dates are a much smaller data set, can be generally supplemented from public marc records (viaf, loc, etc - I don't do that, but it's an option) and at least provide basic filtering / sorting.

gluejar · 2026-05-16T18:56:32 1778957792

we have a tarball of all text files - link posted somewhere here

gluejar · 2026-05-16T18:54:12 1778957652

traffic yesterday ~20% more than recent average. 4971601 sessions 177 robots 863462 robot files 3390115 user files 20.30% robot files (robots id'd based on requests/ip address) 5 apache servers for static content, 1 CherryPy server for dynamic content hosted at iBiblio.

gluejar · 2026-05-15T21:26:59 1778880419

We're using git repos internally to keep history for each book. They existed on github for a while, but our implementation was awkward, and too big of project for the volunteer dev team. But it's likely that we'll evolve towards that.

gluejar · 2026-05-15T19:11:20 1778872280

Check out Distributed Proofreaders: https://pgdp.net

jfengel · 2026-05-15T21:27:50 1778880470

I didn't realized DP was still around. I used to do it quite a bit, 15 years ago, but OCR has improved considerably since then.

tangledhelix · 2026-05-16T15:45:00 1778946300

OCR has improved a lot since then, but OCR is just step 1 of reading in text. They make a lot of errors (even now, especially on old worn out paper pages) and even if they didn't, one has to format the book, deal with footnotes, sidenotes, illustrations, etc. DP is very active, we will welcome you back with open arms :)

gluejar · 2026-05-15T17:58:44 1778867924

wikipedians, please help update this article.

svat · 2026-05-15T19:48:53 1778874533

In what way? And from what sources? (Wikipedia as a tertiary source is supposed to be a summary of information present in reliable secondary sources — see for instance https://en.wikipedia.org/wiki/Wikipedia:Based_upon. So if the information on the Wikipedia article is incomplete or out of date, where is the correct information available?)

gluejar · 2026-05-16T19:10:04 1778958604

There's quite a lot of information here: https://www.gutenberg.org/about/ All our text is now utf-8. No Plucker! Almost every book is HTML(5).

JSeiko · 2026-05-15T21:48:10 1778881690

good question. Eric - any pointers?