2023/10/04

raffaele

@https://digipres.club/@raffaele

Anna's Archive scraped Worldcat
https://annas-blog.org/worldcat-scrape.html
"220GB compressed, 2.2TB uncompressed. 1.3 billion unique IDs (1,348,336,870), covered by 1.8 billion records (1,888,381,236), so 540 million duplicates (29%). 600 million are redirects or 404s, so 700 million unique actual records."

torrent here:
https://annas-archive.org/torrents#worldcat

2023-10-04T05:50:47Z

raffaele

@https://digipres.club/@raffaele

@jorol yeah, very likely. This analysis is interesting, I'm afraid I want to know the duplicate numbers in others catalog (like the Italian one).
If I find space on disk I'll download their scrape, I wish they had used parquet instead of a single zstd zipped jsonl

2023-10-04T09:14:21Z

raffaele

@https://digipres.club/@raffaele

read, read, read, read, read, read.

https://lucysullacultura.com/video/lucy-a-zonzo/cinema-o-letteratura-una-conversazione-con-werner-herzog/

2023-10-04T10:46:35Z

raffaele

@https://digipres.club/@raffaele

raffaele

@https://digipres.club/@raffaele

raffaele

@https://digipres.club/@raffaele

➡️