• raffaele
    @https://digipres.club/@raffaele

    Anna's Archive scraped Worldcat
    https://annas-blog.org/worldcat-scrape.html
    "220GB compressed, 2.2TB uncompressed. 1.3 billion unique IDs (1,348,336,870), covered by 1.8 billion records (1,888,381,236), so 540 million duplicates (29%). 600 million are redirects or 404s, so 700 million unique actual records."

    torrent here:
    https://annas-archive.org/torrents#worldcat

    2023-10-04T05:50:47Z
  • raffaele
    @https://digipres.club/@raffaele

    @jorol yeah, very likely. This analysis is interesting, I'm afraid I want to know the duplicate numbers in others catalog (like the Italian one).
    If I find space on disk I'll download their scrape, I wish they had used parquet instead of a single zstd zipped jsonl

    2023-10-04T09:14:21Z
  • raffaele
    @https://digipres.club/@raffaele

    read, read, read, read, read, read.

    https://lucysullacultura.com/video/lucy-a-zonzo/cinema-o-letteratura-una-conversazione-con-werner-herzog/

    2023-10-04T10:46:35Z
  • ➡️

...