Our Tools β Post45 Data Collective
We presented on our tool for enriching and clustering book data at Code4Lib today. Check it out, and let us know what you think!
data.post45.org/our-tools.html
Huge thanks to @thisismattmiller.com for leading development on this project.
#code4lib #c4l26
02.03.2026 21:58
π 9
π 8
π¬ 0
π 0
Roy Lichtenstein Catalogue RaisonnΓ© site got a serious terms and conditions, complete with auto scroll button before you can use it. Though at least its online + free
26.02.2026 02:18
π 1
π 0
π¬ 0
π 0
WeppySnap - Chrome Web Store
Capture a region of any browser tab as an animated WebP
I wrote a little Chrome extension to make animated WebP (βweppyβ) files from a region of a webpage: chromewebstore.google.com/detail/weppy...
I use it when writing documentation and I want to show a short animation (in Github README for example). Simpler than a WebM video and more modern than GIF.
25.02.2026 17:36
π 0
π 0
π¬ 0
π 0
...state of the union? π , look at this bluesky quote post network explorer I just made. I added 6 networks so far:
thisismattmiller.github.io/bsky-quote-m...
25.02.2026 02:30
π 0
π 0
π¬ 0
π 0
Title page of book:
FREAK TREES OF THE STATE OF NEW YORK The New York State College of Forestry Syracuse University NELSON C. BROWN Acting Dean 1930 Second Edition New York (State) (allege of Forestry, Syracuse.
Shows a strange tree:
PRIZE WINNERS First Prize. G. W. Gotham, 89 River Street, Cortland, N. Y. Two elms, the larger tree appears to have absorbed the growth of the smaller tree. Trunk of large tree is bigger above the graft. 3
Shows two strange trees:
PRIZE WINNERS Second Prize. C. B. Cox, Adams Center, N. Y. Elm, trunk runs along surface of earth in half circle 45 feet near Adams Center on North Harbor Road. Third Prize. A. Wilson Insley, 30 Eagle Street, Mt. Morris, N. Y. Elm, one mile south of Conesus Lake. 4
Shows two strange trees:
PRIZE WINNERS Fourth Prize. George J. Wiedmaier, 222 King Street, Dunkirk, N. Y. Maple, 14 inches in diameter arched 7 feet and anchored in birch tree near Ark- wright, N. Y. Fifth Prize. H. L. Tayntor, McGraw, N. Y. Double beech, near Homer, N. Y. Graft 18 feet in length and 8 inches in diameter. 5
Building my HathiTrust 1930 public domain survey and coming across interesting volumes... like the tree shaming
"Freak trees of the State of New York."
babel.hathitrust.org/cgi/pt?id=co...
07.01.2026 02:13
π 4
π 3
π¬ 0
π 1
Also a lot of photocopies of physical media
19.12.2025 22:58
π 1
π 0
π¬ 0
π 0
Image from Epstine files yellow postit note on green background black redaction bars
Grid paper with black redaction bars
Pink postit note with green background black redaction bars
Green postit note on white background black redaction bars
Some of these Epstein file redactions are very aesthetic. Reminds me of updates.timsherratt.org/2021/04/21/s...
19.12.2025 22:50
π 0
π 0
π¬ 1
π 0
Theoretically yes, the HathiTrust builds a local database. But the tool would need to be updated to know how to work with it, a new service would need to be added, it wouldn't work out of the box.
18.12.2025 20:35
π 1
π 0
π¬ 0
π 0
Diagram illustrating the BookReconciler workflow. On the left, a book cover of The Book of Salt by Monique Truong appears alongside βMinimal Metadata,β listing Author: Truong, Monique and Title: The Book of Salt. An arrow points to a box labeled βBookReconcilerβ with book and diamond icons. A downward arrow leads to βEnriched + Clustered Metadata,β showing multiple editions of the book cover and expanded metadata, including several ISBNs, subject headings (e.g., VietnameseβFrance fiction, women authors, household employees, gay men, cooking), and an author VIAF identifier.
Very happy to introduce a new tool, BookReconciler!
You can take spreadsheets with book data and add subject headings, descriptions, ISBNs, HathiTrust IDs, & more. You can also cluster editions & variations of the same "Work."
Led by @thisismattmiller.com and supported by @post45data.bsky.social.
17.12.2025 21:37
π 123
π 56
π¬ 7
π 1
BookReconciler: An Open-Source Tool for Metadata Enrichment and Work-Level Clustering
We present BookReconciler, an open-source tool for enhancing and clustering book data. BookReconciler allows users to take spreadsheets with minimal metadata, such as book title and author, and automa...
A hard problem with literary data is navigating btwn editions of books and what the "work," or the theoretical text that unites all editions. I've been lucky to work with @thisismattmiller.com and @mellymeldubs.bsky.social, who built a tool to address this + do much more
arxiv.org/abs/2512.10165
12.12.2025 20:39
π 64
π 22
π¬ 4
π 1
www.google.com/maps/@47.232...
10.12.2025 20:55
π 2
π 0
π¬ 0
π 0
Example and analysis of how AI web scrapers are breaking small and medium cultural heritage sites.
01.12.2025 17:55
π 0
π 0
π¬ 0
π 0
A screen shot of the viz showing clustered email graphed across time by contact
Blog: Visualizing 14,000 Released Epstein Emails.
I built a viz of the emails released as part of the 20K House Oversight Committee docs.
thisismattmiller.com/post/email-v...
- A clustered high level view of the emails by contact across time
- Zoom into individual emails and open the sources
25.11.2025 17:51
π 2
π 0
π¬ 0
π 0
Thanks for checking it out!
12.11.2025 22:30
π 0
π 0
π¬ 0
π 0
LCNAF & Trie
Storing +11M unique LCNAF names in 50MB Trie data structure
LCNAF & Trie β Storing +11M unique names in 50MB data structure in the browser
thisismattmiller.com/post/lcnaf-t...
- Optimizing LCNAF authorized headings into a trie data structure
- In browser MARC file name reconciliation + search tool
- OpenRefine / Command line tools for reconciliation
12.11.2025 19:28
π 6
π 4
π¬ 1
π 0
Giallo
Using a vision language model to analyze Italian Giallo films
Halloween blog post: Italian Giallo Horror Films
thisismattmiller.com/post/giallo/
- Using vision language model to analyze a 70 film corpus (π§) / 80,000 frames
- Build and plot βtrope clustersβ across movies
Probably the longest eye acting supercut you've seen: youtu.be/cGrmkOwut6k
31.10.2025 18:50
π 4
π 3
π¬ 0
π 1
Shows a county map of the united states the counties with school districts with banned books are highlighted red.
A screenshot of a the banned book browser interface showing rows of book covers.
New Post: PEN America Banned Books 2025 dataset
thisismattmiller.com/post/book-ba...
Looking at school district book bans
- Interactive Map interface to the books banned in 2024-2025
- A faceted browse interface to the 3700 books
- Subject heading analysis
17.10.2025 19:38
π 1
π 0
π¬ 0
π 0
LC & Flickr Commons
Library of Congress & Flickr Commons: Analysis of user interactions on 40,000 images.
New Blog Post.
Library of Congress & Flickr Commons: Analysis of user interactions on 40,000 images
thisismattmiller.com/post/lc-flic...
- Organizing 95K photo comments.
- Viewer to explore user georectified images
- Folksonomy tagging vs LCSH Vocabulary
- Placing into the Wiki* knowledge graph
07.10.2025 19:03
π 9
π 5
π¬ 2
π 0
One output, 1 hour 40mins of Siskel and Ebert summaries:
www.youtube.com/watch?v=hFLM...
30.07.2025 20:10
π 1
π 0
π¬ 0
π 0
Building datasets from video collections using local & cloud LLMs
Using Qwen2.5-VL, Gemini 2.5 and Whisper to build a Siskel and Ebert dataset
Trying out workflows that use multimodal LLMs for validating and QA.
In this blog I walk through a test using 1000 Siskel and Ebert videos to extract key video frames and other data.
thisismattmiller.com/post/buildin...
30.07.2025 19:43
π 4
π 0
π¬ 0
π 1
A woodcut mashup image titled: maintenance
maintenance
26.07.2025 19:06
π 1
π 1
π¬ 1
π 0
New dataset on bestsellers from 40+ countries, with consistent coverage for France, Germany, Spain, Italy, and the U.S.
Congrats to the authors @sdileonardi.bsky.social, @beccacohen.bsky.social, and @dan-sinnamon.bsky.social on this major contribution! π
π: doi.org/10.18737/386...
29.07.2025 14:49
π 40
π 23
π¬ 1
π 9
A screen capture from the Siskel and Ebert Show reviewing the movie Gremlins 2: The New Batch. Ebert gave it a thumbs down.
Siskel gave it a thumbs down.
Gremlins 2: The New Batch (1990)
Director: Joe Dante
Cast: Phoebe Cates-Kline, Sylvester Stallone, Hulk Hogan, Zach Galligan, Christopher Lee
Watch Review
wp / wd
24.07.2025 09:19
π 0
π 1
π¬ 0
π 1
thisismattmiller.com/post/glitch/
New blog post about @glitch.com shutdown, how I migrated my apps, and how I used glitch for teaching and creative projects.
23.07.2025 19:28
π 1
π 0
π¬ 0
π 0
The Library of Congress BIBFRAME Update is online today at 1PM EDT.
Talks about:
- Hubs (BF ontology)
- BF Cataloging at Penn Libraries
- BF Validation Tooling
listserv.loc.gov/cgi-bin/wa?A...
30.06.2025 13:56
π 1
π 0
π¬ 0
π 0
Need a robots.txt directive indicating bulk download is available, not that they would abide by robots.txt
17.06.2025 22:02
π 1
π 0
π¬ 1
π 0
Yeah we have bots endlessly flooding id.loc.gov stressing servers to the limit trying to scrape millions of html pages even though we offer pretty much all of it as bulk downloads: id.loc.gov/download/
17.06.2025 22:00
π 11
π 5
π¬ 1
π 0
A collage of fern illustrations over a color gradient. Images from:
https://www.biodiversitylibrary.org/item/108032#page/168/mode/2up
https://www.biodiversitylibrary.org/item/107961#page/240/mode/2up
https://www.biodiversitylibrary.org/item/108042#page/116/mode/2up
https://www.biodiversitylibrary.org/item/108040#page/92/mode/2up
https://www.biodiversitylibrary.org/item/107961#page/260/mode/2up
https://www.biodiversitylibrary.org/item/108510#page/206/mode/2up
https://www.biodiversitylibrary.org/item/108787#page/118/mode/2up
ideal
16.05.2025 11:06
π 1
π 1
π¬ 0
π 0
A new very chill bot, for these very un-chill times.
Posts FERNS from "Ferns: British and exotic..." by E. J. Lowe. 8 vols 1856-1860.
Makes a new collage every 8 hours.
15.05.2025 04:09
π 1
π 0
π¬ 0
π 0
Interesting! Because they just terminated their grant for the Post45 Data Collective, which preserves and establishes access to collections of literary and cultural data!
01.05.2025 23:04
π 14
π 3
π¬ 0
π 0