Matt Miller's Avatar

Matt Miller

@thisismattmiller.com

Libraries/Data -- thisismattmiller.com

420
Followers
252
Following
50
Posts
10.07.2023
Joined
Posts Following

Latest posts by Matt Miller @thisismattmiller.com

Preview
Our Tools – Post45 Data Collective

We presented on our tool for enriching and clustering book data at Code4Lib today. Check it out, and let us know what you think!

data.post45.org/our-tools.html

Huge thanks to @thisismattmiller.com for leading development on this project.

#code4lib #c4l26

02.03.2026 21:58 πŸ‘ 9 πŸ” 8 πŸ’¬ 0 πŸ“Œ 0
Video thumbnail

Roy Lichtenstein Catalogue RaisonnΓ© site got a serious terms and conditions, complete with auto scroll button before you can use it. Though at least its online + free

26.02.2026 02:18 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Preview
WeppySnap - Chrome Web Store Capture a region of any browser tab as an animated WebP

I wrote a little Chrome extension to make animated WebP (β€œweppy”) files from a region of a webpage: chromewebstore.google.com/detail/weppy...
I use it when writing documentation and I want to show a short animation (in Github README for example). Simpler than a WebM video and more modern than GIF.

25.02.2026 17:36 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Video thumbnail

...state of the union? πŸ‘Ž , look at this bluesky quote post network explorer I just made. I added 6 networks so far:
thisismattmiller.github.io/bsky-quote-m...

25.02.2026 02:30 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Title page of book:
FREAK TREES OF THE STATE OF NEW YORK The New York State College of Forestry Syracuse University NELSON C. BROWN Acting Dean 1930 Second Edition New York (State) (allege of Forestry, Syracuse.

Title page of book: FREAK TREES OF THE STATE OF NEW YORK The New York State College of Forestry Syracuse University NELSON C. BROWN Acting Dean 1930 Second Edition New York (State) (allege of Forestry, Syracuse.

Shows a strange tree:
PRIZE WINNERS First Prize. G. W. Gotham, 89 River Street, Cortland, N. Y. Two elms, the larger tree appears to have absorbed the growth of the smaller tree. Trunk of large tree is bigger above the graft. 3

Shows a strange tree: PRIZE WINNERS First Prize. G. W. Gotham, 89 River Street, Cortland, N. Y. Two elms, the larger tree appears to have absorbed the growth of the smaller tree. Trunk of large tree is bigger above the graft. 3

Shows two strange trees:
PRIZE WINNERS Second Prize. C. B. Cox, Adams Center, N. Y. Elm, trunk runs along surface of earth in half circle 45 feet near Adams Center on North Harbor Road. Third Prize. A. Wilson Insley, 30 Eagle Street, Mt. Morris, N. Y. Elm, one mile south of Conesus Lake. 4

Shows two strange trees: PRIZE WINNERS Second Prize. C. B. Cox, Adams Center, N. Y. Elm, trunk runs along surface of earth in half circle 45 feet near Adams Center on North Harbor Road. Third Prize. A. Wilson Insley, 30 Eagle Street, Mt. Morris, N. Y. Elm, one mile south of Conesus Lake. 4

Shows two strange trees:
PRIZE WINNERS Fourth Prize. George J. Wiedmaier, 222 King Street, Dunkirk, N. Y. Maple, 14 inches in diameter arched 7 feet and anchored in birch tree near Ark- wright, N. Y. Fifth Prize. H. L. Tayntor, McGraw, N. Y. Double beech, near Homer, N. Y. Graft 18 feet in length and 8 inches in diameter. 5

Shows two strange trees: PRIZE WINNERS Fourth Prize. George J. Wiedmaier, 222 King Street, Dunkirk, N. Y. Maple, 14 inches in diameter arched 7 feet and anchored in birch tree near Ark- wright, N. Y. Fifth Prize. H. L. Tayntor, McGraw, N. Y. Double beech, near Homer, N. Y. Graft 18 feet in length and 8 inches in diameter. 5

Building my HathiTrust 1930 public domain survey and coming across interesting volumes... like the tree shaming
"Freak trees of the State of New York."
babel.hathitrust.org/cgi/pt?id=co...

07.01.2026 02:13 πŸ‘ 4 πŸ” 3 πŸ’¬ 0 πŸ“Œ 1
Video thumbnail

Also a lot of photocopies of physical media

19.12.2025 22:58 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Image from Epstine files yellow postit note on green background black redaction bars

Image from Epstine files yellow postit note on green background black redaction bars

Grid paper with black redaction bars

Grid paper with black redaction bars

Pink postit note with green background black redaction bars

Pink postit note with green background black redaction bars

Green postit note on white background black redaction bars

Green postit note on white background black redaction bars

Some of these Epstein file redactions are very aesthetic. Reminds me of updates.timsherratt.org/2021/04/21/s...

19.12.2025 22:50 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

Theoretically yes, the HathiTrust builds a local database. But the tool would need to be updated to know how to work with it, a new service would need to be added, it wouldn't work out of the box.

18.12.2025 20:35 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Diagram illustrating the BookReconciler workflow. On the left, a book cover of The Book of Salt by Monique Truong appears alongside β€œMinimal Metadata,” listing Author: Truong, Monique and Title: The Book of Salt. An arrow points to a box labeled β€œBookReconciler” with book and diamond icons. A downward arrow leads to β€œEnriched + Clustered Metadata,” showing multiple editions of the book cover and expanded metadata, including several ISBNs, subject headings (e.g., Vietnamese–France fiction, women authors, household employees, gay men, cooking), and an author VIAF identifier.

Diagram illustrating the BookReconciler workflow. On the left, a book cover of The Book of Salt by Monique Truong appears alongside β€œMinimal Metadata,” listing Author: Truong, Monique and Title: The Book of Salt. An arrow points to a box labeled β€œBookReconciler” with book and diamond icons. A downward arrow leads to β€œEnriched + Clustered Metadata,” showing multiple editions of the book cover and expanded metadata, including several ISBNs, subject headings (e.g., Vietnamese–France fiction, women authors, household employees, gay men, cooking), and an author VIAF identifier.

Very happy to introduce a new tool, BookReconciler!

You can take spreadsheets with book data and add subject headings, descriptions, ISBNs, HathiTrust IDs, & more. You can also cluster editions & variations of the same "Work."

Led by @thisismattmiller.com and supported by @post45data.bsky.social.

17.12.2025 21:37 πŸ‘ 123 πŸ” 56 πŸ’¬ 7 πŸ“Œ 1
Preview
BookReconciler: An Open-Source Tool for Metadata Enrichment and Work-Level Clustering We present BookReconciler, an open-source tool for enhancing and clustering book data. BookReconciler allows users to take spreadsheets with minimal metadata, such as book title and author, and automa...

A hard problem with literary data is navigating btwn editions of books and what the "work," or the theoretical text that unites all editions. I've been lucky to work with @thisismattmiller.com and @mellymeldubs.bsky.social, who built a tool to address this + do much more

arxiv.org/abs/2512.10165

12.12.2025 20:39 πŸ‘ 64 πŸ” 22 πŸ’¬ 4 πŸ“Œ 1

www.google.com/maps/@47.232...

10.12.2025 20:55 πŸ‘ 2 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

Example and analysis of how AI web scrapers are breaking small and medium cultural heritage sites.

01.12.2025 17:55 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
A screen shot of the viz showing clustered email graphed across time by contact

A screen shot of the viz showing clustered email graphed across time by contact

Blog: Visualizing 14,000 Released Epstein Emails.

I built a viz of the emails released as part of the 20K House Oversight Committee docs.

thisismattmiller.com/post/email-v...

- A clustered high level view of the emails by contact across time
- Zoom into individual emails and open the sources

25.11.2025 17:51 πŸ‘ 2 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

Thanks for checking it out!

12.11.2025 22:30 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Preview
LCNAF & Trie Storing +11M unique LCNAF names in 50MB Trie data structure

LCNAF & Trie – Storing +11M unique names in 50MB data structure in the browser

thisismattmiller.com/post/lcnaf-t...

- Optimizing LCNAF authorized headings into a trie data structure
- In browser MARC file name reconciliation + search tool
- OpenRefine / Command line tools for reconciliation

12.11.2025 19:28 πŸ‘ 6 πŸ” 4 πŸ’¬ 1 πŸ“Œ 0
Preview
Giallo Using a vision language model to analyze Italian Giallo films

Halloween blog post: Italian Giallo Horror Films

thisismattmiller.com/post/giallo/

- Using vision language model to analyze a 70 film corpus (🧟) / 80,000 frames
- Build and plot β€œtrope clusters” across movies

Probably the longest eye acting supercut you've seen: youtu.be/cGrmkOwut6k

31.10.2025 18:50 πŸ‘ 4 πŸ” 3 πŸ’¬ 0 πŸ“Œ 1
Shows a county map of the united states the counties with school districts with banned books are highlighted red.

Shows a county map of the united states the counties with school districts with banned books are highlighted red.

A screenshot of a the banned book browser interface showing rows of book covers.

A screenshot of a the banned book browser interface showing rows of book covers.

New Post: PEN America Banned Books 2025 dataset
thisismattmiller.com/post/book-ba...

Looking at school district book bans

- Interactive Map interface to the books banned in 2024-2025
- A faceted browse interface to the 3700 books
- Subject heading analysis

17.10.2025 19:38 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Preview
LC & Flickr Commons Library of Congress & Flickr Commons: Analysis of user interactions on 40,000 images.

New Blog Post.
Library of Congress & Flickr Commons: Analysis of user interactions on 40,000 images
thisismattmiller.com/post/lc-flic...
- Organizing 95K photo comments.
- Viewer to explore user georectified images
- Folksonomy tagging vs LCSH Vocabulary
- Placing into the Wiki* knowledge graph

07.10.2025 19:03 πŸ‘ 9 πŸ” 5 πŸ’¬ 2 πŸ“Œ 0

One output, 1 hour 40mins of Siskel and Ebert summaries:
www.youtube.com/watch?v=hFLM...

30.07.2025 20:10 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Preview
Building datasets from video collections using local & cloud LLMs Using Qwen2.5-VL, Gemini 2.5 and Whisper to build a Siskel and Ebert dataset

Trying out workflows that use multimodal LLMs for validating and QA.

In this blog I walk through a test using 1000 Siskel and Ebert videos to extract key video frames and other data.

thisismattmiller.com/post/buildin...

30.07.2025 19:43 πŸ‘ 4 πŸ” 0 πŸ’¬ 0 πŸ“Œ 1
A woodcut mashup image titled: maintenance

A woodcut mashup image titled: maintenance

maintenance

26.07.2025 19:06 πŸ‘ 1 πŸ” 1 πŸ’¬ 1 πŸ“Œ 0
Video thumbnail

New dataset on bestsellers from 40+ countries, with consistent coverage for France, Germany, Spain, Italy, and the U.S.

Congrats to the authors @sdileonardi.bsky.social, @beccacohen.bsky.social, and @dan-sinnamon.bsky.social on this major contribution! πŸŽ‰

πŸ”—: doi.org/10.18737/386...

29.07.2025 14:49 πŸ‘ 40 πŸ” 23 πŸ’¬ 1 πŸ“Œ 9
A screen capture from the Siskel and Ebert Show reviewing the movie Gremlins 2: The New Batch. Ebert gave it a thumbs down.
 Siskel gave it a thumbs down.

A screen capture from the Siskel and Ebert Show reviewing the movie Gremlins 2: The New Batch. Ebert gave it a thumbs down. Siskel gave it a thumbs down.

Gremlins 2: The New Batch (1990)
Director: Joe Dante
Cast: Phoebe Cates-Kline, Sylvester Stallone, Hulk Hogan, Zach Galligan, Christopher Lee
Watch Review
wp / wd

24.07.2025 09:19 πŸ‘ 0 πŸ” 1 πŸ’¬ 0 πŸ“Œ 1

thisismattmiller.com/post/glitch/

New blog post about @glitch.com shutdown, how I migrated my apps, and how I used glitch for teaching and creative projects.

23.07.2025 19:28 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

The Library of Congress BIBFRAME Update is online today at 1PM EDT.
Talks about:
- Hubs (BF ontology)
- BF Cataloging at Penn Libraries
- BF Validation Tooling
listserv.loc.gov/cgi-bin/wa?A...

30.06.2025 13:56 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

Need a robots.txt directive indicating bulk download is available, not that they would abide by robots.txt

17.06.2025 22:02 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

Yeah we have bots endlessly flooding id.loc.gov stressing servers to the limit trying to scrape millions of html pages even though we offer pretty much all of it as bulk downloads: id.loc.gov/download/

17.06.2025 22:00 πŸ‘ 11 πŸ” 5 πŸ’¬ 1 πŸ“Œ 0
A collage of fern illustrations over a color gradient. Images from:
https://www.biodiversitylibrary.org/item/108032#page/168/mode/2up 
https://www.biodiversitylibrary.org/item/107961#page/240/mode/2up 
https://www.biodiversitylibrary.org/item/108042#page/116/mode/2up 
https://www.biodiversitylibrary.org/item/108040#page/92/mode/2up 
https://www.biodiversitylibrary.org/item/107961#page/260/mode/2up 
https://www.biodiversitylibrary.org/item/108510#page/206/mode/2up 
https://www.biodiversitylibrary.org/item/108787#page/118/mode/2up

A collage of fern illustrations over a color gradient. Images from: https://www.biodiversitylibrary.org/item/108032#page/168/mode/2up https://www.biodiversitylibrary.org/item/107961#page/240/mode/2up https://www.biodiversitylibrary.org/item/108042#page/116/mode/2up https://www.biodiversitylibrary.org/item/108040#page/92/mode/2up https://www.biodiversitylibrary.org/item/107961#page/260/mode/2up https://www.biodiversitylibrary.org/item/108510#page/206/mode/2up https://www.biodiversitylibrary.org/item/108787#page/118/mode/2up

ideal

16.05.2025 11:06 πŸ‘ 1 πŸ” 1 πŸ’¬ 0 πŸ“Œ 0

A new very chill bot, for these very un-chill times.
Posts FERNS from "Ferns: British and exotic..." by E. J. Lowe. 8 vols 1856-1860.
Makes a new collage every 8 hours.

15.05.2025 04:09 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

Interesting! Because they just terminated their grant for the Post45 Data Collective, which preserves and establishes access to collections of literary and cultural data!

01.05.2025 23:04 πŸ‘ 14 πŸ” 3 πŸ’¬ 0 πŸ“Œ 0