Jan 3, 2026 | Artificial Intelligence, Continuing Education and Career Growth, VintageReveries

Google Docs Archivist OCR: a free tool for turning vintage scans into clean, usable text

Tags: AI tools, automation tools, VintageReveries

I’ve been heads-down relaunching VintageReveries, but I also shipped a little gift to fellow archivists and curious tinkerers: Google Docs Archivist OCR — a simple Google Apps Script that connects Google Drive + Google Docs + Gemini to batch-OCR folders of images with formatting you can actually use.

Repo: https://github.com/HelloJessicaM/Google-Docs-Archivist-OCR

Why I built it

Between 2011–2014 I scanned a lot of magazines and ephemera. The OCR from that era… was not great. This month I revisited my archive with a fresh workflow and a new scanner (hi, IRIScan 7 Pro) and wanted something:

fast enough to process an entire folder,
smart enough to detect columns/tables, and
simple enough to run inside Google Docs where I already edit.

Gemini 2.5 Flash handles the heavy lifting; Apps Script glues it together.

What it does (in plain English)

Batch OCR: Point it at a Google Drive folder of JPEGs; it processes the set in one click.
Format aware: Detects columns and tables; outputs tables as pipe-formatted rows (| col1 | col2 |) so you can paste directly into Sheets/CSV.
Hands-on inside Docs: Adds a custom menu (Archivist Tools) so you never leave the document.
Ordering fix: Latest update processes files strictly in page order (page1.jpg, page2.jpg, …) for predictable output.

How to use it (5 minutes)

Open a new Google Doc → Extensions → Apps Script.
Paste the code from Code.gs in the repo.
Grab a free API key in Google AI Studio and paste it where the script says PASTE_YOUR_API_KEY_HERE.
In Apps Script, click Services (+) → add Drive API.
Save, return to the Doc, refresh.
Use the new menu 🌟 Archivist Tools → Run Image-to-Text OCR and paste a Drive folder link.

Tip: Name your files sequentially if they aren’t already. The tool will follow page order.

Promotional graphic for Google Docs Archivist OCR, showing a scanner, Google Docs, and Gemini AI icons, with examples of scanned vintage documents and OCR results.

Real-world use: 1920s magazines at scale

I stress-tested the script on two living projects:

1924 St. Louis Fashion Pageant (I ran so many iterations I got sick of looking at it).
See the set: https://vintagereveries.com/topics/1924-st-louis-fashion-pageant/
Character Reading (1924–25) — a delightfully odd periodical on phrenology, vocation advice, and early self-help.

The tool ripped through 100+ pages in about an hour or two, producing clean text blocks and pipe-tables I could drop into Sheets. It’s night-and-day compared with my early-2010s OCR — fewer hallucinations, better column handling, and far less cleanup.

Who this helps

Archivists & librarians turning scans into searchable text.
Resellers & historians extracting ads, captions, and indexes for research.
Bloggers & editors who want ready-to-edit copy instead of raw images.

If you’ve got a box of scans and an afternoon, this will save you many hours.

Roadmap / ideas I’m exploring

Optional per-page Google Docs output (one Doc per image).
A “preserve layout” mode for magazine spreads.
Rate-limit helper & progress meter for very large folders.
One-click CSV export for detected tables.

If any of that would be useful, open an issue or star the repo so I know where to focus.

Practical notes

Privacy & cost: You can stay within Gemini’s free tier for moderate batches; large runs may require throttling.
Accuracy: Vintage typefaces, skewed scans, and low contrast will still need light human cleanup — but you’ll be editing instead of retyping.
File hygiene: Square up scans, crop borders, and keep a consistent DPI for best results.

Try it, break it, tell me

Repo: https://github.com/HelloJessicaM/Google-Docs-Archivist-OCR
If you use it on your own collections, I’d love to see before/afters.
If you hit bugs or want features, open an issue.

This is a small tool, but it unlocks big momentum: instead of typing, I’m architecting. And that leaves me more time to list vintage, write, and keep building the AI-assisted systems that make my one-woman operation move.

Postscript: how this ties into my broader work

I’m actively building a fashion dictionary knowledgebase for an AI listing assistant. Clean OCR is the foundation. The better my source text, the smarter my tooling gets. In the meantime, this script lets me revisit—and finally use—the scans I made a decade ago, with much cleaner results.

← previous post next post →

Google Docs Archivist OCR: a free tool for turning vintage scans into clean, usable text

Why I built it

What it does (in plain English)

How to use it (5 minutes)

Real-world use: 1920s magazines at scale

Who this helps

Roadmap / ideas I’m exploring

Practical notes

Try it, break it, tell me

Postscript: how this ties into my broader work

Related

0 Comments

Recent Posts

Post Categories

This site, and all sites designed by me since 2011, built with love and based on the Divi theme framework.

Sign up for email updates

Thanks! Look for an email confirmation and you'll be all set!