Iโve been heads-down relaunching VintageReveries, but I also shipped a little gift to fellow archivists and curious tinkerers: Google Docs Archivist OCR โ a simple Google Apps Script that connects Google Drive + Google Docs + Gemini to batch-OCR folders of images with formatting you can actually use.
Repo: https://github.com/HelloJessicaM/Google-Docs-Archivist-OCR
Why I built it
Between 2011โ2014 I scanned a lot of magazines and ephemera. The OCR from that eraโฆ was not great. This month I revisited my archive with a fresh workflow and a new scanner (hi, IRIScan 7 Pro) and wanted something:
- fast enough to process an entire folder,
- smart enough to detect columns/tables, and
- simple enough to run inside Google Docs where I already edit.
Gemini 2.5 Flash handles the heavy lifting; Apps Script glues it together.
What it does (in plain English)
- Batch OCR: Point it at a Google Drive folder of JPEGs; it processes the set in one click.
- Format aware: Detects columns and tables; outputs tables as pipe-formatted rows (
| col1 | col2 |) so you can paste directly into Sheets/CSV. - Hands-on inside Docs: Adds a custom menu (Archivist Tools) so you never leave the document.
- Ordering fix: Latest update processes files strictly in page order (
page1.jpg,page2.jpg, โฆ) for predictable output.
How to use it (5 minutes)
- Open a new Google Doc โ Extensions โ Apps Script.
- Paste the code from
Code.gsin the repo. - Grab a free API key in Google AI Studio and paste it where the script says
PASTE_YOUR_API_KEY_HERE. - In Apps Script, click Services (+) โ add Drive API.
- Save, return to the Doc, refresh.
- Use the new menu ๐ Archivist Tools โ Run Image-to-Text OCR and paste a Drive folder link.
Tip: Name your files sequentially if they arenโt already. The tool will follow page order.

Real-world use: 1920s magazines at scale
I stress-tested the script on two living projects:
- 1924 St. Louis Fashion Pageant (I ran so many iterations I got sick of looking at it).
See the set: https://vintagereveries.com/topics/1924-st-louis-fashion-pageant/ - Character Reading (1924โ25) โ a delightfully odd periodical on phrenology, vocation advice, and early self-help.
The tool ripped through 100+ pages in about an hour or two, producing clean text blocks and pipe-tables I could drop into Sheets. Itโs night-and-day compared with my early-2010s OCR โ fewer hallucinations, better column handling, and far less cleanup.
Who this helps
- Archivists & librarians turning scans into searchable text.
- Resellers & historians extracting ads, captions, and indexes for research.
- Bloggers & editors who want ready-to-edit copy instead of raw images.
If youโve got a box of scans and an afternoon, this will save you many hours.
Roadmap / ideas Iโm exploring
- Optional per-page Google Docs output (one Doc per image).
- A โpreserve layoutโ mode for magazine spreads.
- Rate-limit helper & progress meter for very large folders.
- One-click CSV export for detected tables.
If any of that would be useful, open an issue or star the repo so I know where to focus.
Practical notes
- Privacy & cost: You can stay within Geminiโs free tier for moderate batches; large runs may require throttling.
- Accuracy: Vintage typefaces, skewed scans, and low contrast will still need light human cleanup โ but youโll be editing instead of retyping.
- File hygiene: Square up scans, crop borders, and keep a consistent DPI for best results.
Try it, break it, tell me
- Repo: https://github.com/HelloJessicaM/Google-Docs-Archivist-OCR
- If you use it on your own collections, Iโd love to see before/afters.
- If you hit bugs or want features, open an issue.
This is a small tool, but it unlocks big momentum: instead of typing, Iโm architecting. And that leaves me more time to list vintage, write, and keep building the AI-assisted systems that make my one-woman operation move.
Postscript: how this ties into my broader work
Iโm actively building a fashion dictionary knowledgebase for an AI listing assistant. Clean OCR is the foundation. The better my source text, the smarter my tooling gets. In the meantime, this script lets me revisitโand finally useโthe scans I made a decade ago, with much cleaner results.
0 Comments