← Back to pdf2epub

How to Convert a Scanned PDF to EPUB (Automatic OCR)

Scanned PDFs are different from regular PDFs. Instead of containing actual text, they contain images of pages — photos taken of a book, printout, or document. Converting a scanned PDF to EPUB requires OCR (Optical Character Recognition) to extract the text from those images first.

toolkit.bot runs OCR automatically. You upload your scanned PDF, the converter detects image-only pages, runs Tesseract OCR on each one, and produces a reflowable EPUB with real, searchable text. No setup required.

How to convert a scanned PDF to EPUB

  1. Go to toolkit.bot/pdf2epub
  2. Upload your scanned PDF (drag and drop or click to browse)
  3. Wait 30–90 seconds — scanned files take longer because OCR runs on each page
  4. Download your EPUB file

No account or signup required. The free tier includes 5 conversions per month.

What counts as a "scanned" PDF?

A scanned PDF is any PDF where the pages are stored as images rather than as machine-readable text. This happens when:

You can identify a scanned PDF because you can't select or copy the text — clicking on the page selects nothing.

What happens during OCR conversion

The conversion process for scanned PDFs has three stages:

  1. Detection — the converter checks each page. If a page contains only image data with no embedded text layer, it's flagged for OCR.
  2. OCR — Tesseract (the industry-standard open-source OCR engine, also used by Google) processes each flagged page. It identifies character regions, reconstructs words and lines, and produces a text representation of the page.
  3. EPUB assembly — the extracted text is cleaned, paragraphs are identified, headings are inferred from font size context (where possible), and the result is packaged into a reflowable EPUB3 file.

What to expect from OCR output quality

OCR quality depends on the quality of the source scan:

Source scan quality Expected OCR result
High-resolution scan (300 dpi+), printed text Excellent — near-perfect text extraction
Good scan (200 dpi), standard typeface Good — occasional character errors (0 vs O, l vs 1)
Low-resolution scan or photo from a phone Moderate — readable but with more errors
Skewed or angled scan Lower quality — deskewing happens but isn't always perfect
Handwritten text Not supported — handwriting recognition requires a different engine

Can other tools handle scanned PDFs?

Most free converters cannot:

toolkit.bot is one of the few free, browser-based options that handles scanned PDFs with automatic OCR.

Mixed PDFs (some scanned, some not)

Many PDFs contain a mix of pages — some with real text, some with scanned images. toolkit.bot handles mixed PDFs automatically: each page is processed individually. Text pages use direct extraction; image pages use OCR. The resulting EPUB has consistent text throughout.

Does OCR make the file larger?

The EPUB file will typically be much smaller than the original scanned PDF. Scanned PDFs store high-resolution images for every page; the EPUB contains only the extracted text (plus any actual images that were in the original). A 20MB scanned PDF often produces a 200–500KB EPUB.

Upload your scanned PDF — OCR runs automatically, no setup required.

Convert Scanned PDF to EPUB →