Optical character recognition

Tesseract OCR - Tesseract Open Source OCR Engine (main repo).

Tesseract.js - JavaScript library that gets words in almost any language out of images.

keras-ocr - Packaged and flexible version of the CRAFT text detector and Keras CRNN recognition model.

Awesome Scanning - Curated list of awesome projects to simplify and improve paper scannning.

Scale Document - Secure platform for document processing.

Easy OCR - Ready-to-use OCR with 40+ languages supported including Chinese, Japanese, Korean and Thai. (HN)

OCRmyPDF - Adds an OCR text layer to scanned PDF files, allowing them to be searched. (Docs)

FilingDB - Database of extracted and structured text from European company filings. Optimised for quant investors.

InvoiceNet - Deep neural network to extract intelligent information from invoice documents.

PaddleOCR - Rich, leading, and practical OCR tools that help users train better models and apply them into practice. (Web) (HN)

TextRecognitionDataGenerator - Synthetic data generator for text recognition.

Paperless - Index and archive all of your scanned paper documents.

macOCR - Get any text on your screen into your clipboard. (HN)

MMOCR - OpenMMLab Text Detection, Recognition and Understanding Toolbox.

Project Naptha - Highlight, copy and translate text from any image in the browser. (HN)

Extract Table - API for extracting a table from an image. (Code)

Amazon Textract - Easily extract printed text, handwriting, and data from any document. (Code Samples)

CalamariOCR - Line based ATR Engine based on OCRopy.

Paperless-NGX - Supercharged version of paperless: scan, index and archive all your physical documents. (HN)

scan2drive - Go program (with a web interface) for scanning, converting and uploading physical documents to Google Drive.

docTR - Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2 & PyTorch.

RapidOCR - Cross platform OCR Library based on PaddleOCR & OnnxRuntime.

meme_finder - Find locally-saved memes via their meme text using OCR. Written in Rust.

Veryfi OCR API - OCR API for Real-Time Data Extraction from Receipts & Invoices. (Node SDK)

ocrit - Command-line utility for performing OCR using Apple's Vision framework.

Tesseract WASM - WebAssembly build of the Tesseract OCR engine for use in the browser and Node.

tinyocr - Tiny command line OCR utility for recent versions of MacOS.

Donut - Document Understanding Transformer.

ocrpy - OCR, Archive, Index and Search: Implementation agnostic OCR framework.

Links​