https://github.com/abbyy/jfk-ocr

Fully searchable JFK files powered by ABBYY Purpose-Built AI. Making history more accessible for AI research, full-text search, and Retrieval-Augmented Generation (RAG) applications. Open-source and ready for analysis.
https://github.com/abbyy/jfk-ocr

image-processing image-to-text ocr ocr-recognition

Last synced: 5 months ago
JSON representation

Host: GitHub
URL: https://github.com/abbyy/jfk-ocr
Owner: abbyy
License: cc-by-4.0
Created: 2025-03-19T18:50:47.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2025-03-20T06:51:36.000Z (over 1 year ago)
Last Synced: 2025-06-17T18:08:46.104Z (about 1 year ago)
Topics: image-processing, image-to-text, ocr, ocr-recognition
Homepage: https://www.abbyy.com
Size: 365 KB
Stars: 7
Watchers: 3
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE.md

Awesome Lists containing this project

README

          # The JFK Files: Fully OCR'd & Searchable 

The **JFK files** are now part of the public domain, offering a trove of historical documents for researchers, journalists, and enthusiasts alike. 

However, the vast collection remains **unindexed, lacks a text layer, and is difficult to search**—making it challenging to analyze effectively, especially for AI-powered research.  

As a leader in **OCR (Optical Character Recognition) technology**, **ABBYY** is facilitating research. We are providing the JFK files as **fully searchable, structured PDFs**, freely available for the open-source community. By making these documents **machine-readable**, we aim to **unlock deeper insights, accelerate historical research, and enable advanced AI-driven analysis**.  

## What You Can Do with These Files  

With this dataset, you can for example:  

- 🔍 **Perform Full-Text Search** – Instantly locate key events, names, and places across thousands of pages.  

- 🏗 **Build AI-Powered Research Tools** – Leverage **Retrieval-Augmented Generation (RAG)** to create **AI assistants** that can answer JFK-related questions.  

- 📊 **Run NLP & Machine Learning Analysis** – Detect patterns, extract key insights, and apply **entity recognition** to map relationships.  

- 📜 **Enhance Historical Investigations** – Cross-reference details, analyze declassified records, and uncover new connections.  

## About the Data  

These records are sourced from the **U.S. National Archives** and are part of the public domain:  

🔗 [JFK Records Collection (National Archives)](https://www.archives.gov/research/jfk/release-2025)  

⚠ **Disclaimer:** While these records are public domain, any copyrighted material within them remains the property of the respective copyright owner. These documents are provided **for private study, scholarship, or research purposes only** and are shared **as-is, without warranty** of any kind.  

## Brought to you by ABBYY

The JFK Files were made machine-readable using the Document AI API. Get access here: https://hubs.li/Q039Y11p0

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/abbyy/jfk-ocr

Awesome Lists containing this project

README