https://github.com/abbyy/jfk-ocr
Fully searchable JFK files powered by ABBYY Purpose-Built AI. Making history more accessible for AI research, full-text search, and Retrieval-Augmented Generation (RAG) applications. Open-source and ready for analysis.
https://github.com/abbyy/jfk-ocr
image-processing image-to-text ocr ocr-recognition
Last synced: 5 months ago
JSON representation
Fully searchable JFK files powered by ABBYY Purpose-Built AI. Making history more accessible for AI research, full-text search, and Retrieval-Augmented Generation (RAG) applications. Open-source and ready for analysis.
- Host: GitHub
- URL: https://github.com/abbyy/jfk-ocr
- Owner: abbyy
- License: cc-by-4.0
- Created: 2025-03-19T18:50:47.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-03-20T06:51:36.000Z (over 1 year ago)
- Last Synced: 2025-06-17T18:08:46.104Z (about 1 year ago)
- Topics: image-processing, image-to-text, ocr, ocr-recognition
- Homepage: https://www.abbyy.com
- Size: 365 KB
- Stars: 7
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE.md
Awesome Lists containing this project
README
# The JFK Files: Fully OCR'd & Searchable
The **JFK files** are now part of the public domain, offering a trove of historical documents for researchers, journalists, and enthusiasts alike.
However, the vast collection remains **unindexed, lacks a text layer, and is difficult to search**βmaking it challenging to analyze effectively, especially for AI-powered research.
As a leader in **OCR (Optical Character Recognition) technology**, **ABBYY** is facilitating research. We are providing the JFK files as **fully searchable, structured PDFs**, freely available for the open-source community. By making these documents **machine-readable**, we aim to **unlock deeper insights, accelerate historical research, and enable advanced AI-driven analysis**.
## What You Can Do with These Files
With this dataset, you can for example:
- π **Perform Full-Text Search** β Instantly locate key events, names, and places across thousands of pages.
- π **Build AI-Powered Research Tools** β Leverage **Retrieval-Augmented Generation (RAG)** to create **AI assistants** that can answer JFK-related questions.
- π **Run NLP & Machine Learning Analysis** β Detect patterns, extract key insights, and apply **entity recognition** to map relationships.
- π **Enhance Historical Investigations** β Cross-reference details, analyze declassified records, and uncover new connections.
## About the Data
These records are sourced from the **U.S. National Archives** and are part of the public domain:
π [JFK Records Collection (National Archives)](https://www.archives.gov/research/jfk/release-2025)
β **Disclaimer:** While these records are public domain, any copyrighted material within them remains the property of the respective copyright owner. These documents are provided **for private study, scholarship, or research purposes only** and are shared **as-is, without warranty** of any kind.
## Brought to you by ABBYY
The JFK Files were made machine-readable using the Document AI API. Get access here: https://hubs.li/Q039Y11p0