An open API service indexing awesome lists of open source software.

https://github.com/abbyy/jfk-ocr

Fully searchable JFK files powered by ABBYY Purpose-Built AI. Making history more accessible for AI research, full-text search, and Retrieval-Augmented Generation (RAG) applications. Open-source and ready for analysis.
https://github.com/abbyy/jfk-ocr

image-processing image-to-text ocr ocr-recognition

Last synced: 5 months ago
JSON representation

Fully searchable JFK files powered by ABBYY Purpose-Built AI. Making history more accessible for AI research, full-text search, and Retrieval-Augmented Generation (RAG) applications. Open-source and ready for analysis.

Awesome Lists containing this project

README

          

# The JFK Files: Fully OCR'd & Searchable
The **JFK files** are now part of the public domain, offering a trove of historical documents for researchers, journalists, and enthusiasts alike.
However, the vast collection remains **unindexed, lacks a text layer, and is difficult to search**β€”making it challenging to analyze effectively, especially for AI-powered research.
As a leader in **OCR (Optical Character Recognition) technology**, **ABBYY** is facilitating research. We are providing the JFK files as **fully searchable, structured PDFs**, freely available for the open-source community. By making these documents **machine-readable**, we aim to **unlock deeper insights, accelerate historical research, and enable advanced AI-driven analysis**.

## What You Can Do with These Files

With this dataset, you can for example:

- πŸ” **Perform Full-Text Search** – Instantly locate key events, names, and places across thousands of pages.
- πŸ— **Build AI-Powered Research Tools** – Leverage **Retrieval-Augmented Generation (RAG)** to create **AI assistants** that can answer JFK-related questions.
- πŸ“Š **Run NLP & Machine Learning Analysis** – Detect patterns, extract key insights, and apply **entity recognition** to map relationships.
- πŸ“œ **Enhance Historical Investigations** – Cross-reference details, analyze declassified records, and uncover new connections.

## About the Data

These records are sourced from the **U.S. National Archives** and are part of the public domain:
πŸ”— [JFK Records Collection (National Archives)](https://www.archives.gov/research/jfk/release-2025)

⚠ **Disclaimer:** While these records are public domain, any copyrighted material within them remains the property of the respective copyright owner. These documents are provided **for private study, scholarship, or research purposes only** and are shared **as-is, without warranty** of any kind.

## Brought to you by ABBYY
The JFK Files were made machine-readable using the Document AI API. Get access here: https://hubs.li/Q039Y11p0