{"id":28780710,"url":"https://github.com/abbyy/jfk-ocr","last_synced_at":"2026-02-03T15:01:58.800Z","repository":{"id":283349848,"uuid":"951485501","full_name":"abbyy/JFK-OCR","owner":"abbyy","description":"Fully searchable JFK files powered by ABBYY Purpose-Built AI. Making history more accessible for AI research, full-text search, and Retrieval-Augmented Generation (RAG) applications. Open-source and ready for analysis.","archived":false,"fork":false,"pushed_at":"2025-03-20T06:51:36.000Z","size":374,"stargazers_count":7,"open_issues_count":0,"forks_count":0,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-06-17T18:08:46.104Z","etag":null,"topics":["image-processing","image-to-text","ocr","ocr-recognition"],"latest_commit_sha":null,"homepage":"https://www.abbyy.com","language":null,"has_issues":false,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"cc-by-4.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/abbyy.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-03-19T18:50:47.000Z","updated_at":"2025-06-17T00:19:59.000Z","dependencies_parsed_at":null,"dependency_job_id":"3a629a3c-e92c-4b9d-a000-627aea623eb9","html_url":"https://github.com/abbyy/JFK-OCR","commit_stats":null,"previous_names":["abbyy/jfk-ocr"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/abbyy/JFK-OCR","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/abbyy%2FJFK-OCR","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/abbyy%2FJFK-OCR/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/abbyy%2FJFK-OCR/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/abbyy%2FJFK-OCR/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/abbyy","download_url":"https://codeload.github.com/abbyy/JFK-OCR/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/abbyy%2FJFK-OCR/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29047794,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-03T14:55:20.264Z","status":"ssl_error","status_checked_at":"2026-02-03T14:55:19.725Z","response_time":96,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["image-processing","image-to-text","ocr","ocr-recognition"],"created_at":"2025-06-17T18:08:13.653Z","updated_at":"2026-02-03T15:01:58.791Z","avatar_url":"https://github.com/abbyy.png","language":null,"funding_links":[],"categories":[],"sub_categories":[],"readme":"# The JFK Files: Fully OCR'd \u0026 Searchable \nThe **JFK files** are now part of the public domain, offering a trove of historical documents for researchers, journalists, and enthusiasts alike. \nHowever, the vast collection remains **unindexed, lacks a text layer, and is difficult to search**—making it challenging to analyze effectively, especially for AI-powered research.  \nAs a leader in **OCR (Optical Character Recognition) technology**, **ABBYY** is facilitating research. We are providing the JFK files as **fully searchable, structured PDFs**, freely available for the open-source community. By making these documents **machine-readable**, we aim to **unlock deeper insights, accelerate historical research, and enable advanced AI-driven analysis**.  \n\n## What You Can Do with These Files  \n\nWith this dataset, you can for example:  \n\n- 🔍 **Perform Full-Text Search** – Instantly locate key events, names, and places across thousands of pages.  \n- 🏗 **Build AI-Powered Research Tools** – Leverage **Retrieval-Augmented Generation (RAG)** to create **AI assistants** that can answer JFK-related questions.  \n- 📊 **Run NLP \u0026 Machine Learning Analysis** – Detect patterns, extract key insights, and apply **entity recognition** to map relationships.  \n- 📜 **Enhance Historical Investigations** – Cross-reference details, analyze declassified records, and uncover new connections.  \n\n## About the Data  \n\nThese records are sourced from the **U.S. National Archives** and are part of the public domain:  \n🔗 [JFK Records Collection (National Archives)](https://www.archives.gov/research/jfk/release-2025)  \n\n⚠ **Disclaimer:** While these records are public domain, any copyrighted material within them remains the property of the respective copyright owner. These documents are provided **for private study, scholarship, or research purposes only** and are shared **as-is, without warranty** of any kind.  \n\n\n## Brought to you by ABBYY\nThe JFK Files were made machine-readable using the Document AI API. Get access here: https://hubs.li/Q039Y11p0\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fabbyy%2Fjfk-ocr","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fabbyy%2Fjfk-ocr","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fabbyy%2Fjfk-ocr/lists"}