{"id":50977771,"url":"https://github.com/vsancnaj/document-extractor-llm","last_synced_at":"2026-06-19T10:02:33.154Z","repository":{"id":363840159,"uuid":"867245104","full_name":"vsancnaj/document-extractor-llm","owner":"vsancnaj","description":"A Streamlit app using Large Language Models (LLMs) for efficient document parsing and data extraction. Dockerized for easy deployment, leveraging OpenAI, Chroma, and RAG for advanced information retrieval.","archived":false,"fork":false,"pushed_at":"2026-06-10T14:37:46.000Z","size":40,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2026-06-10T16:21:33.271Z","etag":null,"topics":["llm","openai","python","rag","streamlit"],"latest_commit_sha":null,"homepage":"https://hub.docker.com/repository/docker/vsanchezn/streamlit-app/general","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/vsancnaj.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2024-10-03T17:46:05.000Z","updated_at":"2026-06-10T14:45:04.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/vsancnaj/document-extractor-llm","commit_stats":null,"previous_names":["vsancnaj/document-extractor-llm"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/vsancnaj/document-extractor-llm","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vsancnaj%2Fdocument-extractor-llm","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vsancnaj%2Fdocument-extractor-llm/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vsancnaj%2Fdocument-extractor-llm/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vsancnaj%2Fdocument-extractor-llm/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/vsancnaj","download_url":"https://codeload.github.com/vsancnaj/document-extractor-llm/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vsancnaj%2Fdocument-extractor-llm/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34526073,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-19T02:00:06.005Z","response_time":61,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["llm","openai","python","rag","streamlit"],"created_at":"2026-06-19T10:02:32.009Z","updated_at":"2026-06-19T10:02:33.147Z","avatar_url":"https://github.com/vsancnaj.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Document Extractor LLM\n\nThis project aims to extract relevant information from documents using Large Language Models (LLMs). The implementation leverages the power of LLMs to read, understand, and extract data from various documents, making it useful for a wide range of applications in data processing, automation, and information retrieval.\n\n## Demo Video\n\nYou can find a small demonstration of the Streamlit app implementation of this project:  \n\n\nhttps://github.com/user-attachments/assets/47506049-d56d-46b3-b3f9-efa65449c719\n\n\n## Project Overview\n\nThe main goal of this project is to build an LLM-based document extraction tool. The tool allows users to input a variety of documents and have relevant information extracted and presented in a structured format. This project uses state-of-the-art language models, OpenAI, and Chroma for the vector database, as well as Retrieval-Augmented Generation (RAG) for context to process documents and extract the desired data.\n\n## Features\n\n- **Document Extraction**: Extracts structured data from documents using Large Language Models.\n- **Streamlit Interface**: A user-friendly interface for extracting data from documents.\n- **Dockerized Application**: The application is containerized using Docker for easy deployment and usage.\n\n## Getting Started\n\n[Link to the docker hub streamlit app](https://hub.docker.com/repository/docker/vsanchezn/streamlit-app/general)\n\n1. **Install Docker**: First, ensure Docker is installed on your computer. You can download and install it from the official [Docker website](https://www.docker.com/).\n\n2. **Pull the Docker Image**: Open your terminal and run this command to get the app:\n\n   ```bash\n   docker pull vsanchezn/streamlit-app\n   ```\n\n3. **Run the App**: Start the app by running:\n\n   ```bash\n   docker run -p 8501:8501 vsanchezn/streamlit-app\n   ```\n\n   This will launch the app on port `8501`.\n\n4. **Open the App**: Open your web browser and go to [http://localhost:8501](http://localhost:8501).\n\n5. **Stop the App**: To stop the app, press `Ctrl+C` in the terminal or use:\n\n   ```bash\n   docker stop \u003ccontainer_id\u003e\n   ```\n\n   Replace `vsanchezn` with the actual username you used to upload the Docker image.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvsancnaj%2Fdocument-extractor-llm","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fvsancnaj%2Fdocument-extractor-llm","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvsancnaj%2Fdocument-extractor-llm/lists"}