{"id":31127192,"url":"https://github.com/floerianc/athena","last_synced_at":"2026-05-03T01:35:22.608Z","repository":{"id":308227431,"uuid":"1019786135","full_name":"Floerianc/Athena","owner":"Floerianc","description":"An RAG package fully written in Python.","archived":false,"fork":false,"pushed_at":"2025-08-18T17:10:03.000Z","size":940,"stargazers_count":1,"open_issues_count":1,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-08-18T18:36:55.714Z","etag":null,"topics":["ai","chatgpt","chatgpt-api","chromadb","openai","openai-api","package","processing","rag","rag-chatbot","vector-database"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Floerianc.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-07-14T21:48:48.000Z","updated_at":"2025-08-18T17:10:06.000Z","dependencies_parsed_at":"2025-08-04T22:15:51.850Z","dependency_job_id":"6ba15557-c1db-4037-b86e-fb7739939b9d","html_url":"https://github.com/Floerianc/Athena","commit_stats":null,"previous_names":["floerianc/athena"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/Floerianc/Athena","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Floerianc%2FAthena","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Floerianc%2FAthena/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Floerianc%2FAthena/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Floerianc%2FAthena/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Floerianc","download_url":"https://codeload.github.com/Floerianc/Athena/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Floerianc%2FAthena/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":275680444,"owners_count":25508570,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-17T02:00:09.119Z","response_time":84,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","chatgpt","chatgpt-api","chromadb","openai","openai-api","package","processing","rag","rag-chatbot","vector-database"],"created_at":"2025-09-17T23:02:53.325Z","updated_at":"2025-09-17T23:03:32.747Z","avatar_url":"https://github.com/Floerianc.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"./assets/icon_edit.png\" alt=\"Athena Logo\" width=\"128\" /\u003e\n\u003c/p\u003e\n\n\u003cp style=\"font-size: 8pt\" align=\"center\"\u003e\n    Icon was generated with AI.\n    It's only a placeholder, will be replaced with real art\n\u003c/p\u003e\n\n\u003ch1 align=\"center\"\u003e\n    Athena\n\u003c/h1\u003e\n\n\u003cp align=\"center\"\u003e\n    \u003cstrong\u003e\n        Wisdom for your data.\n    \u003c/strong\u003e\u003cbr\u003e\n    A RAG-Model built from scratch in Python. Ingest large JSON, PDF, Markdown or TXT-Files and let Athena work its magic.\n\u003c/p\u003e\n\n---\n\n## 🚀 Features\n\n- **🔗 Vector‑based Database**  \n  - Sanitizes input and metadata to ensure correct upserting\n  - Custom embedding vector generation (not reliant on ChromaDB)\n  - Split or normalize text (by blank lines, newlines, or fixed‐size chunks)  \n  - Embed chunks into ChromaDB for ultra‑fast semantic lookup\n  - Careful deletion process\n- **🔍 Smart Search \u0026 Retrieval**  \n  - Highlight query terms in returned documents\n  - Filter results by tokens, metadata \u0026 distance thresholds  \n  - Cap output by tokens for cost control\n  - Multiple helper methods for developers\n- **🤖 AI Pipeline**  \n  - Convert `QueryResults` + user query into a single, structured prompt  \n  - Full support for JSON, plain‑text \u0026 Markdown outputs  \n  - Configurable max_tokens for both input \u0026 output\n  - Optional structured output with custom schema\n  - Improved text extracting from responses\n- **🧠 AI Memory**\n  - Own memory component\n  - Shortens past prompts for efficient token usage\n  - Seperate vector database to get relevant past queries/responses.\n  - Offers fallbacks and other helper methods\n- **💻 CLI**\n  - Pretty CLI design\n  - Own stylesheets (ColorProfiles), progress bars and progress messages\n  - Allows any supported input file type and a (optional) schema path\n- **⚙️ Extensive Processor**\n  - Normalize documents lengths for uniform chunks\n  - Large parser for TXT, Markdown and PDF-Files:\n    - Parsing by newline\n    - Parsing by blank lines\n    - Parsing by chunks\n  - Serializer to convert internal objects to human readable JSON/dicts\n  - Validator to validate input data for the parser\n- **📊 Benchmarking**  \n  - Automatically log system specs, input sizes, timings \u0026 memory\n  - CLI‑friendly display and extensive JSON export of every run\n- **🛜 Streamlit App**\n  - Fully grown Streamlit app featuring four pages:\n    - Simple Chatbot with custom input file, schema and log view.\n    - Search engine implementation to visualize the main ChromaDB database\n    - Processor overview to see how the Processor processes the input data\n    - Config overview\n- **⚙️ Rich Configuration** via `Config` (models, parsing modes, memory limits, embedding, search engine configs...)\n\n---\n\n## 📦 Installation\n\n```bash\ngit clone https://github.com/floerianc/athena.git\ncd athena\npython3 -m venv .venv\nsource .venv/bin/activate\npip install -r requirements.txt\n````\n\n## 🔮 Roadmap\n\n| Priority      | Task                                                |\n| ------------- | --------------------------------------------------- |\n| **Very High** | • Improve stability of core components              |\n| **High**      | • Better Error handling                             |\n|               | • Token calculation for input max_tokens            |\n| **Mid**       | • Code cleanup                                      |\n|               | • Create Unit-Tests                                 |\n\n---\n\n## 📄 License\n\n[GPLv3](LICENSE) © 2025 Floerianc \u003c3\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffloerianc%2Fathena","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffloerianc%2Fathena","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffloerianc%2Fathena/lists"}