{"id":15914639,"url":"https://github.com/lebtoki/docuchat","last_synced_at":"2025-07-22T08:06:13.353Z","repository":{"id":244370284,"uuid":"815047241","full_name":"LebToki/DocuChat","owner":"LebToki","description":"DocuChat is a document-based chatbot that leverages advanced NLP models to provide intelligent responses based on the content of uploaded documents. This project consists of a PHP frontend and a Python backend.","archived":false,"fork":false,"pushed_at":"2025-06-30T10:16:52.000Z","size":569,"stargazers_count":1,"open_issues_count":1,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-06-30T11:22:12.207Z","etag":null,"topics":["agent","ai","bert-model","bootstrap5","bottraining","chat","chatbot","documentmanagement","documents","embeddings","fine-tuning","font-awesome","jquery","knowledgemanagement","machine-learning","nlp","php8","phyton","python-3","python3"],"latest_commit_sha":null,"homepage":"","language":"PHP","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/LebToki.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-06-14T08:35:14.000Z","updated_at":"2025-06-30T10:16:18.000Z","dependencies_parsed_at":"2024-10-28T15:21:23.210Z","dependency_job_id":"401f5a64-4ff5-4f06-beac-04d858373756","html_url":"https://github.com/LebToki/DocuChat","commit_stats":null,"previous_names":["lebtoki/docuchat"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/LebToki/DocuChat","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LebToki%2FDocuChat","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LebToki%2FDocuChat/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LebToki%2FDocuChat/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LebToki%2FDocuChat/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/LebToki","download_url":"https://codeload.github.com/LebToki/DocuChat/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LebToki%2FDocuChat/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":266455244,"owners_count":23931358,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-07-22T02:00:09.085Z","response_time":66,"last_error":null,"robots_txt_status":null,"robots_txt_updated_at":null,"robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["agent","ai","bert-model","bootstrap5","bottraining","chat","chatbot","documentmanagement","documents","embeddings","fine-tuning","font-awesome","jquery","knowledgemanagement","machine-learning","nlp","php8","phyton","python-3","python3"],"created_at":"2024-10-06T17:04:51.518Z","updated_at":"2025-07-22T08:06:13.345Z","avatar_url":"https://github.com/LebToki.png","language":"PHP","funding_links":[],"categories":[],"sub_categories":[],"readme":"# DocuChat Model Fine-Tuning\r\n\r\n## Introducing DocuChat\r\n\r\nIn today's fast-paced world, managing and interacting with various documents can be a daunting task. Imagine having a tool that not only simplifies this process but also enhances it using the power of Artificial Intelligence. Enter DocuChat, an innovative solution designed to revolutionize the way we handle documents.\r\nDocuChat is a document-based chatbot that leverages advanced NLP models to provide intelligent responses based on the content of uploaded documents. This project consists of a PHP frontend and a Python backend.\r\n\r\n## What is DocuChat?\r\nDocuChat is a groundbreaking project that leverages the capabilities of Generative AI to analyze, interact with, and retrieve information from various types of documents, including PDFs, DOCX, PPT, and more. Whether you're a hobbyist, a beginner, or a professional, DocuChat offers a seamless and efficient way to manage your document needs.\r\n\r\n# For Hobbyists and Beginners\r\nAre you new to the world of Natural Language Processing (NLP) and AI? \r\n\r\nDocuChat is the perfect starting point for you. With a user-friendly interface and straightforward setup, you can quickly dive into the exciting world of AI and document management. Here's what you can expect:\r\n\r\n- Easy Setup: \r\nFollow our simple installation guide to get started.\r\n\r\n- Interactive Learning: \r\nExperiment with different document types and see how AI analyzes and retrieves information.\r\n\r\n- Community Support: \r\nJoin our growing community of hobbyists and beginners to share experiences, ask questions, and learn together.\r\n\r\n# For Professionals\r\nIf you're a professional looking for a robust and reliable document management solution, DocuChat has got you covered. With advanced features and customizable options, you can tailor the tool to meet your specific needs. Here’s how DocuChat can benefit you:\r\n\r\n- Efficiency: \r\nSave time by letting AI handle the heavy lifting of document analysis and information retrieval.\r\n\r\n- Accuracy: \r\nEnsure precise and relevant information extraction with our fine-tuned models.\r\n\r\n- Scalability: \r\nEasily integrate DocuChat into your existing workflows and scale it as your document management needs grow.\r\n\r\n## Get Involved\r\nWe invite contributors and sponsors to join us in enhancing DocuChat. Your support and contributions can help us bring even more exciting features and improvements to the project. Whether you're a developer, a researcher, or a sponsor, there's a place for you in the DocuChat community.\r\n\r\n## Overview\r\nThe project includes scripts and configurations to:\r\n- Upload documents\r\n- Extract text from various document formats\r\n- Generate embeddings using a BERT-based model\r\n- Fine-tune the model on specific tasks\r\n\r\n## Project Structure\r\n\r\n**backend/**: Contains the backend code and scripts for the project.\r\n- **app.py**: Main Flask app for handling requests.\r\n- **download_models.py**: Script to download models.\r\n- **fine_tune_model.py**: Script to fine-tune the model.\r\n- **models/**: Directory to store models.\r\n  - **bert-base-multilingual-cased/**: Directory for the bert-base-multilingual-cased model.\r\n- **project_embeddings/**: Directory to store project embeddings.\r\n  - **YourProjectName/**: Your Own Projects Embeddings will be created here\r\n- **results/**: Directory to store fine-tuning results.\r\n- **static/**: Directory for static files.\r\n  - **embeddings/**: Directory to store generated embeddings.\r\n- **YourProjectName/**: Your Own Projects Embeddings will be created here\r\n  - **uploads/**: Directory to store uploaded files.\r\n  - **YourProjectName/**: Your Own Projects Embeddings will be created here\r\n\r\n- **templates/**: HTML templates.\r\n- **__pycache__/**: Python cache files.\r\n\r\n- **public/**: Frontend code and assets.\r\n  - **css/**: Custom styles.\r\n  - **img/**: Images for the frontend.\r\n    - **types/**: File type icons.\r\n  - **js/**: JavaScript files.\r\n  - **src/**: Source files for the frontend.\r\n    - **views/**: Views for the frontend.\r\n\r\n**vendor/**: Contains third-party libraries and frameworks.\r\n- **bootstrap/**: Bootstrap CSS and JS.\r\n- **font-awesome/**: FontAwesome CSS and JS.\r\n- **jquery/**: jQuery library.\r\n\r\n## Requirements\r\n\r\n### Backend (Python)\r\n- Python 3.11\r\n- Flask\r\n- Flask-CORS\r\n- pdfminer.six\r\n- python-docx\r\n- openpyxl\r\n- python-pptx\r\n- transformers\r\n- faiss\r\n- langdetect\r\n- torch\r\n\r\n### Frontend (PHP)\r\n- PHP 7.4+\r\n- Bootstrap\r\n- FontAwesome\r\n- jQuery\r\n\r\n## Installation\r\n\r\n1. Clone the repository:\r\n\r\n```bash\r\ngit clone https://github.com/LebToki/DocuChat.git\r\ncd DocuChat/backend\r\n```\r\n\r\nSet up a virtual environment and install dependencies:\r\n```bash\r\npython -m venv .venv\r\nsource .venv/bin/activate # On Windows, use `.venv\\\\Scripts\\\\activate`\r\npip install -r requirements.txt\r\n```\r\n\r\n## Setup Simplified Steps\r\n\r\n### Backend\r\n\r\n1. Create a virtual environment and activate it:\r\n   ```sh\r\n   python -m venv .venv\r\n   source .venv/bin/activate  # On Windows, use `.venv\\Scripts\\activate`\r\n   ```\r\n# Install the required Python packages:\r\n\r\n```sh\r\npip install -r requirements.txt\r\n```\r\n# Download the necessary models by running:\r\n\r\n```sh\r\npython backend/download_models.py\r\n```\r\n# Run the Flask app:\r\n\r\n```sh\r\npython backend/app.py\r\n```\r\n# Frontend\r\nEnsure you have a local server setup (e.g., XAMPP, Laragon).\r\n\r\nPlace the PHP files in the appropriate directory of your server.\r\nOpen the project in your browser.\r\n\r\n# Usage\r\n- Upload documents through the frontend interface.\r\n- Ask questions related to the uploaded documents.\r\n- The backend will process the documents, generate embeddings, and provide relevant responses based on the content.\r\n\r\n# Fine-Tuning\r\nTo fine-tune the model, run the following script:\r\n```\r\npython backend/fine_tune_model.py\r\n\r\n```\r\n# Contributing\r\nContributions are welcome! Please submit a pull request or open an issue for any improvements or bugs.\r\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flebtoki%2Fdocuchat","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flebtoki%2Fdocuchat","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flebtoki%2Fdocuchat/lists"}