{"id":19383959,"url":"https://github.com/saadkh1/docqa-textsummarization-app","last_synced_at":"2025-04-14T00:36:12.277Z","repository":{"id":207842391,"uuid":"685739046","full_name":"saadkh1/DocQA-TextSummarization-App","owner":"saadkh1","description":"A Streamlit app for document question answering and text summarization.","archived":false,"fork":false,"pushed_at":"2023-09-01T06:14:02.000Z","size":195,"stargazers_count":3,"open_issues_count":1,"forks_count":3,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-27T14:51:55.213Z","etag":null,"topics":["langchain","llama-2","llamacpp","pytesseract","question-answering","streamlit","summarization","whisper"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/saadkh1.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2023-08-31T22:42:05.000Z","updated_at":"2024-12-15T11:58:06.000Z","dependencies_parsed_at":"2023-11-17T23:58:54.977Z","dependency_job_id":"ee87da13-0ce7-41e6-be2c-1a33daab55f3","html_url":"https://github.com/saadkh1/DocQA-TextSummarization-App","commit_stats":null,"previous_names":["saadkh1/docqa-textsummarization-app"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/saadkh1%2FDocQA-TextSummarization-App","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/saadkh1%2FDocQA-TextSummarization-App/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/saadkh1%2FDocQA-TextSummarization-App/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/saadkh1%2FDocQA-TextSummarization-App/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/saadkh1","download_url":"https://codeload.github.com/saadkh1/DocQA-TextSummarization-App/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248802892,"owners_count":21163939,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["langchain","llama-2","llamacpp","pytesseract","question-answering","streamlit","summarization","whisper"],"created_at":"2024-11-10T09:28:31.112Z","updated_at":"2025-04-14T00:36:12.255Z","avatar_url":"https://github.com/saadkh1.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Streamlit Document QA and Text Summarization App\r\n\r\nThis Streamlit application empowers users to effortlessly perform Document Question-Answering (QA) and Text Summarization tasks in their preferred language, English or German, with just a few simple steps.\r\n\r\n## How to Use\r\n\r\n### Language Selection\r\n\r\n1. Select your preferred language (English or German) from the sidebar.\r\n\r\n### Task Selection\r\n\r\n2. Choose the task you want to perform - Document QA or Text Summarization.\r\n\r\n### Document QA\r\n\r\n#### 1. Upload Documents\r\n\r\n- Click on the \"Upload Files\" section in the sidebar.\r\n- Upload various types of documents including (PDFs, Markdown, plain text, and DOCX files).\r\n\r\n#### 2. OCR for Images\r\n\r\n- Click on the \"OCR for Images\" section in the sidebar.\r\n- Conveniently upload images (PNG, JPG, JPEG) for optical character recognition (OCR).\r\n\r\n#### 3. Upload Audio Files and Transcribe\r\n\r\n- Click on the \"Upload Audio Files and Transcribe\" section in the sidebar.\r\n- Effortlessly upload audio files (MP3, WAV) for automatic transcription.\r\n\r\n#### 4. Import HTML\r\n\r\n- Click on the \"Import HTML\" section in the sidebar.\r\n- Simply enter URLs to import HTML content from websites.\r\n\r\n#### 5. Transcribe YouTube Video\r\n\r\n- Click on the \"YouTube Video\" section in the sidebar.\r\n- Enter a YouTube video URL for transcription.\r\n\r\n#### 6. Create Vector Database\r\n\r\n- Click on the \"Create Vector Database\" section in the sidebar to create a database from uploaded documents.\r\n\r\n#### 7. Remove All Files\r\n\r\n- Click on the \"Remove Files\" section in the sidebar to remove all files in the data directory.\r\n\r\n#### Chat with Chatbot\r\n\r\n- Engage with a chatbot that can provide answers to questions based on the uploaded documents.\r\n\r\n### Text Summarization\r\n\r\n- In the Text Summarization task, simply enter text in the provided text area.\r\n- Click the \"Summarize\" button to generate a concise summarization of the input text.\r\n\r\n### Document QA Example:\r\n\r\nHere's an example of the Document Question-Answering task in action:\r\n![Document QA Example](https://github.com/saadkh1/DocQA-TextSummarization-App/blob/main/images/qa.png)\r\n\r\n### Text Summarization Example:\r\n\r\nAnd here's an example of the Text Summarization task in action:\r\n![Text Summarization Example](https://github.com/saadkh1/DocQA-TextSummarization-App/blob/main/images/summarization.png)\r\n\r\n## Installation and Running Locally\r\n\r\nTo use this Streamlit application, follow these steps:\r\n\r\n1. **Clone the repository and navigate to the project directory:**\r\n\r\n   ```bash\r\n   git clone https://github.com/saadkh1/DocQA-TextSummarization-App.git\r\n   ```\r\n   ```bash\r\n   cd DocQA-TextSummarization-App\r\n   ```\r\n\r\n2. **Install the required packages from the requirements.txt file:**\r\n\r\n    ```bash\r\n    pip install -r requirements.txt\r\n    ```\r\n\r\n3. **Download the necessary language models and embeddings by running the models.sh script:**\r\n\r\n    ```bash\r\n    sh models.sh\r\n    ```\r\n\r\n4. **Run the Streamlit app:**\r\n\r\n    ```bash\r\n    streamlit run app.py\r\n    ```\r\n5. **Open this URL in your browser:** http://localhost:8501/\r\n\r\n\r\n## Using Docker  \r\n\r\nAlternatively, you can use Docker to run the application in a container. Make sure you have Docker installed on your system. Follow these steps:\r\n\r\n1. **Clone the repository and navigate to the project directory:**\r\n\r\n   ```bash\r\n   git clone https://github.com/saadkh1/DocQA-TextSummarization-App.git\r\n   ```\r\n   ```bash\r\n   cd DocQA-TextSummarization-App\r\n   ```\r\n\r\n2. **Build the Docker image:**\r\n\r\n    ```bash\r\n    docker build -t qa-summrize-app:1.0 .\r\n    ```\r\n\r\n3. **Run the Docker container:**\r\n\r\n    ```bash\r\n    docker run -p 8501:8501 qa-summrize-app:1.0\r\n    ```\r\n\r\n4. **Open this URL in your browser:** http://localhost:8501/\r\n\r\n## Using Google Colab\r\n\r\nIf you prefer to use Google Colab, you can run the app using the provided app.ipynb notebook:\r\n\r\n1. **Open the app.ipynb notebook in Google Colab:**\r\n\r\n2. **Run all the cells in the notebook.**\r\n\r\nThe notebook will start the Streamlit app and expose it using ngrok. Follow the instructions in the notebook to access the app URL.\r\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsaadkh1%2Fdocqa-textsummarization-app","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsaadkh1%2Fdocqa-textsummarization-app","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsaadkh1%2Fdocqa-textsummarization-app/lists"}