{"id":25733677,"url":"https://github.com/tarunsingh2002/web-content-question-and-answer-tool","last_synced_at":"2026-04-19T04:35:18.542Z","repository":{"id":279490801,"uuid":"938946143","full_name":"TarunSingh2002/Web-Content-Question-And-Answer-Tool","owner":"TarunSingh2002","description":"A Python-based tool that answers questions using content from specified URLs. Combines web scraping, AI models (LLMs), and vector embeddings via Streamlit and LangChain/FAISS to deliver precise, context-aware responses.","archived":false,"fork":false,"pushed_at":"2025-02-25T20:21:46.000Z","size":14,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-02-25T21:25:48.664Z","etag":null,"topics":["beautifulsoup","embeddings","faiss-vector-database","gpt-4o","huggingface","huggingface-spaces","langchain","llm","python","streamlit","vector-database"],"latest_commit_sha":null,"homepage":"https://tarun-singh-web-content-question-and-answer-tool.hf.space/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/TarunSingh2002.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2025-02-25T18:42:42.000Z","updated_at":"2025-02-25T20:54:01.000Z","dependencies_parsed_at":"2025-02-25T21:25:53.812Z","dependency_job_id":"1fe36454-c3e5-4ade-b78e-4ee885762d86","html_url":"https://github.com/TarunSingh2002/Web-Content-Question-And-Answer-Tool","commit_stats":null,"previous_names":["tarunsingh2002/web-content-question-and-answer-tool"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TarunSingh2002%2FWeb-Content-Question-And-Answer-Tool","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TarunSingh2002%2FWeb-Content-Question-And-Answer-Tool/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TarunSingh2002%2FWeb-Content-Question-And-Answer-Tool/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TarunSingh2002%2FWeb-Content-Question-And-Answer-Tool/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/TarunSingh2002","download_url":"https://codeload.github.com/TarunSingh2002/Web-Content-Question-And-Answer-Tool/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TarunSingh2002%2FWeb-Content-Question-And-Answer-Tool/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":259114541,"owners_count":22807251,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["beautifulsoup","embeddings","faiss-vector-database","gpt-4o","huggingface","huggingface-spaces","langchain","llm","python","streamlit","vector-database"],"created_at":"2025-02-26T04:22:25.287Z","updated_at":"2026-04-19T04:35:18.498Z","avatar_url":"https://github.com/TarunSingh2002.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Web Content Question And Answer Tool 🔍\n\n**[Check out the live project here!](https://tarun-singh-web-content-question-and-answer-tool.hf.space/)**\n\nThe Web Content Question And Answer Tool is a powerful Python-based application that leverages web scraping, vector embeddings, and large language models (LLMs) to answer questions based on content extracted from one or more URLs. With an intuitive Streamlit interface and the robust capabilities of LangChain and FAISS, this tool provides concise, context-aware responses to your queries by analyzing the text from any web page you specify.\n\n\u003ctable border=\"2\" style=\"width:100%; border-collapse: collapse;\"\u003e\n  \u003ctr\u003e\n    \u003ctd\u003e\u003cimg src=\"https://github.com/user-attachments/assets/49784cb7-1d4d-4648-b168-53838202fecf\" alt=\"Project Image 1\" style=\"width:100%;\"\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n\u003c/table\u003e\n\n## Table of Contents\n\n1. [Features](#features)\n2. [Tech Stack](#tech-stack)\n3. [Project Structure](#project-structure)\n4. [Installation](#installation)\n5. [Usage](#usage)\n6. [Deployment](#deployment)\n7. [Configuration](#configuration)\n8. [License](#license)\n9. [Acknowledgments](#acknowledgments)\n\n## Features ✨\n\n- **Web Content Extraction**: Automatically scrapes and cleans content from multiple URLs.\n- **AI-Powered Q\u0026A**: Answers questions using GPT-4o model with RAG architecture.\n- **Document Processing**:\n  - Text splitting with overlap for context preservation.\n  - Vector embeddings using Hugging Face models.\n- **Efficient Retrieval**:\n  - FAISS vector store for fast similarity search.\n  - Context-aware answer generation.\n- **User-Friendly Interface**:\n  - Streamlit web interface.\n  - Real-time processing indicators.\n  - Error handling for invalid URLs.\n\n## Tech Stack 🛠️\n\n| Component               | Technology Used          |\n|-------------------------|--------------------------|\n| Frontend                | Streamlit                |\n| Language Model          | OpenAI GPT-4o            |\n| Embeddings              | HuggingFace Embeddings   |\n| Vector Store            | FAISS                    |\n| Web Scraping            | BeautifulSoup4           |\n| Text Processing         | LangChain                |\n| Deployment              | Hugging Face Spaces      |\n\n## Project Structure 📁\n\n```bash\n├── app.py                 \u003c- Main Streamlit application file.\n├── requirements.txt       \u003c- The requirements file for reproducing the environment.\n├── README.md              \u003c- The top-level README for developers using this project.\n├── .gitattributes         \u003c- Configuration files.       \n└── .gitignore             \u003c- Specifies which files to ignore in the version control.            \n\n```\n## Installation 💻\n\nTo set up and run the Web Content Question And Answer Tool on your local machine, follow these steps:\n\n1. **Clone the repository:**\n    ```bash\n    git clone https://github.com/TarunSingh2002/Web-Content-Question-And-Answer-Tool\n    cd Web-Content-Question-And-Answer-Tool\n    ```\n\n2. **Set up a virtual environment:**\n    ```bash\n    python -m venv venv\n    source venv/bin/activate  # Linux/MacOS\n    venv\\Scripts\\activate     # Windows\n    ```\n\n3. **Install the required dependencies:**\n    ```bash\n    pip install -r requirements.txt\n    ```\n\n4. **Set environment variables:**\n    ```bash\n    export OPENAI_API_KEY=\"your-openai-key\"\n    ```\n\n## Usage 🚀\n\n1. **Run the Streamlit application:**\n    ```bash\n    streamlit run app.py\n    ```\n\n2. **Open your browser and go to:**\n    ```\n    http://localhost:8501\n    ```\n\n3. **Interact with the Tool:**\n\n   - Input URLs: Enter one or more URLs (each on a new line) in the provided text area.\n   - Ask a Question: Type your query in the designated input field.\n   - Submit: Start the analysis by clicking the \"Get Answer\" button.\n   - View the Answer: The tool will display a succinct answer generated from the content of the provided URLs.\n    ```bash\n    URLs:\n    https://en.wikipedia.org/wiki/Large_language_model\n    https://www.geeksforgeeks.org/large-language-model-llm/\n\n    Question:\n    What are the main applications of LLMs?\n    ```\n\n\n## Deployment 🌐\n\n### Deploying to Hugging Face Spaces\n\n1. **Create new Space:**\n   - Select Streamlit template.\n   - Choose appropriate hardware (CPU Basic for testing).\n\n2. **Add secrets:**\n   - OPENAI → Your OpenAI API key\n\n3. **Deploy from repository:**\n   - Connect your Git repository.\n   - Push changes to trigger automatic deployment.\n\n\n\n## Configuration ⚙️\n\n### Environment Variables\n\n| Variable                | Description              | Required                 |                        \n|-------------------------|--------------------------|--------------------------|\n| OPENAI                  | OpenAI API key           | Yes                      |\n\n### Model Parameters\n```markdown\n    # LLM Settings\n     ChatOpenAI(\n        model='gpt-4o',\n        temperature=0.7,  # Creativity control (0-1)\n        max_tokens=100    # Response length limit\n    )\n\n    # Text Processing\n    CharacterTextSplitter(\n        chunk_size=500,    # Character limit per chunk\n        chunk_overlap=100  # Context preservation overlap\n    )\n```\n\n## License 📜\n```markdown\nMIT License\n\nCopyright (c) 2024 Tarun Singh\n\nPermission is hereby granted, free of charge, to any person obtaining a copy\nof this software and associated documentation files (the \"Software\"), to deal\nin the Software without restriction, including without limitation the rights\nto use, copy, modify, merge, publish, distribute, sublicense, and/or sell\ncopies of the Software, and to permit persons to whom the Software is\nfurnished to do so, subject to the following conditions:\n\nThe above copyright notice and this permission notice shall be included in all\ncopies or substantial portions of the Software.\n\nTHE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\nIMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\nFITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\nAUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\nLIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\nOUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE\nSOFTWARE.\n```\n\n## Acknowledgments 🙏\n\n- **OpenAI** for the GPT-4o language model\n- **Hugging Face** for embeddings and hosting\n- **LangChain** team for the RAG framework\n- **Streamlit** for the web interface framework\n- **Beautiful Soup** for web scraping capabilities\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftarunsingh2002%2Fweb-content-question-and-answer-tool","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftarunsingh2002%2Fweb-content-question-and-answer-tool","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftarunsingh2002%2Fweb-content-question-and-answer-tool/lists"}