{"id":26890992,"url":"https://github.com/nextgencodes/streamlit_llm_website_interaction","last_synced_at":"2025-03-31T22:06:39.855Z","repository":{"id":284165507,"uuid":"954040634","full_name":"nextgencodes/streamlit_llm_website_interaction","owner":"nextgencodes","description":"Ask any questions with google api ","archived":false,"fork":false,"pushed_at":"2025-03-24T14:29:57.000Z","size":4,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-24T14:30:48.104Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"https://website-interaction.streamlit.app/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/nextgencodes.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2025-03-24T13:26:09.000Z","updated_at":"2025-03-24T14:30:01.000Z","dependencies_parsed_at":"2025-03-24T14:41:26.890Z","dependency_job_id":null,"html_url":"https://github.com/nextgencodes/streamlit_llm_website_interaction","commit_stats":null,"previous_names":["nextgencodes/streamlit_llm_website_interaction"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nextgencodes%2Fstreamlit_llm_website_interaction","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nextgencodes%2Fstreamlit_llm_website_interaction/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nextgencodes%2Fstreamlit_llm_website_interaction/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nextgencodes%2Fstreamlit_llm_website_interaction/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/nextgencodes","download_url":"https://codeload.github.com/nextgencodes/streamlit_llm_website_interaction/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246547366,"owners_count":20794970,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-03-31T22:06:39.252Z","updated_at":"2025-03-31T22:06:39.839Z","avatar_url":"https://github.com/nextgencodes.png","language":"Python","readme":"# Web Content Q\u0026A Tool\n\n[![Streamlit App](https://static.streamlit.io/badges/streamlit_badge_black_white.svg)](https://website-interaction.streamlit.app/)\n\n**[Deployed Application Link: https://website-interaction.streamlit.app/](https://website-interaction.streamlit.app/)**\n\n## Overview\n\nThis is a simple web-based tool built with Streamlit, Langchain, and Google Gemini API that allows you to ask questions about the content of websites you provide.  The tool is designed to answer questions *solely* based on the information scraped from the URLs you input, without relying on general world knowledge.\n\n**Key Features:**\n\n*   **URL Input:**  Users can enter one or more website URLs in a text area.\n*   **Content Ingestion:**  The tool scrapes the text content from the provided URLs. It also supports ingesting content from `sitemap.xml` files for broader website coverage.\n*   **Question Answering:** Users can ask questions related to the ingested website content.\n*   **Accurate Answers:** Answers are generated using Google's Gemini Pro model and are grounded strictly in the scraped website content.\n*   **Simple UI:**  A user-friendly Streamlit interface with clear input fields and buttons.\n*   **Two Ingestion Modes:**\n    *   **Ingest URLs:** Processes content from the URLs directly entered by the user.\n    *   **Ingest all subdomains:**  Attempts to find and process content from the `sitemap.xml` of each provided URL, potentially covering more pages of the website.\n*   **Persistent Vector Store:** The ingested website content is vectorized and stored, allowing you to ask multiple questions without re-ingesting the URLs each time.\n\n## Evaluation Criteria\n\nThis project was built with the following evaluation criteria in mind:\n\n*   **Relevance \u0026 Accuracy of answers:**  Answers should be directly relevant to the ingested website content and factually accurate based on that content alone.\n*   **UI/UX:** The user interface should be straightforward, intuitive, and easy to use for anyone.\n*   **Implementation Clarity:** The codebase should be well-organized, commented, and maintainable for future modifications or understanding.\n\n## How to Run Locally\n\nFollow these steps to run the Web Content Q\u0026A Tool on your local machine:\n\n1.  **Prerequisites:**\n    *   **Python 3.8 or higher** must be installed on your system.\n    *   **Pip** (Python package installer) should be installed.\n\n2.  **Install Python Libraries:**\n    Open your terminal or command prompt and run the following command to install the necessary Python libraries:\n\n    ```bash\n    pip install -r requirements.txt\n    ```\n\n3.  **Get a Google AI Studio API Key:**\n    *   Go to [Google AI Studio](https://makersuite.google.com/app/apikey) and create a project.\n    *   Generate an API key for the Gemini API within your project.\n    *   **Important Security Note:**  For local testing, you will enter this API key directly into the application's text box. **This is NOT a secure method for production deployments.**\n\n4.  **Run the Streamlit App:**\n    Navigate to the directory where you saved the `app.py` file in your terminal. Run the Streamlit application using the command:\n\n    ```bash\n    streamlit run app.py\n    ```\n\n5.  **Access the App in Your Browser:**\n    Streamlit will provide a local URL in your terminal (usually `http://localhost:8501`). Open this URL in your web browser to access the Web Content Q\u0026A Tool.\n\n    *   **Enter your Google AI Studio API key** into the provided text box.\n    *   **Enter the website URLs** you want to query (one URL per line).\n    *   Click either **\"Ingest URLs\"** or **\"Ingest all subdomains\"** to process the website content.\n    *   **Ask your question** in the question input box.\n    *   Click **\"Ask Question\"** to get your answer.\n\n## Deployment (and Security Considerations)\n\n**Warning: Entering your API key directly into the code is highly insecure and is only recommended for local testing.**  **Do not use this method for production or publicly accessible deployments.**\n\nFor secure deployment, especially if you are using Streamlit Cloud or other platforms, you should use secure methods to manage your API keys, such as:\n\n*   **Streamlit Secrets (Recommended for Streamlit Cloud):**\n    1.  In your Streamlit Cloud app settings, define a secret named `GOOGLE_API_KEY` and paste your actual API key as the value.\n    2.  In your `app.py` code, uncomment the API key input text box or replace it with the following line to load the API key from secrets:\n\n        ```python\n        api_key = st.secrets[\"GOOGLE_API_KEY\"]\n        ```\n        And revert the `initialize_llm`, `initialize_embeddings`, and `ingest_urls` functions back to using `GOOGLE_API_KEY` directly instead of passing it as an argument.\n    3.  Deploy your application to Streamlit Cloud.\n\n*   **Environment Variables (For other hosting platforms):** Configure your hosting environment to set an environment variable named `GOOGLE_API_KEY` with your API key value. Access it in your Python code using `os.environ.get(\"GOOGLE_API_KEY\")`.\n\n**Steps for Streamlit Cloud Deployment (using Secrets):**\n\n1.  **Push your code to a GitHub repository.**\n2.  **Sign up for Streamlit Cloud** at [streamlit.io/cloud](https://streamlit.io/cloud).\n3.  **Connect your GitHub repository to Streamlit Cloud.**\n4.  **Set up your API Key as a Secret in Streamlit Cloud:** In your Streamlit Cloud app's settings, add a secret named `GOOGLE_API_KEY` and paste your Gemini API key as the value.\n5.  **Revert your code/uncomment the st.secrets part to use `st.secrets[\"GOOGLE_API_KEY\"]`** for secure API key loading (as mentioned above).\n6.  **Deploy your app from your GitHub repository in Streamlit Cloud.**\n\n## Source Code\n\n[Link to your GitHub Repository will be here]\n\n## Note\n\nThis is a basic implementation of a Web Content Q\u0026A Tool and can be further enhanced. Potential future improvements could include:\n\n*   More robust error handling and user feedback.\n*   Improved UI/UX design.\n*   More advanced text processing and chunking strategies for better content ingestion.\n*   Exploration of different Langchain chain types and retrieval methods for optimized question answering.\n*   Support for different document loaders and file types.\n*   Addition of OpenAI api as alternate to google gemini api\n\n---\n\nFeel free to contribute to this project or use it as a starting point for your own web content analysis tools!","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnextgencodes%2Fstreamlit_llm_website_interaction","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fnextgencodes%2Fstreamlit_llm_website_interaction","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnextgencodes%2Fstreamlit_llm_website_interaction/lists"}