{"id":13455922,"url":"https://github.com/prvnsingh/LLM-WebToGraph","last_synced_at":"2025-03-24T09:31:09.052Z","repository":{"id":205334694,"uuid":"712270254","full_name":"prvnsingh/LLM-WebToGraph","owner":"prvnsingh","description":"It is project which uses transformer to scrape the web and LLM to retrieve the identity from the text and store it in neo4j.","archived":false,"fork":false,"pushed_at":"2024-02-27T16:19:23.000Z","size":5400,"stargazers_count":36,"open_issues_count":0,"forks_count":6,"subscribers_count":2,"default_branch":"main","last_synced_at":"2024-10-28T23:33:25.676Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/prvnsingh.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-10-31T06:11:11.000Z","updated_at":"2024-09-29T21:52:24.000Z","dependencies_parsed_at":"2023-11-29T08:28:54.303Z","dependency_job_id":"3cbe40f6-69a7-4a01-a9d5-e9deaf117452","html_url":"https://github.com/prvnsingh/LLM-WebToGraph","commit_stats":null,"previous_names":["prvnsingh/llm-webtograph"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/prvnsingh%2FLLM-WebToGraph","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/prvnsingh%2FLLM-WebToGraph/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/prvnsingh%2FLLM-WebToGraph/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/prvnsingh%2FLLM-WebToGraph/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/prvnsingh","download_url":"https://codeload.github.com/prvnsingh/LLM-WebToGraph/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245243213,"owners_count":20583582,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-07-31T08:01:13.516Z","updated_at":"2025-03-24T09:31:07.742Z","avatar_url":"https://github.com/prvnsingh.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"# LLM-WebToGraph\n\nLLM-WebToGraph is a powerful project that harnesses the capabilities of Langchain and OpenAI's Language Models (LLMs) to scrape data from various sources on the web, transforming it into a structured knowledge graph. This knowledge graph is then populated into a Neo4j Aura Database, providing an efficient way to store, query, and retrieve information using cypher query and LLMs. With the synergy of Langchain, OpenAI LLMs, and Neo4j, this project offers a robust solution for knowledge management and retrieval.\n\n## Architecture\n![design](https://github.com/prvnsingh/LLM-WebToGraph/blob/main/design.jpeg?raw=true)\n\n\n## Overview\n\nThe LLM-WebToGraph project combines several key components to achieve its goal:\n\n1. **Langchain:** A language model designed for natural language understanding and generation, powering the core of the project.\n\n2. **OpenAI's Language Models (LLMs):** These models are used to extract and process data from various sources, converting unstructured data into structured knowledge.\n\n3. **Neo4j Aura Database:** The project stores the structured knowledge graph in a Neo4j Aura Database, allowing for efficient storage and retrieval.\n\n4. **FastAPI:** To expose an API for interacting with the project and to check its health status.\n\n5. **Streamlit:** For building a user-friendly interface to query and visualize the knowledge graph.\n\n## Features\n\n- Web scraping from various sources, such as web links and CSV files.\n- Data transformation and extraction using OpenAI LLM (gpt-3.5-turbo).\n- Population of a structured knowledge graph in Neo4j Aura Database.\n- FastAPI-based health check API to monitor the application's status.\n- Streamlit web application for querying and visualizing the knowledge graph.\n\n## Getting Started\n1. Configuring the data sources\n   - Update the data files .csv in the data directory.\n   - Update the links of html in datasource.yml\n2. Setup environment variables\n   - Add credentials in .env file like openAI api key and neo4jDB password or add environment variables.\n\n3. Configure the schema.yml for identities and relationships\n   - Modify the schema.yml to specify the identities to be recognized.\n4. Run the streamlit UI and FASTAPI app.\n   - build docker and run the image with env file\n~~~sh\n   sudo docker run --env-file .env -p 8501:8501 -p 8000:8000 image_name \n~~~\nTo access the application\n~~~html\nhttp://localhost:8501/\n~~~\n\nTo check backend APIs, access the swagger at\n```html\nhttp://localhost:8000/docs\n```\n## Working directory\n![Directory Tree](https://github.com/prvnsingh/LLM-WebToGraph/blob/main/dirTree.jpg?raw=true)\n\n## Demo snapshot\n![Demo snapshot](https://github.com/prvnsingh/LLM-WebToGraph/blob/main/working.jpg?raw=true)\n\n## Contributing\n\nContributions to the LLM-WebToGraph project are welcome! If you'd like to contribute, please follow these guidelines:\n\n- Fork the repository.\n- Create a new branch for your feature or bug fix.\n- Make your changes and ensure tests pass.\n- Submit a pull request.\n\n## Future Scope\nIn the future, the project can be extended with a microservices architecture, including:\n\nA separate data service responsible for ingesting data from S3.\nUtilization of a Selenium bot to scrape the web and download CSV files.\nIntegration with more data sources for enhanced knowledge graph creation.\n\n## References\n- [Langchain Graph Transformer Documentation](https://python.langchain.com/docs/use_cases/graph/diffbot_graphtransformer)\n- [Langchain Cypher Query Documentation](https://python.langchain.com/docs/use_cases/graph/graph_cypher_qa)\n- [Blog Post: Constructing Knowledge Graphs from Text](https://blog.langchain.dev/constructing-knowledge-graphs-from-text-using-openai-functions/)\n\n## License\n\nThis project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.\n\n## Contact\n\nFor questions or support, feel free to contact us at [prvns1997@gmail.com](mailto:prvns1997@email.com).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fprvnsingh%2FLLM-WebToGraph","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fprvnsingh%2FLLM-WebToGraph","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fprvnsingh%2FLLM-WebToGraph/lists"}