{"id":14959401,"url":"https://github.com/derwenai/erkg","last_synced_at":"2025-05-02T12:31:39.276Z","repository":{"id":231579326,"uuid":"774007477","full_name":"DerwenAI/ERKG","owner":"DerwenAI","description":"Demonstrate integration of Senzing and Neo4j to construct an Entity Resolved Knowledge Graph","archived":false,"fork":false,"pushed_at":"2024-08-14T02:17:33.000Z","size":14534,"stargazers_count":30,"open_issues_count":0,"forks_count":6,"subscribers_count":4,"default_branch":"main","last_synced_at":"2025-04-07T02:03:57.968Z","etag":null,"topics":["compliance","cypher","data-integration","entity-resolved-knowlege-graph","entity-resoultion","graph-analytics","graph-data-science","graph-database","graph-visualization","knowledge-graph","neo4j","open-data","record-linking","safegraph","senzing-community"],"latest_commit_sha":null,"homepage":"https://neo4j.com/developer-blog/entity-resolved-knowledge-graphs/","language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/DerwenAI.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-03-18T19:28:59.000Z","updated_at":"2025-04-06T00:17:49.000Z","dependencies_parsed_at":"2024-05-05T05:29:33.868Z","dependency_job_id":"aea9c907-1ae9-413d-9b14-c0d2f5a0c0f0","html_url":"https://github.com/DerwenAI/ERKG","commit_stats":{"total_commits":47,"total_committers":2,"mean_commits":23.5,"dds":0.08510638297872342,"last_synced_commit":"45b6036ebec557cac15e1d71d13b345836dba04a"},"previous_names":["derwenai/erkg"],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DerwenAI%2FERKG","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DerwenAI%2FERKG/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DerwenAI%2FERKG/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DerwenAI%2FERKG/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/DerwenAI","download_url":"https://codeload.github.com/DerwenAI/ERKG/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252038179,"owners_count":21684640,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["compliance","cypher","data-integration","entity-resolved-knowlege-graph","entity-resoultion","graph-analytics","graph-data-science","graph-database","graph-visualization","knowledge-graph","neo4j","open-data","record-linking","safegraph","senzing-community"],"created_at":"2024-09-24T13:19:37.106Z","updated_at":"2025-05-02T12:31:36.843Z","avatar_url":"https://github.com/DerwenAI.png","language":null,"funding_links":[],"categories":[],"sub_categories":[],"readme":"# Entity Resolved Knowledge Graphs\n\nThis hands-on tutorial in Python demonstrates integration of\n[Senzing](https://github.com/Senzing) and [Neo4j](https://github.com/neo4j)\nto construct an\n[_Entity Resolved Knowledge Graph_](https://senzing.com/entity-resolved-knowledge-graphs/):\n\n  1. Use three datasets describing businesses in Las Vegas: ~85K records, ~2% duplicates.\n  2. Run _entity resolution_ in Senzing to resolve duplicate business names and addresses.\n  3. Parse results to construct a _knowledge graph_ in Neo4j.\n  4. Analyze and visualize the _entity resolved knowledge graph_.\n\nWe'll walk through example code based on Neo4j Desktop and the\n[Graph Data Science](https://github.com/neo4j/graph-data-science-client)\n(GDS) library to run Cypher queries on the graph,\npreparing data for downstream analysis and visualizations with\n[Jupyter](https://jupyter.org/),\n[Pandas](https://pandas.pydata.org/),\n[Seaborn](https://seaborn.pydata.org/),\n[PyVis](https://pyvis.readthedocs.io/en/latest/).\n\nThe code is simple to download and easy to follow, and presented so\nyou can try it with your own data.\nOverall, this tutorial takes about 35 minutes total to run.\n\n![Before and After](article/before_after.png)\n\nWhy?\nFor one example, popular use of _retrieval augmented generation_ (RAG)\nto make AI applications more robust has boosted recent interest in KGs.\nWhen the entities, relations, and properties in a KG leverage your\ndomain-specific data to strengthen your AI app ... compliance issues\nand audits rush to the foreground.\n\nTL;DR: sense-making of the data coming from a connected world.\nDuring the transition from data integration to KG construction,\nyou need to make sure the entities in your graph get resolved correctly.\nOtherwise, your AI app downstream will struggle with the kinds of details\nthat make people get concerned, very concerned, very quickly:\ne.g., billing, deliveries, voter registration, crucial medical details,\ncredit reporting, industrial safety, security, and so on.\n\nHighly recommended:\n  - [\"Entity Resolved Knowledge Graphs\"](https://senzing.com/entity-resolved-knowledge-graphs/)\n  - [\"Analytics on Entity Resolved Knowledge Graphs\"](https://youtu.be/ZgK5YHNixTM), Mel Richey (2023)\n\n\n## Prerequisites\n\nIn this tutorial we'll work in two environments.\nThe configuration and coding are at a level which should be comfortable\nfor most people working in data science.\nYou'll need to have familiarity with how to:\n\n  - clone a public repo from GitHub\n  - launch a server in the cloud\n  - use Linux command lines\n  - write some code in Python\n\nTotal estimated project time: 35 minutes.\n\nCloud computing budget: running Senzing in this tutorial cost a total\nof $0.04 USD.\n\n\n## Set up local environment\n\nAfter cloning this repo, connect into the `ERKG` directory and set up\nyour local environment:\n\n```bash\ngit clone https://github.com/DerwenAI/ERKG.git\ncd ERKG\n\npython3.11 -m venv venv\nsource venv/bin/activate\n\npython3 -m pip install -U pip wheel setuptools\npython3 -m pip install -r requirements.txt \n```\n\nWe're using Python 3.11 here, although this code should run with most\nof the recent Python 3.x versions.\n\n\n## Run the tutorial notebooks\n\nFirst, launch Jupyter:\n\n```bash\n./venv/bin/jupyter lab\n```\n\nThen based on the [tutorial](TBD), follow the steps shown in these notebooks:\n\n  1. [`examples/datasets.ipynb`](examples/datasets.ipynb)\n  2. [`examples/graph.ipynb`](examples/graph.ipynb)\n  3. [`examples/impact.ipynb`](examples/impact.ipynb)\n\nYou can view the results --\nan interactive visualization of the entity resolved knowledge graph --\nby loading [`examples/big_vegas.2.html`](examples/big_vegas.2.html)\nin a web browser.\nThe full HTML+JavaScript is large and may take several minutes to load.\n\n\n## Deleting data\n\nIf you need to clear the database and start over, run this in Neo4j Desktop:\n\n```cypher\nMATCH (n)\nCALL {\n  WITH n\n  DETACH DELETE n\n} IN TRANSACTIONS\n```\n\nSee: \u003chttps://neo4j.com/docs/cypher-manual/current/subqueries/subqueries-in-transactions/#delete-with-call-in-transactions\u003e\n\n\n## Kudos\n\nMany thanks to:\n[@akollegger](https://github.com/akollegger),\n[@brianmacy](https://github.com/brianmacy)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fderwenai%2Ferkg","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fderwenai%2Ferkg","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fderwenai%2Ferkg/lists"}