Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/derwenai/erkg
Demonstrate integration of Senzing and Neo4j to construct an Entity Resolved Knowledge Graph
https://github.com/derwenai/erkg
compliance cypher data-integration entity-resolved-knowlege-graph entity-resoultion graph-analytics graph-data-science graph-database graph-visualization knowledge-graph neo4j open-data record-linking safegraph senzing-community
Last synced: 2 months ago
JSON representation
Demonstrate integration of Senzing and Neo4j to construct an Entity Resolved Knowledge Graph
- Host: GitHub
- URL: https://github.com/derwenai/erkg
- Owner: DerwenAI
- License: mit
- Created: 2024-03-18T19:28:59.000Z (10 months ago)
- Default Branch: main
- Last Pushed: 2024-08-14T02:17:33.000Z (5 months ago)
- Last Synced: 2024-11-12T17:03:01.764Z (2 months ago)
- Topics: compliance, cypher, data-integration, entity-resolved-knowlege-graph, entity-resoultion, graph-analytics, graph-data-science, graph-database, graph-visualization, knowledge-graph, neo4j, open-data, record-linking, safegraph, senzing-community
- Homepage: https://neo4j.com/developer-blog/entity-resolved-knowledge-graphs/
- Size: 13.9 MB
- Stars: 26
- Watchers: 3
- Forks: 6
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Entity Resolved Knowledge Graphs
This hands-on tutorial in Python demonstrates integration of
[Senzing](https://github.com/Senzing) and [Neo4j](https://github.com/neo4j)
to construct an
[_Entity Resolved Knowledge Graph_](https://senzing.com/entity-resolved-knowledge-graphs/):1. Use three datasets describing businesses in Las Vegas: ~85K records, ~2% duplicates.
2. Run _entity resolution_ in Senzing to resolve duplicate business names and addresses.
3. Parse results to construct a _knowledge graph_ in Neo4j.
4. Analyze and visualize the _entity resolved knowledge graph_.We'll walk through example code based on Neo4j Desktop and the
[Graph Data Science](https://github.com/neo4j/graph-data-science-client)
(GDS) library to run Cypher queries on the graph,
preparing data for downstream analysis and visualizations with
[Jupyter](https://jupyter.org/),
[Pandas](https://pandas.pydata.org/),
[Seaborn](https://seaborn.pydata.org/),
[PyVis](https://pyvis.readthedocs.io/en/latest/).The code is simple to download and easy to follow, and presented so
you can try it with your own data.
Overall, this tutorial takes about 35 minutes total to run.![Before and After](article/before_after.png)
Why?
For one example, popular use of _retrieval augmented generation_ (RAG)
to make AI applications more robust has boosted recent interest in KGs.
When the entities, relations, and properties in a KG leverage your
domain-specific data to strengthen your AI app ... compliance issues
and audits rush to the foreground.TL;DR: sense-making of the data coming from a connected world.
During the transition from data integration to KG construction,
you need to make sure the entities in your graph get resolved correctly.
Otherwise, your AI app downstream will struggle with the kinds of details
that make people get concerned, very concerned, very quickly:
e.g., billing, deliveries, voter registration, crucial medical details,
credit reporting, industrial safety, security, and so on.Highly recommended:
- ["Entity Resolved Knowledge Graphs"](https://senzing.com/entity-resolved-knowledge-graphs/)
- ["Analytics on Entity Resolved Knowledge Graphs"](https://youtu.be/ZgK5YHNixTM), Mel Richey (2023)## Prerequisites
In this tutorial we'll work in two environments.
The configuration and coding are at a level which should be comfortable
for most people working in data science.
You'll need to have familiarity with how to:- clone a public repo from GitHub
- launch a server in the cloud
- use Linux command lines
- write some code in PythonTotal estimated project time: 35 minutes.
Cloud computing budget: running Senzing in this tutorial cost a total
of $0.04 USD.## Set up local environment
After cloning this repo, connect into the `ERKG` directory and set up
your local environment:```bash
git clone https://github.com/DerwenAI/ERKG.git
cd ERKGpython3.11 -m venv venv
source venv/bin/activatepython3 -m pip install -U pip wheel setuptools
python3 -m pip install -r requirements.txt
```We're using Python 3.11 here, although this code should run with most
of the recent Python 3.x versions.## Run the tutorial notebooks
First, launch Jupyter:
```bash
./venv/bin/jupyter lab
```Then based on the [tutorial](TBD), follow the steps shown in these notebooks:
1. [`examples/datasets.ipynb`](examples/datasets.ipynb)
2. [`examples/graph.ipynb`](examples/graph.ipynb)
3. [`examples/impact.ipynb`](examples/impact.ipynb)You can view the results --
an interactive visualization of the entity resolved knowledge graph --
by loading [`examples/big_vegas.2.html`](examples/big_vegas.2.html)
in a web browser.
The full HTML+JavaScript is large and may take several minutes to load.## Deleting data
If you need to clear the database and start over, run this in Neo4j Desktop:
```cypher
MATCH (n)
CALL {
WITH n
DETACH DELETE n
} IN TRANSACTIONS
```See:
## Kudos
Many thanks to:
[@akollegger](https://github.com/akollegger),
[@brianmacy](https://github.com/brianmacy)