Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/dobraczka/forayer
forayer is a library of first aid utilities for knowledge graph exploration with an entity centric approach.
https://github.com/dobraczka/forayer
data-integration entity-resolution knowledge-graph
Last synced: about 2 months ago
JSON representation
forayer is a library of first aid utilities for knowledge graph exploration with an entity centric approach.
- Host: GitHub
- URL: https://github.com/dobraczka/forayer
- Owner: dobraczka
- License: mit
- Created: 2021-08-02T12:21:26.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2024-03-05T15:10:39.000Z (10 months ago)
- Last Synced: 2024-08-09T14:05:31.509Z (5 months ago)
- Topics: data-integration, entity-resolution, knowledge-graph
- Language: Jupyter Notebook
- Homepage: https://forayer.readthedocs.io/en/latest/
- Size: 1.39 MB
- Stars: 6
- Watchers: 3
- Forks: 0
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
Awesome Lists containing this project
README
forayer
About
=====
Forayer is a library of **f**irst aid utilities for kn**o**wledge g**r**aph explor**a**tion with an entit**y** c**e**ntric app**r**oach.
It is intended to make data integration of knowledge graphs easier. With entities as first class citizens forayer is a toolset to aid in knowledge graph exploration for data integration and specifically entity resolution.You can easily load pre-existing entity resolution tasks:
```python
>>> from forayer.datasets import OpenEADataset
>>> ds = OpenEADataset(ds_pair="D_W",size="15K",version=1)
>>> ds.er_task
ERTask({DBpedia: (# entities: 15000, # entities_with_rel: 15000, # rel: 13359,
# entities_with_attributes: 13782, # attributes: 13782, # attr_values: 24995),
Wikidata: (# entities: 15000, # entities_with_rel: 15000, # rel: 13554,
# entities_with_attributes: 14376, # attributes: 14376, # attr_values: 114107)},
ClusterHelper(# elements:30000, # clusters:15000))
```This entity resolution task holds 2 knowledge graphs and a cluster of known matches. You can search in knowledge graphs:
```python
>>> ds.er_task["DBpedia"].search("Dorothea")
KG(entities={'http://dbpedia.org/resource/E801200':
{'http://dbpedia.org/ontology/activeYearsStartYear': '"1948"^^',
'http://dbpedia.org/ontology/activeYearsEndYear': '"2008"^^',
'http://dbpedia.org/ontology/birthName': 'Dorothea Carothers Allen',
'http://dbpedia.org/ontology/alias': 'Allen, Dorothea Carothers',
'http://dbpedia.org/ontology/birthYear': '"1923"^^',
'http://purl.org/dc/elements/1.1/description': 'Film editor',
'http://dbpedia.org/ontology/birthDate': '"1923-12-03"^^',
'http://dbpedia.org/ontology/deathDate': '"2010-04-17"^^',
'http://dbpedia.org/ontology/deathYear': '"2010"^^'}}, rel={}, name=DBpedia)
```Decide to work with a smaller snippet of the resolution task:
```python
>>> ert_sample = ds.er_task.sample(100)
>>> ert_sample
ERTask({DBpedia: (# entities: 100, # entities_with_rel: 6, # rel: 4,
# entities_with_attributes: 99, # attributes: 99, # attr_values: 274),
Wikidata: (# entities: 100, # entities_with_rel: 4, # rel: 4,
# entities_with_attributes: 100, # attributes: 100, # attr_values: 797)},
ClusterHelper(# elements:200, # clusters:100))
```And much more can be found in the [user guide](https://forayer.readthedocs.io/en/latest/source/user_guide.html).
Installation
============You can install forayer via pip:
```bash
pip install forayer
```