Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/julesbelveze/concepcy
💫 SpaCy wrapper for ConceptNet 💫
https://github.com/julesbelveze/concepcy
conceptnet nlp spacy
Last synced: 2 months ago
JSON representation
💫 SpaCy wrapper for ConceptNet 💫
- Host: GitHub
- URL: https://github.com/julesbelveze/concepcy
- Owner: JulesBelveze
- License: mit
- Created: 2022-07-21T06:47:05.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2023-08-17T07:08:27.000Z (over 1 year ago)
- Last Synced: 2024-10-14T04:03:33.615Z (2 months ago)
- Topics: conceptnet, nlp, spacy
- Language: Python
- Homepage: https://julesbelveze.github.io/concepcy/
- Size: 951 KB
- Stars: 88
- Watchers: 3
- Forks: 4
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# concepCy
[![PyPI version](https://badge.fury.io/py/concepCy.svg)](https://pypi.org/project/concepCy/)
[![github actions docs](https://github.com/JulesBelveze/concepcy/actions/workflows/documentation.yaml/badge.svg)](https://julesbelveze.github.io/concepcy/)
[![demo status](https://img.shields.io/website-up-down-green-red/https/hf.space/gradioiframe/JulesBelveze/concepcy/+.svg?label=demo%20status)](https://huggingface.co/spaces/JulesBelveze/concepcy)`concepCy` is a spaCy wrapper for [ConceptNet](https://conceptnet.io/), a freely-available semantic network designed to
help computers understand the meaning of words.`concepCy` allows you to query [ConceptNet.io](https://conceptnet.io/) to extract word meanings directly from the
resource itself.# Install
You can install `concepCy` via pip:
```
pip install concepcy
```Alternatively you can directly clone the repository and install it using [poetry](https://python-poetry.org/docs/) by
running the following:```
git clone https://github.com/JulesBelveze/concepcy.git
cd concepcy
poetry install
```## Getting Started
To get started you need to install of one the pre-trained spaCy model available [here](https://spacy.io/models).
In `ConceptNet` words are represented as `Node` and relations between words as `Edge`. \
The `Node` object contains the following attributes:* `id`: where you can look up all the information about that word
* `label`: which may be a more complete phrase such as "an example" instead of just the word "example" that appears in
the URI.
* `language`: code for what language the `label` is in
* `term`: a link to the most general version of this term. In many cases this is just the same URI.The `Edge` object features the following attributes:
* `start`: starting `Node`
* `end`: ending `Node`
* `relation`: name of the relation for those two nodes
* `text`: some of ConceptNet's data is extracted from text, `text` shows you what this text was
* `weight`: how believable the information is### Simple start
In this case we will simply be interested in the *RelatedTo* relations between words.
```python
import spacy
import concepcynlp = spacy.load("en_core_web_sm")
nlp.add_pipe("concepcy")doc = nlp("WHO is a lovely company")
# Access all the "RelatedTo" relations from the Doc
print("--- All the 'RelatedTo' relations from the Doc ---")
for word, relations in doc._.relatedto.items():
print(f"Word: '{word}'\n{relations}")# Access the "RelatedTo" relations word by word
print("--- The 'RelatedTo' relations word by word ---")
for token in doc:
print(f"Word: '{token}'\n{token._.relatedto}\n")
``````bash
--- All the 'RelatedTo' relations from the Doc ---
Word: 'company'
[{'start': {'id': '/c/en/company', 'type': 'Node', 'label': 'company', 'language': 'en', 'term': '/c/en/company'}, 'end': {'id': '/c/en/business', 'type': 'Node', 'label': 'business', 'language': 'en', 'term': '/c/en/business'}, 'relation': 'RelatedTo', 'text': '[[company]] is related to [[business]]', 'weight': 6.424017434596516}, {'start': {'id': '/c/en/company', 'type': 'Node', 'label': 'company', 'language': 'en', 'term': '/c/en/company'}, 'end': {'id': '/c/en/corporation', 'type': 'Node', 'label': 'corporation', 'language': 'en', 'term': '/c/en/corporation'}, 'relation': 'RelatedTo', 'text': '[[company]] is related to [[corporation]]', 'weight': 4.432155231938521}, {'start': {'id': '/c/en/company', 'type': 'Node', 'label': 'company', 'language': 'en', 'term': '/c/en/company'}, 'end': {'id': '/c/en/organization', 'type': 'Node', 'label': 'organization', 'language': 'en', 'term': '/c/en/organization'}, 'relation': 'RelatedTo', 'text': '[[company]] is related to [[organization]]', 'weight': 4.259107887809371}]--- The 'RelatedTo' relations word by word ---
Word: 'WHO'
[]Word: 'is'
[]Word: 'a'
[]Word: 'lovely'
[]Word: 'company'
[{'start': {'id': '/c/en/company', 'type': 'Node', 'label': 'company', 'language': 'en', 'term': '/c/en/company'}, 'end': {'id': '/c/en/business', 'type': 'Node', 'label': 'business', 'language': 'en', 'term': '/c/en/business'}, 'relation': 'RelatedTo', 'text': '[[company]] is related to [[business]]', 'weight': 6.424017434596516}, {'start': {'id': '/c/en/company', 'type': 'Node', 'label': 'company', 'language': 'en', 'term': '/c/en/company'}, 'end': {'id': '/c/en/corporation', 'type': 'Node', 'label': 'corporation', 'language': 'en', 'term': '/c/en/corporation'}, 'relation': 'RelatedTo', 'text': '[[company]] is related to [[corporation]]', 'weight': 4.432155231938521}, {'start': {'id': '/c/en/company', 'type': 'Node', 'label': 'company', 'language': 'en', 'term': '/c/en/company'}, 'end': {'id': '/c/en/organization', 'type': 'Node', 'label': 'organization', 'language': 'en', 'term': '/c/en/organization'}, 'relation': 'RelatedTo', 'text': '[[company]] is related to [[organization]]', 'weight': 4.259107887809371}]
```### Custom configuration
One can customize the `concepcy` wrapper by changing the default value of the config. The two parameters of interest
are:* `relations_of_interest: List[str]`: ConceptNet currently support 34 word-relations. Some of them might not be needed
for your use case. To only keep the ones needed pass a list of all the relations you want to keep (see all relations
available [here](https://github.com/commonsense/conceptnet5/wiki/Relations)). Each relation then becomes an extension.
* `filter_edge_fct: Callable[Edge]`: Conceptnet is a crowd-sourced resource, meaning that some information might be more
relevant than others. To only keep reliable relations you can pass a function that will take an `Edge` as input and
will return a boolean indicating whether to filter that edge or not.```python
import spacy
import concepcynlp = spacy.load("en_core_web_sm")
nlp.add_pipe(
"concepcy",
config={
"relations_of_interest": ["MotivatedByGoal", "CapableOf"],
"filter_edge_weight": 3.0,
"filter_missing_text": True,
"as_dict": False
}
)
```# Documentation 📚
📄 The whole documentation along with design decisions and examples can be
found [here](https://julesbelveze.github.io/concepcy/)🎮 A simple demo on how to use concepCy can be found [here](https://huggingface.co/spaces/JulesBelveze/concepcy)
# References
* [ConceptNet 5.5: An Open Multilingual Graph of General Knowledge](https://arxiv.org/abs/1612.03975)