Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/wjbmattingly/bagpipes-spacy
Bagpipes spaCy is a collection of custom spaCy pipeline components designed to enhance text processing capabilities.
https://github.com/wjbmattingly/bagpipes-spacy
nlp spacy
Last synced: 4 months ago
JSON representation
Bagpipes spaCy is a collection of custom spaCy pipeline components designed to enhance text processing capabilities.
- Host: GitHub
- URL: https://github.com/wjbmattingly/bagpipes-spacy
- Owner: wjbmattingly
- Created: 2023-08-26T17:01:53.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-08-15T01:00:03.000Z (6 months ago)
- Last Synced: 2024-09-27T20:23:25.520Z (4 months ago)
- Topics: nlp, spacy
- Language: Python
- Homepage:
- Size: 197 KB
- Stars: 8
- Watchers: 2
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
[![GitHub Stars](https://img.shields.io/github/stars/wjbmattingly/bagpipes-spacy?style=social)](https://github.com/wjbmattingly/bagpipes-spacy)
[![PyPi Version](https://img.shields.io/pypi/v/bagpipes-spacy)](https://pypi.org/project/bagpipes-spacy/0.0.1/)
[![PyPi Downloads](https://img.shields.io/pypi/dm/bagpipes-spacy)](https://pypi.org/project/bagpipes-spacy/0.0.1/)# Bagpipes spaCy
![bagpipes spacy logo](https://github.com/wjbmattingly/bagpipes-spacy/blob/main/images/bagpipes-spacy-logo.png?raw=true)
Bagpipes spaCy is a collection of custom spaCy pipeline components designed to enhance text processing capabilities. These components include:
1. **Quote Detector**: Identifies and extracts quotes from the text.
2. **Phrases Extractor**: Extracts various types of phrases such as prepositional, noun, verb, adjective, and adverbial phrases.
3. **Normalizer**: Normalizes the text by expanding contractions, removing special characters, and more.
4. **Triple Detector**: Extracts triples (subject, predicate, object) from the text.
5. **Entity Similarity**: Computes similarity between entities in the text and maps similar entities.
6. **Entity Cluster**: Groups entities in the text into clusters based on similarity.
7. **Sentence Cluster**: Groups sentences in the text into clusters based on similarity.
8. **Token Cluster**: Groups tokens in the text into clusters based on similarity.
9. **Keyword Extractor**: Extracts keywords from the text based on cosine similarity with the entire text or sentence.## Table of Contents
- [Bagpipes spaCy](#bagpipes-spacy)
- [Table of Contents](#table-of-contents)
- [Installation](#installation)
- [Usage](#usage)
- [Quote Detector](#quote-detector)
- [Phrases Extractor](#phrases-extractor)
- [Normalizer](#normalizer)
- [Triple Detector](#triple-detector)
- [Entity Similarity](#entity-similarity)
- [Sentence Cluster](#sentence-cluster)
- [Token Cluster](#token-cluster)
- [Entity Cluster](#entity-cluster)
- [Keyword Extractor](#keyword-extractor)## Installation
To install Bagpipes spaCy, execute:
```sh
pip install bagpipes-spacy
```## Usage
After installation, you can add any of the Bagpipes spaCy components to your spaCy pipeline using the `add_pipe` method. There's no need to import the components individually.
### Quote Detector
```python
import spacynlp = spacy.blank("en")
nlp.add_pipe("quote_detector")text = """
"I... oh ... very well," said the Prime Minister weakly. "Yes, I'll see Fudge."
He hurried back to his desk, straightening his tie as he went. He had barely resumed his seat, and arranged his face into what he hoped was a relaxed and unfazed expression, when bright green flames burst into life in the empty grate beneath his marble mantelpiece.
"""doc = nlp(text)
for quote in doc._.quotes:
print(quote)
```### Phrases Extractor
```python
import spacynlp = spacy.load("en_core_web_md")
nlp.add_pipe("phrases_extractor")text = """Seconds later, he had climbed out onto a rather fine antique rug, brushing ash from the sleeves of his long pin-striped cloak, a lime-green bowler hat in his hand."""
doc = nlp(text)
print("Prepositional Phrases:", doc._.prep_phrases)
print("Noun Phrases:", doc._.noun_phrases)
print("Verb Phrases:", doc._.verb_phrases)
print("Adj Phrases:", doc._.adj_phrases)
```### Normalizer
```python
nlp.add_pipe('normalizer', first=True)
```### Triple Detector
```python
nlp.add_pipe('triple_detector')
```### Entity Similarity
```python
nlp.add_pipe('entity_similarity')
```### Sentence Cluster
```python
nlp = spacy.load("en_core_web_lg")
nlp.add_pipe("sentence_cluster", config={"threshold": 0.8})text = "Microsoft is a company. Twitter is another company. Tiger Woods is an athlete. Michael Jordan is an athlete."
doc = nlp(text)
print(doc._.sent_cluster)
```### Token Cluster
```python
nlp = spacy.load("en_core_web_lg")
nlp.add_pipe("token_cluster", config={"threshold": 0.4})doc = nlp(text)
print(doc._.token_cluster)
```### Entity Cluster
```python
nlp = spacy.load("en_core_web_lg")
nlp.add_pipe("entity_cluster", config={"threshold": 0.3})doc = nlp(text)
print(doc._.ent_cluster)
```### Keyword Extractor
The `keyword_extractor` component can be configured with the following parameters:
- `top_n`: The number of top keywords to extract for the entire document.
- `min_ngram`: The minimum size for n-grams.
- `max_ngram`: The maximum size for n-grams.
- `strict`: If set to `True`, only n-grams within the `min_ngram` to `max_ngram` range are considered.
- `top_n_sent`: The number of top keywords to extract for each sentence.```python
import spacynlp = spacy.load("en_core_web_lg")
nlp.add_pipe("keyword_extractor", last=True, config={"top_n": 10, "min_ngram": 1, "max_ngram": 3, "strict": True, "top_n_sent": 3})text = "Natural language processing is a fascinating domain of artificial intelligence. It allows computers to understand and generate human language."
doc = nlp(text)
print("Top Document Keywords:", doc._.keywords)
for sent in doc.sents:
print(f"Sentence: {sent.text}")
print("Top Sentence Keywords:", sent._.sent_keywords)
```