https://github.com/wjbmattingly/bagpipes-spacy

Bagpipes spaCy is a collection of custom spaCy pipeline components designed to enhance text processing capabilities.
https://github.com/wjbmattingly/bagpipes-spacy

nlp spacy

Last synced: 3 months ago
JSON representation

Bagpipes spaCy is a collection of custom spaCy pipeline components designed to enhance text processing capabilities.

Host: GitHub
URL: https://github.com/wjbmattingly/bagpipes-spacy
Owner: wjbmattingly
Created: 2023-08-26T17:01:53.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2024-08-15T01:00:03.000Z (9 months ago)
Last Synced: 2025-01-31T02:23:01.029Z (3 months ago)
Topics: nlp, spacy
Language: Python
Homepage:
Size: 197 KB
Stars: 13
Watchers: 2
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

        [![GitHub Stars](https://img.shields.io/github/stars/wjbmattingly/bagpipes-spacy?style=social)](https://github.com/wjbmattingly/bagpipes-spacy)

[![PyPi Version](https://img.shields.io/pypi/v/bagpipes-spacy)](https://pypi.org/project/bagpipes-spacy/0.0.1/)

[![PyPi Downloads](https://img.shields.io/pypi/dm/bagpipes-spacy)](https://pypi.org/project/bagpipes-spacy/0.0.1/)

# Bagpipes spaCy

![bagpipes spacy logo](https://github.com/wjbmattingly/bagpipes-spacy/blob/main/images/bagpipes-spacy-logo.png?raw=true)

Bagpipes spaCy is a collection of custom spaCy pipeline components designed to enhance text processing capabilities. These components include:

1. **Quote Detector**: Identifies and extracts quotes from the text.

2. **Phrases Extractor**: Extracts various types of phrases such as prepositional, noun, verb, adjective, and adverbial phrases.

3. **Normalizer**: Normalizes the text by expanding contractions, removing special characters, and more.

4. **Triple Detector**: Extracts triples (subject, predicate, object) from the text.

5. **Entity Similarity**: Computes similarity between entities in the text and maps similar entities.

6. **Entity Cluster**: Groups entities in the text into clusters based on similarity.

7. **Sentence Cluster**: Groups sentences in the text into clusters based on similarity.

8. **Token Cluster**: Groups tokens in the text into clusters based on similarity.

9. **Keyword Extractor**: Extracts keywords from the text based on cosine similarity with the entire text or sentence.

## Table of Contents

- [Bagpipes spaCy](#bagpipes-spacy)

  - [Table of Contents](#table-of-contents)

  - [Installation](#installation)

  - [Usage](#usage)

    - [Quote Detector](#quote-detector)

    - [Phrases Extractor](#phrases-extractor)

    - [Normalizer](#normalizer)

    - [Triple Detector](#triple-detector)

    - [Entity Similarity](#entity-similarity)

    - [Sentence Cluster](#sentence-cluster)

    - [Token Cluster](#token-cluster)

    - [Entity Cluster](#entity-cluster)

    - [Keyword Extractor](#keyword-extractor)

## Installation

To install Bagpipes spaCy, execute:

```sh

pip install bagpipes-spacy

```

## Usage

After installation, you can add any of the Bagpipes spaCy components to your spaCy pipeline using the `add_pipe` method. There's no need to import the components individually.

### Quote Detector

```python

import spacy

nlp = spacy.blank("en")

nlp.add_pipe("quote_detector")

text = """

"I... oh ... very well," said the Prime Minister weakly. "Yes, I'll see Fudge."

He hurried back to his desk, straightening his tie as he went. He had barely resumed his seat, and arranged his face into what he hoped was a relaxed and unfazed expression, when bright green flames burst into life in the empty grate beneath his marble mantelpiece.

"""

doc = nlp(text)

for quote in doc._.quotes:

    print(quote)

```

### Phrases Extractor

```python

import spacy

nlp = spacy.load("en_core_web_md")

nlp.add_pipe("phrases_extractor")

text = """Seconds later, he had climbed out onto a rather fine antique rug, brushing ash from the sleeves of his long pin-striped cloak, a lime-green bowler hat in his hand."""

doc = nlp(text)

print("Prepositional Phrases:", doc._.prep_phrases)

print("Noun Phrases:", doc._.noun_phrases)

print("Verb Phrases:", doc._.verb_phrases)

print("Adj Phrases:", doc._.adj_phrases)

```

### Normalizer

```python

nlp.add_pipe('normalizer', first=True)

```

### Triple Detector

```python

nlp.add_pipe('triple_detector')

```

### Entity Similarity

```python

nlp.add_pipe('entity_similarity')

```

### Sentence Cluster

```python

nlp = spacy.load("en_core_web_lg")

nlp.add_pipe("sentence_cluster", config={"threshold": 0.8})

text = "Microsoft is a company. Twitter is another company. Tiger Woods is an athlete. Michael Jordan is an athlete."

doc = nlp(text)

print(doc._.sent_cluster)

```

### Token Cluster

```python

nlp = spacy.load("en_core_web_lg")

nlp.add_pipe("token_cluster", config={"threshold": 0.4})

doc = nlp(text)

print(doc._.token_cluster)

```

### Entity Cluster

```python

nlp = spacy.load("en_core_web_lg")

nlp.add_pipe("entity_cluster", config={"threshold": 0.3})

doc = nlp(text)

print(doc._.ent_cluster)

```

### Keyword Extractor

The `keyword_extractor` component can be configured with the following parameters:

- `top_n`: The number of top keywords to extract for the entire document.

- `min_ngram`: The minimum size for n-grams.

- `max_ngram`: The maximum size for n-grams.

- `strict`: If set to `True`, only n-grams within the `min_ngram` to `max_ngram` range are considered.

- `top_n_sent`: The number of top keywords to extract for each sentence.

```python

import spacy

nlp = spacy.load("en_core_web_lg")

nlp.add_pipe("keyword_extractor", last=True, config={"top_n": 10, "min_ngram": 1, "max_ngram": 3, "strict": True, "top_n_sent": 3})

text = "Natural language processing is a fascinating domain of artificial intelligence. It allows computers to understand and generate human language."

doc = nlp(text)

print("Top Document Keywords:", doc._.keywords)

for sent in doc.sents:

    print(f"Sentence: {sent.text}")

    print("Top Sentence Keywords:", sent._.sent_keywords)

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/wjbmattingly/bagpipes-spacy

Awesome Lists containing this project

README