Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/wjbmattingly/spacyex
SpaCyEx allows the creation of spaCy Matcher patterns with RegEx like syntax.
https://github.com/wjbmattingly/spacyex
nlp spacy
Last synced: 3 months ago
JSON representation
SpaCyEx allows the creation of spaCy Matcher patterns with RegEx like syntax.
- Host: GitHub
- URL: https://github.com/wjbmattingly/spacyex
- Owner: wjbmattingly
- Created: 2024-05-01T19:23:57.000Z (9 months ago)
- Default Branch: main
- Last Pushed: 2024-05-03T09:42:43.000Z (9 months ago)
- Last Synced: 2024-10-21T04:43:58.100Z (3 months ago)
- Topics: nlp, spacy
- Language: Python
- Homepage:
- Size: 58.6 KB
- Stars: 57
- Watchers: 3
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# spaCyEx
[![PyPI version](https://badge.fury.io/py/spacyex.svg)](https://pypi.org/project/spacyex/)
[![GitHub stars](https://img.shields.io/github/stars/wjbmattingly/spacyex.svg?style=social&label=Star&maxAge=2592000)](https://github.com/wjbmattingly/spacyex/stargazers)
[![GitHub forks](https://img.shields.io/github/forks/wjbmattingly/spacyex.svg?style=social&label=Fork&maxAge=2592000)](https://github.com/wjbmattingly/spacyex/network)![logo](https://github.com/wjbmattingly/spacyex/blob/main/images/spacyex-logo.png?raw=true)
`spaCyEx` is a powerful extension for spaCy, designed to make pattern matching as flexible and easy as using regular expressions. It builds upon the existing capabilities of spaCy's `Matcher`, enhancing it with a more accessible syntax for defining complex patterns. `spaCyEx` allows for intuitive and detailed text pattern specifications, perfect for extracting detailed linguistic features from texts.
## Installation
You can install `spaCyEx` via pip:
```bash
pip install spacyex
```## Features
- **Dynamic Pattern Creation**: Create complex token matching patterns using a simple string-based syntax.
- **Integration with spaCy**: Leverage spaCy's Matcher capabilities to find sequences in text that match defined patterns.
- **Customizable Matching Rules**: Define token attributes including text characteristics, lexical attributes, and grammatical properties.### Creating Patterns
Define patterns using a string syntax where each token and its attributes are encapsulated by parentheses. Token attributes are specified by key-value pairs, separated by an equals sign (`=`), and multiple attributes are divided by a pipe (`|`).
#### Syntax Examples
- **Single Attribute**: `(pos=NOUN)`
- **Multiple Attributes**: `(pos=NOUN|lemma=run)`
- **Using List Values**: `(lemma=in[run,walk])`
- **Using Operators**: `(ent_type=person|op={2,3})`### Pattern Matching
Once a pattern is defined, it can be used to search text for matches.
## Usage
Here is a simple example to get started with `spaCyEx`:
```python
import spacyex as se
import spacynlp = spacy.load("en_core_web_sm")
text = "John Smith runs fast, but Jacob Smith walks slowly."
pattern = "(ent_type=person|op={2}) (lemma=in[run,walk]) (pos=ADV)"results = se.search(pattern, text, nlp)
for match in results:
print(match[0].text, "Start:", match[1], "End:", match[2])
```This code will match sequences in the text based on the defined pattern, using named entities, lemmas, and parts of speech.
## Roadmap
- [ ] Support for all dictionary properties in patterns.
- [ ] Additional utilities and helper functions for more complex pattern scenarios.