Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/curiousitydrives/named-entity-recognition
https://github.com/curiousitydrives/named-entity-recognition
named-entity-recognition natural-language-processing python regex-pattern spacy
Last synced: 14 days ago
JSON representation
- Host: GitHub
- URL: https://github.com/curiousitydrives/named-entity-recognition
- Owner: curiousityDrives
- Created: 2024-06-28T03:01:33.000Z (6 months ago)
- Default Branch: main
- Last Pushed: 2024-08-16T09:48:52.000Z (4 months ago)
- Last Synced: 2024-12-13T03:16:39.036Z (14 days ago)
- Topics: named-entity-recognition, natural-language-processing, python, regex-pattern, spacy
- Language: Jupyter Notebook
- Homepage:
- Size: 5.45 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Named Entity Recognition (NER) using spaCy
## Overview
This project implements Named Entity Recognition (NER) using the spaCy library in Python. NER is a fundamental task in
Natural Language Processing (NLP) that involves identifying and categorizing entities (such as names of persons, organizations,
locations, dates, etc.) within a body of text. The goal of this project is to demonstrate how to extract these entities using a
pre-trained spaCy model and additional custom rules.## Features
- **Entity Types Recognized**:
- **PERSON**: Names of persons (e.g., "John Smith", "Mary Jane").
- **GPE (Geo-Political Entity)**: Names of countries, cities, states, etc.
- **ORG**: Names of organizations, companies, institutions, etc.
- **TITLE**: Government official titles (e.g., "President", "Prime Minister").
- **DATE**: Dates and times mentioned in the text (e.g., "January 1, 2023", "tomorrow").- **Implementation Details**:
- **spaCy Integration**: Utilizes the `en_core_web_sm` model from spaCy, which is pre-trained on a large corpus of text.
- **Regex Patterns**: Custom regex patterns are used to enhance entity recognition for specific categories like country names and dates.
- **Functionality**: Includes functions for extracting each type of entity separately, providing flexibility and modularity in usage.## Project Structure
- **`README.md`**: This file provides an overview of the project, installation instructions, usage guidelines, and examples.
- **`ner_project.py`**: Python script containing functions for extracting named entities using spaCy and regex.
- **`NER_Dataset.csv`**: Sample dataset used for testing the NER functions.
- **`requirements.txt`**: Lists the Python packages required to run the project (e.g., spaCy, pandas).
- **`LICENSE`**: License information for the project.## Usage
1. **Installation**:
- Clone the repository: `git clone https://github.com/cur10usityDrives/Named-Entity-Recognition.git`
- Install dependencies: `pip install -r requirements.txt`2. **Running the Code**:
- Modify `ner_project.py` to process your specific text data or integrate into your own Python projects.
- Use the functions like `extract_person_names`, `extract_country_names`, etc., to extract named entities from text data.3. **Example**:
```python
from ner_project import extract_person_names, extract_country_names, extract_organization_names, extract_official_titles, extract_datetimessample_text = "In Beirut, a string of officials voiced their anger, while at the United Nations summit in New York, Prime Minister Fouad Siniora said the Lebanese people are resolute in preventing such attempts from destroying their spirit."
print("Person Names:", extract_person_names(sample_text))
print("Country Names:", extract_country_names(sample_text))
print("Organization Names:", extract_organization_names(sample_text))
print("Official Titles:", extract_official_titles(sample_text))
print("Datetimes:", extract_datetimes(sample_text))
```## Contributing
Contributions to improve this project are welcome! You can contribute by:
- Opening issues for bugs or feature requests.
- Forking the repository and submitting pull requests with improvements.
- Providing feedback on the project and its documentation.## Author
Natnael Haile