Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/cloudera/cml_amp_spacy_entity_extraction
A Jupyter notebook demonstrating entity extraction on headlines with SpaCy.
https://github.com/cloudera/cml_amp_spacy_entity_extraction
entity-extraction named-entity-recognition nlp spacy
Last synced: 3 months ago
JSON representation
A Jupyter notebook demonstrating entity extraction on headlines with SpaCy.
- Host: GitHub
- URL: https://github.com/cloudera/cml_amp_spacy_entity_extraction
- Owner: cloudera
- License: apache-2.0
- Created: 2021-01-28T22:16:25.000Z (almost 4 years ago)
- Default Branch: master
- Last Pushed: 2022-04-06T21:32:56.000Z (almost 3 years ago)
- Last Synced: 2023-10-20T18:57:46.759Z (over 1 year ago)
- Topics: entity-extraction, named-entity-recognition, nlp, spacy
- Language: Jupyter Notebook
- Homepage:
- Size: 3.36 MB
- Stars: 3
- Watchers: 8
- Forks: 5
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE.txt
Awesome Lists containing this project
README
# Analyzing News Headlines with SpaCy
[SpaCy](https://spacy.io/) wraps industrial-strength natural language processing capabilites into a Python library with an elegant and powerful API. The notebook in this repo demonstrates its use for Named Entity Recognition (NER) on a real world news dataset.
![Sentences with named entities highlighted.](docs/images/NER.png)
We take a public domain dataset of [Reuters news headlines](https://www.kaggle.com/notlucasp/financial-news-headlines) and use spaCy to extract named entities. We demonstrate three example downstream use cases:
- investigating the organizations that appeared most often in Reuters in 2020
- viewing the mentions of any given organization over time
- inspecting which organizations appear in headlines together## Deploying on Cloudera Machine Learning (CML)
There are three ways to launch this notebook on CML:
1. **From Prototype Catalog** - Navigate to the Prototype Catalog in a CML workspace, select the "Analyzing News Headlines with SpaCy" tile, click "Launch as Project", click "Configure Project"
2. **As ML Prototype** - In a CML workspace, click "New Project", add a Project Name, select "ML Prototype" as the Initial Setup option, copy in the [repo URL](https://github.com/cloudera/CML_AMP_SpaCy_Entity_Extraction), click "Create Project", click "Configure Project"
3. **Manual Setup** - In a CML workspace, click "New Project", add a Project Name, select "Git" as the Initial Setup option, copy in the [repo URL](https://github.com/cloudera/CML_AMP_SpaCy_Entity_Extraction), click "Create Project".Once the project has been initialized in a CML workspace, run the notebook by starting a Python 3 Jupyter notebook server session. All library and model dependencies are installed inline in the notebook.
Happy hacking!