https://github.com/hiejulia/spark-nlp
Spark NLP
https://github.com/hiejulia/spark-nlp
Last synced: 6 months ago
JSON representation
Spark NLP
- Host: GitHub
- URL: https://github.com/hiejulia/spark-nlp
- Owner: hiejulia
- Created: 2020-09-14T18:37:37.000Z (about 5 years ago)
- Default Branch: master
- Last Pushed: 2020-09-14T18:59:36.000Z (about 5 years ago)
- Last Synced: 2025-02-08T21:46:19.992Z (8 months ago)
- Size: 5.86 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# spark-nlp
Spark NLP## Dataset
- https://archive.ics.uci.edu/ml/datasets/Twenty+Newsgroups## Install
- Anaconda
- Spark
-## NLP related terms
- anotators
DocumentAssembler
A Transformer that creates a column that contains documents.
Sentence Segmenter
An annotator that produces the sentences of the document.
Tokenizer
An annotator that produces the tokens of the sentences.
SpellChecker
An annotator that produces the spelling-corrected tokens.
Stemmer
An annotator that produces the stems of the tokens.
Lemmatizer
An annotator that produces the lemmas of the tokens.
POS Tagger
An annotator that produces the parts of speech of the associated tokens.