Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/codersales/msc-y1-s1-w10-thu-lang-eng-python-12-2h-lecture

MSc-Y1-S1-W10-Thu-Lang-Eng-Python-12-2h-Lecture | Summary attempt
https://github.com/codersales/msc-y1-s1-w10-thu-lang-eng-python-12-2h-lecture

Last synced: about 5 hours ago
JSON representation

MSc-Y1-S1-W10-Thu-Lang-Eng-Python-12-2h-Lecture | Summary attempt

Awesome Lists containing this project

README

        

# MSc-Y1-S1-W10-Thu-Lang-Eng-Python-12-2h-Lecture

## Description
MSc-Y1-S1-W10-Thu-Lang-Eng-Python-12-2h-Lecture | Summary attempt

## Content

NLTK

Tokenisation

To find page:

Step 1:

- [Not to be confused with tokenization (lexical analysis).](https://en.wikipedia.org/wiki/Tokenization_(data_security))

Step 2: click through to lexical analysis, which links to the relevant section of that page:

- "A lexical token is a string with an assigned and thus identified meaning, in contrast to the probabilistic token used in large language models." [Lexical analysis > Lexical token and lexical tokenization | Wikipedia](https://en.wikipedia.org/wiki/Lexical_analysis#Tokenization)

Stemming - "reducing inflected (or sometimes derived) words to their word stem, base or root form—generally a written word form." [Stemming | Wikipedia](https://en.wikipedia.org/wiki/Stemming)

Lemmatization - "the process of grouping together the inflected forms of a word so they can be analysed as a single item, identified by the word's lemma, or dictionary form." [Lemmatization | Wikipedia](https://en.wikipedia.org/wiki/Lemmatization)

## References

Language Engineering Module