Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/stephanielees/smoothingngram
Apply Bayesian statistics through DLM to smooth a 4-gram frequency series
https://github.com/stephanielees/smoothingngram
bayesian-statistics language smoothing timeseries
Last synced: 2 days ago
JSON representation
Apply Bayesian statistics through DLM to smooth a 4-gram frequency series
- Host: GitHub
- URL: https://github.com/stephanielees/smoothingngram
- Owner: stephanielees
- Created: 2024-05-11T07:50:37.000Z (8 months ago)
- Default Branch: main
- Last Pushed: 2024-05-11T08:27:26.000Z (8 months ago)
- Last Synced: 2024-11-08T15:12:39.314Z (about 2 months ago)
- Topics: bayesian-statistics, language, smoothing, timeseries
- Homepage:
- Size: 31.3 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# smoothingNGram
Apply Bayesian statistics through DLM to smooth a 4-gram frequency series. Video demonstrating the code and the smoothing theory is [here](https://youtu.be/GxO03B-0xFg).# Goal
To find the trend of some phrases in English.# Data
- The source is Google Books Ngram Viewer. However, the data can also be obtained using package `ngramr` in R. To get the raw data, use `smoothing=0`.
- Time range: 1980-2019
- The ngram data is based on a certain corpus. In this project, we see the ngram data using English corpus, American English corpus, British English corpus, and fiction corpus.# Method
After getting the raw data, we fit the Dynamic Linear Model (aka filtering in state space model terms). Then, we smooth the filtered sequence using propositions for smoothing.