Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/stephanielees/smoothingngram

Apply Bayesian statistics through DLM to smooth a 4-gram frequency series
https://github.com/stephanielees/smoothingngram

bayesian-statistics language smoothing timeseries

Last synced: 2 days ago
JSON representation

Apply Bayesian statistics through DLM to smooth a 4-gram frequency series

Host: GitHub
URL: https://github.com/stephanielees/smoothingngram
Owner: stephanielees
Created: 2024-05-11T07:50:37.000Z (8 months ago)
Default Branch: main
Last Pushed: 2024-05-11T08:27:26.000Z (8 months ago)
Last Synced: 2024-11-08T15:12:39.314Z (about 2 months ago)
Topics: bayesian-statistics, language, smoothing, timeseries
Homepage:
Size: 31.3 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

        # smoothingNGram

Apply Bayesian statistics through DLM to smooth a 4-gram frequency series. Video demonstrating the code and the smoothing theory is [here](https://youtu.be/GxO03B-0xFg).

# Goal

To find the trend of some phrases in English.

# Data

 - The source is Google Books Ngram Viewer. However, the data can also be obtained using package `ngramr` in R. To get the raw data, use `smoothing=0`.

 - Time range: 1980-2019

 - The ngram data is based on a certain corpus. In this project, we see the ngram data using English corpus, American English corpus, British English corpus, and fiction corpus.

# Method

After getting the raw data, we fit the Dynamic Linear Model (aka filtering in state space model terms). Then, we smooth the filtered sequence using propositions for smoothing.