https://github.com/sakan811/stress-pattern-occurrence-in-english-words

This project is intended to provide English learners with data that allows them to make a data-driven guess when encountering words that they aren't sure where to stress
https://github.com/sakan811/stress-pattern-occurrence-in-english-words

data-analysis data-visualization english english-language english-learning language powerbi powerbi-report powerbi-visuals

Last synced: 10 months ago
JSON representation

This project is intended to provide English learners with data that allows them to make a data-driven guess when encountering words that they aren't sure where to stress

Host: GitHub
URL: https://github.com/sakan811/stress-pattern-occurrence-in-english-words
Owner: sakan811
License: apache-2.0
Created: 2023-11-04T15:49:05.000Z (about 2 years ago)
Default Branch: master
Last Pushed: 2024-07-26T03:53:21.000Z (over 1 year ago)
Last Synced: 2025-01-05T06:34:55.076Z (11 months ago)
Topics: data-analysis, data-visualization, english, english-language, english-learning, language, powerbi, powerbi-report, powerbi-visuals
Language: Python
Homepage:
Size: 4.98 MB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# Stress Pattern Occurrence in English Words
Project Latest Update: 26 July 2024

The stress pattern was based on **The CMU Pronouncing Dictionary**.

The **cmudict** module from the **NLTK** library was used to extract the stress pattern from the dataset.

The English words dataset was based on the **SubtlexUS** dataset.

## Disclaimers
According to what is mentioned on the CMU Pronouncing Dictionary website,
"Stress is difficult to get right and people disagree about it."

## Visualizations
Visualizations Latest Update: 6 May 2024

[Power BI](https://app.powerbi.com/view?r=eyJrIjoiMzhkYmVjOGUtMmE5Ni00NmUxLWIzYWYtMzk2ODQ2YmU2NGM2IiwidCI6ImZlMzViMTA3LTdjMmYtNGNjMy1hZDYzLTA2NTY0MzcyMDg3OCIsImMiOjEwfQ%3D%3D)

[Instagram](https://www.instagram.com/p/C6oRlWmM5WL/?utm_source=ig_web_copy_link&igsh=MzRlODBiNWFlZA==)

[Facebook](https://www.facebook.com/permalink.php?story_fbid=pfbid0LQsXGdyJCBBxEvjQeF7tD4tvZVkK9vVvWknG4exkd94jtmVV3Ma8wfYbBUTW5C4Cl&id=61553626169836)

## Codebase Details
### Test Status
[![CodeQL](https://github.com/sakan811/Stress-Pattern-Occurrence-in-English-Words/actions/workflows/codeql.yml/badge.svg)](https://github.com/sakan811/Stress-Pattern-Occurrence-in-English-Words/actions/workflows/codeql.yml)

[![Python application](https://github.com/sakan811/Stress-Pattern-Occurrence-in-English-Words/actions/workflows/python-app.yml/badge.svg)](https://github.com/sakan811/Stress-Pattern-Occurrence-in-English-Words/actions/workflows/python-app.yml)

### To Run the Script to Get the English Words' Stress Pattern data
- Execute ```main.py```
- **Data** is loaded to a local **SQLite** database automatically.
- A local **SQLite** database is created **automatically** if not exist in the given path.
- You can change the path by adjusting this variable:
```
db = 'eng_stress_pattern.db'
```

### ```stress_pattern_finder``` Package

```eng_stress_pattern_finder.py```
- Find the stress pattern of the English words with the given dataset.

### ```stress_pattern_etl``` Package

```extract_data.py```
- Extract data from **SubtlexUS** dataset.

```transform_data.py```
- Transform data to find a syllable count and stress pattern of each English word.
- Words that aren't in the dictionary will be filtered out.

```load_to_sqlite.py```
- Load data to SQLite database tables.

## Database Details
Tables:
- StressPattern
- Store **syllable count**, **stress pattern**, and **primary** and **secondary stress position** of each word

## Sources
The CMU Pronouncing Dictionary: http://www.speech.cs.cmu.edu/cgi-bin/cmudict
“SubtlexUS” dataset: http://www.lexique.org/?page_id=241

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/sakan811/stress-pattern-occurrence-in-english-words

Awesome Lists containing this project

README