Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/sakan811/stress-pattern-occurrence-in-english-words
This project is intended to provide English learners with data that allows them to make a data-driven guess when encountering words that they aren't sure where to stress
https://github.com/sakan811/stress-pattern-occurrence-in-english-words
data-analysis data-visualization english english-language english-learning language powerbi powerbi-report powerbi-visuals
Last synced: 25 days ago
JSON representation
This project is intended to provide English learners with data that allows them to make a data-driven guess when encountering words that they aren't sure where to stress
- Host: GitHub
- URL: https://github.com/sakan811/stress-pattern-occurrence-in-english-words
- Owner: sakan811
- License: apache-2.0
- Created: 2023-11-04T15:49:05.000Z (about 1 year ago)
- Default Branch: master
- Last Pushed: 2024-07-26T03:53:21.000Z (6 months ago)
- Last Synced: 2024-07-26T04:40:00.499Z (6 months ago)
- Topics: data-analysis, data-visualization, english, english-language, english-learning, language, powerbi, powerbi-report, powerbi-visuals
- Language: Python
- Homepage:
- Size: 4.98 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Stress Pattern Occurrence in English Words
Project Latest Update: 26 July 2024The stress pattern was based on **The CMU Pronouncing Dictionary**.
The **cmudict** module from the **NLTK** library was used to extract the stress pattern from the dataset.
The English words dataset was based on the **SubtlexUS** dataset.
## Disclaimers
According to what is mentioned on the CMU Pronouncing Dictionary website,
"Stress is difficult to get right and people disagree about it."## Visualizations
Visualizations Latest Update: 6 May 2024[Power BI](https://app.powerbi.com/view?r=eyJrIjoiMzhkYmVjOGUtMmE5Ni00NmUxLWIzYWYtMzk2ODQ2YmU2NGM2IiwidCI6ImZlMzViMTA3LTdjMmYtNGNjMy1hZDYzLTA2NTY0MzcyMDg3OCIsImMiOjEwfQ%3D%3D)
[Instagram](https://www.instagram.com/p/C6oRlWmM5WL/?utm_source=ig_web_copy_link&igsh=MzRlODBiNWFlZA==)
[Facebook](https://www.facebook.com/permalink.php?story_fbid=pfbid0LQsXGdyJCBBxEvjQeF7tD4tvZVkK9vVvWknG4exkd94jtmVV3Ma8wfYbBUTW5C4Cl&id=61553626169836)
## Codebase Details
### Test Status
[![CodeQL](https://github.com/sakan811/Stress-Pattern-Occurrence-in-English-Words/actions/workflows/codeql.yml/badge.svg)](https://github.com/sakan811/Stress-Pattern-Occurrence-in-English-Words/actions/workflows/codeql.yml)[![Python application](https://github.com/sakan811/Stress-Pattern-Occurrence-in-English-Words/actions/workflows/python-app.yml/badge.svg)](https://github.com/sakan811/Stress-Pattern-Occurrence-in-English-Words/actions/workflows/python-app.yml)
### To Run the Script to Get the English Words' Stress Pattern data
- Execute ```main.py```
- **Data** is loaded to a local **SQLite** database automatically.
- A local **SQLite** database is created **automatically** if not exist in the given path.
- You can change the path by adjusting this variable:
```
db = 'eng_stress_pattern.db'
```### ```stress_pattern_finder``` Package
```eng_stress_pattern_finder.py```
- Find the stress pattern of the English words with the given dataset.### ```stress_pattern_etl``` Package
```extract_data.py```
- Extract data from **SubtlexUS** dataset.```transform_data.py```
- Transform data to find a syllable count and stress pattern of each English word.
- Words that aren't in the dictionary will be filtered out.```load_to_sqlite.py```
- Load data to SQLite database tables.## Database Details
Tables:
- StressPattern
- Store **syllable count**, **stress pattern**, and **primary** and **secondary stress position** of each word## Sources
The CMU Pronouncing Dictionary: http://www.speech.cs.cmu.edu/cgi-bin/cmudict
“SubtlexUS” dataset: http://www.lexique.org/?page_id=241