Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/lefteris-souflas/election-classification-and-clustering-analysis
Creating predictive models to classify Trump's vote share and clustering counties based on demographics and economic variables. Report findings in PDF with detailed methodologies, model assessments, and R code for the project.
https://github.com/lefteris-souflas/election-classification-and-clustering-analysis
agglomerative-algorithm bootstrap-sampling classification clustering cross-validation data-cleaning decision-tree hierarchical-clustering model-evaluation model-interpretation predictive-analytics r random-forest silhouette-analysis statistics support-vector-machine variable-importance
Last synced: 3 days ago
JSON representation
Creating predictive models to classify Trump's vote share and clustering counties based on demographics and economic variables. Report findings in PDF with detailed methodologies, model assessments, and R code for the project.
- Host: GitHub
- URL: https://github.com/lefteris-souflas/election-classification-and-clustering-analysis
- Owner: Lefteris-Souflas
- License: mit
- Created: 2024-03-27T17:05:07.000Z (8 months ago)
- Default Branch: main
- Last Pushed: 2024-04-17T21:18:39.000Z (7 months ago)
- Last Synced: 2024-04-17T22:29:22.807Z (7 months ago)
- Topics: agglomerative-algorithm, bootstrap-sampling, classification, clustering, cross-validation, data-cleaning, decision-tree, hierarchical-clustering, model-evaluation, model-interpretation, predictive-analytics, r, random-forest, silhouette-analysis, statistics, support-vector-machine, variable-importance
- Language: R
- Homepage:
- Size: 587 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Election Classification & Clustering Analysis
Classification & Clustering Assignment for the Statistics II Course of AUEB's MSc in Business Analytics
## Part I
The first part aims at creating a predictive model to classify whether Trump got more than 50% of the votes. You have to use at least 3 distinct methods and assess how good the predictions made by your models are.
## Part II
The second part refers to clustering. Forget about the elections and the votes. You do not have to use them any further.
We split the variables into two groups:
1. **Demographics:** These are the variables:
- PST045214
- PST040210
- PST120214
- POP010210
- AGE135214
- AGE295214
- AGE775214
- SEX255214
- RHI125214
- RHI225214
- RHI325214
- RHI425214
- RHI525214
- RHI625214
- RHI725214
- RHI825214
- POP715213
- POP645213
- POP815213
- EDU635213
- EDU685213
- VET6052132. **Economic Related:** All the remaining variables not listed above.
Here we want to use the "demographic related" variables to cluster the counties and then use the "economic related" to describe the clusters you have found.
You can use whatever method you like. You need to explain why you have selected the respective variables from the list to use and in general, you need to describe in sufficient detail your approach.
## Deliverables
Provide a report in PDF format with your findings. Be as detailed as possible so that your report is self-explained. Explain the methodologies used, how good you think they are, and any limitations that may apply. You also have to upload a separate file containing the R code you have used for the project.