https://github.com/lefteris-souflas/election-classification-and-clustering-analysis

Creating predictive models to classify Trump's vote share and clustering counties based on demographics and economic variables. Report findings in PDF with detailed methodologies, model assessments, and R code for the project.
https://github.com/lefteris-souflas/election-classification-and-clustering-analysis

agglomerative-algorithm bootstrap-sampling classification clustering cross-validation data-cleaning decision-tree hierarchical-clustering model-evaluation model-interpretation predictive-analytics r random-forest silhouette-analysis statistics support-vector-machine variable-importance

Last synced: 5 months ago
JSON representation

Host: GitHub
URL: https://github.com/lefteris-souflas/election-classification-and-clustering-analysis
Owner: Lefteris-Souflas
License: mit
Created: 2024-03-27T17:05:07.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2024-04-17T21:18:39.000Z (over 1 year ago)
Last Synced: 2025-03-02T08:24:41.691Z (10 months ago)
Topics: agglomerative-algorithm, bootstrap-sampling, classification, clustering, cross-validation, data-cleaning, decision-tree, hierarchical-clustering, model-evaluation, model-interpretation, predictive-analytics, r, random-forest, silhouette-analysis, statistics, support-vector-machine, variable-importance
Language: R
Homepage:
Size: 587 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# Election Classification & Clustering Analysis

Classification & Clustering Assignment for the Statistics II Course of AUEB's MSc in Business Analytics

## Part I

The first part aims at creating a predictive model to classify whether Trump got more than 50% of the votes. You have to use at least 3 distinct methods and assess how good the predictions made by your models are.

## Part II

The second part refers to clustering. Forget about the elections and the votes. You do not have to use them any further.

We split the variables into two groups:

1. **Demographics:** These are the variables:
- PST045214
- PST040210
- PST120214
- POP010210
- AGE135214
- AGE295214
- AGE775214
- SEX255214
- RHI125214
- RHI225214
- RHI325214
- RHI425214
- RHI525214
- RHI625214
- RHI725214
- RHI825214
- POP715213
- POP645213
- POP815213
- EDU635213
- EDU685213
- VET605213

2. **Economic Related:** All the remaining variables not listed above.

Here we want to use the "demographic related" variables to cluster the counties and then use the "economic related" to describe the clusters you have found.

You can use whatever method you like. You need to explain why you have selected the respective variables from the list to use and in general, you need to describe in sufficient detail your approach.

## Deliverables

Provide a report in PDF format with your findings. Be as detailed as possible so that your report is self-explained. Explain the methodologies used, how good you think they are, and any limitations that may apply. You also have to upload a separate file containing the R code you have used for the project.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/lefteris-souflas/election-classification-and-clustering-analysis

Awesome Lists containing this project

README