Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/lefteris-souflas/election-classification-and-clustering-analysis

Creating predictive models to classify Trump's vote share and clustering counties based on demographics and economic variables. Report findings in PDF with detailed methodologies, model assessments, and R code for the project.
https://github.com/lefteris-souflas/election-classification-and-clustering-analysis

agglomerative-algorithm bootstrap-sampling classification clustering cross-validation data-cleaning decision-tree hierarchical-clustering model-evaluation model-interpretation predictive-analytics r random-forest silhouette-analysis statistics support-vector-machine variable-importance

Last synced: 3 days ago
JSON representation

Creating predictive models to classify Trump's vote share and clustering counties based on demographics and economic variables. Report findings in PDF with detailed methodologies, model assessments, and R code for the project.

Awesome Lists containing this project

README

        

# Election Classification & Clustering Analysis

Classification & Clustering Assignment for the Statistics II Course of AUEB's MSc in Business Analytics

## Part I

The first part aims at creating a predictive model to classify whether Trump got more than 50% of the votes. You have to use at least 3 distinct methods and assess how good the predictions made by your models are.

## Part II

The second part refers to clustering. Forget about the elections and the votes. You do not have to use them any further.

We split the variables into two groups:

1. **Demographics:** These are the variables:
- PST045214
- PST040210
- PST120214
- POP010210
- AGE135214
- AGE295214
- AGE775214
- SEX255214
- RHI125214
- RHI225214
- RHI325214
- RHI425214
- RHI525214
- RHI625214
- RHI725214
- RHI825214
- POP715213
- POP645213
- POP815213
- EDU635213
- EDU685213
- VET605213

2. **Economic Related:** All the remaining variables not listed above.

Here we want to use the "demographic related" variables to cluster the counties and then use the "economic related" to describe the clusters you have found.

You can use whatever method you like. You need to explain why you have selected the respective variables from the list to use and in general, you need to describe in sufficient detail your approach.

## Deliverables

Provide a report in PDF format with your findings. Be as detailed as possible so that your report is self-explained. Explain the methodologies used, how good you think they are, and any limitations that may apply. You also have to upload a separate file containing the R code you have used for the project.