Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/jasper-koops/analyse_verkiezingen_2e_kamer_2023
https://github.com/jasper-koops/analyse_verkiezingen_2e_kamer_2023
Last synced: 24 days ago
JSON representation
- Host: GitHub
- URL: https://github.com/jasper-koops/analyse_verkiezingen_2e_kamer_2023
- Owner: Jasper-Koops
- Created: 2023-11-06T16:07:24.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2023-11-20T15:21:58.000Z (about 1 year ago)
- Last Synced: 2024-10-30T09:16:18.340Z (2 months ago)
- Language: Jupyter Notebook
- Size: 13.5 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
```python
import pandas as pd
from pca import pca
import seaborn as sns
import matplotlib.pyplot as plt
```# 1. Hoofdcomponentenanalyse
Hoofdcomponentenanalyse (PCA) is een statistische techniek die wordt gebruikt voor dimensionale reductie. Dit is een techniek waarmee de dimensies of variabelen van een dataset worden verminderd, terwijl zo veel mogelijk relevante informatie behouden blijft. Het doel is om de complexiteit van gegevens te verminderen door de dataset om te zetten in een lagere dimensionale ruimte, terwijl belangrijke patronen en relaties in de gegevens behouden blijven.
De overgebleven dimensies, de 'hoofdcomponenten' zijn een combinatie van de oorspronkelijke variabelen en kunnen worden gebruikt om de belangrijkste patronen in de data te identificeren. Het is hierdoor mogelijk om de data te visualiseren in een grafiek met minder dimensies. Niet elk component verklaard evenveel van de variantie in de data. De eerste component verklaard de meeste variantie, de tweede component verklaard de meeste variantie van de overgebleven variantie, etc. De toegevoegde waarde van een component neemt hierdoor af naarmate er meer componenten worden toegevoegd omdat elk nieuw component een kleiner deel van de variantie verklaard.
# 2. Analyze stemhulpen
Dit is een analyse van hoe de verschillende partijen zich tot elkaar verhouden op basis van de stellingen van de stemhulpen. Het doel is om hierdoor een beter beeld te kunnen geven van het antwoord op vragen als; "Waar bevindt de NSC zich op het politieke spectrum?", "hoeveel overlap is er tussen de partijen?".
Elke vraag in een stemhulp representeert een dimensie, dit betekent dat een stemhulp met 30 vragen een 30 dimensionale dataset oplevert. Om deze data alsnog te kunnen visualiseren is er een hoofdcomponentenanalyse toegepast op de data van alle stemhulpen om het aantal dimensies terug te brengen tot 1, 2 of 3 dimensies. Hierdoor is het mogelijk de posities van de partijen in een grafiek te visualiseren. Partijen die zich dicht bij elkaar bevinden zijn het met elkaar eens, partijen die ver van elkaar af staan zijn het oneens. Het is belangrijk de beseffen dat niet elke as ('component') evenveel van de variantie verklaard. In de grafieken staat per as aangegeven welk deel van de variantie verklaard wordt en dus ook hoe 'belangrijk' deze as is.
Per component is er analyse van de stellingen die de grootste invloed hebben op de positie van de partijen. Hierdoor wordt inzichtelijk gemaakt wat een as in de grafiek representeert. De antwoorden van de partijen op de stellingen zijn gecodeerd als nummer. Hoe hoger het nummer, hoe meer de partij het eens is met de stelling, negatieve nummers betekenen dat de partij het oneens is met de stelling. Deze antwoorden worden vermenigvuldigt met de waardes ('loadings') die de stellingen hebben voor de component. De som van deze vermenigvuldigingen is de positie van de partij op de component.
Als laatste heb ik de corrolatie tussen de antwoorden van de partijen berekend. Hoe hoger de correlatie, hoe meer de partijen het met elkaar eens zijn.
## 2.1 Check je stem
[Check je stem](https://checkjestem.nl/) is een stemhulp die niet de standpunten maar het stemgedrag van de partijen volgt. Het probleem met deze checker is dat bij de gekozen stemmingen BIJ1 (7) en NSC (3) niet altijd aanwezig waren. Deze stemmingen kunnen hierdoor *of* niet meegenomen worden in de analyse, *of* de analyse moet zonder de partijen plaatsvinden. Gezien het grote aantal gemiste stemmingen (7) en het feit dat de partij in de peilingen stabiel op 0 zetels staat is er besloten om BIJ1 weg te laten uit de analyse. NSC is wel meegenomen gezien het grote aantal zetels in de peilingen.
Van de 26 stellingen blijven er hierdoor 23 over. Het stemgedrag van Groenlinks en de PvdA had 100% overlap en zijn daarom samengevoegd tot de 'GL-PvdA' combinatie waarmee ze in de verkiezingen ook op de lijst staan.
```python
# Load data and prepare DataFrame for analysis
check_je_stem_df = pd.read_csv('data/check_je_stem.csv')
check_je_stem_df = check_je_stem_df.drop("BIJ1", axis=1)
check_je_stem_df = check_je_stem_df.dropna(axis=0)
check_je_stem_labels = check_je_stem_df.columns[1:]
check_je_stem_df =check_je_stem_df.iloc[1:]
check_je_stem_questions = check_je_stem_df["vraag"]
check_je_stem_df = check_je_stem_df.iloc[:, 1:]# Prepared DataFrame is assigned its own variable to allow the original to be used in the Merged model.
check_je_stem_prepared = pd.DataFrame(data=check_je_stem_df, columns=check_je_stem_labels)
check_je_stem_prepared = check_je_stem_prepared.rename(index=check_je_stem_questions)
check_je_stem_prepared = check_je_stem_prepared.transpose()
```### Verklaarde variantie per component
```python
model = pca(n_components=5)
results = model.fit_transform(check_je_stem_prepared, row_labels=check_je_stem_labels)
model.plot()
```[pca] >Extracting column labels from dataframe.
[pca] >The PCA reduction is performed on the [22] columns of the input dataframe.
[pca] >Fit using PCA.
[pca] >Compute loadings and PCs.
[pca] >Compute explained variance.
[pca] >Outlier detection using Hotelling T2 test with alpha=[0.05] and n_components=[5]
[pca] >Multiple test correction applied for Hotelling T2 test: [fdr_bh]
[pca] >Outlier detection using SPE/DmodX with n_std=[3](,
)
![png](images/output_6_2.png)
### Visualisatie van de posities van de partijen
#### 1d
```python
model = pca(n_components=2)
results = model.fit_transform(check_je_stem_prepared, row_labels=check_je_stem_labels)
model.scatter(legend=False, figsize=(20,20), labels=check_je_stem_labels, PC=(0,0), fontsize=16, title="1d Grafiek (diagonaal vanwege leesbaarheid)")```
[scatterd] >INFO> Create scatterplot
[pca] >Extracting column labels from dataframe.
[pca] >The PCA reduction is performed on the [22] columns of the input dataframe.
[pca] >Fit using PCA.
[pca] >Compute loadings and PCs.
[pca] >Compute explained variance.
[pca] >Outlier detection using Hotelling T2 test with alpha=[0.05] and n_components=[2]
[pca] >Multiple test correction applied for Hotelling T2 test: [fdr_bh]
[pca] >Outlier detection using SPE/DmodX with n_std=[3](,
)
![png](images/output_9_3.png)
#### 2d
```python
model = pca(n_components=2)
results = model.fit_transform(check_je_stem_prepared, row_labels=check_je_stem_labels)
model.scatter(legend=False, figsize=(20,20), labels=check_je_stem_labels)
```[scatterd] >INFO> Create scatterplot
[pca] >Extracting column labels from dataframe.
[pca] >The PCA reduction is performed on the [22] columns of the input dataframe.
[pca] >Fit using PCA.
[pca] >Compute loadings and PCs.
[pca] >Compute explained variance.
[pca] >Outlier detection using Hotelling T2 test with alpha=[0.05] and n_components=[2]
[pca] >Multiple test correction applied for Hotelling T2 test: [fdr_bh]
[pca] >Outlier detection using SPE/DmodX with n_std=[3](,
)
![png](images/output_11_3.png)
#### 3d
```python
model = pca(n_components=3)
results = model.fit_transform(check_je_stem_prepared, row_labels=check_je_stem_labels)
model.scatter3d(legend=False, figsize=(20,20), labels=check_je_stem_labels)
```[scatterd] >INFO> Create scatterplot
[pca] >Extracting column labels from dataframe.
[pca] >The PCA reduction is performed on the [22] columns of the input dataframe.
[pca] >Fit using PCA.
[pca] >Compute loadings and PCs.
[pca] >Compute explained variance.
[pca] >Outlier detection using Hotelling T2 test with alpha=[0.05] and n_components=[3]
[pca] >Multiple test correction applied for Hotelling T2 test: [fdr_bh]
[pca] >Outlier detection using SPE/DmodX with n_std=[3](,
)
![png](images/output_13_3.png)
### Analyse componenten
```python
component_df = model.results["loadings"].iloc[0].sort_values(ascending=False)
fig, ax = plt.subplots(figsize=(20,12))
component_df.plot.barh(ax=ax)
fig.tight_layout()
```
![png](images/output_15_0.png)
```python
component_df = model.results["loadings"].iloc[1].sort_values(ascending=False)
fig, ax = plt.subplots(figsize=(20,12))
component_df.plot.barh(ax=ax)
fig.tight_layout()
```
![png](images/output_16_0.png)
### Correlatie stemgedrag van de partijen
```python
fig, ax = plt.subplots(figsize=(10,10))
sns.heatmap(check_je_stem_df.corr())
```
![png](images/output_18_1.png)
```python
check_je_stem_df.corr()
```.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}.dataframe tbody tr th {
vertical-align: top;
}.dataframe thead th {
text-align: right;
}
SP
PVV
GL-PvdA
Partij voor de Dieren
FVD
BBB
DENK
D66
BVNL
VOLT
SGP
JA21
CU
VVD
NSC
CDA
SP
1.000000
0.398527
0.312500
0.541667
-0.094356
0.019920
0.385758
4.082483e-01
-0.113228
0.385758
-0.094356
-0.149071
0.094356
-0.312500
-0.149071
-3.857584e-01
PVV
0.398527
1.000000
-0.332106
-0.088561
0.431187
0.328139
-0.184482
-1.084652e-01
0.451243
-0.184482
0.210580
0.376256
-0.210580
0.332106
0.158424
-4.099600e-02
GL-PvdA
0.312500
-0.332106
1.000000
0.770833
-0.509525
-0.418330
0.385758
6.123724e-01
-0.528396
0.597925
-0.509525
-0.559017
0.301941
-0.541667
-0.354044
-1.735913e-01
Partij voor de Dieren
0.541667
-0.088561
0.770833
1.000000
-0.301941
-0.199205
0.385758
4.082483e-01
-0.528396
0.597925
-0.301941
-0.354044
0.094356
-0.541667
-0.354044
-1.735913e-01
FVD
-0.094356
0.431187
-0.509525
-0.301941
1.000000
0.424043
-0.244600
-4.622502e-01
0.692308
-0.628971
0.435897
0.725797
-0.435897
0.509525
0.354459
2.445998e-01
BBB
0.019920
0.328139
-0.418330
-0.199205
0.424043
1.000000
0.092214
-2.927700e-01
0.369910
-0.313527
0.622532
0.552340
-0.027067
0.199205
0.748331
5.163978e-01
DENK
0.385758
-0.184482
0.385758
0.385758
-0.244600
0.092214
1.000000
3.779645e-01
-0.524142
0.607143
-0.052414
-0.310530
0.244600
-0.597925
-0.120761
-2.142857e-01
D66
0.408248
-0.108465
0.612372
0.408248
-0.462250
-0.292770
0.377964
1.000000e+00
-0.462250
0.566947
-0.277350
-0.547723
0.647150
-0.408248
-0.182574
1.573593e-17
BVNL
-0.113228
0.451243
-0.528396
-0.528396
0.692308
0.369910
-0.524142
-4.622502e-01
1.000000
-0.716328
0.316239
0.573886
-0.316239
0.528396
0.388217
1.397713e-01
VOLT
0.385758
-0.184482
0.597925
0.597925
-0.628971
-0.313527
0.607143
5.669467e-01
-0.716328
1.000000
-0.436785
-0.690066
0.244600
-0.597925
-0.500298
-4.107143e-01
SGP
-0.094356
0.210580
-0.509525
-0.301941
0.435897
0.622532
-0.052414
-2.773501e-01
0.316239
-0.436785
1.000000
0.725797
0.128205
0.301941
0.725797
4.367853e-01
JA21
-0.149071
0.376256
-0.559017
-0.354044
0.725797
0.552340
-0.310530
-5.477226e-01
0.573886
-0.690066
0.725797
1.000000
-0.168790
0.559017
0.633333
5.002975e-01
CU
0.094356
-0.210580
0.301941
0.094356
-0.435897
-0.027067
0.244600
6.471502e-01
-0.316239
0.244600
0.128205
-0.168790
1.000000
-0.094356
0.202548
3.319569e-01
VVD
-0.312500
0.332106
-0.541667
-0.541667
0.509525
0.199205
-0.597925
-4.082483e-01
0.528396
-0.597925
0.301941
0.559017
-0.094356
1.000000
0.354044
3.857584e-01
NSC
-0.149071
0.158424
-0.354044
-0.354044
0.354459
0.748331
-0.120761
-1.825742e-01
0.388217
-0.500298
0.725797
0.633333
0.202548
0.354044
1.000000
6.900656e-01
CDA
-0.385758
-0.040996
-0.173591
-0.173591
0.244600
0.516398
-0.214286
1.573593e-17
0.139771
-0.410714
0.436785
0.500298
0.331957
0.385758
0.690066
1.000000e+00
## 2.2 StemWijzer
De [StemWijzer](https://www.stemwijzer.nl/) is gemaakt door de organisatie ProDemos en bevat 30 stellingen. Vragen kunnen beantwoord worden met 'ja', 'geen mening' en 'nee' welke ik als '2', '1' en '0' heb gecodeerd.
```python
# Load data and prepare DataFrame for analysis
stemwijzer_df = pd.read_csv('data/stemwijzer.csv')
stemwijzer_labels = stemwijzer_df.columns[1:]
stemwijzer_df =stemwijzer_df.iloc[1:]
stemwijzer_questions = stemwijzer_df["Stelling (0=Nee; 1=Geen mening; 2=Ja)"]
stemwijzer_df = stemwijzer_df.iloc[:, 1:]# Prepared DataFrame is assigned its own variable to allow the original to be used in the Merged model.
stemwijzer_df_prepared = pd.DataFrame(data=stemwijzer_df, columns=stemwijzer_labels)
stemwijzer_df_prepared = stemwijzer_df_prepared.rename(index=stemwijzer_questions)
stemwijzer_df_prepared = stemwijzer_df_prepared.transpose()
```### Verklaarde variantie per component
```python
model = pca(n_components=5)
results = model.fit_transform(stemwijzer_df_prepared, row_labels=stemwijzer_labels)
model.plot()
```[pca] >Extracting column labels from dataframe.
[pca] >The PCA reduction is performed on the [29] columns of the input dataframe.
[pca] >Fit using PCA.
[pca] >Compute loadings and PCs.
[pca] >Compute explained variance.
[pca] >Outlier detection using Hotelling T2 test with alpha=[0.05] and n_components=[5]
[pca] >Multiple test correction applied for Hotelling T2 test: [fdr_bh]
[pca] >Outlier detection using SPE/DmodX with n_std=[3](,
)
![png](images/output_23_2.png)
### Visualisatie van de posities van de partijen
#### 1d
```python
model = pca(n_components=2)
results = model.fit_transform(stemwijzer_df_prepared, row_labels=stemwijzer_labels)
model.scatter(legend=False, figsize=(20,20), labels=stemwijzer_labels, PC=(0,0), fontsize=16, title="1d Grafiek (diagonaal vanwege leesbaarheid)")
```[scatterd] >INFO> Create scatterplot
[pca] >Extracting column labels from dataframe.
[pca] >The PCA reduction is performed on the [29] columns of the input dataframe.
[pca] >Fit using PCA.
[pca] >Compute loadings and PCs.
[pca] >Compute explained variance.
[pca] >Outlier detection using Hotelling T2 test with alpha=[0.05] and n_components=[2]
[pca] >Multiple test correction applied for Hotelling T2 test: [fdr_bh]
[pca] >Outlier detection using SPE/DmodX with n_std=[3](,
)
![png](images/output_26_3.png)
#### 2d
```python
model = pca(n_components=2)
results = model.fit_transform(stemwijzer_df_prepared, row_labels=stemwijzer_labels)
model.scatter(legend=False, figsize=(20,20), labels=stemwijzer_labels)
```[pca] >Extracting column labels from dataframe.
[pca] >The PCA reduction is performed on the [29] columns of the input dataframe.
[pca] >Fit using PCA.
[pca] >Compute loadings and PCs.
[pca] >Compute explained variance.
[pca] >Outlier detection using Hotelling T2 test with alpha=[0.05] and n_components=[2]
[pca] >Multiple test correction applied for Hotelling T2 test: [fdr_bh]
[pca] >Outlier detection using SPE/DmodX with n_std=[3][scatterd] >INFO> Create scatterplot
(,
)
![png](images/output_28_3.png)
#### 3d
```python
model = pca(n_components=3)
results = model.fit_transform(stemwijzer_df_prepared, row_labels=stemwijzer_labels)
model.scatter3d(legend=False, figsize=(20,20), labels=stemwijzer_labels)
```[scatterd] >INFO> Create scatterplot
[pca] >Extracting column labels from dataframe.
[pca] >The PCA reduction is performed on the [29] columns of the input dataframe.
[pca] >Fit using PCA.
[pca] >Compute loadings and PCs.
[pca] >Compute explained variance.
[pca] >Outlier detection using Hotelling T2 test with alpha=[0.05] and n_components=[3]
[pca] >Multiple test correction applied for Hotelling T2 test: [fdr_bh]
[pca] >Outlier detection using SPE/DmodX with n_std=[3](,
)
![png](images/output_31_3.png)
### Analyse componenten
```python
component_df = model.results["loadings"].iloc[0].sort_values(ascending=False)
fig, ax = plt.subplots(figsize=(20,12))
component_df.plot.barh(ax=ax)
fig.tight_layout()
```/tmp/ipykernel_13280/279368595.py:4: UserWarning: Tight layout not applied. The left and right margins cannot be made large enough to accommodate all axes decorations.
fig.tight_layout()
![png](images/output_33_1.png)
```python
component_df = model.results["loadings"].iloc[1].sort_values(ascending=False)
fig, ax = plt.subplots(figsize=(20,12))
component_df.plot.barh(ax=ax)
fig.tight_layout()
```/tmp/ipykernel_13280/845681879.py:4: UserWarning: Tight layout not applied. The left and right margins cannot be made large enough to accommodate all axes decorations.
fig.tight_layout()
![png](images/output_34_1.png)
### Correlatie standpunten van de partijen
```python
fig, ax = plt.subplots(figsize=(10,10))
sns.heatmap(stemwijzer_df.corr())
```
![png](images/output_36_1.png)
```python
stemwijzer_df.corr()
```.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}.dataframe tbody tr th {
vertical-align: top;
}.dataframe thead th {
text-align: right;
}
SGP
VVD
CDA
FVD
CU
JA21
NSC
BBB
VOLT
DENK
PVV
BVNL
SP
Partij voor de Dieren
GL-PvdA
D66
BIJ1
SGP
1.000000
0.312527
0.264683
0.330086
0.242056
0.358273
0.219330
0.113996
0.022939
-0.264683
0.027696
0.164488
-0.421332
-0.264683
-0.312527
-0.253024
-0.404716
VVD
0.312527
1.000000
0.353273
0.240300
-0.330514
0.490025
0.009855
0.380019
0.029053
-0.064684
0.127558
0.318182
-0.497568
-0.641863
-0.267677
0.063564
-0.476072
CDA
0.264683
0.353273
1.000000
0.028989
0.217078
0.273402
0.650516
0.226604
0.257603
0.137255
-0.180962
0.079611
-0.289216
-0.289216
-0.064684
0.260927
-0.263048
FVD
0.330086
0.240300
0.028989
1.000000
-0.341309
0.631187
0.239237
0.354436
-0.410337
-0.169100
0.366624
0.613010
-0.028989
-0.449323
-0.524737
-0.545205
-0.349760
CU
0.242056
-0.330514
0.217078
-0.341309
1.000000
-0.394748
0.158676
-0.207762
0.507557
0.157640
-0.174902
-0.430193
0.082696
0.382471
0.406585
0.385162
0.188390
JA21
0.358273
0.490025
0.273402
0.631187
-0.394748
1.000000
0.417118
0.256188
-0.533440
-0.344832
0.418015
0.670034
-0.201973
-0.630550
-0.562528
-0.275326
-0.677076
NSC
0.219330
0.009855
0.650516
0.239237
0.158676
0.417118
1.000000
0.148796
-0.041231
-0.228166
0.022402
0.275946
-0.087383
-0.228166
-0.295656
0.005168
-0.270336
BBB
0.113996
0.380019
0.226604
0.354436
-0.207762
0.256188
0.148796
1.000000
-0.149049
0.273402
0.131340
0.417521
-0.298033
-0.512321
-0.307515
-0.180929
-0.335421
VOLT
0.022939
0.029053
0.257603
-0.410337
0.507557
-0.533440
-0.041231
-0.149049
1.000000
0.195154
-0.533654
-0.565213
-0.182143
0.270613
0.430512
0.617731
0.234474
DENK
-0.264683
-0.064684
0.137255
-0.169100
0.157640
-0.344832
-0.228166
0.273402
0.195154
1.000000
-0.037700
-0.223906
0.289216
0.004902
0.064684
0.041748
0.335013
PVV
0.027696
0.127558
-0.180962
0.366624
-0.174902
0.418015
0.022402
0.131340
-0.533654
-0.037700
1.000000
0.390326
0.253849
-0.256362
-0.349508
-0.398675
-0.141233
BVNL
0.164488
0.318182
0.079611
0.613010
-0.430193
0.670034
0.275946
0.417521
-0.565213
-0.223906
0.390326
1.000000
-0.079611
-0.512495
-0.464646
-0.524404
-0.400505
SP
-0.421332
-0.497568
-0.289216
-0.028989
0.082696
-0.201973
-0.087383
-0.298033
-0.182143
0.289216
0.253849
-0.079611
1.000000
0.431373
0.208979
-0.260927
0.550911
Partij voor de Dieren
-0.264683
-0.641863
-0.289216
-0.449323
0.382471
-0.630550
-0.228166
-0.512321
0.270613
0.004902
-0.256362
-0.512495
0.431373
1.000000
0.641863
0.193086
0.550911
GL-PvdA
-0.312527
-0.267677
-0.064684
-0.524737
0.406585
-0.562528
-0.295656
-0.307515
0.430512
0.064684
-0.349508
-0.464646
0.208979
0.641863
1.000000
0.550889
0.403024
D66
-0.253024
0.063564
0.260927
-0.545205
0.385162
-0.275326
0.005168
-0.180929
0.617731
0.041748
-0.398675
-0.524404
-0.260927
0.193086
0.550889
1.000000
-0.058121
BIJ1
-0.404716
-0.476072
-0.263048
-0.349760
0.188390
-0.677076
-0.270336
-0.335421
0.234474
0.335013
-0.141233
-0.400505
0.550911
0.550911
0.403024
-0.058121
1.000000
## 2.3 StemmenTracker
Gemaakt door de organisatie achter de **StemWijzer*, onderscheid de [StemmenTracker](https://www.stemmentracker.nl/) zich door net als **Check je stem** het stemgedrag van de partijen als data te gebruiken. De StemmenTracker loopt in mindere mate tegen het zelfde probleem als **Check je stem** aan, namelijk dat zowel BIJ1 (4) als de NSC (4) niet bij alle stemmingen aanwezig waren.
Omdat de afwezigheid van beide partijen geen overlap heeft, zou ik van de 30 stellingen er 8 moeten weggooien om beide partijen mee te nemen. Ik heb er daarom voor gekozen om ook hier BIJ1 weg te laten maar NSC wel mee te nemen. Hierdoor blijven er 26 stellingen over.
Het stemgedrag van Groenlinks en de PvdA had 100% overlap en zijn daarom samengevoegd tot de 'GL-PvdA' combinatie waarmee ze in de verkiezingen ook op de lijst staan.
```python
# Load data and prepare DataFrame for analysis
stemmen_tracker_df = pd.read_csv('data/stemmentracker.csv')
stemmen_tracker_df = stemmen_tracker_df.drop("BIJ1", axis=1)
stemmen_tracker_df = stemmen_tracker_df.dropna(axis=0)
stemmen_tracker_labels = stemmen_tracker_df.columns[1:]
stemmen_tracker_df =stemmen_tracker_df.iloc[1:]
stemmen_tracker_questions = stemmen_tracker_df["Motie"]
stemmen_tracker_df = stemmen_tracker_df.iloc[:, 1:]# Prepared DataFrame is assigned its own variable to allow the original to be used in the Merged model.
stemmen_tracker_df_prepared = pd.DataFrame(data=stemmen_tracker_df, columns=stemmen_tracker_labels)
stemmen_tracker_df_prepared = stemmen_tracker_df_prepared.rename(index=stemmen_tracker_questions)
stemmen_tracker_df_prepared = stemmen_tracker_df_prepared.transpose()
```### Verklaarde variantie per component
```python
model = pca(n_components=5)
results = model.fit_transform(stemmen_tracker_df_prepared, row_labels=stemmen_tracker_labels)
model.plot()
```[pca] >Extracting column labels from dataframe.
[pca] >The PCA reduction is performed on the [25] columns of the input dataframe.
[pca] >Fit using PCA.
[pca] >Compute loadings and PCs.
[pca] >Compute explained variance.
[pca] >Outlier detection using Hotelling T2 test with alpha=[0.05] and n_components=[5]
[pca] >Multiple test correction applied for Hotelling T2 test: [fdr_bh]
[pca] >Outlier detection using SPE/DmodX with n_std=[3](,
)
![png](images/output_41_2.png)
### Visualisatie van de posities van de partijen
#### 1d
```python
model = pca(n_components=2)
results = model.fit_transform(stemmen_tracker_df_prepared, row_labels=stemmen_tracker_labels)
model.scatter(legend=False, figsize=(20,20), labels=stemmen_tracker_labels, PC=(0,0), fontsize=16, title="1d Grafiek (diagonaal vanwege leesbaarheid)")```
[scatterd] >INFO> Create scatterplot
[pca] >Extracting column labels from dataframe.
[pca] >The PCA reduction is performed on the [25] columns of the input dataframe.
[pca] >Fit using PCA.
[pca] >Compute loadings and PCs.
[pca] >Compute explained variance.
[pca] >Outlier detection using Hotelling T2 test with alpha=[0.05] and n_components=[2]
[pca] >Multiple test correction applied for Hotelling T2 test: [fdr_bh]
[pca] >Outlier detection using SPE/DmodX with n_std=[3](,
)
![png](images/output_44_3.png)
#### 2d
```python
model = pca(n_components=2)
results = model.fit_transform(stemmen_tracker_df_prepared, row_labels=stemmen_tracker_labels)
model.scatter(legend=False, figsize=(20,20), labels=stemmen_tracker_labels)
```[scatterd] >INFO> Create scatterplot
[pca] >Extracting column labels from dataframe.
[pca] >The PCA reduction is performed on the [25] columns of the input dataframe.
[pca] >Fit using PCA.
[pca] >Compute loadings and PCs.
[pca] >Compute explained variance.
[pca] >Outlier detection using Hotelling T2 test with alpha=[0.05] and n_components=[2]
[pca] >Multiple test correction applied for Hotelling T2 test: [fdr_bh]
[pca] >Outlier detection using SPE/DmodX with n_std=[3](,
)
![png](images/output_46_3.png)
#### 3d
```python
model = pca(n_components=3)
results = model.fit_transform(stemmen_tracker_df_prepared, row_labels=stemmen_tracker_labels)
model.scatter3d(legend=False, figsize=(20,20), labels=stemmen_tracker_labels)
```[pca] >Extracting column labels from dataframe.
[pca] >The PCA reduction is performed on the [25] columns of the input dataframe.
[pca] >Fit using PCA.
[pca] >Compute loadings and PCs.
[pca] >Compute explained variance.
[pca] >Outlier detection using Hotelling T2 test with alpha=[0.05] and n_components=[3]
[pca] >Multiple test correction applied for Hotelling T2 test: [fdr_bh]
[pca] >Outlier detection using SPE/DmodX with n_std=[3][scatterd] >INFO> Create scatterplot
(,
)
![png](images/output_48_3.png)
### Analyse componenten
```python
component_df = model.results["loadings"].iloc[0].sort_values(ascending=False)
fig, ax = plt.subplots(figsize=(20,12))
component_df.plot.barh(ax=ax)
fig.tight_layout()
```
![png](images/output_50_0.png)
```python
component_df = model.results["loadings"].iloc[1].sort_values(ascending=False)
fig, ax = plt.subplots(figsize=(20,12))
component_df.plot.barh(ax=ax)
fig.tight_layout()
```
![png](images/output_51_0.png)
### Correlatie stemgedrag van de partijen
```python
fig, ax = plt.subplots(figsize=(10,10))
sns.heatmap(stemmen_tracker_df.corr())
```
![png](images/output_53_1.png)
```python
stemmen_tracker_df.corr()
```.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}.dataframe tbody tr th {
vertical-align: top;
}.dataframe thead th {
text-align: right;
}
PVV
FVD
JA21
BBB
BVNL
VVD
D66
CDA
SP
GL-PvdA
Partij voor de Dieren
CU
SGP
DENK
VOLT
NSC
PVV
1.000000
0.510355
0.479167
0.557370
0.378726
-0.068041
-0.113424
0.041667
-0.041667
-0.329044
-0.113424
-0.068041
0.174595
0.131944
-0.386976
0.174595
FVD
0.510355
1.000000
0.510355
0.428412
0.773906
-0.098693
-0.438719
-0.174595
-0.161165
-0.461039
-0.277425
-0.427669
-0.136364
-0.161165
-0.529043
-0.136364
JA21
0.479167
0.510355
1.000000
0.378726
0.557370
-0.068041
-0.447024
0.041667
-0.215278
-0.664804
-0.447024
-0.238145
0.174595
-0.215278
-0.553777
0.174595
BBB
0.557370
0.428412
0.378726
1.000000
0.264706
-0.140028
-0.199098
0.157207
0.200082
-0.262575
-0.199098
0.035007
0.255665
0.021437
-0.144174
0.428412
BVNL
0.378726
0.773906
0.557370
0.264706
1.000000
0.035007
-0.370734
-0.378726
0.021437
-0.435322
-0.199098
-0.665133
-0.262575
-0.157207
-0.315810
-0.262575
VVD
-0.068041
-0.098693
-0.068041
-0.140028
0.035007
1.000000
0.359546
0.578352
0.102062
0.065795
-0.130744
0.333333
0.065795
-0.068041
0.130744
0.065795
D66
-0.113424
-0.438719
-0.447024
-0.199098
-0.370734
0.359546
1.000000
0.280224
0.386976
0.690337
0.519231
0.522976
0.045162
0.220176
0.602564
0.206456
CDA
0.041667
-0.174595
0.041667
0.157207
-0.378726
0.578352
0.280224
1.000000
0.041667
-0.006715
-0.220176
0.748455
0.496924
0.041667
0.053376
0.496924
SP
-0.041667
-0.161165
-0.215278
0.200082
0.021437
0.102062
0.386976
0.041667
1.000000
0.510355
0.386976
0.102062
0.174595
0.305556
0.447024
0.342475
GL-PvdA
-0.329044
-0.461039
-0.664804
-0.262575
-0.435322
0.065795
0.690337
-0.006715
0.510355
1.000000
0.690337
0.230283
-0.136364
0.342475
0.600012
0.025974
Partij voor de Dieren
-0.113424
-0.277425
-0.447024
-0.199098
-0.199098
-0.130744
0.519231
-0.220176
0.386976
0.690337
1.000000
0.032686
-0.116131
0.386976
0.602564
0.045162
CU
-0.068041
-0.427669
-0.238145
0.035007
-0.665133
0.333333
0.522976
0.748455
0.102062
0.230283
0.032686
1.000000
0.559259
0.272166
0.294174
0.559259
SGP
0.174595
-0.136364
0.174595
0.255665
-0.262575
0.065795
0.045162
0.496924
0.174595
-0.136364
-0.116131
0.559259
1.000000
0.006715
0.116131
0.675325
DENK
0.131944
-0.161165
-0.215278
0.021437
-0.157207
-0.068041
0.220176
0.041667
0.305556
0.342475
0.386976
0.272166
0.006715
1.000000
0.280224
0.006715
VOLT
-0.386976
-0.529043
-0.553777
-0.144174
-0.315810
0.130744
0.602564
0.053376
0.447024
0.600012
0.602564
0.294174
0.116131
0.280224
1.000000
0.116131
NSC
0.174595
-0.136364
0.174595
0.428412
-0.262575
0.065795
0.206456
0.496924
0.342475
0.025974
0.045162
0.559259
0.675325
0.006715
0.116131
1.000000
## 2.4 Kieskompas
Het [Kieskompas](https://www.kieskompas.nl/) is een stemwijzer gemaakt door de krant Trouw met 30 stellingen. De partijen worden na afloop op een 2d grafiek geplaatst met 'links/ rechts' en 'progressief / conservatief' als assen.
De vragen worden op een [Likert scale](https://en.wikipedia.org/wiki/Likert_scale) beantwoord met -2, -1, 0, 1 en 2. Het Kieskompas bevat ook de optie 'geen mening' maar deze is door geen van de partijen gebruikt en is daarom niet meegenomen in de codering.
```python
# Load data and prepare DataFrame for analysis
kieskompas_df = pd.read_csv('data/kieskompas.csv')
kieskompas_df = kieskompas_df.dropna(axis=1)
kieskompas_df_labels = kieskompas_df.columns[1:]
kieskompas_df =kieskompas_df.iloc[1:]
kieskompas_df_questions = kieskompas_df["stelling (-2; -1; 0=neutraal; 1; 2)"]
kieskompas_df = kieskompas_df.iloc[:, 1:]# Prepared DataFrame is assigned its own variable to allow the original to be used in the Merged model.
kieskompas_df_prepared = pd.DataFrame(data=kieskompas_df, columns=kieskompas_df_labels)
kieskompas_df_prepared = kieskompas_df_prepared.rename(index=kieskompas_df_questions)
kieskompas_df_prepared = kieskompas_df_prepared.transpose()
```### Verklaarde variantie per component
```python
model = pca(n_components=5)
results = model.fit_transform(kieskompas_df_prepared, row_labels=kieskompas_df_labels)
model.plot()
```[pca] >Extracting column labels from dataframe.
[pca] >The PCA reduction is performed on the [29] columns of the input dataframe.
[pca] >Fit using PCA.
[pca] >Compute loadings and PCs.
[pca] >Compute explained variance.
[pca] >Outlier detection using Hotelling T2 test with alpha=[0.05] and n_components=[5]
[pca] >Multiple test correction applied for Hotelling T2 test: [fdr_bh]
[pca] >Outlier detection using SPE/DmodX with n_std=[3](,
)
![png](images/output_58_2.png)
### Visualisatie van de posities van de partijen
#### 1d
```python
model = pca(n_components=2)
results = model.fit_transform(kieskompas_df_prepared, row_labels=kieskompas_df_labels)
model.scatter(legend=False, figsize=(20,20), labels=kieskompas_df_labels, PC=(0,0), fontsize=16, title="1d Grafiek (diagonaal vanwege leesbaarheid)")
```[scatterd] >INFO> Create scatterplot
[pca] >Extracting column labels from dataframe.
[pca] >The PCA reduction is performed on the [29] columns of the input dataframe.
[pca] >Fit using PCA.
[pca] >Compute loadings and PCs.
[pca] >Compute explained variance.
[pca] >Outlier detection using Hotelling T2 test with alpha=[0.05] and n_components=[2]
[pca] >Multiple test correction applied for Hotelling T2 test: [fdr_bh]
[pca] >Outlier detection using SPE/DmodX with n_std=[3](,
)
![png](images/output_61_3.png)
#### 2d
```python
model = pca(n_components=2)
results = model.fit_transform(kieskompas_df_prepared, row_labels=kieskompas_df_labels)
model.scatter(legend=False, figsize=(20,20), labels=kieskompas_df_labels)
```[scatterd] >INFO> Create scatterplot
[pca] >Extracting column labels from dataframe.
[pca] >The PCA reduction is performed on the [29] columns of the input dataframe.
[pca] >Fit using PCA.
[pca] >Compute loadings and PCs.
[pca] >Compute explained variance.
[pca] >Outlier detection using Hotelling T2 test with alpha=[0.05] and n_components=[2]
[pca] >Multiple test correction applied for Hotelling T2 test: [fdr_bh]
[pca] >Outlier detection using SPE/DmodX with n_std=[3](,
)
![png](images/output_63_3.png)
#### 3d
```python
model = pca(n_components=3)
results = model.fit_transform(kieskompas_df_prepared, row_labels=kieskompas_df_labels)
model.scatter3d(legend=False, figsize=(20,20), labels=kieskompas_df_labels)
```[scatterd] >INFO> Create scatterplot
[pca] >Extracting column labels from dataframe.
[pca] >The PCA reduction is performed on the [29] columns of the input dataframe.
[pca] >Fit using PCA.
[pca] >Compute loadings and PCs.
[pca] >Compute explained variance.
[pca] >Outlier detection using Hotelling T2 test with alpha=[0.05] and n_components=[3]
[pca] >Multiple test correction applied for Hotelling T2 test: [fdr_bh]
[pca] >Outlier detection using SPE/DmodX with n_std=[3](,
)
![png](images/output_65_3.png)
### Analyse componenten
```python
model = pca(n_components=2)
results = model.fit_transform(kieskompas_df_prepared, row_labels=kieskompas_df_labels)
component_df = model.results["loadings"].iloc[0].sort_values(ascending=False)
fig, ax = plt.subplots(figsize=(20,12))
component_df.plot.barh(ax=ax)
fig.tight_layout()
```[pca] >Extracting column labels from dataframe.
[pca] >The PCA reduction is performed on the [29] columns of the input dataframe.
[pca] >Fit using PCA.
[pca] >Compute loadings and PCs.
[pca] >Compute explained variance.
[pca] >Outlier detection using Hotelling T2 test with alpha=[0.05] and n_components=[2]
[pca] >Multiple test correction applied for Hotelling T2 test: [fdr_bh]
[pca] >Outlier detection using SPE/DmodX with n_std=[3]
![png](images/output_67_1.png)
```python
component_df = model.results["loadings"].iloc[1].sort_values(ascending=False)
fig, ax = plt.subplots(figsize=(20,12))
component_df.plot.barh(ax=ax)
fig.tight_layout()
```
![png](images/output_68_0.png)
### Correlatie standpunten van de partijen
```python
fig, ax = plt.subplots(figsize=(10,10))
sns.heatmap(kieskompas_df.corr())
```
![png](images/output_70_1.png)
```python
kieskompas_df.corr()
```.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}.dataframe tbody tr th {
vertical-align: top;
}.dataframe thead th {
text-align: right;
}
SGP
VVD
CDA
FVD
CU
JA21
NSC
BBB
VOLT
DENK
PVV
BVNL
SP
Partij voor de Dieren
GL-PvdA
D66
BIJ1
SGP
1.000000
0.498676
0.742955
0.496533
0.295812
0.593788
0.455999
0.642521
-0.280852
-0.036527
0.403871
0.447891
-0.141408
-0.527947
-0.328309
-0.217313
-0.555560
VVD
0.498676
1.000000
0.563205
0.639830
-0.280328
0.745577
0.362060
0.624179
-0.269583
-0.513088
0.534769
0.731928
-0.557567
-0.754800
-0.537104
-0.137651
-0.743069
CDA
0.742955
0.563205
1.000000
0.347505
0.328573
0.410906
0.782175
0.737032
-0.060893
0.011466
0.294340
0.234035
-0.106885
-0.357807
-0.097485
0.174447
-0.404159
FVD
0.496533
0.639830
0.347505
1.000000
-0.333067
0.932560
0.111626
0.511349
-0.613763
-0.544102
0.659435
0.883869
-0.530002
-0.870775
-0.869068
-0.585792
-0.890381
CU
0.295812
-0.280328
0.328573
-0.333067
1.000000
-0.305816
0.507366
0.086527
0.501477
0.641826
-0.231494
-0.446368
0.466801
0.426793
0.542719
0.362416
0.358504
JA21
0.593788
0.745577
0.410906
0.932560
-0.305816
1.000000
0.187794
0.585716
-0.661629
-0.545692
0.699018
0.922785
-0.540137
-0.925324
-0.848436
-0.579381
-0.945880
NSC
0.455999
0.362060
0.782175
0.111626
0.507366
0.187794
1.000000
0.683809
0.098625
0.256326
0.303470
-0.008167
0.100093
-0.074291
0.150841
0.322966
-0.127138
BBB
0.642521
0.624179
0.737032
0.511349
0.086527
0.585716
0.683809
1.000000
-0.304223
-0.050758
0.603367
0.447431
-0.171843
-0.494562
-0.311638
-0.158093
-0.518591
VOLT
-0.280852
-0.269583
-0.060893
-0.613763
0.501477
-0.661629
0.098625
-0.304223
1.000000
0.306212
-0.632619
-0.574231
0.243366
0.570185
0.726941
0.771245
0.629898
DENK
-0.036527
-0.513088
0.011466
-0.544102
0.641826
-0.545692
0.256326
-0.050758
0.306212
1.000000
-0.147556
-0.689285
0.589468
0.511477
0.539233
0.286898
0.584174
PVV
0.403871
0.534769
0.294340
0.659435
-0.231494
0.699018
0.303470
0.603367
-0.632619
-0.147556
1.000000
0.624747
-0.126305
-0.613352
-0.582898
-0.558314
-0.595962
BVNL
0.447891
0.731928
0.234035
0.883869
-0.446368
0.922785
-0.008167
0.447431
-0.574231
-0.689285
0.624747
1.000000
-0.640855
-0.893219
-0.842626
-0.582158
-0.898951
SP
-0.141408
-0.557567
-0.106885
-0.530002
0.466801
-0.540137
0.100093
-0.171843
0.243366
0.589468
-0.126305
-0.640855
1.000000
0.703695
0.624559
0.214847
0.660983
Partij voor de Dieren
-0.527947
-0.754800
-0.357807
-0.870775
0.426793
-0.925324
-0.074291
-0.494562
0.570185
0.511477
-0.613352
-0.893219
0.703695
1.000000
0.870506
0.490938
0.920840
GL-PvdA
-0.328309
-0.537104
-0.097485
-0.869068
0.542719
-0.848436
0.150841
-0.311638
0.726941
0.539233
-0.582898
-0.842626
0.624559
0.870506
1.000000
0.711516
0.877832
D66
-0.217313
-0.137651
0.174447
-0.585792
0.362416
-0.579381
0.322966
-0.158093
0.771245
0.286898
-0.558314
-0.582158
0.214847
0.490938
0.711516
1.000000
0.576962
BIJ1
-0.555560
-0.743069
-0.404159
-0.890381
0.358504
-0.945880
-0.127138
-0.518591
0.629898
0.584174
-0.595962
-0.898951
0.660983
0.920840
0.877832
0.576962
1.000000