Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/jasper-koops/analyse_verkiezingen_2e_kamer_2023


https://github.com/jasper-koops/analyse_verkiezingen_2e_kamer_2023

Last synced: 24 days ago
JSON representation

Awesome Lists containing this project

README

        

```python
import pandas as pd
from pca import pca
import seaborn as sns
import matplotlib.pyplot as plt
```

# 1. Hoofdcomponentenanalyse

Hoofdcomponentenanalyse (PCA) is een statistische techniek die wordt gebruikt voor dimensionale reductie. Dit is een techniek waarmee de dimensies of variabelen van een dataset worden verminderd, terwijl zo veel mogelijk relevante informatie behouden blijft. Het doel is om de complexiteit van gegevens te verminderen door de dataset om te zetten in een lagere dimensionale ruimte, terwijl belangrijke patronen en relaties in de gegevens behouden blijven.

De overgebleven dimensies, de 'hoofdcomponenten' zijn een combinatie van de oorspronkelijke variabelen en kunnen worden gebruikt om de belangrijkste patronen in de data te identificeren. Het is hierdoor mogelijk om de data te visualiseren in een grafiek met minder dimensies. Niet elk component verklaard evenveel van de variantie in de data. De eerste component verklaard de meeste variantie, de tweede component verklaard de meeste variantie van de overgebleven variantie, etc. De toegevoegde waarde van een component neemt hierdoor af naarmate er meer componenten worden toegevoegd omdat elk nieuw component een kleiner deel van de variantie verklaard.

# 2. Analyze stemhulpen

Dit is een analyse van hoe de verschillende partijen zich tot elkaar verhouden op basis van de stellingen van de stemhulpen. Het doel is om hierdoor een beter beeld te kunnen geven van het antwoord op vragen als; "Waar bevindt de NSC zich op het politieke spectrum?", "hoeveel overlap is er tussen de partijen?".

Elke vraag in een stemhulp representeert een dimensie, dit betekent dat een stemhulp met 30 vragen een 30 dimensionale dataset oplevert. Om deze data alsnog te kunnen visualiseren is er een hoofdcomponentenanalyse toegepast op de data van alle stemhulpen om het aantal dimensies terug te brengen tot 1, 2 of 3 dimensies. Hierdoor is het mogelijk de posities van de partijen in een grafiek te visualiseren. Partijen die zich dicht bij elkaar bevinden zijn het met elkaar eens, partijen die ver van elkaar af staan zijn het oneens. Het is belangrijk de beseffen dat niet elke as ('component') evenveel van de variantie verklaard. In de grafieken staat per as aangegeven welk deel van de variantie verklaard wordt en dus ook hoe 'belangrijk' deze as is.

Per component is er analyse van de stellingen die de grootste invloed hebben op de positie van de partijen. Hierdoor wordt inzichtelijk gemaakt wat een as in de grafiek representeert. De antwoorden van de partijen op de stellingen zijn gecodeerd als nummer. Hoe hoger het nummer, hoe meer de partij het eens is met de stelling, negatieve nummers betekenen dat de partij het oneens is met de stelling. Deze antwoorden worden vermenigvuldigt met de waardes ('loadings') die de stellingen hebben voor de component. De som van deze vermenigvuldigingen is de positie van de partij op de component.

Als laatste heb ik de corrolatie tussen de antwoorden van de partijen berekend. Hoe hoger de correlatie, hoe meer de partijen het met elkaar eens zijn.

## 2.1 Check je stem

[Check je stem](https://checkjestem.nl/) is een stemhulp die niet de standpunten maar het stemgedrag van de partijen volgt. Het probleem met deze checker is dat bij de gekozen stemmingen BIJ1 (7) en NSC (3) niet altijd aanwezig waren. Deze stemmingen kunnen hierdoor *of* niet meegenomen worden in de analyse, *of* de analyse moet zonder de partijen plaatsvinden. Gezien het grote aantal gemiste stemmingen (7) en het feit dat de partij in de peilingen stabiel op 0 zetels staat is er besloten om BIJ1 weg te laten uit de analyse. NSC is wel meegenomen gezien het grote aantal zetels in de peilingen.

Van de 26 stellingen blijven er hierdoor 23 over. Het stemgedrag van Groenlinks en de PvdA had 100% overlap en zijn daarom samengevoegd tot de 'GL-PvdA' combinatie waarmee ze in de verkiezingen ook op de lijst staan.

```python
# Load data and prepare DataFrame for analysis
check_je_stem_df = pd.read_csv('data/check_je_stem.csv')
check_je_stem_df = check_je_stem_df.drop("BIJ1", axis=1)
check_je_stem_df = check_je_stem_df.dropna(axis=0)
check_je_stem_labels = check_je_stem_df.columns[1:]
check_je_stem_df =check_je_stem_df.iloc[1:]
check_je_stem_questions = check_je_stem_df["vraag"]
check_je_stem_df = check_je_stem_df.iloc[:, 1:]

# Prepared DataFrame is assigned its own variable to allow the original to be used in the Merged model.
check_je_stem_prepared = pd.DataFrame(data=check_je_stem_df, columns=check_je_stem_labels)
check_je_stem_prepared = check_je_stem_prepared.rename(index=check_je_stem_questions)
check_je_stem_prepared = check_je_stem_prepared.transpose()
```

### Verklaarde variantie per component

```python
model = pca(n_components=5)
results = model.fit_transform(check_je_stem_prepared, row_labels=check_je_stem_labels)
model.plot()
```

[pca] >Extracting column labels from dataframe.
[pca] >The PCA reduction is performed on the [22] columns of the input dataframe.
[pca] >Fit using PCA.
[pca] >Compute loadings and PCs.
[pca] >Compute explained variance.
[pca] >Outlier detection using Hotelling T2 test with alpha=[0.05] and n_components=[5]
[pca] >Multiple test correction applied for Hotelling T2 test: [fdr_bh]
[pca] >Outlier detection using SPE/DmodX with n_std=[3]

(,
)


![png](images/output_6_2.png)

### Visualisatie van de posities van de partijen

#### 1d

```python
model = pca(n_components=2)
results = model.fit_transform(check_je_stem_prepared, row_labels=check_je_stem_labels)
model.scatter(legend=False, figsize=(20,20), labels=check_je_stem_labels, PC=(0,0), fontsize=16, title="1d Grafiek (diagonaal vanwege leesbaarheid)")

```

[scatterd] >INFO> Create scatterplot

[pca] >Extracting column labels from dataframe.
[pca] >The PCA reduction is performed on the [22] columns of the input dataframe.
[pca] >Fit using PCA.
[pca] >Compute loadings and PCs.
[pca] >Compute explained variance.
[pca] >Outlier detection using Hotelling T2 test with alpha=[0.05] and n_components=[2]
[pca] >Multiple test correction applied for Hotelling T2 test: [fdr_bh]
[pca] >Outlier detection using SPE/DmodX with n_std=[3]

(,
)


![png](images/output_9_3.png)

#### 2d

```python
model = pca(n_components=2)
results = model.fit_transform(check_je_stem_prepared, row_labels=check_je_stem_labels)
model.scatter(legend=False, figsize=(20,20), labels=check_je_stem_labels)
```

[scatterd] >INFO> Create scatterplot

[pca] >Extracting column labels from dataframe.
[pca] >The PCA reduction is performed on the [22] columns of the input dataframe.
[pca] >Fit using PCA.
[pca] >Compute loadings and PCs.
[pca] >Compute explained variance.
[pca] >Outlier detection using Hotelling T2 test with alpha=[0.05] and n_components=[2]
[pca] >Multiple test correction applied for Hotelling T2 test: [fdr_bh]
[pca] >Outlier detection using SPE/DmodX with n_std=[3]

(,
)


![png](images/output_11_3.png)

#### 3d

```python
model = pca(n_components=3)
results = model.fit_transform(check_je_stem_prepared, row_labels=check_je_stem_labels)
model.scatter3d(legend=False, figsize=(20,20), labels=check_je_stem_labels)
```

[scatterd] >INFO> Create scatterplot

[pca] >Extracting column labels from dataframe.
[pca] >The PCA reduction is performed on the [22] columns of the input dataframe.
[pca] >Fit using PCA.
[pca] >Compute loadings and PCs.
[pca] >Compute explained variance.
[pca] >Outlier detection using Hotelling T2 test with alpha=[0.05] and n_components=[3]
[pca] >Multiple test correction applied for Hotelling T2 test: [fdr_bh]
[pca] >Outlier detection using SPE/DmodX with n_std=[3]

(,
)


![png](images/output_13_3.png)

### Analyse componenten

```python
component_df = model.results["loadings"].iloc[0].sort_values(ascending=False)
fig, ax = plt.subplots(figsize=(20,12))
component_df.plot.barh(ax=ax)
fig.tight_layout()
```


![png](images/output_15_0.png)

```python
component_df = model.results["loadings"].iloc[1].sort_values(ascending=False)
fig, ax = plt.subplots(figsize=(20,12))
component_df.plot.barh(ax=ax)
fig.tight_layout()
```


![png](images/output_16_0.png)

### Correlatie stemgedrag van de partijen

```python
fig, ax = plt.subplots(figsize=(10,10))
sns.heatmap(check_je_stem_df.corr())
```


![png](images/output_18_1.png)

```python
check_je_stem_df.corr()
```

.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}

.dataframe tbody tr th {
vertical-align: top;
}

.dataframe thead th {
text-align: right;
}




SP
PVV
GL-PvdA
Partij voor de Dieren
FVD
BBB
DENK
D66
BVNL
VOLT
SGP
JA21
CU
VVD
NSC
CDA




SP
1.000000
0.398527
0.312500
0.541667
-0.094356
0.019920
0.385758
4.082483e-01
-0.113228
0.385758
-0.094356
-0.149071
0.094356
-0.312500
-0.149071
-3.857584e-01


PVV
0.398527
1.000000
-0.332106
-0.088561
0.431187
0.328139
-0.184482
-1.084652e-01
0.451243
-0.184482
0.210580
0.376256
-0.210580
0.332106
0.158424
-4.099600e-02


GL-PvdA
0.312500
-0.332106
1.000000
0.770833
-0.509525
-0.418330
0.385758
6.123724e-01
-0.528396
0.597925
-0.509525
-0.559017
0.301941
-0.541667
-0.354044
-1.735913e-01


Partij voor de Dieren
0.541667
-0.088561
0.770833
1.000000
-0.301941
-0.199205
0.385758
4.082483e-01
-0.528396
0.597925
-0.301941
-0.354044
0.094356
-0.541667
-0.354044
-1.735913e-01


FVD
-0.094356
0.431187
-0.509525
-0.301941
1.000000
0.424043
-0.244600
-4.622502e-01
0.692308
-0.628971
0.435897
0.725797
-0.435897
0.509525
0.354459
2.445998e-01


BBB
0.019920
0.328139
-0.418330
-0.199205
0.424043
1.000000
0.092214
-2.927700e-01
0.369910
-0.313527
0.622532
0.552340
-0.027067
0.199205
0.748331
5.163978e-01


DENK
0.385758
-0.184482
0.385758
0.385758
-0.244600
0.092214
1.000000
3.779645e-01
-0.524142
0.607143
-0.052414
-0.310530
0.244600
-0.597925
-0.120761
-2.142857e-01


D66
0.408248
-0.108465
0.612372
0.408248
-0.462250
-0.292770
0.377964
1.000000e+00
-0.462250
0.566947
-0.277350
-0.547723
0.647150
-0.408248
-0.182574
1.573593e-17


BVNL
-0.113228
0.451243
-0.528396
-0.528396
0.692308
0.369910
-0.524142
-4.622502e-01
1.000000
-0.716328
0.316239
0.573886
-0.316239
0.528396
0.388217
1.397713e-01


VOLT
0.385758
-0.184482
0.597925
0.597925
-0.628971
-0.313527
0.607143
5.669467e-01
-0.716328
1.000000
-0.436785
-0.690066
0.244600
-0.597925
-0.500298
-4.107143e-01


SGP
-0.094356
0.210580
-0.509525
-0.301941
0.435897
0.622532
-0.052414
-2.773501e-01
0.316239
-0.436785
1.000000
0.725797
0.128205
0.301941
0.725797
4.367853e-01


JA21
-0.149071
0.376256
-0.559017
-0.354044
0.725797
0.552340
-0.310530
-5.477226e-01
0.573886
-0.690066
0.725797
1.000000
-0.168790
0.559017
0.633333
5.002975e-01


CU
0.094356
-0.210580
0.301941
0.094356
-0.435897
-0.027067
0.244600
6.471502e-01
-0.316239
0.244600
0.128205
-0.168790
1.000000
-0.094356
0.202548
3.319569e-01


VVD
-0.312500
0.332106
-0.541667
-0.541667
0.509525
0.199205
-0.597925
-4.082483e-01
0.528396
-0.597925
0.301941
0.559017
-0.094356
1.000000
0.354044
3.857584e-01


NSC
-0.149071
0.158424
-0.354044
-0.354044
0.354459
0.748331
-0.120761
-1.825742e-01
0.388217
-0.500298
0.725797
0.633333
0.202548
0.354044
1.000000
6.900656e-01


CDA
-0.385758
-0.040996
-0.173591
-0.173591
0.244600
0.516398
-0.214286
1.573593e-17
0.139771
-0.410714
0.436785
0.500298
0.331957
0.385758
0.690066
1.000000e+00

## 2.2 StemWijzer

De [StemWijzer](https://www.stemwijzer.nl/) is gemaakt door de organisatie ProDemos en bevat 30 stellingen. Vragen kunnen beantwoord worden met 'ja', 'geen mening' en 'nee' welke ik als '2', '1' en '0' heb gecodeerd.

```python
# Load data and prepare DataFrame for analysis
stemwijzer_df = pd.read_csv('data/stemwijzer.csv')
stemwijzer_labels = stemwijzer_df.columns[1:]
stemwijzer_df =stemwijzer_df.iloc[1:]
stemwijzer_questions = stemwijzer_df["Stelling (0=Nee; 1=Geen mening; 2=Ja)"]
stemwijzer_df = stemwijzer_df.iloc[:, 1:]

# Prepared DataFrame is assigned its own variable to allow the original to be used in the Merged model.
stemwijzer_df_prepared = pd.DataFrame(data=stemwijzer_df, columns=stemwijzer_labels)
stemwijzer_df_prepared = stemwijzer_df_prepared.rename(index=stemwijzer_questions)
stemwijzer_df_prepared = stemwijzer_df_prepared.transpose()
```

### Verklaarde variantie per component

```python
model = pca(n_components=5)
results = model.fit_transform(stemwijzer_df_prepared, row_labels=stemwijzer_labels)
model.plot()
```

[pca] >Extracting column labels from dataframe.
[pca] >The PCA reduction is performed on the [29] columns of the input dataframe.
[pca] >Fit using PCA.
[pca] >Compute loadings and PCs.
[pca] >Compute explained variance.
[pca] >Outlier detection using Hotelling T2 test with alpha=[0.05] and n_components=[5]
[pca] >Multiple test correction applied for Hotelling T2 test: [fdr_bh]
[pca] >Outlier detection using SPE/DmodX with n_std=[3]

(,
)


![png](images/output_23_2.png)

### Visualisatie van de posities van de partijen

#### 1d

```python
model = pca(n_components=2)
results = model.fit_transform(stemwijzer_df_prepared, row_labels=stemwijzer_labels)
model.scatter(legend=False, figsize=(20,20), labels=stemwijzer_labels, PC=(0,0), fontsize=16, title="1d Grafiek (diagonaal vanwege leesbaarheid)")
```

[scatterd] >INFO> Create scatterplot

[pca] >Extracting column labels from dataframe.
[pca] >The PCA reduction is performed on the [29] columns of the input dataframe.
[pca] >Fit using PCA.
[pca] >Compute loadings and PCs.
[pca] >Compute explained variance.
[pca] >Outlier detection using Hotelling T2 test with alpha=[0.05] and n_components=[2]
[pca] >Multiple test correction applied for Hotelling T2 test: [fdr_bh]
[pca] >Outlier detection using SPE/DmodX with n_std=[3]

(,
)


![png](images/output_26_3.png)

#### 2d

```python
model = pca(n_components=2)
results = model.fit_transform(stemwijzer_df_prepared, row_labels=stemwijzer_labels)
model.scatter(legend=False, figsize=(20,20), labels=stemwijzer_labels)
```

[pca] >Extracting column labels from dataframe.
[pca] >The PCA reduction is performed on the [29] columns of the input dataframe.
[pca] >Fit using PCA.
[pca] >Compute loadings and PCs.
[pca] >Compute explained variance.
[pca] >Outlier detection using Hotelling T2 test with alpha=[0.05] and n_components=[2]
[pca] >Multiple test correction applied for Hotelling T2 test: [fdr_bh]
[pca] >Outlier detection using SPE/DmodX with n_std=[3]

[scatterd] >INFO> Create scatterplot

(,
)


![png](images/output_28_3.png)

#### 3d

```python
model = pca(n_components=3)
results = model.fit_transform(stemwijzer_df_prepared, row_labels=stemwijzer_labels)
model.scatter3d(legend=False, figsize=(20,20), labels=stemwijzer_labels)
```

[scatterd] >INFO> Create scatterplot

[pca] >Extracting column labels from dataframe.
[pca] >The PCA reduction is performed on the [29] columns of the input dataframe.
[pca] >Fit using PCA.
[pca] >Compute loadings and PCs.
[pca] >Compute explained variance.
[pca] >Outlier detection using Hotelling T2 test with alpha=[0.05] and n_components=[3]
[pca] >Multiple test correction applied for Hotelling T2 test: [fdr_bh]
[pca] >Outlier detection using SPE/DmodX with n_std=[3]

(,
)


![png](images/output_31_3.png)

### Analyse componenten

```python
component_df = model.results["loadings"].iloc[0].sort_values(ascending=False)
fig, ax = plt.subplots(figsize=(20,12))
component_df.plot.barh(ax=ax)
fig.tight_layout()
```

/tmp/ipykernel_13280/279368595.py:4: UserWarning: Tight layout not applied. The left and right margins cannot be made large enough to accommodate all axes decorations.
fig.tight_layout()


![png](images/output_33_1.png)

```python
component_df = model.results["loadings"].iloc[1].sort_values(ascending=False)
fig, ax = plt.subplots(figsize=(20,12))
component_df.plot.barh(ax=ax)
fig.tight_layout()
```

/tmp/ipykernel_13280/845681879.py:4: UserWarning: Tight layout not applied. The left and right margins cannot be made large enough to accommodate all axes decorations.
fig.tight_layout()


![png](images/output_34_1.png)

### Correlatie standpunten van de partijen

```python
fig, ax = plt.subplots(figsize=(10,10))
sns.heatmap(stemwijzer_df.corr())
```


![png](images/output_36_1.png)

```python
stemwijzer_df.corr()
```

.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}

.dataframe tbody tr th {
vertical-align: top;
}

.dataframe thead th {
text-align: right;
}




SGP
VVD
CDA
FVD
CU
JA21
NSC
BBB
VOLT
DENK
PVV
BVNL
SP
Partij voor de Dieren
GL-PvdA
D66
BIJ1




SGP
1.000000
0.312527
0.264683
0.330086
0.242056
0.358273
0.219330
0.113996
0.022939
-0.264683
0.027696
0.164488
-0.421332
-0.264683
-0.312527
-0.253024
-0.404716


VVD
0.312527
1.000000
0.353273
0.240300
-0.330514
0.490025
0.009855
0.380019
0.029053
-0.064684
0.127558
0.318182
-0.497568
-0.641863
-0.267677
0.063564
-0.476072


CDA
0.264683
0.353273
1.000000
0.028989
0.217078
0.273402
0.650516
0.226604
0.257603
0.137255
-0.180962
0.079611
-0.289216
-0.289216
-0.064684
0.260927
-0.263048


FVD
0.330086
0.240300
0.028989
1.000000
-0.341309
0.631187
0.239237
0.354436
-0.410337
-0.169100
0.366624
0.613010
-0.028989
-0.449323
-0.524737
-0.545205
-0.349760


CU
0.242056
-0.330514
0.217078
-0.341309
1.000000
-0.394748
0.158676
-0.207762
0.507557
0.157640
-0.174902
-0.430193
0.082696
0.382471
0.406585
0.385162
0.188390


JA21
0.358273
0.490025
0.273402
0.631187
-0.394748
1.000000
0.417118
0.256188
-0.533440
-0.344832
0.418015
0.670034
-0.201973
-0.630550
-0.562528
-0.275326
-0.677076


NSC
0.219330
0.009855
0.650516
0.239237
0.158676
0.417118
1.000000
0.148796
-0.041231
-0.228166
0.022402
0.275946
-0.087383
-0.228166
-0.295656
0.005168
-0.270336


BBB
0.113996
0.380019
0.226604
0.354436
-0.207762
0.256188
0.148796
1.000000
-0.149049
0.273402
0.131340
0.417521
-0.298033
-0.512321
-0.307515
-0.180929
-0.335421


VOLT
0.022939
0.029053
0.257603
-0.410337
0.507557
-0.533440
-0.041231
-0.149049
1.000000
0.195154
-0.533654
-0.565213
-0.182143
0.270613
0.430512
0.617731
0.234474


DENK
-0.264683
-0.064684
0.137255
-0.169100
0.157640
-0.344832
-0.228166
0.273402
0.195154
1.000000
-0.037700
-0.223906
0.289216
0.004902
0.064684
0.041748
0.335013


PVV
0.027696
0.127558
-0.180962
0.366624
-0.174902
0.418015
0.022402
0.131340
-0.533654
-0.037700
1.000000
0.390326
0.253849
-0.256362
-0.349508
-0.398675
-0.141233


BVNL
0.164488
0.318182
0.079611
0.613010
-0.430193
0.670034
0.275946
0.417521
-0.565213
-0.223906
0.390326
1.000000
-0.079611
-0.512495
-0.464646
-0.524404
-0.400505


SP
-0.421332
-0.497568
-0.289216
-0.028989
0.082696
-0.201973
-0.087383
-0.298033
-0.182143
0.289216
0.253849
-0.079611
1.000000
0.431373
0.208979
-0.260927
0.550911


Partij voor de Dieren
-0.264683
-0.641863
-0.289216
-0.449323
0.382471
-0.630550
-0.228166
-0.512321
0.270613
0.004902
-0.256362
-0.512495
0.431373
1.000000
0.641863
0.193086
0.550911


GL-PvdA
-0.312527
-0.267677
-0.064684
-0.524737
0.406585
-0.562528
-0.295656
-0.307515
0.430512
0.064684
-0.349508
-0.464646
0.208979
0.641863
1.000000
0.550889
0.403024


D66
-0.253024
0.063564
0.260927
-0.545205
0.385162
-0.275326
0.005168
-0.180929
0.617731
0.041748
-0.398675
-0.524404
-0.260927
0.193086
0.550889
1.000000
-0.058121


BIJ1
-0.404716
-0.476072
-0.263048
-0.349760
0.188390
-0.677076
-0.270336
-0.335421
0.234474
0.335013
-0.141233
-0.400505
0.550911
0.550911
0.403024
-0.058121
1.000000

## 2.3 StemmenTracker

Gemaakt door de organisatie achter de **StemWijzer*, onderscheid de [StemmenTracker](https://www.stemmentracker.nl/) zich door net als **Check je stem** het stemgedrag van de partijen als data te gebruiken. De StemmenTracker loopt in mindere mate tegen het zelfde probleem als **Check je stem** aan, namelijk dat zowel BIJ1 (4) als de NSC (4) niet bij alle stemmingen aanwezig waren.

Omdat de afwezigheid van beide partijen geen overlap heeft, zou ik van de 30 stellingen er 8 moeten weggooien om beide partijen mee te nemen. Ik heb er daarom voor gekozen om ook hier BIJ1 weg te laten maar NSC wel mee te nemen. Hierdoor blijven er 26 stellingen over.

Het stemgedrag van Groenlinks en de PvdA had 100% overlap en zijn daarom samengevoegd tot de 'GL-PvdA' combinatie waarmee ze in de verkiezingen ook op de lijst staan.

```python
# Load data and prepare DataFrame for analysis
stemmen_tracker_df = pd.read_csv('data/stemmentracker.csv')
stemmen_tracker_df = stemmen_tracker_df.drop("BIJ1", axis=1)
stemmen_tracker_df = stemmen_tracker_df.dropna(axis=0)
stemmen_tracker_labels = stemmen_tracker_df.columns[1:]
stemmen_tracker_df =stemmen_tracker_df.iloc[1:]
stemmen_tracker_questions = stemmen_tracker_df["Motie"]
stemmen_tracker_df = stemmen_tracker_df.iloc[:, 1:]

# Prepared DataFrame is assigned its own variable to allow the original to be used in the Merged model.
stemmen_tracker_df_prepared = pd.DataFrame(data=stemmen_tracker_df, columns=stemmen_tracker_labels)
stemmen_tracker_df_prepared = stemmen_tracker_df_prepared.rename(index=stemmen_tracker_questions)
stemmen_tracker_df_prepared = stemmen_tracker_df_prepared.transpose()
```

### Verklaarde variantie per component

```python
model = pca(n_components=5)
results = model.fit_transform(stemmen_tracker_df_prepared, row_labels=stemmen_tracker_labels)
model.plot()
```

[pca] >Extracting column labels from dataframe.
[pca] >The PCA reduction is performed on the [25] columns of the input dataframe.
[pca] >Fit using PCA.
[pca] >Compute loadings and PCs.
[pca] >Compute explained variance.
[pca] >Outlier detection using Hotelling T2 test with alpha=[0.05] and n_components=[5]
[pca] >Multiple test correction applied for Hotelling T2 test: [fdr_bh]
[pca] >Outlier detection using SPE/DmodX with n_std=[3]

(,
)


![png](images/output_41_2.png)

### Visualisatie van de posities van de partijen

#### 1d

```python
model = pca(n_components=2)
results = model.fit_transform(stemmen_tracker_df_prepared, row_labels=stemmen_tracker_labels)
model.scatter(legend=False, figsize=(20,20), labels=stemmen_tracker_labels, PC=(0,0), fontsize=16, title="1d Grafiek (diagonaal vanwege leesbaarheid)")

```

[scatterd] >INFO> Create scatterplot

[pca] >Extracting column labels from dataframe.
[pca] >The PCA reduction is performed on the [25] columns of the input dataframe.
[pca] >Fit using PCA.
[pca] >Compute loadings and PCs.
[pca] >Compute explained variance.
[pca] >Outlier detection using Hotelling T2 test with alpha=[0.05] and n_components=[2]
[pca] >Multiple test correction applied for Hotelling T2 test: [fdr_bh]
[pca] >Outlier detection using SPE/DmodX with n_std=[3]

(,
)


![png](images/output_44_3.png)

#### 2d

```python
model = pca(n_components=2)
results = model.fit_transform(stemmen_tracker_df_prepared, row_labels=stemmen_tracker_labels)
model.scatter(legend=False, figsize=(20,20), labels=stemmen_tracker_labels)
```

[scatterd] >INFO> Create scatterplot

[pca] >Extracting column labels from dataframe.
[pca] >The PCA reduction is performed on the [25] columns of the input dataframe.
[pca] >Fit using PCA.
[pca] >Compute loadings and PCs.
[pca] >Compute explained variance.
[pca] >Outlier detection using Hotelling T2 test with alpha=[0.05] and n_components=[2]
[pca] >Multiple test correction applied for Hotelling T2 test: [fdr_bh]
[pca] >Outlier detection using SPE/DmodX with n_std=[3]

(,
)


![png](images/output_46_3.png)

#### 3d

```python
model = pca(n_components=3)
results = model.fit_transform(stemmen_tracker_df_prepared, row_labels=stemmen_tracker_labels)
model.scatter3d(legend=False, figsize=(20,20), labels=stemmen_tracker_labels)
```

[pca] >Extracting column labels from dataframe.
[pca] >The PCA reduction is performed on the [25] columns of the input dataframe.
[pca] >Fit using PCA.
[pca] >Compute loadings and PCs.
[pca] >Compute explained variance.
[pca] >Outlier detection using Hotelling T2 test with alpha=[0.05] and n_components=[3]
[pca] >Multiple test correction applied for Hotelling T2 test: [fdr_bh]
[pca] >Outlier detection using SPE/DmodX with n_std=[3]

[scatterd] >INFO> Create scatterplot

(,
)


![png](images/output_48_3.png)

### Analyse componenten

```python
component_df = model.results["loadings"].iloc[0].sort_values(ascending=False)
fig, ax = plt.subplots(figsize=(20,12))
component_df.plot.barh(ax=ax)
fig.tight_layout()
```


![png](images/output_50_0.png)

```python
component_df = model.results["loadings"].iloc[1].sort_values(ascending=False)
fig, ax = plt.subplots(figsize=(20,12))
component_df.plot.barh(ax=ax)
fig.tight_layout()
```


![png](images/output_51_0.png)

### Correlatie stemgedrag van de partijen

```python
fig, ax = plt.subplots(figsize=(10,10))
sns.heatmap(stemmen_tracker_df.corr())
```


![png](images/output_53_1.png)

```python
stemmen_tracker_df.corr()
```

.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}

.dataframe tbody tr th {
vertical-align: top;
}

.dataframe thead th {
text-align: right;
}




PVV
FVD
JA21
BBB
BVNL
VVD
D66
CDA
SP
GL-PvdA
Partij voor de Dieren
CU
SGP
DENK
VOLT
NSC




PVV
1.000000
0.510355
0.479167
0.557370
0.378726
-0.068041
-0.113424
0.041667
-0.041667
-0.329044
-0.113424
-0.068041
0.174595
0.131944
-0.386976
0.174595


FVD
0.510355
1.000000
0.510355
0.428412
0.773906
-0.098693
-0.438719
-0.174595
-0.161165
-0.461039
-0.277425
-0.427669
-0.136364
-0.161165
-0.529043
-0.136364


JA21
0.479167
0.510355
1.000000
0.378726
0.557370
-0.068041
-0.447024
0.041667
-0.215278
-0.664804
-0.447024
-0.238145
0.174595
-0.215278
-0.553777
0.174595


BBB
0.557370
0.428412
0.378726
1.000000
0.264706
-0.140028
-0.199098
0.157207
0.200082
-0.262575
-0.199098
0.035007
0.255665
0.021437
-0.144174
0.428412


BVNL
0.378726
0.773906
0.557370
0.264706
1.000000
0.035007
-0.370734
-0.378726
0.021437
-0.435322
-0.199098
-0.665133
-0.262575
-0.157207
-0.315810
-0.262575


VVD
-0.068041
-0.098693
-0.068041
-0.140028
0.035007
1.000000
0.359546
0.578352
0.102062
0.065795
-0.130744
0.333333
0.065795
-0.068041
0.130744
0.065795


D66
-0.113424
-0.438719
-0.447024
-0.199098
-0.370734
0.359546
1.000000
0.280224
0.386976
0.690337
0.519231
0.522976
0.045162
0.220176
0.602564
0.206456


CDA
0.041667
-0.174595
0.041667
0.157207
-0.378726
0.578352
0.280224
1.000000
0.041667
-0.006715
-0.220176
0.748455
0.496924
0.041667
0.053376
0.496924


SP
-0.041667
-0.161165
-0.215278
0.200082
0.021437
0.102062
0.386976
0.041667
1.000000
0.510355
0.386976
0.102062
0.174595
0.305556
0.447024
0.342475


GL-PvdA
-0.329044
-0.461039
-0.664804
-0.262575
-0.435322
0.065795
0.690337
-0.006715
0.510355
1.000000
0.690337
0.230283
-0.136364
0.342475
0.600012
0.025974


Partij voor de Dieren
-0.113424
-0.277425
-0.447024
-0.199098
-0.199098
-0.130744
0.519231
-0.220176
0.386976
0.690337
1.000000
0.032686
-0.116131
0.386976
0.602564
0.045162


CU
-0.068041
-0.427669
-0.238145
0.035007
-0.665133
0.333333
0.522976
0.748455
0.102062
0.230283
0.032686
1.000000
0.559259
0.272166
0.294174
0.559259


SGP
0.174595
-0.136364
0.174595
0.255665
-0.262575
0.065795
0.045162
0.496924
0.174595
-0.136364
-0.116131
0.559259
1.000000
0.006715
0.116131
0.675325


DENK
0.131944
-0.161165
-0.215278
0.021437
-0.157207
-0.068041
0.220176
0.041667
0.305556
0.342475
0.386976
0.272166
0.006715
1.000000
0.280224
0.006715


VOLT
-0.386976
-0.529043
-0.553777
-0.144174
-0.315810
0.130744
0.602564
0.053376
0.447024
0.600012
0.602564
0.294174
0.116131
0.280224
1.000000
0.116131


NSC
0.174595
-0.136364
0.174595
0.428412
-0.262575
0.065795
0.206456
0.496924
0.342475
0.025974
0.045162
0.559259
0.675325
0.006715
0.116131
1.000000

## 2.4 Kieskompas

Het [Kieskompas](https://www.kieskompas.nl/) is een stemwijzer gemaakt door de krant Trouw met 30 stellingen. De partijen worden na afloop op een 2d grafiek geplaatst met 'links/ rechts' en 'progressief / conservatief' als assen.

De vragen worden op een [Likert scale](https://en.wikipedia.org/wiki/Likert_scale) beantwoord met -2, -1, 0, 1 en 2. Het Kieskompas bevat ook de optie 'geen mening' maar deze is door geen van de partijen gebruikt en is daarom niet meegenomen in de codering.

```python
# Load data and prepare DataFrame for analysis
kieskompas_df = pd.read_csv('data/kieskompas.csv')
kieskompas_df = kieskompas_df.dropna(axis=1)
kieskompas_df_labels = kieskompas_df.columns[1:]
kieskompas_df =kieskompas_df.iloc[1:]
kieskompas_df_questions = kieskompas_df["stelling (-2; -1; 0=neutraal; 1; 2)"]
kieskompas_df = kieskompas_df.iloc[:, 1:]

# Prepared DataFrame is assigned its own variable to allow the original to be used in the Merged model.
kieskompas_df_prepared = pd.DataFrame(data=kieskompas_df, columns=kieskompas_df_labels)
kieskompas_df_prepared = kieskompas_df_prepared.rename(index=kieskompas_df_questions)
kieskompas_df_prepared = kieskompas_df_prepared.transpose()
```

### Verklaarde variantie per component

```python
model = pca(n_components=5)
results = model.fit_transform(kieskompas_df_prepared, row_labels=kieskompas_df_labels)
model.plot()
```

[pca] >Extracting column labels from dataframe.
[pca] >The PCA reduction is performed on the [29] columns of the input dataframe.
[pca] >Fit using PCA.
[pca] >Compute loadings and PCs.
[pca] >Compute explained variance.
[pca] >Outlier detection using Hotelling T2 test with alpha=[0.05] and n_components=[5]
[pca] >Multiple test correction applied for Hotelling T2 test: [fdr_bh]
[pca] >Outlier detection using SPE/DmodX with n_std=[3]

(,
)


![png](images/output_58_2.png)

### Visualisatie van de posities van de partijen

#### 1d

```python
model = pca(n_components=2)
results = model.fit_transform(kieskompas_df_prepared, row_labels=kieskompas_df_labels)
model.scatter(legend=False, figsize=(20,20), labels=kieskompas_df_labels, PC=(0,0), fontsize=16, title="1d Grafiek (diagonaal vanwege leesbaarheid)")
```

[scatterd] >INFO> Create scatterplot

[pca] >Extracting column labels from dataframe.
[pca] >The PCA reduction is performed on the [29] columns of the input dataframe.
[pca] >Fit using PCA.
[pca] >Compute loadings and PCs.
[pca] >Compute explained variance.
[pca] >Outlier detection using Hotelling T2 test with alpha=[0.05] and n_components=[2]
[pca] >Multiple test correction applied for Hotelling T2 test: [fdr_bh]
[pca] >Outlier detection using SPE/DmodX with n_std=[3]

(,
)


![png](images/output_61_3.png)

#### 2d

```python
model = pca(n_components=2)
results = model.fit_transform(kieskompas_df_prepared, row_labels=kieskompas_df_labels)
model.scatter(legend=False, figsize=(20,20), labels=kieskompas_df_labels)
```

[scatterd] >INFO> Create scatterplot

[pca] >Extracting column labels from dataframe.
[pca] >The PCA reduction is performed on the [29] columns of the input dataframe.
[pca] >Fit using PCA.
[pca] >Compute loadings and PCs.
[pca] >Compute explained variance.
[pca] >Outlier detection using Hotelling T2 test with alpha=[0.05] and n_components=[2]
[pca] >Multiple test correction applied for Hotelling T2 test: [fdr_bh]
[pca] >Outlier detection using SPE/DmodX with n_std=[3]

(,
)


![png](images/output_63_3.png)

#### 3d

```python
model = pca(n_components=3)
results = model.fit_transform(kieskompas_df_prepared, row_labels=kieskompas_df_labels)
model.scatter3d(legend=False, figsize=(20,20), labels=kieskompas_df_labels)
```

[scatterd] >INFO> Create scatterplot

[pca] >Extracting column labels from dataframe.
[pca] >The PCA reduction is performed on the [29] columns of the input dataframe.
[pca] >Fit using PCA.
[pca] >Compute loadings and PCs.
[pca] >Compute explained variance.
[pca] >Outlier detection using Hotelling T2 test with alpha=[0.05] and n_components=[3]
[pca] >Multiple test correction applied for Hotelling T2 test: [fdr_bh]
[pca] >Outlier detection using SPE/DmodX with n_std=[3]

(,
)


![png](images/output_65_3.png)

### Analyse componenten

```python
model = pca(n_components=2)
results = model.fit_transform(kieskompas_df_prepared, row_labels=kieskompas_df_labels)
component_df = model.results["loadings"].iloc[0].sort_values(ascending=False)
fig, ax = plt.subplots(figsize=(20,12))
component_df.plot.barh(ax=ax)
fig.tight_layout()
```

[pca] >Extracting column labels from dataframe.
[pca] >The PCA reduction is performed on the [29] columns of the input dataframe.
[pca] >Fit using PCA.
[pca] >Compute loadings and PCs.
[pca] >Compute explained variance.
[pca] >Outlier detection using Hotelling T2 test with alpha=[0.05] and n_components=[2]
[pca] >Multiple test correction applied for Hotelling T2 test: [fdr_bh]
[pca] >Outlier detection using SPE/DmodX with n_std=[3]


![png](images/output_67_1.png)

```python
component_df = model.results["loadings"].iloc[1].sort_values(ascending=False)
fig, ax = plt.subplots(figsize=(20,12))
component_df.plot.barh(ax=ax)
fig.tight_layout()
```


![png](images/output_68_0.png)

### Correlatie standpunten van de partijen

```python
fig, ax = plt.subplots(figsize=(10,10))
sns.heatmap(kieskompas_df.corr())
```


![png](images/output_70_1.png)

```python
kieskompas_df.corr()
```

.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}

.dataframe tbody tr th {
vertical-align: top;
}

.dataframe thead th {
text-align: right;
}




SGP
VVD
CDA
FVD
CU
JA21
NSC
BBB
VOLT
DENK
PVV
BVNL
SP
Partij voor de Dieren
GL-PvdA
D66
BIJ1




SGP
1.000000
0.498676
0.742955
0.496533
0.295812
0.593788
0.455999
0.642521
-0.280852
-0.036527
0.403871
0.447891
-0.141408
-0.527947
-0.328309
-0.217313
-0.555560


VVD
0.498676
1.000000
0.563205
0.639830
-0.280328
0.745577
0.362060
0.624179
-0.269583
-0.513088
0.534769
0.731928
-0.557567
-0.754800
-0.537104
-0.137651
-0.743069


CDA
0.742955
0.563205
1.000000
0.347505
0.328573
0.410906
0.782175
0.737032
-0.060893
0.011466
0.294340
0.234035
-0.106885
-0.357807
-0.097485
0.174447
-0.404159


FVD
0.496533
0.639830
0.347505
1.000000
-0.333067
0.932560
0.111626
0.511349
-0.613763
-0.544102
0.659435
0.883869
-0.530002
-0.870775
-0.869068
-0.585792
-0.890381


CU
0.295812
-0.280328
0.328573
-0.333067
1.000000
-0.305816
0.507366
0.086527
0.501477
0.641826
-0.231494
-0.446368
0.466801
0.426793
0.542719
0.362416
0.358504


JA21
0.593788
0.745577
0.410906
0.932560
-0.305816
1.000000
0.187794
0.585716
-0.661629
-0.545692
0.699018
0.922785
-0.540137
-0.925324
-0.848436
-0.579381
-0.945880


NSC
0.455999
0.362060
0.782175
0.111626
0.507366
0.187794
1.000000
0.683809
0.098625
0.256326
0.303470
-0.008167
0.100093
-0.074291
0.150841
0.322966
-0.127138


BBB
0.642521
0.624179
0.737032
0.511349
0.086527
0.585716
0.683809
1.000000
-0.304223
-0.050758
0.603367
0.447431
-0.171843
-0.494562
-0.311638
-0.158093
-0.518591


VOLT
-0.280852
-0.269583
-0.060893
-0.613763
0.501477
-0.661629
0.098625
-0.304223
1.000000
0.306212
-0.632619
-0.574231
0.243366
0.570185
0.726941
0.771245
0.629898


DENK
-0.036527
-0.513088
0.011466
-0.544102
0.641826
-0.545692
0.256326
-0.050758
0.306212
1.000000
-0.147556
-0.689285
0.589468
0.511477
0.539233
0.286898
0.584174


PVV
0.403871
0.534769
0.294340
0.659435
-0.231494
0.699018
0.303470
0.603367
-0.632619
-0.147556
1.000000
0.624747
-0.126305
-0.613352
-0.582898
-0.558314
-0.595962


BVNL
0.447891
0.731928
0.234035
0.883869
-0.446368
0.922785
-0.008167
0.447431
-0.574231
-0.689285
0.624747
1.000000
-0.640855
-0.893219
-0.842626
-0.582158
-0.898951


SP
-0.141408
-0.557567
-0.106885
-0.530002
0.466801
-0.540137
0.100093
-0.171843
0.243366
0.589468
-0.126305
-0.640855
1.000000
0.703695
0.624559
0.214847
0.660983


Partij voor de Dieren
-0.527947
-0.754800
-0.357807
-0.870775
0.426793
-0.925324
-0.074291
-0.494562
0.570185
0.511477
-0.613352
-0.893219
0.703695
1.000000
0.870506
0.490938
0.920840


GL-PvdA
-0.328309
-0.537104
-0.097485
-0.869068
0.542719
-0.848436
0.150841
-0.311638
0.726941
0.539233
-0.582898
-0.842626
0.624559
0.870506
1.000000
0.711516
0.877832


D66
-0.217313
-0.137651
0.174447
-0.585792
0.362416
-0.579381
0.322966
-0.158093
0.771245
0.286898
-0.558314
-0.582158
0.214847
0.490938
0.711516
1.000000
0.576962


BIJ1
-0.555560
-0.743069
-0.404159
-0.890381
0.358504
-0.945880
-0.127138
-0.518591
0.629898
0.584174
-0.595962
-0.898951
0.660983
0.920840
0.877832
0.576962
1.000000