An open API service indexing awesome lists of open source software.

https://github.com/gitfrid/czechfoi-sim

CzechFOI-SIM
https://github.com/gitfrid/czechfoi-sim

dowhy python

Last synced: 12 months ago
JSON representation

CzechFOI-SIM

Awesome Lists containing this project

README

          

### CzechFOI-SIM
**Czech FOI Simulation Analysis**



**Investigates whether there is a reliable statistical way to determine the dAEFI rate when the baseline is unknown (real world).**
**As far as I know, this (vital) problem is still waiting for the head that can solve it?**

Simulates dAEFIs to analyse the impact on the curve and back-calculate the dAEFIs rate (comparing known and unknown baseline).
Uses real Czech FOI (Freedom of Information) data, or generates d, dvx, duvx data in modulated sine wave form.

Simulated data can be used to check for calculation errors in your code, it is possible to create a CSV file with the data of all Plot curves (from day 1-1534).

The [Python Scripts](https://github.com/gitfrid/CzechFOI-SIM/tree/main/Py%20Scripts) process and visualize CSV data from the [TERRA folder](https://github.com/gitfrid/CzechFOI-SIM/tree/main/TERRA), generating interactive HTML plots.
Each plot compares two age groups. To interact with the plots, click on a legend entry to show/hide curves.

**Refactored Scripts AF) and AG)** compare AG groups (e.g., 1-year intervals) by calculating differences between closely positioned age groups. The differences are summed, and simulated dAEFIs are added to examine the curves with and without dAEFIs. Multiple AG groups are plotted into a single HTML file for comparison

Download the processed plots for analysis from the [Plot Results Folder](https://github.com/gitfrid/CzechFOI-SIM/tree/main/Plot%20Results/dAEFI). Or simply adapt and run the [Python script](https://github.com/gitfrid/CzechFOI-SIM/blob/main/Py%20Scripts/AB%29%20backcalc%20dAEFI%20simulation.py) to meet your own analysis requirements!

Dates are counted as the number of days since [January 1, 2020](https://github.com/gitfrid/CzechFOI-SIM/blob/main/Plot%20Results/Days%20to%20Date%20Translation%20Day%20Date%20Translation/Days%20to%20Date%20Translation%20Day%20Date%20Translation.png), for easier processing. "AGE_2023" represents age on January 1, 2023.
The data can optionally be normalized per 100,000 for comparison.

Access the original Czech FOI data from a [Freedom of Information request](https://github.com/PalackyUniversity/uzis-data-analysis/blob/main/data/Vesely_106_202403141131.tar.xz). To learn how the Pivot CSV files in the TERRA folder were created, see the [wiki](https://github.com/gitfrid/CzechFOI-DA/wiki)


**Abbreviations:** The figures are per age group from the CSV files in the TERRA folder:
| **Deaths** | **Definition** | **Population/Doses** | **Definition** |
|-------------------|------------------------------------------------------------|-----------------------|-------------------------------------------------------|
| NUM_D | Number deaths | NUM_POP | Total people |
| NUM_DUVX | Number unvaxed deaths | NUM_UVX | Number of unvaxed people |
| NUM_DVX | Number vaxed deaths | NUM_VX | Number of vaxed people |
| NUM_DVD1-DVD7 | Number deaths doses 1 - 7 | NUM_VD1-VD7 | Number of vax doses 1 - 7 |
| NUM_DVDA | Number deaths from all doses | NUM_VDA | Total number of all vax doses (sum) |
| dAEFI | simulated death Adverse Events following imunis. | | |

_________________________________________
**DoWhy Analysis**

**Phyton script [AI) dowhy diff all-agegrp-in-same-plot](https://github.com/gitfrid/CzechFOI-SIM/blob/main/Py%20Scripts/AI%29%20dowhy%20diff%20all-agegrp-in-same-plot.py)**
Uses the DoWhy Library https://github.com/py-why/dowhy


DoWhy is a Python library for causal inference that allows modeling and testing of causal assumptions, based on a unified language for causal inference.
See the book Models, Reasoning, and Inference by Judea Pearl for deeper insights, that goes far beyond my horizon.




DoWhy Causal Impact estimates, showing the effect of changes in doses and deaths between age groups one year apart.


Blue crosses represent the mean points of two age groups (one year apart), showing the average differences in treatment dose (Doses_curve) and observed outcome (RAW D_Curve).


Red crosses represent the AEF D_Curve (RAW D_Curve with added dAEFIs), in this example, 1 dAEFI per 5000 doses given.


Causal effects are not yet shown in the plot.



Phasediagram Doses/Deaths between age groups one year apart.




[Download html](https://github.com/gitfrid/CzechFOI-SIM/blob/main/Plot%20Results/AI%29%20dowhy%20diff%20all-agegrp-in-same-plot/AI%29%20dowhy%20diff%20all-agegrp-in-same-plot%20dAEFI%20causalimpact%20AG_15-85.html)


_________________________________________
**dAEFI simulation known Basline.
One dAEFI per 5000 Doses RAND_DAY_RANGE 1-250 AVG_WND 14: AG_50-54**



**If the baseline is known (which is not the case in practice), the estimated dAEFIs per dose are quite accurate, e.g., 4408 vs. 5000.** .

_________________________________________
**dAEFI simulation known Basline.
One dAEFI per 5000 Doses RAND_DAY_RANGE 1-250 AVG_WND 14: AG_75-79**



**The estimated dAEFIs per dose, e.g., 4179 vs. 5000.** .
_________________________________________
**dAEFI simulation unknown Basline real world.
One dAEFI per 5000 Doses RAND_DAY_RANGE 1-250 AVG_WND 14: AG_50-54**



**If the baseline is unknown (which is the case in practice), the estimated dAEFI per dose are not reliable , e.g., 136 vs. 5000.** .

_________________________________________
**dAEFI simulation unknown Basline (real world).
One dAEFI per 5000 Doses RAND_DAY_RANGE 1-250 AVG_WND 14: AG_75-79**



**The estimated dAEFIs per dose, e.g., 39 vs. 5000.** .
_________________________________________

**D, DVX, DUVX plots.
Added dAEFIs (1/5000 Doses) vs non added AEFIs: AG_50-54 vs 75-79**




**As you can see, the added dAEFIs have little impact on the top D-curves for age group 75-79, making it hard to detect a signal without knowing the baseline.
I struggled to find a reliable method to back-calculate the dAEFIs ratio using only the moving average as the baseline (real world). This is particularly true for the older age groups.**







_________________________________________

**dAEFI simulation known Basline.
One dAEFI per 5000 Doses RAND_DAY_RANGE 1-250 AVG_WND 14: AG_54-59**



**Simulation of sinus curves for D, DVX, and DUVX, adding one dAEFI per 5000 doses in a random 1-250 day window after dose** .
The estimated dAEFIs per dose, e.g., 5138 vs. 5000 - if basline is known.
_________________________________________

**dAEFI simulation unknown Basline.
One dAEFI per 5000 Doses RAND_DAY_RANGE 1-250 AVG_WND 14: AG_54-59**

The legend label "pr" calculates the mortality curves for D, DUVX, and DVX, assuming that the vx and uvx populations have the same hypothetical mortality probability (distribution), see upper part of the first plot.

The lower part of the first plot shows the result of the normalized mortality curves (deaths/100,000 people - legend label "n"). Since the mortality probability for D, VX, and UVX is assumed to be identical, the normalized curves overlap.


Additionally, 1/5000 dAEFIs are added (legend label "ae")



The estimated dAEFIs per dose, e.g., 388 vs. 5000 - if basline is unknown.
_________________________________________

**Phyton script [AC) calc dAEFI diff all-agegrp-in-same-plot](https://github.com/gitfrid/CzechFOI-SIM/blob/main/Py%20Scripts/AC%29%20calc%20dAEFI%20diff%20all-agegrp-in-same-plot.py)**

The script compares age groups in 1-year intervals.

The idea is that the two populations, which are one year apart, can be considered comparable.

It calculates the difference in normalized death rates (per 100,000 people) and takes a rolling average of this difference as the baseline. It also calculates the difference in normalized doses administered (per 100,000 doses). However, a reliable and accurate method for calculating estimated dAEFIs has not yet been found.

Can also calculate rolling and phaseshift correlation

For Database and CSV File creation in the Terra folder [All AG SQL Time.sql]() was used.
_________________________________________

**DIF-VDA n all AgeGroups**

Shows the estimated mean dAEFI values for AG 1 to 113.




Shows the normalized DIF-VDA (All Doses difference for all AG)





_________________________________________

**DIF-VDA Basline Mean estimate dAEFI n - Some examples of different AGs**

**For AG 13-14**


**For AG 14-15**


**For AG 17-18**




**For AG 42-43**




**For AG 71-72**




**For AG 81-82**




**For AG 107-108**



_________________________________________

**Refactored Scripts AF)**

The [Python script](https://github.com/gitfrid/CzechFOI-SIM/blob/main/Py%20Scripts/AF%29%20calc%20dAEFI%20diff%20norm%20all-agegrp-in-same-plot.py) calculates the differences in doses and deaths for similar age bands (one year appart), as specified in the `age_band_compare` list. It then summarizes the differences and adds dAEFIs (one per 5,000 doses). Additionally, it compares the rolling and shift correlations of the raw D-curve with the D-curve that includes the added dAEFI events.

With one dAEFI per 5,000 doses, there is no significant change in the D-curves, including the rolling Pearson correlation, making it irrelevant in this context.
Although the amplitude of the phase shift correlation has changed significantly, this is not helpful since the baseline is unknown





Zoomed in to highlight the minimal difference at 1/5,000 doses.







[Download html](https://github.com/gitfrid/CzechFOI-SIM/blob/main/Plot%20Results/AF%29%20calc%20dAEFI%20diff%20all-agegrp-in-same-plot/AF%29%20calc%20dAEFI%20diff%20norm%20all-agegrp-in-same-plot%20dAEFI%20AG_15-85.html)


_________________________________________

**Refactored Scripts AG)**

This [Python script](https://github.com/gitfrid/CzechFOI-SIM/blob/main/Plot%20Results/AG%29%20calc%20dAEFI%20diff%20all-agegrp-in-same-plot/AG%29%20calc%20dAEFI%20diff%20all-agegrp-in-same-plot.py) employs a different approach but produces results similar to those of the AF script.
It calculates the rolling Pearson correlation based on changes in cumulative doses, revealing a strong correlation.
However, this correlation is not relevant in the context of rare dAEFIs. Additionally, although the amplitude of the phase shift correlation changes significantly, this information is not useful without a known baseline







[Download html](https://github.com/gitfrid/CzechFOI-SIM/raw/main/Plot%20Results/AG%29%20calc%20dAEFI%20diff%20all-agegrp-in-same-plot/AG%29%20calc%20dAEFI%20diff%20all-agegrp-in-same-plot%20dAEFI%20AG_15-85.html)


_________________________________________

### Software Requirements:
- [Python 3.12.5](https://www.python.org/downloads/) to run the scripts.
- [Visual Studio Code 1.92.2](https://code.visualstudio.com/download) to edit and run scripts.
- [Optional - DB Browser for SQLite 3.13.0](https://sqlitebrowser.org/dl/) for database creation, SQL queries, and CSV export.

### Disclaimer:
**The results have not been checked for errors. Neither methodological nor technical checks or data cleansing have been performed.**