{"id":17154248,"url":"https://github.com/jaydu1/gcate","last_synced_at":"2025-03-24T13:24:34.539Z","repository":{"id":216130310,"uuid":"627412890","full_name":"jaydu1/gcate","owner":"jaydu1","description":"Generalized confounder adjustment for testing and estimation","archived":false,"fork":false,"pushed_at":"2025-03-15T11:26:20.000Z","size":4820,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-15T12:25:14.473Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/jaydu1.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-04-13T12:18:11.000Z","updated_at":"2025-03-15T11:26:25.000Z","dependencies_parsed_at":"2024-03-13T13:54:43.194Z","dependency_job_id":"d19e685b-9199-422f-b4dc-898b902ad0e0","html_url":"https://github.com/jaydu1/gcate","commit_stats":null,"previous_names":["jaydu1/gcate"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jaydu1%2Fgcate","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jaydu1%2Fgcate/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jaydu1%2Fgcate/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jaydu1%2Fgcate/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/jaydu1","download_url":"https://codeload.github.com/jaydu1/gcate/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245277139,"owners_count":20589102,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-14T21:48:43.335Z","updated_at":"2025-03-24T13:24:34.515Z","avatar_url":"https://github.com/jaydu1.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Generalized confounder adjustment for testing and estimation (GCATE)\n\nThis repository contains the code for reproducing simulation and real data analysis results of the paper \"Simultaneous inference for generalized linear models with unmeasured confounders\".\n\n\n## Files\n\n\n### Python module\n\n- gcate: The main module for GCATE.\n\n### Scripts\n\n- ex1: Simulation with Poisson DGP with sample splitting\n    - `ex1_generate_data.py`: Generate simulated data.\n    - `ex1_run_gcate.py`: Run GCATE.\n- ex2: Simulation with Poisson DGP without sample splitting\n    - `ex2_generate_data.py`: Generate simulated data.\n    - `ex2_run_gcate.py`: Run GCATE.\n    - `ex2_run_glm.py`: Run GLM oracle and GLM naive.\n    - `ex2_run_cate.R`: Run CATE.\n- ex3: Simulation with Splatter simulator\n    - `ex3_generate_data.py`: Generate simulated data.\n    - `ex3_run_gcate.py`: Run GCATE.\n    - `ex3_run_glm.py`: Run GLM naive.\n    - `ex3_run_cate.R`: Run CATE.\n- ex4: Lupus data\n    - `ex4_preprocess_lupus.py`: preprocess the lupus data    \n    - `ex4_run_glm.py`: Run GLM on subset and full set of covaraites.\n    - `ex4_run_gcate.py`: Run GCATE on subset of covaraites.\n    - `ex4_run_gcate_full.py`: Run GCATE full set of covaraites.    \n    - `ex4_run_cate.R`: Run CATE on subset of covaraites.\n    - `ex4_run_cate_full.R`: Run CATE on full set of covaraites.\n    - `ex4_GO.R`: gene ontology analysis\n- ex5: Simulation with varying dimensions\n    - `ex5_blessing_dim.py`: Run GCATE on varying dimensions.\n\n### Jupyter notebooks:\n- `Plot_simu.ipynb`: Reproduce the figures and tables for simulation studies.\n- `Plot_lupus.ipynb`: Reproduce the figures and tables for the lupus data analysis.\n\n\n## Requirements\n\nThe following packages are required for the reproducibility workflow.\n\n\n### Python packages\n\nPackage | Version\n---|---\nanndata | 0.9.2 \ncvxpy | 1.1.18 \nh5py | 3.1.0 \njoblib | 1.1.0 \njupyter | 1.0.0\nmatplotlib | 3.4.3\nnumba | 0.54.1 \nnumpy | 1.22.0 \npandas | 1.3.3 \npython | 3.8.12\nscanpy | 1.9.3 \nscikit-learn | 1.1.2 \nscipy | 1.10.1 \nseaborn | 0.13.0\nstatsmodels | 0.13.5 \ntqdm | 4.62.3\n\n### R packages\n\nPackage | Version\n---|---\nAnnotationDbi | 1.56.2\ncate | 1.1.1 \nclusterProfiler | 4.2.2\norg.Hs.eg.db | 3.14.0\nqvalue | 2.26 \nR | 3.8.2\nreticulate | 1.31 \nrrvgo | 1.6.0\ntidyverse | 1.3.1\n\n\n\n## Reproducibility workflow\n\nFor simulation studies, the workflow is as follows:\n\n- Run script `ex1_generate_data.py` to generate simulated data, which will be stored in the folder `/data/ex1/`. The data for the second and the third experiments can be similarly generated by running `ex2_generate_data.py` and `ex3_generate_data.py`, respectively.\n- Run scripts of individual methods for each experiment as described below, and the results will be stored in the folder `result/`:\n    - Ex1: `ex1_run_gcate.py`\n    - Ex2: `ex2_run_glm.py`, `ex2_run_gcate.py`, `ex2_run_cate.R`\n    - Ex3: `ex3_run_glm.py`, `ex3_run_gcate.py`, `ex3_run_cate.R`\n- For experiments on varying dimensions, run `ex5_blessing_dim.py`.\n- Use `Plot_simu.ipynb` to reproduce the figures (Figures 2-6, F1-F2, and G3) and table (Table G2) based on the previous results.\n\n\nFor real data analysis, the workflow is as follows:\n\n- Obtain the h5ad file of the [lupus data](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE174188) from the authors of the original paper and store it in the folder `data/lupus/GSE174188_CLUES1_adjusted.h5ad`.\n- Run `ex4_preprocess_lupus.py` to preprocess the lupus data.\n- Run scripts of individual methods (`ex4_run_glm.py`, `ex4_run_gcate.py`, `ex4_run_gcate_full.py`, `ex4_run_cate.R`, `ex4_run_cate_full.R`), and the results will be stored in the folder `result/lupus/`.\n- Use `Plot_lupus.ipynb` to reproduce the figures (Figures 6, G4-G9, and G11) and tables (Tables G3-G4) based on the previous results.\n- Run `ex4_GO.R` to perform gene ontology analysis (Figure G10).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjaydu1%2Fgcate","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjaydu1%2Fgcate","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjaydu1%2Fgcate/lists"}