{"id":19261105,"url":"https://github.com/abhi18av/ljmu_masters_dissertation","last_synced_at":"2026-05-07T04:33:38.943Z","repository":{"id":113603085,"uuid":"262081640","full_name":"abhi18av/LJMU_Masters_Dissertation","owner":"abhi18av","description":null,"archived":false,"fork":false,"pushed_at":"2020-11-28T16:48:01.000Z","size":20549,"stargazers_count":1,"open_issues_count":0,"forks_count":1,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-01-05T10:08:10.530Z","etag":null,"topics":["drug-resistance-prediction","h2oai","machine-learning","python","tuberculosis"],"latest_commit_sha":null,"homepage":"  https://zenodo.org/badge/latestdoi/262081640","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/abhi18av.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-05-07T15:01:45.000Z","updated_at":"2022-03-31T18:16:03.000Z","dependencies_parsed_at":null,"dependency_job_id":"fde227c8-dd4f-4aee-9e25-4899e4695c19","html_url":"https://github.com/abhi18av/LJMU_Masters_Dissertation","commit_stats":null,"previous_names":[],"tags_count":4,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/abhi18av%2FLJMU_Masters_Dissertation","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/abhi18av%2FLJMU_Masters_Dissertation/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/abhi18av%2FLJMU_Masters_Dissertation/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/abhi18av%2FLJMU_Masters_Dissertation/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/abhi18av","download_url":"https://codeload.github.com/abhi18av/LJMU_Masters_Dissertation/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":240358153,"owners_count":19788843,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["drug-resistance-prediction","h2oai","machine-learning","python","tuberculosis"],"created_at":"2024-11-09T19:24:40.690Z","updated_at":"2025-10-19T02:15:03.786Z","avatar_url":"https://github.com/abhi18av.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"Application of ML for DRP using WGS data on MTB genomes.\n==============================\n\nThis repository contains the code for my masters dissertation. \n\n[![DOI](https://zenodo.org/badge/262081640.svg)](https://zenodo.org/badge/latestdoi/262081640)\n\n\n\n\nTo execute the code, the following execution environments are recommended.\n\n1. AWS/Azure Batch for genomic pre-processing.\n\n2. Azure ML Studio for notebooks, with a decent server.\n\n\nThe rest of the instructions  are embedded within the `notebooks/FINAL/*ipynb` notebooks.\n\n\nProject Organization\n------------\n\n    ├── LICENSE\n    ├── README.md\n    │\n    ├── conda_enviroment.yml \u003c- The minimal conda file needed to recreate the environment.\n    ├── azure_enviroment.yml \u003c- The conda file for the Azure ML studio.\n    │\n    ├── data\n    │   ├── interim        \u003c- Intermediate data that has been transformed.\n    │   ├── processed      \u003c- The final, canonical data sets for modeling.\n    │   └── raw            \u003c- The original, immutable data dump.\n    │\n    ├── models             \n    │      ├── ALL_FEATURES   \u003c- Models trained on All features.\n    │      │      ├── FINAL   \n    │      │\n    │      └── PCA300         \u003c- Models trained on PCA300 features.\n    │\n    ├── notebooks          \n    │   ├── FINAL          \u003c- The final jupyter notebooks, named as per their execution order.\n    │      └── 001_feature_engineering.ipynb\n    │      └── 002_choose_limited_tbportals_genomes.ipynb \u003c- Contains the SRA IDs of genomes, can be downloaded through download.nf\n    │      └── 003_eda_mono_resistance.ipynb\n    │      └── 004_model_grids.ipynb\n    │      └── 005_stacked_ensemble.ipynb\n    │      └── 006_pca_based_ml.ipynb\n    │      └── 007_model_inspection_with_without_pca.ipynb\n    │\n    ├── src                \n    │   ├── genomic_preprocessing           \u003c- Scripts for genomic pre-processing\n    │      └── nyu_gatk.sh\n    │      └── download.nf\n    │      └── bwa.nf\n    │      └── fastqc.nf\n    │      └── gatk.nf\n    │      └── picard.nf\n    │      └── samtools.nf\n    │      └── tb_profiler.nf\n    │      └── trimmomatic.nf\n    │   \n    │   \n    │   ├── features       \u003c- Scripts to turn raw VCF data into tabular data for modeling\n    │       └── 01_tbprofiler.py\n    │       └── 02_vcf_drop_cols.py\n    │       └── 03_filter_unique_snps.py\n    │       └── 04_binarize_vcf.py\n    │       └── 05_final_snp_df.py\n\n--------\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fabhi18av%2Fljmu_masters_dissertation","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fabhi18av%2Fljmu_masters_dissertation","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fabhi18av%2Fljmu_masters_dissertation/lists"}