{"id":44964475,"url":"https://github.com/bioinfomachinelearning/mintomics","last_synced_at":"2026-02-18T14:09:29.253Z","repository":{"id":296783049,"uuid":"704731828","full_name":"BioinfoMachineLearning/mintomics","owner":"BioinfoMachineLearning","description":null,"archived":false,"fork":false,"pushed_at":"2025-06-02T01:33:33.000Z","size":40948,"stargazers_count":1,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-09-09T16:34:19.049Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/BioinfoMachineLearning.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2023-10-13T23:48:29.000Z","updated_at":"2025-06-02T01:33:36.000Z","dependencies_parsed_at":null,"dependency_job_id":"86f08dba-a344-497c-81af-8c1b287cda49","html_url":"https://github.com/BioinfoMachineLearning/mintomics","commit_stats":null,"previous_names":["bioinfomachinelearning/mintomics"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/BioinfoMachineLearning/mintomics","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BioinfoMachineLearning%2Fmintomics","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BioinfoMachineLearning%2Fmintomics/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BioinfoMachineLearning%2Fmintomics/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BioinfoMachineLearning%2Fmintomics/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/BioinfoMachineLearning","download_url":"https://codeload.github.com/BioinfoMachineLearning/mintomics/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BioinfoMachineLearning%2Fmintomics/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29581621,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-18T13:56:48.962Z","status":"ssl_error","status_checked_at":"2026-02-18T13:54:34.145Z","response_time":162,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-02-18T14:09:28.542Z","updated_at":"2026-02-18T14:09:29.245Z","avatar_url":"https://github.com/BioinfoMachineLearning.png","language":"Python","readme":"# Mintomics - Integration of Transcriptomics to Proteomics using Transformer\n\n![Workflow](F4.large.jpg)\n\n## Overview\n\n**mintomics** is a multi-omics analysis pipeline designed to integrate transcriptomic and proteomic data using a transformer-based model. The objective is to elucidate the adaptive characteristics of the oviduct during natural fertilization, as described in [Finnerty et al., eLife 2025](https://elifesciences.org/articles/100705).\n\n## Methodology\n\n### Biological Context\nThe oviduct (fallopian tube) is the site of fertilization and preimplantation embryo development in mammals. This project investigates how the presence of gametes and embryos modulates oviductal gene and protein expression, using multi-omics data and advanced machine learning.\n\n### Multi-Omics Integration\n- **Transcriptomics**: Bulk RNA-seq data from mouse oviductal tissues at various preimplantation stages.\n- **Proteomics**: Protein abundance data from oviductal fluid, comparing natural fertilization and superovulation.\n- **Machine Learning**: A transformer-based model integrates transcriptomic and proteomic data to predict protein abundance from gene expression and identify key transcription factors.\n\nFor more details, see the [eLife article](https://elifesciences.org/articles/100705).\n\n## Data Sampling \u0026 Structure\n\n### Data Sources\n- **Gene Expression**: Processed CPM (counts per million) data for different stages:\n  - `Data/Data_cpm/Data_control.csv`\n  - `Data/Data_cpm/Data_0_5preg.csv`\n  - `Data/Data_cpm/Data_1_5preg.csv`\n  - `Data/Data_cpm/Data_2_5preg.csv`\n- **Protein Labels**: Normalized protein abundance labels:\n  - `Data/Labels_proc_log10_minmax/Labels_control.csv`\n  - `Data/Labels_proc_log10_minmax/Labels_0_5preg.csv`\n  - `Data/Labels_proc_log10_minmax/Labels_1_5preg.csv`\n  - `Data/Labels_proc_log10_minmax/Labels_2_5preg.csv`\n- **Transcription Factors**: List in `Mouse_TFs1`\n- **Differential Data**: Differentially expressed genes and proteins in `Data/Diff_data/` and `Data/Diff_labels/`\n\n### Data Preparation\n- Raw data is filtered, normalized (CPM, log, min-max), and split by stage.\n- Transcription factors are annotated in the gene list.\n- Data loaders sample gene-protein pairs for model training/testing.\n\n## Pipeline Usage\n\n### Requirements\n- Python 3.8+\n- PyTorch, PyTorch Lightning, pandas, numpy, seaborn, matplotlib, wandb, mlxtend, rnanorm, scipy, torchmetrics\n\nInstall dependencies (example):\n```bash\npip install torch pytorch-lightning pandas numpy seaborn matplotlib wandb mlxtend rnanorm scipy torchmetrics\n```\n\n### Training the Model\nRun the following command to train the transformer-based model:\n```bash\npython Training.py --num_gpus 1 --nodes 1 --num_epochs 100 --batch_size 8 --save_dir tempo\n```\n- Training and validation data are automatically loaded from the `Dataset/` directory.\n- Model checkpoints and logs are saved in `Trainings/`.\n\n### Inference\nTo run inference on test data using a trained checkpoint:\n```bash\npython Inference.py --save_dir tempo --chkpt \u003ccheckpoint_file.ckpt\u003e\n```\n- Replace `\u003ccheckpoint_file.ckpt\u003e` with the actual checkpoint filename from `Trainings/tempo/`.\n\n### Data Preprocessing\nData preprocessing scripts are provided (see `Data_preprocessing.py`).\n- Generates normalized gene and protein data for each stage.\n- Example: `python Data_preprocessing.py`\n\n## Output \u0026 Analysis\n- Model outputs include predicted protein abundances and attention scores for gene-protein relationships.\n- Result analysis scripts (e.g., `Result_analysis.py`) help interpret key transcription factors and proteins.\n- Visualizations and logs are available via Weights \u0026 Biases (wandb).\n\n## Reference\n- Finnerty RM, Carulli DJ, Hedge A, et al. (2025). Multi-omics analyses and machine learning prediction of oviductal responses in the presence of gametes and embryos. _eLife_ 13:RP100705. [https://elifesciences.org/articles/100705](https://elifesciences.org/articles/100705)\n\n## License\nThis project is distributed under the terms of the Creative Commons Attribution License, as per the referenced publication.\n\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbioinfomachinelearning%2Fmintomics","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbioinfomachinelearning%2Fmintomics","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbioinfomachinelearning%2Fmintomics/lists"}