{"id":20074740,"url":"https://github.com/greenelab/linear_signal","last_synced_at":"2025-07-04T03:06:50.106Z","repository":{"id":37466252,"uuid":"267877185","full_name":"greenelab/linear_signal","owner":"greenelab","description":"Comparing the performance of linear and nonlinear models in transcriptomic prediction","archived":false,"fork":false,"pushed_at":"2023-01-30T23:10:06.000Z","size":70134,"stargazers_count":5,"open_issues_count":2,"forks_count":3,"subscribers_count":4,"default_branch":"master","last_synced_at":"2025-05-23T00:26:40.422Z","etag":null,"topics":["analysis","deep-learning","gene-expression","machine-learning","tool"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/greenelab.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-05-29T14:23:49.000Z","updated_at":"2024-03-12T12:46:16.000Z","dependencies_parsed_at":"2024-11-13T14:54:20.926Z","dependency_job_id":"c7c2ba5b-266d-4603-b0f2-02d71c57a269","html_url":"https://github.com/greenelab/linear_signal","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/greenelab/linear_signal","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/greenelab%2Flinear_signal","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/greenelab%2Flinear_signal/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/greenelab%2Flinear_signal/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/greenelab%2Flinear_signal/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/greenelab","download_url":"https://codeload.github.com/greenelab/linear_signal/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/greenelab%2Flinear_signal/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":263437345,"owners_count":23466368,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["analysis","deep-learning","gene-expression","machine-learning","tool"],"created_at":"2024-11-13T14:54:07.768Z","updated_at":"2025-07-04T03:06:50.084Z","avatar_url":"https://github.com/greenelab.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Linear and Nonlinear Signals\nThis repo contains the code to reproduce the results of the manuscript \"The Effects of Nonlinear Signal on Expression-Based Prediction Performance\".\nIn short, we compare linear and nonlinear models in multiple prediction tasks, and find that their predictive ability is roughly equivalent.\nFurther, this similarity is despite the fact that predictive nonlinear signal exists in the data for each of the tasks.\n\n![model comparison figure](https://raw.githubusercontent.com/greenelab/linear_signal/master/figures/full_signal_combined.svg)\n\n## Installation\n\n### Python dependencies\nThe Python dependencies for this project are managed via [Conda](https://docs.conda.io/en/latest/miniconda.html).\nTo install them and activate the environment, use the following commands in bash:\n\n``` bash\nconda env create --file environment.yml\nconda activate linear_models\n```\n\n### R setup\nThe R dependencies for this project are managed via [Renv](https://rstudio.github.io/renv/articles/renv.html).\nTo set up Renv for the repository, use the commands below within R while working in the `linear_signals` repo:\n\n``` R\ninstall.packages('renv')\nrenv::init()\nrenv::restore()\n```\n\n### Sex prediction setup\nBefore running scripts involving sex prediction, you need to download the Flynn et al. labels from [this link](https://figshare.com/s/985621c1705043421962) and put the results in the `saged/data` directory.\nBecause of the settings on the figshare repo it isn't possible to incorporate that part of the data download into the Snakefile, otherwise I would.\n\n\n### Neptune setup \nIf you want to log training results, you will need to sign up for a free neptune account [here](https://neptune.ai/).\n1. The neptune module is already installed as part of the saged conda environment, but you'll need to grab an API token from the website.\n2. Create a neptune project for storing your logs.\n3. Store the token in secrets.yml in the format `neptune_api_token: \"\u003cyour_token\u003e\"`, and update the `neptune_config` file to use your info.\n\n## Reproducing results\nThe pipeline to download all the data, and produce all the results shown in the manuscript is managed by Snakemake.\nTo reproduce all results files and figures, run\n``` bash\nsnakemake -j \u003cNUM_CORES\u003e\n```\n\nSuccessfully running the full pipeline takes a few months on a single machine.\nFor reference specs, my machine has an 64 GB of RAM, an AMD Ryzen 7 3800xt processor and an NVIDIA 3090 GPU). \nYou can get by with less ram, vRAM, and processor cores, by reducing the degree of paralellism. \nI imagine the analyses can comfortably fit on a machine with 32GB of ram and a ~1080ti GPU, but I haven't tested the pipeline in such an environment.\n\nIf you want to speed up the process and see similar results, you can run the pipeline without hyperparameter optimization with\n\n``` bash\nsnakemake -s no_hyperopt_snakefile -j \u003cNUM_CORES\u003e\n```\n\nIf you are going to be running the pipeline in a cluster environment, it may be helpful to read through the file `slurm_snakefile`. \n[This blog post](https://bluegenes.github.io/snakemake-via-slurm/) might also be helpful.\n\n\n### Intermediate steps\nWhen running the full pipeline via snakemake, the data required will be automatically downloaded (excluding the sex prediction labels mentioned in the section below). \nIf you'd like to skip the data download (and in doing so save yourself about a week of downloading and processing things), you can rehydrate [this Zenodo archive](https://zenodo.org/record/6711450) into the `data/` dir.\n\nLikewise, if you'd like to download the results files, they can be found [here](https://zenodo.org/record/6685655).\nIf you only need the saved models, they can be found [here](https://zenodo.org/record/6703144).\n\n\n## Directory Layout\n|File/dir|Description|\n|--------|-----------|\n|Snakefile | Contains the rules Snakemake uses to run the full project |\n|environment.yml | Lists the python dependencies and their versions in a format readable by Conda |\n|neptune.yml | Lists information for Neptune logging |\n|secrets.yml | Stores neputne API token (see Neptune setup section) |\n|||\n|data/| Stores the raw and intermediate data files used for training models |\\\n|dataset_configs/| Stores config information telling Dataset objects how to construct themselves |\n|figures/| Contains images visualizing the results of the various analyses |\n|logs/| Holds serialized versions of trained models |\n|model_configs/| Stores config information for models such as default hyperparameters |\n|notebook/| Stores notebooks used for visualizing results |\n|results/| Records the accuracies of the models on various tasks |\n|src/| The source code used to run the analyses |\n|test/| Tests for the source code (runnable with pytest) |\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgreenelab%2Flinear_signal","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgreenelab%2Flinear_signal","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgreenelab%2Flinear_signal/lists"}