{"id":15651659,"url":"https://github.com/tlatkowski/deep-learning-gene-expression","last_synced_at":"2025-04-30T18:01:51.609Z","repository":{"id":53532163,"uuid":"105190573","full_name":"tlatkowski/deep-learning-gene-expression","owner":"tlatkowski","description":"Deep learning methods for feature selection in gene expression autism data.","archived":false,"fork":false,"pushed_at":"2022-06-21T21:16:54.000Z","size":44426,"stargazers_count":36,"open_issues_count":2,"forks_count":17,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-03-30T18:34:35.142Z","etag":null,"topics":["annotated-genes","autism","autism-data","data-mining","deep-learning","feature-detection","feature-extraction","feature-selection","features-extraction","fisher-score","gene-annotation","gene-expression","gene-expression-profiles","google-colab","microarray-data","ncbi-database","neural-network","numpy","python3","ttest"],"latest_commit_sha":null,"homepage":null,"language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/tlatkowski.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2017-09-28T19:26:19.000Z","updated_at":"2025-02-16T13:04:39.000Z","dependencies_parsed_at":"2022-09-18T09:12:45.573Z","dependency_job_id":null,"html_url":"https://github.com/tlatkowski/deep-learning-gene-expression","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tlatkowski%2Fdeep-learning-gene-expression","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tlatkowski%2Fdeep-learning-gene-expression/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tlatkowski%2Fdeep-learning-gene-expression/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tlatkowski%2Fdeep-learning-gene-expression/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/tlatkowski","download_url":"https://codeload.github.com/tlatkowski/deep-learning-gene-expression/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":251758162,"owners_count":21638988,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["annotated-genes","autism","autism-data","data-mining","deep-learning","feature-detection","feature-extraction","feature-selection","features-extraction","fisher-score","gene-annotation","gene-expression","gene-expression-profiles","google-colab","microarray-data","ncbi-database","neural-network","numpy","python3","ttest"],"created_at":"2024-10-03T12:39:32.089Z","updated_at":"2025-04-30T18:01:51.543Z","avatar_url":"https://github.com/tlatkowski.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"![](https://img.shields.io/badge/Python-3.6-blue.svg) ![](https://img.shields.io/badge/NumPy-1.14.2-blue.svg) ![](https://img.shields.io/badge/License-MIT-blue.svg)\n\n# Deep learning methods for gene expression\nDeep learning methods for feature selection in gene expression autism data.\n# Description\nThis project implements several features selection algorithms intended for finding the most significant subset of genes and gene sequences stored in dataset of gene expression microarray. \n\nCurrent version of project provides the following list of feature selection algorithms:\n* Fisher discriminant analysis\n* two sample t-test\n* feature correlation with a class\n  \nMore implementation details of the above methods can be found here:\n\n[Data mining for feature selection in gene expression autism data](http://www.sciencedirect.com/science/article/pii/S0957417414005259)\n\n[Feature selection methods in application to gene expression: autism data](http://www.pe.org.pl/articles/2014/8/47.pdf)\n\nThe outcome of feature selection stage is consumed by fully connected feedforward neural network. The following list of hyperparameters can be configured in this neural network:\n* number of layers,\n* number of hidden units in each layer,\n* activation function: sigmoid, tanh and ReLU,\n* L2 lambda reguralization parameter.\n* batch size,\n* number of epochs.\n\n# Model Flow\nThe below diagram depicts the training and testing procedures:\n\n![](pics/model_flow.png)\n\n# Dataset\n\nThe dataset is publicity available and was downloaded from [GEO (NCBI) repository](https://www.ncbi.nlm.nih.gov/sites/GDSbrowser?acc=GDS4431). Data file in this repository was cleaned up and contains only raw data with annotated genes and gene sequences annotations.\n\n## Dataset details\nNumber of observations in this dataset equals 146 and number of genes 54613. The database consists of two classes: the first one is related to children with autism (n=82) and the second to control (healthy) children (n=64). Blood draws for all subjects were done between the spring and  summer  of  2004.  Total  RNA  was  extracted  for  microarray experiments with Affymetrix Human U133 Plus 2.0 39 Expression Arrays. \n\n\n## Run the pipeline locally\n\n### Installation (Ubuntu)\n\nIn order to install all requirements execute the following script:\n(If needed add 'execute' permission to *install.sh* script before running it):\n```bash\nchmod a+x bin/install.sh\n```\n\n```bash\n./bin/install.sh\n```\n\nThen activate the Virtual Environment (if needed):\n```bash\nsource .venv/bin/activate\n```\nIn order to run the pipeline execute:\n```\npython pipeline.py\n```\n\n## Run the pipeline on Google Colab\nIn order to run the pipeline on Google Colab use the following notebook:\n[Deep Learning Gene Expression in Google Colab](https://github.com/tlatkowski/deep-learning-gene-expression/blob/master/colab/deep_learning_feature_selection.ipynb)\n\n## Pipeline configuration\n\nPipeline gives you possibility to tweak training parameters. In order to modify them use configuration file\nplaced in `./config/experiment_setup.yml`. Below you can find the default configuration:\n\n```yaml\nselection_methods:\n  - method: fisher\n    num_features: 100\n  - method: ttest\n    num_features: 100\n  - method: corr\n    num_features: 100\n  - method: random\n    num_features: 100\nhyperparameters:\n  learning_rate: 0.001\n  input_size: 100\n  hidden_sizes: [80]\n  output_size: 1\n  num_features: 100\n  activation_function: 'tanh'\n  lambda_reg: 0.8\n  norm_data: True\n  data_file: 'data/data.tsv'\ntraining:\n  num_epochs: 10000\n  cross_validation_folds: 10\n  batch_size: 20  # online learning when batch_size=1\n```","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftlatkowski%2Fdeep-learning-gene-expression","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftlatkowski%2Fdeep-learning-gene-expression","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftlatkowski%2Fdeep-learning-gene-expression/lists"}