{"id":25167172,"url":"https://github.com/biocomputingup/caid","last_synced_at":"2025-04-30T22:01:44.817Z","repository":{"id":52147794,"uuid":"288403234","full_name":"BioComputingUP/CAID","owner":"BioComputingUP","description":"Critical Assessment of Intrinsic Disorder","archived":false,"fork":false,"pushed_at":"2022-11-28T14:32:34.000Z","size":117720,"stargazers_count":13,"open_issues_count":0,"forks_count":1,"subscribers_count":6,"default_branch":"vectorized","last_synced_at":"2025-03-30T20:22:35.522Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/BioComputingUP.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-08-18T08:47:44.000Z","updated_at":"2024-11-02T23:03:16.000Z","dependencies_parsed_at":"2023-01-23T13:45:34.643Z","dependency_job_id":null,"html_url":"https://github.com/BioComputingUP/CAID","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BioComputingUP%2FCAID","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BioComputingUP%2FCAID/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BioComputingUP%2FCAID/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BioComputingUP%2FCAID/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/BioComputingUP","download_url":"https://codeload.github.com/BioComputingUP/CAID/tar.gz/refs/heads/vectorized","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":251789604,"owners_count":21644084,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-02-09T06:19:48.028Z","updated_at":"2025-04-30T22:01:44.744Z","avatar_url":"https://github.com/BioComputingUP.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"## Introduction\nThe CAID software produces all outputs necessary for a CAID edition, including baselines, references, metrics and plots, \nstarting from predictions and a reference (see Data Availability section to know how to obtain this data). \nCAID software packages wraps the vectorized_cls_metrics repository (https://github.com/marnec/vectorized_cls_metrics), \nwhich performs the calculations of the classification metrics used throughout CAID. More information at \nhttp://disprotcentral.org/caid\n\n\n## Index\n* Requirements\n* Installation\n* Demo\n* Data\n* Usage\n* License\n\n## Requirements\n### Interpreter\nPython 3.6+\n\n### Dependencies\n* `vectorized_metrics` https://github.com/marnec/vectorized_cls_metrics\n* `numpy`\n* `matplotlib`\n* `seaborn`\n* `scipy`\n* `pandas`\n\n## Installation\nInstallation is only possible on Unix systems. In order to install the package follow these steps:\nTypical install time is around 1 minute.\n\n1. Clone or download the package from the GitHub repository \n\n```\ngit clone https://github.com/BioComputingUP/CAID.git\n```\n\n2. CAID relies on `vectorized_cls_metrics` library. Clone or download the package from the GitHub repository\n\n```\ngit clone https://github.com/marnec/vectorized_cls_metrics\n```\n\n3. Add the `vectorized_cls_metrics` library to the PYTHONPATH environmental variable:\n\n```\nexport PYTHONPATH=\"${PYTHONPATH}:/path/where/the/library/was/cloned\"\n```\n\nThe library is successfully installed. In order to be able to copy-paste commands without the need of customize paths\nthe CAID package should be placed in this folder structure:\n\n```\nCAID-root\n├── baseline\n├── caid                --\u003e (CAID repository)\n├── data\n│   ├── annotations\n│   ├── predictions\n│   │   ├── binding\n│   │   └── disorder\n│   ├── references\n│   │   ├── binding\n│   │   └── disorder\n├── plots\n└── results\n```\n  \n\n## Demo\nOnce installed you can test if everything works fine by launching the `./demo.sh`. \nIn around 1:30 minutes it should produce the following list of files in the `caid/demo/demo_output` folder:\n\n```\nD018_ESpritz-D.rawscores.distribution.txt\nD018_ESpritz-D.thresholds.distribution.txt\nD019_ESpritz-N.rawscores.distribution.txt\nD019_ESpritz-N.thresholds.distribution.txt\nD020_ESpritz-X.rawscores.distribution.txt\nD020_ESpritz-X.thresholds.distribution.txt\ndemo-reference.analysis.all.bootstrap.bac.metrics.csv\ndemo-reference.analysis.all.bootstrap.csi.metrics.csv\ndemo-reference.analysis.all.bootstrap.default.metrics.csv\ndemo-reference.analysis.all.bootstrap.f1s.metrics.csv\ndemo-reference.analysis.all.bootstrap.f2s.metrics.csv\ndemo-reference.analysis.all.bootstrap.f05.metrics.csv\ndemo-reference.analysis.all.bootstrap.fnr.metrics.csv\ndemo-reference.analysis.all.bootstrap.fom.metrics.csv\ndemo-reference.analysis.all.bootstrap.fpr.metrics.csv\ndemo-reference.analysis.all.bootstrap.inf.metrics.csv\ndemo-reference.analysis.all.bootstrap.mcc.metrics.csv\ndemo-reference.analysis.all.bootstrap.mk.metrics.csv\ndemo-reference.analysis.all.bootstrap.npv.metrics.csv\ndemo-reference.analysis.all.bootstrap.ppv.metrics.csv\ndemo-reference.analysis.all.bootstrap.tnr.metrics.csv\ndemo-reference.analysis.all.bootstrap.tpr.metrics.csv\ndemo-reference.analysis.all.ci.bac.metrics.csv\ndemo-reference.analysis.all.ci.csi.metrics.csv\ndemo-reference.analysis.all.ci.default.metrics.csv\ndemo-reference.analysis.all.ci.f1s.metrics.csv\ndemo-reference.analysis.all.ci.f2s.metrics.csv\ndemo-reference.analysis.all.ci.f05.metrics.csv\ndemo-reference.analysis.all.ci.fnr.metrics.csv\ndemo-reference.analysis.all.ci.fom.metrics.csv\ndemo-reference.analysis.all.ci.fpr.metrics.csv\ndemo-reference.analysis.all.ci.inf.metrics.csv\ndemo-reference.analysis.all.ci.mcc.metrics.csv\ndemo-reference.analysis.all.ci.mk.metrics.csv\ndemo-reference.analysis.all.ci.npv.metrics.csv\ndemo-reference.analysis.all.ci.ppv.metrics.csv\ndemo-reference.analysis.all.ci.tnr.metrics.csv\ndemo-reference.analysis.all.ci.tpr.metrics.csv\ndemo-reference.analysis.all.dataset._.cmat.csv\ndemo-reference.analysis.all.dataset._.pr.csv\ndemo-reference.analysis.all.dataset._.predictions.csv\ndemo-reference.analysis.all.dataset._.roc.csv\ndemo-reference.analysis.all.dataset.bac.cmat.csv\ndemo-reference.analysis.all.dataset.bac.metrics.csv\ndemo-reference.analysis.all.dataset.csi.cmat.csv\ndemo-reference.analysis.all.dataset.csi.metrics.csv\ndemo-reference.analysis.all.dataset.default.cmat.csv\ndemo-reference.analysis.all.dataset.default.metrics.csv\ndemo-reference.analysis.all.dataset.f1s.cmat.csv\ndemo-reference.analysis.all.dataset.f1s.metrics.csv\ndemo-reference.analysis.all.dataset.f2s.cmat.csv\ndemo-reference.analysis.all.dataset.f2s.metrics.csv\ndemo-reference.analysis.all.dataset.f05.cmat.csv\ndemo-reference.analysis.all.dataset.f05.metrics.csv\ndemo-reference.analysis.all.dataset.fnr.cmat.csv\ndemo-reference.analysis.all.dataset.fnr.metrics.csv\ndemo-reference.analysis.all.dataset.fom.cmat.csv\ndemo-reference.analysis.all.dataset.fom.metrics.csv\ndemo-reference.analysis.all.dataset.fpr.cmat.csv\ndemo-reference.analysis.all.dataset.fpr.metrics.csv\ndemo-reference.analysis.all.dataset.inf.cmat.csv\ndemo-reference.analysis.all.dataset.inf.metrics.csv\ndemo-reference.analysis.all.dataset.mcc.cmat.csv\ndemo-reference.analysis.all.dataset.mcc.metrics.csv\ndemo-reference.analysis.all.dataset.mk.cmat.csv\ndemo-reference.analysis.all.dataset.mk.metrics.csv\ndemo-reference.analysis.all.dataset.npv.cmat.csv\ndemo-reference.analysis.all.dataset.npv.metrics.csv\ndemo-reference.analysis.all.dataset.ppv.cmat.csv\ndemo-reference.analysis.all.dataset.ppv.metrics.csv\ndemo-reference.analysis.all.dataset.tnr.cmat.csv\ndemo-reference.analysis.all.dataset.tnr.metrics.csv\ndemo-reference.analysis.all.dataset.tpr.cmat.csv\ndemo-reference.analysis.all.dataset.tpr.metrics.csv\ndemo-reference.analysis.all.target.bac.metrics.csv\ndemo-reference.analysis.all.target.csi.metrics.csv\ndemo-reference.analysis.all.target.default.metrics.csv\ndemo-reference.analysis.all.target.f1s.metrics.csv\ndemo-reference.analysis.all.target.f2s.metrics.csv\ndemo-reference.analysis.all.target.f05.metrics.csv\ndemo-reference.analysis.all.target.fnr.metrics.csv\ndemo-reference.analysis.all.target.fom.metrics.csv\ndemo-reference.analysis.all.target.fpr.metrics.csv\ndemo-reference.analysis.all.target.inf.metrics.csv\ndemo-reference.analysis.all.target.mcc.metrics.csv\ndemo-reference.analysis.all.target.mk.metrics.csv\ndemo-reference.analysis.all.target.npv.metrics.csv\ndemo-reference.analysis.all.target.ppv.metrics.csv\ndemo-reference.analysis.all.target.tnr.metrics.csv\ndemo-reference.analysis.all.target.tpr.metrics.csv\ndemo-reference.analysis.D018_ESpritz-D.bootstrap.metrics.csv\ndemo-reference.analysis.D018_ESpritz-D.dataset.metrics.csv\ndemo-reference.analysis.D018_ESpritz-D.target.metrics.csv\ndemo-reference.analysis.D019_ESpritz-N.bootstrap.metrics.csv\ndemo-reference.analysis.D019_ESpritz-N.dataset.metrics.csv\ndemo-reference.analysis.D019_ESpritz-N.target.metrics.csv\ndemo-reference.analysis.D020_ESpritz-X.bootstrap.metrics.csv\ndemo-reference.analysis.D020_ESpritz-X.dataset.metrics.csv\ndemo-reference.analysis.D020_ESpritz-X.target.metrics.csv\n```\n\nThe content of `demo-reference.analysis.all.dataset.bac.metrics.csv` should look like this:\n\n```\n,bac,csi,f05,f1s,f2s,fnr,fom,fpr,inf,mcc,mk,npv,ppv,tnr,tpr,aucroc,aucpr,aps,thr\nD020_ESpritz-X,0.692,0.261,0.329,0.414,0.56,0.268,0.074,0.349,0.383,0.287,0.215,0.926,0.289,0.651,0.732,0.739,0.304,0.303,0.048\nD019_ESpritz-N,0.67,0.249,0.322,0.398,0.521,0.344,0.089,0.317,0.339,0.258,0.197,0.911,0.286,0.683,0.656,0.714,0.296,0.296,0.345\nD018_ESpritz-D,0.704,0.27,0.338,0.426,0.576,0.248,0.068,0.345,0.407,0.305,0.229,0.932,0.297,0.655,0.752,0.774,0.41,0.409,0.248\n\n```\n\n## Data\nCAID revolves around data obtained from different sources to build references and baselines\n\n### Raw data\nIn order to use copy-pasted commands as they are (without customizing paths), the following files should be placed in\n`CAID-root/data/annotations` \n\ndisprot-2018-11-disorder.fasta obtained from: \nhttps://disprot.org/api/search?release=2018_11\u0026show_ambiguous=false\u0026show_obsolete=false\u0026format=fasta\u0026namespace=structural_state\u0026get_consensus=true\n\ndisprot-2016-10-disorder.fasta obtained from: \nhttps://disprot.org/api/search?release=2016_10\u0026show_ambiguous=false\u0026show_obsolete=false\u0026format=fasta\u0026namespace=structural_state\u0026get_consensus=true\n\ndisprot-2018-11-interaction.fasta obtained from: \nhttps://disprot.org/api/search?release=2018_11\u0026show_ambiguous=false\u0026show_obsolete=false\u0026format=fasta\u0026namespace=interaction_partner\u0026get_consensus=true\n\ndisprot-2018-11.json obtained from: \nhttps://disprot.org/api/search?release=current\u0026show_ambiguous=false\u0026show_obsolete=false\u0026format=json\n\nInterProScan (5.38-76.0) output generated with the following command:\n\n```\ninterproscan.sh -f tsv -dp -iprlookup -T /tmp/ -i disprot-2018-11_seq.fasta -o caid_interproscan -T /tmp/\npython parse_interproscan.py ../data/annotations/disprot-2018-11-disorder.fasta ../data/annotations/disprot-2018-11.json caid_interproscan  \u003e ../data/annotations/data/gene3d.fasta\n```\n\nPDB annotations obtained from MobiDB; downloaded on date 06/03/2020 from: \nhttps://mobidb.bio.unipd.it/mobidb3_datasets/latest/derived_disorder.mjson.gz\n\nPDB-ateast definition\n\n```\npython parse_mobidb.py ../data/annotations/disprot-2018-11.json ../data/annotations/derived_disorder.mjson.gz \u003e ../data/annotations/pdb-atleast.fasta\n```\n\n### Predictions\nIn order to use copy-pasted commands as they are (without customizing paths), the following files should be placed in\n`CAID-root/data/predictions/{disorder|binding}`\n \npredictions obtainable from: \nhttps://mobidb.org/caid/1/predictions\n\nDisorder prediction filenames start with `D` character. Binding predictions filenames start with `B` character.\n\n\n## Usage\nCreate reference\n\n```\npython datasets/make_references.py ../data/annotations/disprot-2018-11.json -d ../data/annotations/disprot-2018-11-disorder.fasta -e ../data/annotations/disprot-2016-10-disorder.fasta -s ../data/annotations/pdb-atleast.fasta ../data/annotations/gene3d.fasta -i ../data/annotations/disprot-2018-11-interaction.fasta\n```\n\nCalculate Reference statistics\n\n```\npython reference_stats.py ../data/annotations/disprot-2018-11.json ../data/references/disorder/disprot-disorder* ../data/references/binding/disprot-binding* -o ../data/dataset_stats/\n```\n\nCalculate Evaluation metrics\n\n```\nbash launch_all.sh\n```\n\nDraw plots\n\n```\npython plots.py ../results/ ../baseline/ ../data/references/disorder/ ../data/dataset_stats/ -o ../plots/ -n data/caid_names.json -ll DEBUG -g 'disprot-disorder*'\npython plots.py ../results/ ../baseline/ ../data/references/binding/ ../data/dataset_stats/ -o ../plots/ -n data/caid_names.json -ll DEBUG -g 'disprot-binding*'\n```\n\n## License\n[CC BY 3.0](https://creativecommons.org/licenses/by/3.0/)","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbiocomputingup%2Fcaid","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbiocomputingup%2Fcaid","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbiocomputingup%2Fcaid/lists"}