{"id":27390501,"url":"https://github.com/biocomputingup/caid-reference","last_synced_at":"2025-04-13T20:01:36.237Z","repository":{"id":287155354,"uuid":"963780204","full_name":"BioComputingUP/caid-reference","owner":"BioComputingUP","description":"Critical Assessment of Protein Intrinsic Disorder (CAID) - Reference generation and analysis","archived":false,"fork":false,"pushed_at":"2025-04-10T08:03:10.000Z","size":23341,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-04-10T08:44:16.115Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/BioComputingUP.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2025-04-10T07:41:33.000Z","updated_at":"2025-04-10T08:03:13.000Z","dependencies_parsed_at":"2025-04-10T08:54:29.690Z","dependency_job_id":null,"html_url":"https://github.com/BioComputingUP/caid-reference","commit_stats":null,"previous_names":["biocomputingup/caid-reference"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BioComputingUP%2Fcaid-reference","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BioComputingUP%2Fcaid-reference/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BioComputingUP%2Fcaid-reference/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BioComputingUP%2Fcaid-reference/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/BioComputingUP","download_url":"https://codeload.github.com/BioComputingUP/caid-reference/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248774935,"owners_count":21159533,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-04-13T20:00:25.222Z","updated_at":"2025-04-13T20:01:36.230Z","avatar_url":"https://github.com/BioComputingUP.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Critical Assessment of Protein Intrinsic Disorder (CAID) \n\n## Reference generation and analysis\n\nThis is the official repository of the [Critical Assessment of Protein Intrinsic Disorder (CAID)](https://caid.idpcentral.org/) challenge.\n\nThe previous CAID2 repository is available [here](https://github.com/BioComputingUP/caid2-reference).\n\nFollow the instructions below to replicate the generation of the references and some useful statistics.\n\n```bash\n# Generate the folder structure\nmkdir -p data/{disprot,sifts,alphafold,output/{references,references_stat,references_merge_analysis,homology,new_taxdump}}\n````\n\n## references\nGenerate the references from two snapshots of the DisProt database (mongo export)\nDisProt data can be obtained directly exporting the relevant database collections (ask the developers). \nOr using the download service from the website (lastest annotations might not be available to the public). \nNote the formats are slightly different.\n\n## Download DisProt data\nUse MongoDB compass and download the current public collection and\nthe current \"curators\" collections.\n\nPublic 2023_12\nCurrent 2024_12_c (29 Oct 2024)\n\nFor the CAID3 CASP-16 dataset\nmongoexport --uri \"mongodb://moros:27017/disprot8\" --collection entries_2024_12_c \u003e data/disprot/disprot8.entries_2024_12_c.json\n\n\n```bash\n# Download data\nwget -O data/sifts/uniprot_segments_observed.tsv.gz ftp://ftp.ebi.ac.uk/pub/databases/msd/sifts/flatfiles/tsv/uniprot_segments_observed.tsv.gz\nwget -O data/disprot/go-basic.obo http://purl.obolibrary.org/obo/go/go-basic.obo\n```\n\n## homology\nParse the blast output, extract information about the best match and perform optimal \npairwise alignments.\nComparison are between CAID and DisProt \"old\" and between\nCAID and PDB seqres.\n\n```bash\n# Generate BLAST alignments of the new DisProt against the old DisProt and against PDB seqres\n# Install blast on your home (check the version and paths)\nwget https://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/ncbi-blast-2.15.0+-x64-linux.tar.gz\ntar -xf ncbi-blast-2.15.0+-x64-linux.tar.gz\nexport PATH=\"/home/$USER/ncbi-blast-2.15.0+/bin:$PATH\" \n\n# Download PDB seqres\nwget https://files.wwpdb.org/pub/pdb/derived_data/pdb_seqres.txt.gz -O data/output/homology/pdb_seqres.txt.gz\ngunzip data/output/homology/pdb_seqres.txt.gz\n\n# Make blast dbs\nmakeblastdb -in data/output/homology/disprot_old.fasta -dbtype prot\nmakeblastdb -in data/output/homology/pdb_seqres.txt -dbtype prot\n\n# Run BLAST\nblastp -db data/output/homology/disprot_old.fasta -query data/output/homology/disprot_new.fasta -out data/output/homology/disprot_new_old.blast -outfmt 6 -num_threads 12\nblastp -db data/output/homology/pdb_seqres.txt -query data/output/homology/disprot_new.fasta -out data/output/homology/disprot_new_pdb.blast -outfmt 6 -num_threads 12\n```\n\n## homology_plot\nGenerate plots from the output of the homology notebook\n\n\n## references_stat\nGenerate statistics about the references\n\n```bash\n# Download taxonomy data\nwget -O data/new_taxdump.tar.gz  ftp://ftp.ncbi.nih.gov/pub/taxonomy/new_taxdump/new_taxdump.tar.gz\ntar -xf data/new_taxdump.tar.gz -C data/new_taxdump\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbiocomputingup%2Fcaid-reference","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbiocomputingup%2Fcaid-reference","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbiocomputingup%2Fcaid-reference/lists"}