{"id":15589762,"url":"https://github.com/emptyport/bacterial_proteomes","last_synced_at":"2025-03-29T09:44:07.568Z","repository":{"id":73125955,"uuid":"102652752","full_name":"emptyport/bacterial_proteomes","owner":"emptyport","description":"Scripts to investigate relationships between proteins between species of bacteria","archived":false,"fork":false,"pushed_at":"2017-10-02T18:07:48.000Z","size":12,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-02-04T01:08:24.829Z","etag":null,"topics":["bacteria","bioinformatics","python"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/emptyport.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-09-06T20:01:43.000Z","updated_at":"2021-02-27T14:32:53.000Z","dependencies_parsed_at":null,"dependency_job_id":"8c187e35-a70c-4411-b073-61bb92ca80f1","html_url":"https://github.com/emptyport/bacterial_proteomes","commit_stats":{"total_commits":7,"total_committers":1,"mean_commits":7.0,"dds":0.0,"last_synced_commit":"dd3c38bc6f61517fa3f7a3fa7a80a1989f807c89"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/emptyport%2Fbacterial_proteomes","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/emptyport%2Fbacterial_proteomes/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/emptyport%2Fbacterial_proteomes/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/emptyport%2Fbacterial_proteomes/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/emptyport","download_url":"https://codeload.github.com/emptyport/bacterial_proteomes/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246168091,"owners_count":20734389,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bacteria","bioinformatics","python"],"created_at":"2024-10-02T23:04:48.630Z","updated_at":"2025-03-29T09:44:07.543Z","avatar_url":"https://github.com/emptyport.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Bacterial Proteome Relationships\n\n\u003e This repository consists of some code I'm developing while learning about gene coupling and co-evolution\n\n## Obtaining the Data\nI downloaded bacterial proteomes from Uniprot using the following commands in a bash terminal:\n```shell\nmkdir bacteria_proteomes\n\ncd bacteria_proteomes\n\nwget ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/reference_proteomes/Bacteria/*\n```\nI downloaded the proteomes on Tuesday, Sep 5, 2017. It took a good part of the day to finish. The \"Date Modified\" says 8/30/17 on Uniprot's ftp server.\n\nFrom here, I ran the script ```create_protein_database.py``` to save the proteins to a MySQL table. Once you install MySQL, just create a database and user and put that information in the script; the script will take care of creating the table. The table structure is as follows:\n\n* **id** is the primary key and is set to autoincrement\n\n* **organism** follows the format \"UP000028641\" and as far as I can tell is an identifier set by Uniprot\n\n* **header** is the header information for each protein sequence in the fasta file\n\n* **sequence** is the protein sequence itself\n\nEach fasta file is uncompressed, read into memory, and then each protein sequence is inserted into the database. Even though each file is being inserted all at once, this process is still quite time consuming (there are over 6000 proteomes to process).\n\nAfter running ```create_protein_database.py```, run the following MySQL commands:\n```sql\nDROP TABLE IF EXISTS `organisms`;\nCREATE TABLE `organisms` (\n\t`organism` VARCHAR(32)\n    );\nINSERT INTO `organisms`\n(`organism`)\nSELECT DISTINCT `organism` FROM `proteins`;\n```\nThis will just create a table of all the unique organisms for which we have protein sequences. Eventually this should be incorporated into the python script, but for now it exists as a separate MySQL script.\n\nAlso add an index for `organism` in the `proteins` table. This will speed things up later.\n\n## BLAST Preparation\n\nInstall NCBI BLAST and make sure it is in your path. For example, you should be able to type 'makeblastdb' from the command line/terminal without receiving an error about that being an unknown program.\n\nOnce BLAST is installed, run ```make_blast_databases.py``` to create a BLAST database for each organism.\n\n## Next Steps\n* Multiple sequence alignment\n* Looking at conserved residues\n* Putting it all together\n\n\n\n\n## OrthoDB stuff\n\nDownload from http://www.orthodb.org/?page=filelist\n\n```grep -v '/\\*' ODB.sql \u003e ODB_mod.sql``` Don't know if this is required, but I did run it before the next step\n\n```sed -i 's/ TYPE=MyISAM;/;/g' ODB_mod.sql``` to make compatible with current MySQL version\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Femptyport%2Fbacterial_proteomes","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Femptyport%2Fbacterial_proteomes","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Femptyport%2Fbacterial_proteomes/lists"}