{"id":26129751,"url":"https://github.com/project-gemmi/pdb-stats","last_synced_at":"2025-04-13T18:53:03.602Z","repository":{"id":52607654,"uuid":"108165585","full_name":"project-gemmi/pdb-stats","owner":"project-gemmi","description":"Clickable PDB statistics: synchrotrons and MX software","archived":false,"fork":false,"pushed_at":"2025-03-07T19:42:06.000Z","size":8056,"stargazers_count":8,"open_issues_count":0,"forks_count":4,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-03-27T09:45:27.734Z","etag":null,"topics":["cross-filter","pdb-files","protein-data-bank","statistics","synchrotron","x-ray-crystallography"],"latest_commit_sha":null,"homepage":"https://project-gemmi.github.io/pdb-stats/","language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/project-gemmi.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-10-24T18:12:39.000Z","updated_at":"2025-03-07T19:42:09.000Z","dependencies_parsed_at":"2024-12-12T12:29:56.858Z","dependency_job_id":"db7811ea-7e8f-43df-87e5-a1af277d1c88","html_url":"https://github.com/project-gemmi/pdb-stats","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/project-gemmi%2Fpdb-stats","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/project-gemmi%2Fpdb-stats/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/project-gemmi%2Fpdb-stats/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/project-gemmi%2Fpdb-stats/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/project-gemmi","download_url":"https://codeload.github.com/project-gemmi/pdb-stats/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248766029,"owners_count":21158296,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cross-filter","pdb-files","protein-data-bank","statistics","synchrotron","x-ray-crystallography"],"created_at":"2025-03-10T19:58:52.985Z","updated_at":"2025-04-13T18:53:03.563Z","avatar_url":"https://github.com/project-gemmi.png","language":"JavaScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"\nThese statistics serve as an example how to use `gemmi-grep`\n(one of the utilities provided by the [project gemmi][1])\nto quickly extract data from mmCIF files.\n\nThe data of interest is extracted from a local copy of the PDB archive\n(mmCIF coordinate files) to a file `grep.out`:\n\n    $ ./extract.sh --auto\n\nThis script uses `gemmi-grep` to extract metadata to a file `grep.out`.\nReading all the archive (69GB of compressed files) takes about 30min.\nThe output is redirected to a file (`grep.out`) that is then used\nto prepare JSON files used in our web pages.\n\n### Filter-able statistics\n\n`./process.py \u003edata.json` makes a concise JSON file that includes only:\n\n* entries deposited since 2008 (aribitrary cut-off),\n* obtained using X-ray crystallography,\n* in case of group depositions (PDB has now only a dozen of such groups)\n  we take one entry from each group.\n\nThe web app itself is contained in a single file (index.html).\nIt depends on three external libraries: dc.js, d3.js and crossfilter.\n\nHere is an interactive demo (each plot can be used as a filter):\n\nhttps://project-gemmi.github.io/pdb-stats/\n\n### Synchrotron work patterns\n\n`./coldates.py` prepares json files for calendar.html, see `extract.sh`.\n\nhttps://project-gemmi.github.io/pdb-stats/calendar.html\n\n### Residue statistics\n\n    $ curl -O ftp://ftp.wwpdb.org/pub/pdb/data/monomers/components.cif.gz\n    $ ./resistat.py $PDB_DIR/structures/divided/mmCIF \u003e residues.json\n    $ # update file_count and date in residues.html (take file_count from json)\n\nhttps://project-gemmi.github.io/pdb-stats/residues.html\n\n### Tag statistics\n\nFile entries.idx is used to sort entries from the most recent ones,\nso that the example PDB ID in tooltip is the newest entry with given tag.\n\n    $ curl -O https://files.wwpdb.org/pub/pdb/derived_data/index/entries.idx\n    $ gemmi tags --full components.cif.gz \u003e ccd-tags.tsv\n    $ gemmi tags --full --entries-idx=entries.idx $PDB_DIR/structures/divided/mmCIF \u003e mmcif-tags.tsv\n    $ gemmi tags --full --entries-idx=entries.idx --sf $PDB_DIR/structures/divided/structure_factors \u003e sf-tags.tsv\n    $ sed -i s\"/ on 20..-..-../ on $(date -Idate)/\" tags.html\n\nhttps://project-gemmi.github.io/pdb-stats/tags.html\n\nSimilarly, for the COD:\n\n    $ gemmi tags --full path/to/cod/cif/ \u003e cod-cif-tags.tsv\n    $ gemmi tags --full --glob='*.hkl' path/to/cod/hkl/ \u003e cod-hkl-tags.tsv\n    $ sed -i s\"/ on 20..-..-../ on $(date -Idate)/\" cod-tags.html\n\nhttps://project-gemmi.github.io/pdb-stats/cod-tags.html\n\n[1]: https://project-gemmi.github.io/\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fproject-gemmi%2Fpdb-stats","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fproject-gemmi%2Fpdb-stats","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fproject-gemmi%2Fpdb-stats/lists"}