{"id":48547849,"url":"https://github.com/lucinamay/fairfetched","last_synced_at":"2026-04-08T07:31:02.681Z","repository":{"id":342662499,"uuid":"1165813330","full_name":"lucinamay/fairfetched","owner":"lucinamay","description":"data APIs for reproducible data fetching in cheminformatics in line with FAIR-principles","archived":false,"fork":false,"pushed_at":"2026-03-20T13:14:00.000Z","size":160,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2026-03-21T02:12:51.371Z","etag":null,"topics":["cheminformatics","data-pipeline","database","fair-data"],"latest_commit_sha":null,"homepage":"https://pypi.org/project/fairfetched/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/lucinamay.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-02-24T15:16:52.000Z","updated_at":"2026-03-20T13:14:04.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/lucinamay/fairfetched","commit_stats":null,"previous_names":["lucinamay/fairfetched"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/lucinamay/fairfetched","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lucinamay%2Ffairfetched","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lucinamay%2Ffairfetched/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lucinamay%2Ffairfetched/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lucinamay%2Ffairfetched/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/lucinamay","download_url":"https://codeload.github.com/lucinamay/fairfetched/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lucinamay%2Ffairfetched/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31545904,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-07T16:28:08.000Z","status":"online","status_checked_at":"2026-04-08T02:00:06.127Z","response_time":54,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cheminformatics","data-pipeline","database","fair-data"],"created_at":"2026-04-08T07:31:02.061Z","updated_at":"2026-04-08T07:31:02.676Z","avatar_url":"https://github.com/lucinamay.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# fairfetched\ndata APIs for reproducible data fetching in cheminformatics in line with FAIR principles\n\n# installation \nyou can install this package through\n`uv add fairfetched` (recommended)\n\nor if you do not use the uv package manager:\n`pip install fairfetched`\n\n\n# examples\nyou can download Chembl or Papyrus through:\n```python\nfrom fairfetched.get import Chembl, Papyrus\nmychembl = Chembl.from_latest() # this downloads Chembl raw files + extracts parquet files to wherever you\n                                # have set the environment variable FAIRFETCHED_HOME, PYSTOW_HOME,\n                                # or \u003cHOME\u003e/.data if not in environment variables.\n                                # from there, fairfetched saves it to a folder chembl/\u003cversion\u003e\n\nmychembl.lfs                  # a dictionary of all chembl files in polars LazyFrame format, scanned directly from the extracted .parquet files\n\n\nmychembl.consolidated_paths   # the paths to the parquet-converted tabular data files in the Chembl .db file\n\nmychembl.raw_paths            # the paths to the raw chembl file as downloaded from Chembl. currently does include an uncompressed .db file\n\nmychembl.compounds            # NOT YET IMPLEMENTED !! convenience alias for mychembl.compose()[\"compounds\"], which uses mychembl.lfs LazyFrame joins to obtain an intuitive join of the data.\n                              # from there, you can \n```\n\n### examples of how to use the LazyFrames:\n\n#### checking which columns+datatypes are in the file, so that you can choose to join them:\n```python\n\u003e\u003e\u003e mychembl.lfs[\"activities\"].collect_schema()\nSchema({'activity_id': Int64, 'assay_id': Int64, 'doc_id': Int64, 'record_id': Int64, 'molregno': Int64, 'standard_relation': String, 'standard_value': Float64, 'standard_units': String, 'standard_flag': Int64, 'standard_type': String, 'activity_comment': String, 'data_validity_comment': String, 'potential_duplicate': Int64, 'pchembl_value': Float64, 'bao_endpoint': String, 'uo_units': String, 'qudt_units': String, 'toid': Int64, 'upper_value': Float64, 'standard_upper_value': Null, 'src_id': Int64, 'type': String, 'relation': String, 'value': Float64, 'units': String, 'text_value': String, 'standard_text_value': String, 'action_type': String})\n```\n#### selecting all entries based on doc_id:\n```python\n\u003e\u003e\u003e mychembl.lfs[\"activities\"].filter(doc_id=89530).drop_nulls(\"units\").collect()\nshape: (107, 28)\n┌─────────────┬──────────┬────────┬───────────┬───┬───────┬────────────┬─────────────────────┬─────────────┐\n│ activity_id ┆ assay_id ┆ doc_id ┆ record_id ┆ … ┆ units ┆ text_value ┆ standard_text_value ┆ action_type │\n│ ---         ┆ ---      ┆ ---    ┆ ---       ┆   ┆ ---   ┆ ---        ┆ ---                 ┆ ---         │\n│ i64         ┆ i64      ┆ i64    ┆ i64       ┆   ┆ str   ┆ str        ┆ str                 ┆ str         │\n╞═════════════╪══════════╪════════╪═══════════╪═══╪═══════╪════════════╪═════════════════════╪═════════════╡\n│ 15120638    ┆ 1431503  ┆ 89530  ┆ 2256150   ┆ … ┆ uM    ┆ null       ┆ null                ┆ null        │\n│ 15120639    ┆ 1431503  ┆ 89530  ┆ 2256151   ┆ … ┆ uM    ┆ null       ┆ null                ┆ null        │\n│ 15120640    ┆ 1431503  ┆ 89530  ┆ 2256152   ┆ … ┆ uM    ┆ null       ┆ null                ┆ null        │\n│ 15120641    ┆ 1431503  ┆ 89530  ┆ 2256153   ┆ … ┆ uM    ┆ null       ┆ null                ┆ null        │\n│ 15120642    ┆ 1431503  ┆ 89530  ┆ 2256154   ┆ … ┆ uM    ┆ null       ┆ null                ┆ null        │\n│ …           ┆ …        ┆ …      ┆ …         ┆ … ┆ …     ┆ …          ┆ …                   ┆ …           │\n│ 15125200    ┆ 1431507  ┆ 89530  ┆ 2256167   ┆ … ┆ uM    ┆ null       ┆ null                ┆ null        │\n│ 15125201    ┆ 1431507  ┆ 89530  ┆ 2256168   ┆ … ┆ uM    ┆ null       ┆ null                ┆ null        │\n│ 15125202    ┆ 1431507  ┆ 89530  ┆ 2256169   ┆ … ┆ uM    ┆ null       ┆ null                ┆ null        │\n│ 15125203    ┆ 1431507  ┆ 89530  ┆ 2256170   ┆ … ┆ uM    ┆ null       ┆ null                ┆ null        │\n│ 15125204    ┆ 1431507  ┆ 89530  ┆ 2256171   ┆ … ┆ uM    ┆ null       ┆ null                ┆ null        │\n└─────────────┴──────────┴────────┴───────────┴───┴───────┴────────────┴─────────────────────┴─────────────┘\n```\n\n#### adding compound structure info to the activities on molregno\n```python\n\u003e\u003e\u003e mychembl.lfs[\"activities\"].join(mychembl.lfs[\"compound_structures\"],on=\"molregno\",how=\"left\",validate=\"m:1\").head().collect()\nshape: (5, 32)\n┌─────────────┬──────────┬────────┬───────────┬───┬────────────────────────┬─────────────────────────────────┬─────────────────────────────┬─────────────────────────────────┐\n│ activity_id ┆ assay_id ┆ doc_id ┆ record_id ┆ … ┆ molfile                ┆ standard_inchi                  ┆ standard_inchi_key          ┆ canonical_smiles                │\n│ ---         ┆ ---      ┆ ---    ┆ ---       ┆   ┆ ---                    ┆ ---                             ┆ ---                         ┆ ---                             │\n│ i64         ┆ i64      ┆ i64    ┆ i64       ┆   ┆ str                    ┆ str                             ┆ str                         ┆ str                             │\n╞═════════════╪══════════╪════════╪═══════════╪═══╪════════════════════════╪═════════════════════════════════╪═════════════════════════════╪═════════════════════════════════╡\n│ 31863       ┆ 54505    ┆ 6424   ┆ 206172    ┆ … ┆                        ┆ InChI=1S/C20H12N2O2/c1-2-7-13(… ┆ BEBACPIIZGRKGG-UHFFFAOYSA-N ┆ c1ccc(-c2nc3c(-c4nc5ccccc5o4)c… │\n│             ┆          ┆        ┆           ┆   ┆      RDKit          2D ┆                                 ┆                             ┆                                 │\n│             ┆          ┆        ┆           ┆   ┆                        ┆                                 ┆                             ┆                                 │\n│             ┆          ┆        ┆           ┆   ┆  24 2…                 ┆                                 ┆                             ┆                                 │\n│ 31864       ┆ 83907    ┆ 6432   ┆ 208970    ┆ … ┆                        ┆ InChI=1S/C23H14N2O5/c1-12-5-8-… ┆ SUKVIELCKKEBOJ-UHFFFAOYSA-N ┆ Cc1ccc2oc(-c3cccc(N4C(=O)c5ccc… │\n│             ┆          ┆        ┆           ┆   ┆      RDKit          2D ┆                                 ┆                             ┆                                 │\n│             ┆          ┆        ┆           ┆   ┆                        ┆                                 ┆                             ┆                                 │\n│             ┆          ┆        ┆           ┆   ┆  30 3…                 ┆                                 ┆                             ┆                                 │\n│ 31865       ┆ 88152    ┆ 6432   ┆ 208970    ┆ … ┆                        ┆ InChI=1S/C23H14N2O5/c1-12-5-8-… ┆ SUKVIELCKKEBOJ-UHFFFAOYSA-N ┆ Cc1ccc2oc(-c3cccc(N4C(=O)c5ccc… │\n│             ┆          ┆        ┆           ┆   ┆      RDKit          2D ┆                                 ┆                             ┆                                 │\n│             ┆          ┆        ┆           ┆   ┆                        ┆                                 ┆                             ┆                                 │\n│             ┆          ┆        ┆           ┆   ┆  30 3…                 ┆                                 ┆                             ┆                                 │\n│ 31866       ┆ 83907    ┆ 6432   ┆ 208987    ┆ … ┆                        ┆ InChI=1S/C30H20N2O7/c1-37-24-6… ┆ ZFJHZUAZBGPPQK-UHFFFAOYSA-N ┆ COc1ccccc1-c1ccc2oc(-c3ccc(OC)… │\n│             ┆          ┆        ┆           ┆   ┆      RDKit          2D ┆                                 ┆                             ┆                                 │\n│             ┆          ┆        ┆           ┆   ┆                        ┆                                 ┆                             ┆                                 │\n│             ┆          ┆        ┆           ┆   ┆  39 4…                 ┆                                 ┆                             ┆                                 │\n│ 31867       ┆ 88153    ┆ 6432   ┆ 208987    ┆ … ┆                        ┆ InChI=1S/C30H20N2O7/c1-37-24-6… ┆ ZFJHZUAZBGPPQK-UHFFFAOYSA-N ┆ COc1ccccc1-c1ccc2oc(-c3ccc(OC)… │\n│             ┆          ┆        ┆           ┆   ┆      RDKit          2D ┆                                 ┆                             ┆                                 │\n│             ┆          ┆        ┆           ┆   ┆                        ┆                                 ┆                             ┆                                 │\n│             ┆          ┆        ┆           ┆   ┆  39 4…                 ┆                                 ┆                             ┆                                 │\n└─────────────┴──────────┴────────┴───────────┴───┴────────────────────────┴─────────────────────────────────┴─────────────────────────────┴─────────────────────────────────┘\n```\n\n#### move it to pandas for direct drop-in use (if you really want pandas...)\nideally as far down the line after you complete all filtering, you call `.collect().to_pandas()` (see polars documentation for more info)\n```\nmychembl.lfs[\"activities\"].collect().to_pandas()\n```\n\n\n\n\n# roadmap\n- [ ] papyrus database support\n  - [x] papyrus latest version download\n  - [x] simple nested filtering\n  - [ ] efficient nested filtering\n  - [ ] all-version support\n  - [ ] built-in pivots\n- [ ] chembl database support\n  - [x] database to tables (parquet)\n  - [ ] intuitive pre-merged flat files\n  - [ ] database visualisation\n  - [ ] remove the need for storing uncompressed .db\n- [ ] reproducion from downloaded raw file \n- [ ] reproducible molecular (and protein?) standardisation\n- [ ] automated time-url logging and manifest files\n- [ ] well-organised logging\n- [ ] dependency minimisation\n- [ ] other database support\n- [ ] preservation of api and parsing logic per major version\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flucinamay%2Ffairfetched","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flucinamay%2Ffairfetched","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flucinamay%2Ffairfetched/lists"}