{"id":38522789,"url":"https://github.com/flyconnectome/cocoa","last_synced_at":"2026-01-17T06:43:28.656Z","repository":{"id":229824455,"uuid":"629697843","full_name":"flyconnectome/cocoa","owner":"flyconnectome","description":"Comparative Connectomics for Python","archived":false,"fork":false,"pushed_at":"2025-11-11T13:49:50.000Z","size":3131,"stargazers_count":8,"open_issues_count":1,"forks_count":2,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-11-11T15:21:10.435Z","etag":null,"topics":["celltypes","clustering","connectomics","neurobiology","neurons"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/flyconnectome.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2023-04-18T21:04:15.000Z","updated_at":"2025-11-11T11:19:53.000Z","dependencies_parsed_at":null,"dependency_job_id":"51ffd84c-5b84-44b1-8d6a-131614222f71","html_url":"https://github.com/flyconnectome/cocoa","commit_stats":null,"previous_names":["flyconnectome/cocoa"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/flyconnectome/cocoa","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/flyconnectome%2Fcocoa","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/flyconnectome%2Fcocoa/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/flyconnectome%2Fcocoa/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/flyconnectome%2Fcocoa/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/flyconnectome","download_url":"https://codeload.github.com/flyconnectome/cocoa/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/flyconnectome%2Fcocoa/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28502819,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-17T04:31:57.058Z","status":"ssl_error","status_checked_at":"2026-01-17T04:31:45.816Z","response_time":85,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["celltypes","clustering","connectomics","neurobiology","neurons"],"created_at":"2026-01-17T06:43:28.091Z","updated_at":"2026-01-17T06:43:28.650Z","avatar_url":"https://github.com/flyconnectome.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"![cocoa](docs/_static/cocoa.svg)\n\n# cocoa\n\n`cocoa` is a Python library for **co**mparative **co**nnectomics **a**nalyses.\n\nIt implements various dataset-agnostic as well as dataset-specific methods\nfor matching, connectivity, co-clustering and cell typing.\n\nCurrently implemented are:\n\n1. [FlyWire](https://flywire.ai)\n2. [hemibrain](https://neuprint.janelia.org/?dataset=hemibrain%3Av1.2.1\u0026qt=findneurons)\n3. [MANC](https://neuprint.janelia.org/?dataset=manc%3Av1.2.3\u0026qt=findneurons)\n4. [male CNS](https://neuprint.janelia.org/?dataset=male-cns%3Av0.9\u0026qt=findneurons)\n\nOn the TO-DO list:\n- female adult nerve cord (FANC)\n- brain and nerve cord (BANC)\n\nFeel free to open an Issue or a PR if you want a specific dataset added.\n\n## Install\n\n```bash\npip3 install git+https://github.com/flyconnectome/cocoa.git -U\n```\n\n### Other requirements\n\nAll dependencies should be installed automatically. However, to use the\npre-define datasets you will need to set a couple environment variables and\nsecrets:\n1. To use the neuPrint datasets (hemibrain, MANC and maleCNS) you need to set your\n   API token as `NEUPRINT_APPLICATION_CREDENTIALS`\n   (see [neuprint-python](https://github.com/connectome-neuprint/neuprint-python))\n2. To use the CAVE/chunkedgraph datasets (FlyWire, FANC) you need to have your\n   CAVE token set (see [fafbseg](https://fafbseg-py.readthedocs.io/en/latest/source/tutorials/flywire_setup.html))\n3. _For internal use only_: if you want to use the live annotations from flytable\n   make sure to set the `SEATABLE_SERVER` and `SEATABLE_TOKEN` environment variables\n   (see [sea-serpent](https://github.com/schlegelp/sea-serpent))\n\n## Concepts\n\nThe main concept in `cocoa` is that of a `DataSet`. A `DataSet` represents\na collection of neurons from a specific source (e.g. FlyWire or hemibrain),\nand provides methods to fetch annotations and connectivity.\n\nWhile you can use `cocoa` to run clusterings on just a single dataset,\nits real power lies in co-clustering neurons from multiple datasets. To do\nthis, it auto-magically computes mappings between neurons from different\ndatasets based on available labels. These labels are then used to\ngenerate a joint connectivity vector from which we can compute pairwise\ndistances.\n\n## Examples\n\n```Python\n\u003e\u003e\u003e import cocoa as cc\n\u003e\u003e\u003e # Define the sets of neurons to co-cluster\n\u003e\u003e\u003e hb = cc.Hemibrain(label='hemibrain',\n...                   ).add_neurons(['SLP001', 'SLP003'])\n\u003e\u003e\u003e fwl = cc.FlyWire(label='FlyWire_left',\n...                  materialization=783,\n...                  ).add_neurons(['SLP001', 'SLP003'], sides='left')\n\u003e\u003e\u003e fwr = cc.FlyWire(label='FlyWire_right',\n...                  materialization=783,\n...                  ).add_neurons(['SLP001', 'SLP003'], sides='right')\n\u003e\u003e\u003e # Combine into a clustering and co-cluster\n\u003e\u003e\u003e cl = cc.Clustering([hb, fwl, fwr]).compile()\n\u003e\u003e\u003e # The clustering `cl` contains the results of the clustering.\n\u003e\u003e\u003e # The joint connectivity vector:\n\u003e\u003e\u003e cl.vect_\n                   downstream                          ... upstream\n                      LHAV1b1 LHPV4g1 LHAV5e1 LHAV1b3  ...    CL018 CL077 SLP202 LC9\n294437347                   0       0       1       0  ...        0     0      0   0\n543692985                   0       0       0       4  ...        0     6      0   1\n720575940617091414          0       0       1       0  ...        0     0      0   0\n720575940623050334          0       0       0       2  ...        1     1      0   0\n720575940627960442          0       0       1       0  ...        0     0      1   0\n720575940628895750          1       4       0       3  ...        0     5      0   0\n\u003e\u003e\u003e # The pairwise (cosine) distances:\n\u003e\u003e\u003e cl.dists_\n                    SLP001_hemibrain  ...  SLP003_FlyWire_right\n294437347                   0.000000  ...              0.990616\n543692985                   0.988929  ...              0.092726\n720575940617091414          0.141363  ...              0.994823\n720575940623050334          0.993146  ...              0.046200\n720575940627960442          0.218134  ...              0.992618\n720575940628895750          0.990616  ...              0.000000\n\u003e\u003e\u003e # It also provides some useful methods to work with the data\n\u003e\u003e\u003e table = cl.to_table(clusters=cl.extract_homogeneous_clusters())\n\u003e\u003e\u003e table\n                   id   label        dataset  cn_frac_used  dend_ix  cluster\n0           543692985  SLP003      hemibrain      0.503151        0        0\n1  720575940623050334  SLP003   FlyWire_left      0.541004        1        0\n2  720575940628895750  SLP003  FlyWire_right      0.545074        2        0\n3           294437347  SLP001      hemibrain      0.308048        3        1\n4  720575940617091414  SLP001   FlyWire_left      0.375770        4        1\n5  720575940627960442  SLP001  FlyWire_right      0.328080        5        1\n\u003e\u003e\u003e # See also `cl.plot_clustermap` for a quick visualization\n```\n\nAlternatively, you can also use the `generate_clustering` helper function.\nThat may be enough in cases where you don't need fine-grained control.\n\n```Python\n\u003e\u003e\u003e cl = cc.generate_clustering(\n...            fw=['SLP001', 'SLP002'],\n...            hb=['SLP001', 'SLP002']\n...         ).compile()\n```\n\n## Documentation\n\n`cocoa` does not yet have a dedicated documentation but we provide a number of\n[examples/](examples/) that show how to use the library for various tasks:\n\n- `0_flywire_hemibrain_FC1-3.ipynb`: demonstrates co-clustering for a small group of neurons, including visualization of the results\n- `1_malecns_flywire_mapping.ipynb`: show how to use `cocoa` to generate mappings between neurons from different datasets\n- `2_malecns_flywire_optic_lobes.ipynb`: demonstrates a large-scale (~160k neurons) co-clustering between two datasets\n\nIn addition, all functions/classes have extensive docstrings:\n\n```python\n\u003e\u003e\u003e help(cc.Clustering.compile)\ncc.Clustering.compile(\n    self,\n    join='outer',\n    metric='cosine',\n    mapper=\u003cclass 'cocoa.mappers.GraphMapper'\u003e,\n    force_recompile=False,\n    exclude_labels=None,\n    include_labels=None,\n    ignore_unlabeled=True,\n    cn_frac_threshold=None,\n    augment=None,\n    n_batches='auto',\n    verbose=True,\n)\nDocstring:\nCompile combined connectivity vector and calculate distance matrix.\n\nParameters\n----------\njoin :      \"inner\" | \"outer\" | \"existing\"\n            How to combine the dataset connectivity vectors:\n              - \"existing\" (default) will check if a label exists in\n                theory and use it even if it's not present in the\n                connectivity vectors of all datasets\n              - \"inner\" will get the intersection of all labels across\n                the connectivity vectors\n              - \"outer\" will use all available labels\n            Note: if you are using a GraphMapper, you should use \"outer\"\n            as the mapper will already have filtered out non-matching\n            labels.\nmetric :    \"cosine\" | \"Euclidean\"\n            Metric to use for distance calculations.\nmapper :    cocoa.Mapper | dict\n            The mapper used to match neuron labels across datasets.\n            Examples are `cocoa.GraphMapper` and `cocoa.SimpleMapper`.\n            See the mapper's documentation for more information.\n            Alternatively, you can also provide a dictionary that maps\n            IDs to labels.\nexclude_labels : str | list of str, optional\n            If provided will exclude given labels from the observation\n            vector. This uses regex!\n[...]\n```","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fflyconnectome%2Fcocoa","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fflyconnectome%2Fcocoa","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fflyconnectome%2Fcocoa/lists"}