{"id":34497516,"url":"https://github.com/scbirlab/carabiner","last_synced_at":"2026-04-27T05:33:30.216Z","repository":{"id":213557835,"uuid":"734377457","full_name":"scbirlab/carabiner","owner":"scbirlab","description":"🪨  Useful python utilities.","archived":false,"fork":false,"pushed_at":"2026-04-10T11:15:34.000Z","size":82,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2026-04-10T11:28:33.377Z","etag":null,"topics":["utilities","utilities-python"],"latest_commit_sha":null,"homepage":"https://carabiner-docs.readthedocs.org","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/scbirlab.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2023-12-21T14:28:49.000Z","updated_at":"2026-04-10T11:14:15.000Z","dependencies_parsed_at":"2024-04-23T10:42:42.308Z","dependency_job_id":null,"html_url":"https://github.com/scbirlab/carabiner","commit_stats":null,"previous_names":["scbirlab/carabiner"],"tags_count":9,"template":false,"template_full_name":null,"purl":"pkg:github/scbirlab/carabiner","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/scbirlab%2Fcarabiner","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/scbirlab%2Fcarabiner/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/scbirlab%2Fcarabiner/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/scbirlab%2Fcarabiner/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/scbirlab","download_url":"https://codeload.github.com/scbirlab/carabiner/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/scbirlab%2Fcarabiner/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32324547,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-26T23:26:28.701Z","status":"online","status_checked_at":"2026-04-27T02:00:06.769Z","response_time":128,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["utilities","utilities-python"],"created_at":"2025-12-24T01:47:20.871Z","updated_at":"2026-04-27T05:33:30.204Z","avatar_url":"https://github.com/scbirlab.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# 🪨 carabiner\n\n![GitHub Workflow Status (with branch)](https://img.shields.io/github/actions/workflow/status/scbirlab/carabiner/python-publish.yml)\n![PyPI - Python Version](https://img.shields.io/pypi/pyversions/carabiner-tools)\n![PyPI](https://img.shields.io/pypi/v/carabiner-tools)\n\nA ragtag collection of useful Python functions and classes.\n\n- [Installation](#installation)\n- [Fast and flexible reading and random access of very large files](#fast-and-flexible-reading-and-random-access-of-very-large-files)\n    - [Reading tabular data](#reading-tabular-data)\n- [Utilities to simplify building command-line apps](#utilities-to-simplify-building-command-line-apps)\n- [Reservoir sampling](#reservoir-sampling)\n- [Multikey dictionaries](#multikey-dictionaries)\n- [Decorators](#decorators)\n    - [Vectorized functions](#vectorized-functions)\n    - [Return `None` instead of error](#return-none-instead-of-error)\n    - [Decorators with parameters](#decorators-with-parameters)\n- [Colorblind palette](#colorblind-palette)\n- [Grids with sensible defaults in Matplotlib](#grids-with-sensible-defaults-in-matplotlib)\n- [Fast indicator matrix x dense matrix multiplication in Tensorflow](#fast-indicator-matrix-x-dense-matrix-multiplication-in-tensorflow)\n- [Issues, problems, suggestions](#issues-problems-suggestions)\n- [Documentation](#documentation)\n\n## Installation\n\n### The easy way\n\nInstall the pre-compiled version from GitHub:\n\n```bash\n$ pip install carabiner-tools\n```\n\nIf you want to use the `tensorflow`, `pandas`, or `matplotlib` utilities, these must be installed separately\nor together:\n\n```bash\n$ pip install carabiner-tools[deep]\n# or\n$ pip install carabiner-tools[pd]\n# or\n$ pip install carabiner-tools[mpl]\n# or\n$ pip install carabiner-tools[all]\n```\n\n### From source\n\nClone the repository, then `cd` into it. Then run:\n\n```bash\n$ pip install -e .\n```\n\n## Fast and flexible reading and random access of very large files\n\nSubsets of lines from very large, optionally compressed, files can be read quickly \ninto memory. for example, we can read the first 10,000 lines of an arbitrarily large \nfile:\n\n```python\n\u003e\u003e\u003e from carabiner.io import get_lines\n\n\u003e\u003e\u003e get_lines(\"big-table.tsv.gz\", lines=10_000)\n```\n\nOr random access of specific lines. Hundreds of millions of lines can be \nparsed per minute.\n\n```python\n\u003e\u003e\u003e get_lines(\"big-table.tsv.gz\", lines=[999999, 10000000, 100000001])\n```\n\nThis pattern will allow sampling a random subset:\n\n```python\n\u003e\u003e\u003e from random import sample\n\u003e\u003e\u003e from carabiner.io import count_lines, get_lines\n\n\u003e\u003e\u003e number_of_lines = count_lines(\"big-table.tsv.gz\")\n\u003e\u003e\u003e line_sample = sample(range(number_of_lines), k=1000)\n\u003e\u003e\u003e get_lines(\"big-table.tsv.gz\", lines=line_sample)\n```\n\n### Reading tabular data\n\nWith this backend, we can read subsets of very large files more quickly \nand flexibly than plain `pandas.read_csv`. Formats (delimiters) including Excel \nare inferred from file extensions, but can also be over-ridden with the `format` \nparameter.\n\n```python\n\u003e\u003e\u003e from carabiner.pd import read_table\n\n\u003e\u003e\u003e read_table(\"big-table.tsv.gz\", lines=10_000)\n```\n\nThe same fast random access is availavble as for reading lines. Hundreds of \nmillions of records can be looped through per minute.\n\n```python\n\u003e\u003e\u003e from random import sample\n\u003e\u003e\u003e from carabiner.io import count_lines, get_lines\n\n\u003e\u003e\u003e number_of_lines = count_lines(\"big-table.tsv.gz\")\n\u003e\u003e\u003e line_sample = sample(range(number_of_lines), k=1000)\n\u003e\u003e\u003e read_table(\"big-table.tsv.gz\", lines=line_sample)\n```\n\n## Utilities to simplify building command-line apps\n\nThe standard library `argparse` is robust but verbose when building command-line apps with several sub-commands, each with many options. `carabiner.cliutils` smooths this process. Apps are built by defining `CLIOptions` which are then assigned to `CLICommands` directing the functions to run when called, which then form part of a `CLIApp`.\n\nFirst define the options:\n```python\ninputs = CLIOption('inputs',\n                    type=str,\n                    default=[],\n                    nargs='*',\n                    help='')\noutput = CLIOption('--output', '-o', \n                    type=FileType('w'),\n                    default=sys.stdout,\n                    help='Output file. Default: STDOUT')\nformatting = CLIOption('--format', '-f', \n                        type=str,\n                        default='TSV',\n                        choices=['TSV', 'CSV', 'tsv', 'csv'],\n                        help='Format of files. Default: %(default)s')\n```\n\nThen the commands:\n\n```python\ntest = CLICommand(\"test\",\n                    description=\"Test CLI subcommand using Carabiner utilities.\",\n                    options=[inputs, output, formatting],\n                    main=_main)\n```\n\nThe same options can be assigned to multiple commands if necessary.\n\nFianlly, define the app and run it:\n\n```python\n\napp = CLIApp(\"Carabiner\", \n             version=__version__,\n             description=\"Test CLI app using Carabiner utilities.\",\n             commands=[test])\n\napp.run()\n```\n## Reservoir sampling\n\nIf you need to sample a random subset from an iterator of unknown length by looping through only once, you can use this pure python implementation of [reservoir sampling](https://en.wikipedia.org/wiki/Reservoir_sampling).\n\nAn important limitation is that while the population to be sampled is not necessarily in memory, the sampled population must fit in memory.\n\nOriginally written in [Python Bugs](https://bugs.python.org/issue41311).\n\nBased on [this GitHub Gist](https://gist.github.com/oscarbenjamin/4c1b977181f34414a425f68589e895d1).\n\n```python\n\u003e\u003e\u003e from carabiner.random import sample_iter\n\u003e\u003e\u003e from string import ascii_letters\n\u003e\u003e\u003e from itertools import chain\n\u003e\u003e\u003e from random import seed\n\u003e\u003e\u003e seed(1)\n\u003e\u003e\u003e sample_iter(chain.from_iterable(ascii_letters for _ in range(1000000)), 10)\n['X', 'c', 'w', 'q', 'T', 'e', 'u', 'w', 'E', 'h']\n\u003e\u003e\u003e seed(1)\n\u003e\u003e\u003e sample_iter(chain.from_iterable(ascii_letters for _ in range(1000000)), 10, shuffle_output=False)\n['T', 'h', 'u', 'X', 'E', 'e', 'w', 'q', 'c', 'w']\n\n```\n\n## Multikey dictionaries\n\nConveniently return the values of multiple keys from a dictionary without manually looping.\n\n```python\n\u003e\u003e\u003e from carabiner.collections import MultiKeyDict\n\u003e\u003e\u003e d = MultiKeyDict(a=1, b=2, c=3)\n\u003e\u003e\u003e d\n{'a': 1, 'b': 2, 'c': 3}\n\u003e\u003e\u003e d['c']\n{'c': 3}\n\u003e\u003e\u003e d['a', 'b']\n{'a': 1, 'b': 2} \n```\n\n## Decorators\n\n`carabiner` provides several decorators to facilitate functional programming.\n\n### Vectorized functions\n\nIn scientific programming frameworks like `numpy` we are used to functions which take a scalar or vector and apply to every element. It is occasionally useful to convert functions from arbitrary packages to behave in a vectorized manner on Python iterables.\n\nScalar functions can be converted to a vectorized form easily using `@vectorize`.\n\n```python\n\u003e\u003e\u003e @vectorize\n... def vector_adder(x): return x + 1\n...\n\u003e\u003e\u003e list(vector_adder(range(3)))\n[1, 2, 3]\n\u003e\u003e\u003e list(vector_adder((4, 5, 6)))\n[5, 6, 7]\n\u003e\u003e\u003e vector_adder([10])\n11\n\u003e\u003e\u003e vector_adder(10)\n11\n```\n\n### Return `None` instead of error\n\nWhen it is useful for a function to not fail, but have a testable indicator of success, you can wrap in `@return_none_on_error`.\n\n```python\n\u003e\u003e\u003e def error_maker(x): raise KeyError\n... \n\u003e\u003e\u003e @return_none_on_error\n... def error_maker2(x): raise KeyError\n... \n\u003e\u003e\u003e @return_none_on_error(exception=ValueError)\n... def error_maker3(x): raise KeyError\n... \n\n\u003e\u003e\u003e error_maker('a')  # Causes an error\nTraceback (most recent call last):\nFile \"\u003cstdin\u003e\", line 1, in \u003cmodule\u003e\nFile \"\u003cstdin\u003e\", line 1, in error_maker\nKeyError\n\n\u003e\u003e\u003e error_maker2('a')  # Wrapped returns None\n\n\u003e\u003e\u003e error_maker3('a')  # Only catches ValueError\nTraceback (most recent call last):\nFile \"\u003cstdin\u003e\", line 1, in \u003cmodule\u003e\nFile \".../carabiner/decorators.py\", line 59, in wrapped_function\n    \nFile \"\u003cstdin\u003e\", line 2, in error_maker3\nKeyError\n```\n\n### Decorators with parameters\n\nSometimes a decorator has optional parameters to control its behavior. It's convenient to use it in the form `@decorator` when you want the default behavior, or `@decorator(*kwargs)` when you want to custmize the behavior. Usually this requires some convoluted code, but this has been packed up into `@decorator_with_params`, to decorate your decorator definitions!\n\n```python\n\u003e\u003e\u003e def decor(f, suffix=\"World\"): \n...     return lambda x: f(x + suffix)\n...\n\u003e\u003e\u003e @decor\n... def printer(x): \n...     print(x)\n... \n\n# doesn't work, raises an error!\n\u003e\u003e\u003e @decor(suffix=\"everyone\")  \n... def printer2(x): \n...     print(x)\n... \nTraceback (most recent call last):\nFile \"\u003cstdin\u003e\", line 1, in \u003cmodule\u003e\nTypeError: decor() missing 1 required positional argument: 'f'\n\n# decorate the decorator!\n\u003e\u003e\u003e @decorator_with_params\n... def decor2(f, suffix=\"World\"): \n...     return lambda x: f(x + suffix)\n... \n\n# Now it works!\n\u003e\u003e\u003e @decor2(suffix=\"everyone\")  \n... def printer3(x): \n...     print(x)\n... \n\n\u003e\u003e\u003e printer(\"Hello \")\nHello World\n\u003e\u003e\u003e printer3(\"Hello \")\nHello everyone\n```\n\n## Colorblind palette\n\nHere's a qualitative palette that's colorblind friendly.\n\n```python\n\u003e\u003e\u003e from carabiner import colorblind_palette\n\n\u003e\u003e\u003e colorblind_palette()\n('#EE7733', '#0077BB', '#33BBEE', '#EE3377', '#CC3311', '#009988', '#BBBBBB', '#000000')\n\n# subsets\n\u003e\u003e\u003e colorblind_palette(range(2))\n('#EE7733', '#0077BB')\n\u003e\u003e\u003e colorblind_palette(slice(3, 6))\n('#EE3377', '#CC3311', '#009988')\n```\n\n## Grids with sensible defaults in Matplotlib\n\nWhile `plt.subplots()` is very flexible, it requires many defaults to be defined. Instead, `carabiner.mpl.grid()` generates the `fig, ax` tuple with sensible defaults of a 1x1 grid with panel size 3 and a `constrained` layout.\n\n```python\nfrom carabiner.mpl import grid\nfig, ax = grid()  # 1x1 grid\nfig, ax = grid(ncol=3)  # 1x3 grid; figsize expands appropriately\nfig, ax = grid(ncol=3, nrow=2, sharex=True)  #additional parameters are passed to `plt.subplots()`\n```\n\n## Fast indicator matrix x dense matrix multiplication in Tensorflow\n\nIf you want to multiply an indicator matrix, i.e. a sparse matrix of zeros and ones with the same number of non-zero entries per row (as in linear models), as part of a Tensorflow model, this pattern will be faster than using `tensorflow.SparseMatrix` if you convert the indicator matrix to a `[n x 1]` matrix providing the index of the non-zero element per row.\n\n## Issues, problems, suggestions\n\nAdd to the [issue tracker](https://www.github.com/carabiner/issues).\n\n## Documentation\n\nAvailable at [ReadTheDocs](https://carabiner-docs.readthedocs.org).","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fscbirlab%2Fcarabiner","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fscbirlab%2Fcarabiner","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fscbirlab%2Fcarabiner/lists"}