{"id":18008891,"url":"https://github.com/eerkela/bertrand","last_synced_at":"2026-03-27T02:35:48.383Z","repository":{"id":53916618,"uuid":"478306486","full_name":"eerkela/bertrand","owner":"eerkela","description":"flexible type extensions for pandas","archived":false,"fork":false,"pushed_at":"2026-03-16T10:28:47.000Z","size":39530,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2026-03-16T22:39:23.955Z","etag":null,"topics":["conversions","data-analysis","data-engineering","data-science","multiple-dispatch","numpy","pandas","type-checking","type-inference","types"],"latest_commit_sha":null,"homepage":"","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/eerkela.png","metadata":{"files":{"readme":"README.rst","changelog":"CHANGELOG.rst","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2022-04-05T21:30:04.000Z","updated_at":"2026-03-16T10:28:52.000Z","dependencies_parsed_at":"2023-09-24T09:14:45.599Z","dependency_job_id":"d03f23dd-44d6-4c18-968c-a985203a5b0d","html_url":"https://github.com/eerkela/bertrand","commit_stats":null,"previous_names":["eerkela/bertrand"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/eerkela/bertrand","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/eerkela%2Fbertrand","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/eerkela%2Fbertrand/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/eerkela%2Fbertrand/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/eerkela%2Fbertrand/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/eerkela","download_url":"https://codeload.github.com/eerkela/bertrand/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/eerkela%2Fbertrand/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31010639,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-27T02:33:22.146Z","status":"ssl_error","status_checked_at":"2026-03-27T02:33:21.763Z","response_time":164,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["conversions","data-analysis","data-engineering","data-science","multiple-dispatch","numpy","pandas","type-checking","type-inference","types"],"created_at":"2024-10-30T02:07:46.630Z","updated_at":"2026-03-27T02:35:48.340Z","avatar_url":"https://github.com/eerkela.png","language":"C++","funding_links":[],"categories":[],"sub_categories":[],"readme":".. NOTE: whenever a change is made to this file, make sure to update the\n.. start and end lines of index.rst to allow doctests to run.\n\n.. CI BADGES\n\n.. latest PyPI version\n.. |current_version| image:: https://img.shields.io/pypi/v/bertrand\n   :alt: PyPI version\n   :target: https://pypi.org/project/bertrand/\n\n.. supported Python versions\n.. |requires_python| image:: https://img.shields.io/badge/python-3.7%2B-blue\n   :alt: Supported Python versions\n   :target: https://pypi.org/project/bertrand/\n\n.. TODO: add tests pass/fail status + coverage report when applicable\n\n.. latest cibuildwheel report\n.. |cibuildwheel| image:: https://github.com/eerkela/pdcast/actions/workflows/cibuildwheel.yml/badge.svg\n   :alt: PyPI wheels\n   :target: https://github.com/eerkela/pdcast/actions/workflows/cibuildwheel.yml\n\n.. latest pylint report\n.. |pylint| image:: https://github.com/eerkela/pdcast/actions/workflows/pylint.yml/badge.svg\n   :alt: Pylint report\n   :target: https://github.com/eerkela/pdcast/actions/workflows/pylint.yml\n\n.. latest mypy report\n.. |mypy| image:: https://github.com/eerkela/pdcast/actions/workflows/mypy.yml/badge.svg\n   :alt: Mypy report\n   :target: https://github.com/eerkela/pdcast/actions/workflows/mypy.yml\n\n.. latest black format report\n.. |black| image:: https://github.com/eerkela/pdcast/actions/workflows/black.yml/badge.svg\n   :alt: Black format\n   :target: https://github.com/eerkela/pdcast/actions/workflows/black.yml\n\npdcast - flexible type extensions for pandas\n============================================\n|current_version| |requires_python| |cibuildwheel| |pylint| |mypy| |black|\n\n``pdcast`` enhances the numpy/pandas typing infrastructure, allowing users to\nwrite powerful, modular extensions for arbitrary data.\n\n.. contents::\n   :local:\n\nWhat pdcast does\n----------------\n``pdcast`` provides a robust toolset for handling custom data types, including:\n\n*  **Automatic creation of ExtensionDtypes**: ``pdcast`` simplifies and\n   streamlines the creation of new data types for the pandas ecosystem.\n*  **Universal conversions**: ``pdcast`` implements a single, overloadable\n   conversion function that can losslessly convert data within its expanded\n   type system.\n*  **Type inference and schema validation**: ``pdcast`` can efficiently infer\n   the types of arbitrary data and compare them against an external schema,\n   increasing confidence and reliability in complex data pipelines.\n*  **First-class support for missing values and mixed-type data**: ``pdcast``\n   implements a separate data type for missing values, and can naturally\n   process composite vectors via a split-apply-combine strategy.\n*  **Data compression**: ``pdcast`` can losslessly compress data into a more\n   efficient representation, reducing memory usage and increasing performance.\n*  **Compatibility with third-party libraries**: ``pdcast`` bridges the gap\n   between dynamically-typed Python and statically-typed extension libraries,\n   allowing users to optimize their code without sacrificing flexibility.\n\nFeatures\n--------\n``pdcast`` implements a rich `type system\n\u003chttps://en.wikipedia.org/wiki/Type_system\u003e`_ for numpy/pandas ``dtype``\nobjects, adding support for:\n\n*  **Abstract hierarchies** representing different subtypes and\n   implementations.  These are lightweight, efficient, and highly extensible,\n   with new types added in as little as :ref:`10 lines of code \u003ctutorial\u003e`.\n\n   .. doctest::\n\n      \u003e\u003e\u003e @register\n      ... class CustomType(ScalarType):\n      ...     name = \"custom\"\n      ...     aliases = {\"foo\", \"bar\"}\n      ... \n      ...     def __init__(self, x=None):\n      ...         super().__init__(x=x)\n\n*  A configurable, **domain-specific mini-language** for resolving types.  This\n   represents a superset of the existing numpy/pandas syntax, with customizable\n   aliases and semantics.\n\n   .. doctest::\n\n      \u003e\u003e\u003e resolve_type(\"foo\")\n      CustomType(x=None)\n      \u003e\u003e\u003e resolve_type(\"foo\").aliases.add(\"baz\")\n      \u003e\u003e\u003e resolve_type(\"baz[x]\")\n      CustomType(x='x')\n\n*  Vectorized **type detection** for example data in any format.  This is\n   highly optimized and works regardless of an example's ``.dtype`` attribute,\n   allowing ``pdcast`` to infer the types of ambiguous sequences such as lists,\n   tuples, generators, and ``dtype: object`` arrays, no matter their contents.\n\n   .. doctest::\n\n      \u003e\u003e\u003e detect_type([1, 2, 3])\n      PythonIntegerType()\n      \u003e\u003e\u003e detect_type([1, 2.3, 4+5j])   # doctest: +SKIP\n      CompositeType({int[python], float64[python], complex128[python]})\n\n*  Efficient **type checks** for vectorized data.  These combine the above\n   tools to perform ``isinstance()``-like hierarchical checks for any node in\n   the ``pdcast`` type system.  If the data are properly labeled, then this is\n   done in constant time, allowing users to add checks wherever they are\n   needed.\n\n   .. doctest::\n\n         \u003e\u003e\u003e df = pd.DataFrame({\"a\": [1, 2], \"b\": [1., 2.], \"c\": [\"a\", \"b\"]})\n         \u003e\u003e\u003e typecheck(df, {\"a\": \"int\", \"b\": \"float\", \"c\": \"string\"})\n         True\n         \u003e\u003e\u003e typecheck(df[\"a\"], \"int\")\n         True\n\n*  Support for **composite** and **decorator** types.  These can be used to\n   represent mixed data and/or add new functionality to an existing type\n   without modifying its original implementation (for instance by marking it as\n   ``sparse`` or ``categorical``).\n\n   .. doctest::\n\n      \u003e\u003e\u003e resolve_type(\"int, float, complex\")  # doctest: +SKIP\n      CompositeType({int, float, complex})\n      \u003e\u003e\u003e resolve_type(\"sparse[int, 23]\")\n      SparseType(wrapped=IntegerType(), fill_value=23)\n\n*  **Multiple dispatch** based on the inferred type of one or more of a\n   function's arguments.  With the ``pdcast`` type system, this can be extended\n   to cover vectorized data in any representation, including those containing\n   mixed elements.\n\n   .. doctest::\n\n      \u003e\u003e\u003e @dispatch(\"x\", \"y\")\n      ... def add(x, y):\n      ...     return x + y\n\n      \u003e\u003e\u003e @add.overload(\"int\", \"int\")\n      ... def add_integer(x, y):\n      ...     return x - y\n\n      \u003e\u003e\u003e add([1, 2, 3], 1)\n      0    0\n      1    1\n      2    2\n      dtype: int[python]\n      \u003e\u003e\u003e add([1, 2, 3], [1, True, 1.0])\n      0      0\n      1      3\n      2    4.0\n      dtype: object\n\n*  **Metaprogrammable extension functions** with dynamic arguments.  These can\n   be used to actively manage the values that are supplied to a function by\n   defining validators for one or more arguments, which pass their results into\n   the body of the function in-place.  They can also be used to\n   programmatically add new arguments at runtime, making them available to any\n   virtual implementations that might request them.\n\n   .. doctest::\n\n      \u003e\u003e\u003e @extension_func\n      ... def add(x, y, **kwargs):\n      ...     return x + y\n\n      \u003e\u003e\u003e @add.argument\n      ... def y(val, context: dict) -\u003e int:\n      ...     return int(value)\n\n      \u003e\u003e\u003e add(1, \"2\")\n      3\n      \u003e\u003e\u003e add.y = 2\n      \u003e\u003e\u003e add(1)\n      3\n      \u003e\u003e\u003e del add.y\n      \u003e\u003e\u003e add(1)\n      Traceback (most recent call last):\n         ...\n      TypeError: add() missing 1 required positional argument: 'y'\n\n*  **Attachable functions** with a variety of access patterns.  These can be\n   used to export a function to an existing class as a virtual attribute,\n   dynamically modifying its interface at runtime.  These attributes can be\n   used to mask existing behavior while maintaining access to the original\n   implementation or be hidden behind virtual namespaces to avoid conflicts\n   altogether, similar to ``Series.str``, ``Series.dt``, etc.\n\n   .. doctest::\n\n      \u003e\u003e\u003e pdcast.attach()\n      \u003e\u003e\u003e series = pd.Series([1, 2, 3])\n      \u003e\u003e\u003e series.element_type == detect_type(series)\n      True\n      \u003e\u003e\u003e series.typecheck(\"int\") == typecheck(series, \"int\")\n      True\n\nTogether, these features enable a functional approach to extending pandas with\nsmall, fully encapsulated functions that perform special logic based on the\ntypes of their arguments.  Users are thus able to surgically overload virtually\nany aspect of the pandas interface or add entirely new behavior specific to\none or more of their own data types - all while maintaining the pandas tools\nthey know and love.\n\n..\n   Installation\n   ------------\n   Wheels are built using `cibuildwheel\n   \u003chttps://cibuildwheel.readthedocs.io/en/stable/\u003e`_ and are available for most\n   platforms via the Python Package Index (PyPI).\n\n   .. TODO: add hyperlink to PyPI page when it goes live\n\n   .. code:: console\n\n      (.venv) $ pip install pdcast\n\n   If a wheel is not available for your system, ``pdcast`` also provides a\n   source distribution to allow pip to build locally, although doing so\n   requires a valid `Cython \u003chttps://cython.org/\u003e`_ installation, including a C\n   compiler such as `gcc \u003chttps://gcc.gnu.org/\u003e`_ for Mac/Linux or `MinGW\n   \u003chttps://sourceforge.net/projects/mingw/\u003e`_ for Windows.\n\n   .. code:: console\n\n      (.venv) $ git clone https://github.com/eerkela/pdcast\n      (.venv) $ pip install pdcast/\n\n   This should take around 5 minutes to build.  An editable install can be\n   created by running:\n\n   .. code:: console\n\n      (.venv) $ git clone https://github.com/eerkela/pdcast\n      (.venv) $ cd pdcast/\n      (.venv) $ pip install -e .[dev]\n      (.venv) $ make help\n\n   Manual installs may also require Python development headers if they are\n   not already present.  These can be installed via your system's package\n   manager.\n\n      *  On Ubuntu (or other Debian-based systems), run\n         ``sudo apt-get install python3-dev``.\n      *  On CentOS, run: ``sudo yum install python3-devel``.\n      *  On Fedora, run: ``sudo dnf install python3-devel``.\n\nUsage\n-----\n``pdcast`` combines its advanced features to implement its own super-charged\n:func:`cast() \u003cpdcast.cast\u003e` function, which can perform universal data\nconversions within its expanded type system.  Here's a round-trip journey\nthrough each of the core families of the ``pdcast`` type system:\n\n.. doctest::\n\n   \u003e\u003e\u003e import numpy as np\n\n   \u003e\u003e\u003e class CustomObj:\n   ...     def __init__(self, x):  self.x = x\n   ...     def __str__(self):  return f\"CustomObj({self.x})\"\n   ...     def __repr__(self):  return str(self)\n\n   \u003e\u003e\u003e pdcast.to_boolean([1+0j, \"False\", None])  # non-homogenous to start\n   0     True\n   1    False\n   2     \u003cNA\u003e\n   dtype: boolean\n   \u003e\u003e\u003e _.cast(np.dtype(np.int8))  # to integer\n   0       1\n   1       0\n   2    \u003cNA\u003e\n   dtype: Int8\n   \u003e\u003e\u003e _.cast(\"double\")  # to float\n   0    1.0\n   1    0.0\n   2    NaN\n   dtype: float64\n   \u003e\u003e\u003e _.cast(np.complex128, downcast=True)  # to complex (minimizing memory usage)\n   0    1.0+0.0j\n   1    0.0+0.0j\n   2   N000a000N\n   dtype: complex64\n   \u003e\u003e\u003e _.cast(\"sparse[decimal, 1]\")  # to decimal (sparse)\n   0      1\n   1      0\n   2    NaN\n   dtype: Sparse[object, Decimal('1')]\n   \u003e\u003e\u003e _.cast(\"datetime\", unit=\"Y\", since=\"j2000\")  # to datetime (years since j2000 epoch)\n   0   2001-01-01 12:00:00\n   1   2000-01-01 12:00:00\n   2                   NaT\n   dtype: datetime64[ns]\n   \u003e\u003e\u003e _.cast(\"timedelta[python]\", since=\"Jan 1st, 2000 at 12:00 PM\")  # to timedelta (µs since j2000)\n   0    366 days, 0:00:00\n   1              0:00:00\n   2                  NaT\n   dtype: timedelta[python]\n   \u003e\u003e\u003e _.cast(CustomObj)  # to custom Python object\n   0    CustomObj(366 days, 0:00:00)\n   1              CustomObj(0:00:00)\n   2                            \u003cNA\u003e\n   dtype: object[\u003cclass 'CustomObj'\u003e]\n   \u003e\u003e\u003e _.cast(\"categorical[str[pyarrow]]\")  # to string (categorical with PyArrow backend)\n   0    CustomObj(366 days, 0:00:00)\n   1              CustomObj(0:00:00)\n   2                            \u003cNA\u003e\n   dtype: category\n   Categories (2, string): [CustomObj(0:00:00), CustomObj(366 days, 0:00:00)]\n   \u003e\u003e\u003e _.cast(\"bool\", true=\"*\", false=\"CustomObj(0:00:00)\")  # back to our original data\n   0     True\n   1    False\n   2     \u003cNA\u003e\n   dtype: boolean\n\nNew implementations for :func:`cast() \u003cpdcast.cast\u003e` can be added dynamically,\nwith customization for both the source and destination types.\n\n.. doctest::\n\n   \u003e\u003e\u003e @cast.overload(\"bool[python]\", \"int[python]\")\n   ... def my_custom_conversion(series, dtype, **unused):\n   ...     print(\"calling my custom conversion...\")\n   ...     return series.apply(int, convert_dtype=False)\n\n   \u003e\u003e\u003e pd.Series([True, False], dtype=object).cast(int)\n   calling my custom conversion...\n   0    1\n   1    0\n   dtype: object\n\nFinally, ``pdcast``'s powerful suite of function decorators allow users to\nwrite their own specialized extensions for existing pandas behavior:\n\n.. doctest::\n\n   \u003e\u003e\u003e @attachable\n   ... @dispatch(\"self\", \"other\")\n   ... def __add__(self, other):\n   ...     return getattr(self.__add__, \"original\", self.__add__)(other)\n\n   \u003e\u003e\u003e @__add__.overload(\"int\", \"int\")\n   ... def add_integer(self, other):\n   ...     return self - other\n\n   \u003e\u003e\u003e __add__.attach_to(pd.Series)\n   \u003e\u003e\u003e pd.Series([1, 2, 3]) + 1\n   0    0\n   1    1\n   2    2\n   dtype: int64\n   \u003e\u003e\u003e pd.Series([1, 2, 3]) + [1, True, 1.0]\n   0      0\n   1      3\n   2    4.0\n   dtype: object\n\nOr create entirely new attributes and methods above and beyond what pandas\nincludes by default.\n\n.. doctest::\n\n   \u003e\u003e\u003e @attachable\n   ... @dispatch(\"series\")\n   ... def bar(series):\n   ...     raise NotImplementedError(\"bar is only defined for floating point values\")\n\n   \u003e\u003e\u003e @bar.overload(\"float\")\n   ... def float_bar(series):\n   ...     print(\"Hello, World!\")\n   ...     return series\n\n   \u003e\u003e\u003e bar.attach_to(pd.Series, namespace=\"foo\", pattern=\"property\")\n   \u003e\u003e\u003e pd.Series([1.0, 2.0, 3.0]).foo.bar\n   Hello, World!\n   0    1.0\n   1    2.0\n   2    3.0\n   dtype: float64\n   \u003e\u003e\u003e pd.Series([1, 2, 3]).foo.bar\n   Traceback (most recent call last):\n      ...\n   NotImplementedError: bar is only defined for floating point values\n\n.. \n   Documentation\n   -------------\n   Detailed documentation is hosted on readthedocs.\n\nLicense\n-------\n``pdcast`` is available under an `MIT license\n\u003chttps://github.com/eerkela/pdcast/blob/main/LICENSE\u003e`_.\n\nContributing\n------------\n``pdcast`` is open-source and welcomes contributions.  For more information,\nplease contact the package maintainer or submit a pull request on\n`GitHub \u003chttps://github.com/eerkela/pdcast\u003e`_.\n\nContact\n-------\nThe package maintainer can be contacted via the\n`GitHub issue tracker \u003chttps://github.com/eerkela/pdcast/issues\u003e`_, or directly\nat eerkela42@gmail.com.\n\nRelated Projects\n----------------\n*  `pdlearn \u003chttps://github.com/eerkela/pdlearn\u003e`_ - AutoML integration for\n   pandas DataFrames using the ``pdcast`` type system.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Feerkela%2Fbertrand","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Feerkela%2Fbertrand","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Feerkela%2Fbertrand/lists"}