{"id":13769097,"url":"https://github.com/genomoncology/FuzzTypes","last_synced_at":"2025-05-11T01:31:40.158Z","repository":{"id":227158634,"uuid":"759087350","full_name":"genomoncology/FuzzTypes","owner":"genomoncology","description":"Pydantic extension for annotating autocorrecting fields.","archived":true,"fork":false,"pushed_at":"2024-06-20T19:10:55.000Z","size":368,"stargazers_count":219,"open_issues_count":0,"forks_count":4,"subscribers_count":5,"default_branch":"main","last_synced_at":"2025-03-20T19:24:00.452Z","etag":null,"topics":["data-cleaning","fuzzy-string-matching","named-entity-linking","pydantic"],"latest_commit_sha":null,"homepage":"https://genomoncology.com","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/genomoncology.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-02-17T18:44:02.000Z","updated_at":"2025-02-19T17:56:44.000Z","dependencies_parsed_at":"2024-04-22T16:32:06.181Z","dependency_job_id":"1eb5dafe-411f-481d-be1d-6768e4f6a00d","html_url":"https://github.com/genomoncology/FuzzTypes","commit_stats":null,"previous_names":["genomoncology/fuzztypes"],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/genomoncology%2FFuzzTypes","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/genomoncology%2FFuzzTypes/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/genomoncology%2FFuzzTypes/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/genomoncology%2FFuzzTypes/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/genomoncology","download_url":"https://codeload.github.com/genomoncology/FuzzTypes/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253504544,"owners_count":21918827,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-cleaning","fuzzy-string-matching","named-entity-linking","pydantic"],"created_at":"2024-08-03T17:00:17.204Z","updated_at":"2025-05-11T01:31:39.850Z","avatar_url":"https://github.com/genomoncology.png","language":"Python","funding_links":[],"categories":["Python Libraries","Learning"],"sub_categories":["Repositories"],"readme":"# FuzzTypes\n\nFuzzTypes is a set of \"autocorrecting\" annotation types that expands\nupon [Pydantic](https://github.com/pydantic/pydantic)'s included [data\nconversions.](https://docs.pydantic.dev/latest/concepts/conversion_table/)\nDesigned for simplicity, it provides powerful normalization capabilities\n(e.g. named entity linking) to ensure structured data is composed of\n\"smart things\" not \"dumb strings\".\n\n\n## Getting Started\n\nPydantic supports basic conversion of data between types. For instance:\n\n```python\nfrom pydantic import BaseModel\n\nclass Normal(BaseModel):\n    boolean: bool\n    float: float\n    integer: int\n    \nobj = Normal(\n    boolean='yes',\n    float='2',\n    integer='3',\n)\nassert obj.boolean is True\nassert obj.float == 2.0\nassert obj.integer == 3\n```\n\nFuzzTypes expands on the standard data conversions handled by Pydantic and\nprovides a variety of autocorrecting annotation types. \n\n```python\nfrom datetime import datetime\nfrom typing import Annotated\n\nfrom pydantic import BaseModel\n\nfrom fuzztypes import (\n    ASCII,\n    Datetime,\n    Email,\n    Fuzzmoji,\n    InMemoryValidator,\n    Integer,\n    Person,\n    RegexValidator,\n    ZipCode,\n    flags,\n)\n\n# define a source, see EntitySource for using TSV, CSV, JSONL\ninventors = [\"Ada Lovelace\", \"Alan Turing\", \"Claude Shannon\"]\n\n# define a in memory validator with fuzz search enabled.\nInventor = Annotated[\n    str, InMemoryValidator(inventors, search_flag=flags.FuzzSearch)\n]\n\n# custom Regex type for finding twitter handles.\nHandle = Annotated[\n    str, RegexValidator(r\"@\\w{1,15}\", examples=[\"@genomoncology\"])\n]\n\n# define a Pydantic class with 9 fuzzy type attributes\nclass Fuzzy(BaseModel):\n    ascii: ASCII\n    email: Email\n    emoji: Fuzzmoji\n    handle: Handle\n    integer: Integer\n    inventor: Inventor\n    person: Person\n    time: Datetime\n    zipcode: ZipCode\n\n# create an instance of class Fuzzy\nobj = Fuzzy(\n    ascii=\"άνθρωπος\",\n    email=\"John Doe \u003cjdoe@example.com\u003e\",\n    emoji='thought bubble',\n    handle='Ian Maurer (@imaurer)',\n    integer='fifty-five',\n    inventor='ada luvlace',\n    person='mr. arthur herbert fonzarelli (fonzie)',\n    time='5am on Jan 1, 2025',\n    zipcode=\"(Zipcode: 12345-6789)\",\n)\n\n# test the autocorrecting performed\n\n# greek for man: https://en.wiktionary.org/wiki/άνθρωπος\nassert obj.ascii == \"anthropos\"\n\n# extract email via regular expression\nassert obj.email == \"jdoe@example.com\"\n\n# fuzzy match \"thought bubble\" to \"thought balloon\" emoji\nassert obj.emoji == \"💭\"\n\n# simple, inline regex example (see above Handle type)\nassert obj.handle == \"@imaurer\"\n\n# convert integer word phrase to integer value\nassert obj.integer == 55\n\n# case-insensitive fuzzy match on lowercase, misspelled name\nassert obj.inventor == \"Ada Lovelace\"\n\n# human name parser (title, first, middle, last, suffix, nickname)\nassert str(obj.person) == \"Mr. Arthur H. Fonzarelli (fonzie)\"\nassert obj.person.short_name == \"Arthur Fonzarelli\"\nassert obj.person.nickname == \"fonzie\"\nassert obj.person.last == \"Fonzarelli\"\n\n# convert time phrase to datetime object\nassert obj.time.isoformat() == \"2025-01-01T05:00:00\"\n\n# extract zip5 or zip9 formats using regular expressions\nassert obj.zipcode == \"12345-6789\"\n\n# print JSON on success\nassert obj.model_dump() == {\n    \"ascii\": \"anthropos\",\n    \"email\": \"jdoe@example.com\",\n    \"emoji\": \"💭\",\n    \"handle\": \"@imaurer\",\n    \"integer\": 55,\n    \"inventor\": \"Ada Lovelace\",\n    \"person\": {\n        \"first\": \"Arthur\",\n        \"init_format\": \"{first} {middle} {last}\",\n        \"last\": \"Fonzarelli\",\n        \"middle\": \"H.\",\n        \"name_format\": \"{title} {first} {middle} {last} {suffix} \"\n        \"({nickname})\",\n        \"nickname\": \"fonzie\",\n        \"suffix\": \"\",\n        \"title\": \"Mr.\",\n    },\n    \"time\": datetime(2025, 1, 1, 5),\n    \"zipcode\": \"12345-6789\",\n}\n```\n\n## Installation\n\nAvailable on [PyPI](https://pypi.org/project/FuzzTypes/):\n\n```bash\npip install fuzztypes\n```\n\nTo install all dependencies (see below), you can copy and paste this:\n\n```bash\npip install anyascii dateparser emoji lancedb nameparser number-parser rapidfuzz sentence-transformers tantivy\n```\n\n\n## Google Colab Notebook\n\nThere is a read-only notebook that you can copy and edit to try out FuzzTypes:\n\n[https://colab.research.google.com/drive/1GNngxcTUXpWDqK_qNsJoP2NhSN9vKCzZ?usp=sharing](https://colab.research.google.com/drive/1GNngxcTUXpWDqK_qNsJoP2NhSN9vKCzZ?usp=sharing)\n\n\n## Base Validators\n\nBase validators are the building blocks of FuzzTypes that can be used for creating custom \"usable types\".\n\n| Type                | Description                                                                                 |\n|---------------------|---------------------------------------------------------------------------------------------|\n| `DateType`          | Base date type, pass in arguments such as `date_order`, `strict` and `relative_base`.       |\n| `FuzzValidator`     | Validator class that calls a provided function and handles core and json schema config.     |\n| `InMemoryValidator` | Enables matching entities in memory using exact, alias, fuzzy, or semantic search.          |\n| `OnDiskValidator`   | Performs matching entities stored on disk using exact, alias, fuzzy, or semantic search.    |\n| `RegexValidator`    | Regular expression pattern matching base validator.                                         |\n| `DatetimeType`      | Base datetime type, pass in arguments such as `date_order`, `timezone` and `relative_base`. |\n\nThese base types offer flexibility and extensibility, enabling you to create custom annotation types that suit your\nspecific data validation and normalization requirements.\n\n\n## Usable Types\n\nUsable types are pre-built annotation types in FuzzTypes that can be directly used in Pydantic models. They provide\nconvenient and ready-to-use functionality for common data types and scenarios.\n\n| Type           | Description                                                                               |\n|----------------|-------------------------------------------------------------------------------------------|\n| `ASCII`        | Converts Unicode strings to ASCII equivalents using either `anyascii` or `unidecode`.     |\n| `Date`         | Converts date strings to `date` objects using `dateparser`.                               |\n| `Email`        | Extracts email addresses from strings using a regular expression.                         |\n| `Emoji`        | Matches emojis based on Unicode Consortium aliases using the `emoji` library.             |\n| `Fuzzmoji`     | Matches emojis using fuzzy string matching against aliases.                               |\n| `Integer`      | Converts numeric strings or words to integers using `number-parser`.                      |\n| `LanguageCode` | Resolves language to ISO language codes (e.g., \"en\").                                     |\n| `LanguageName` | Resolves language to ISO language names (e.g., \"English\").                                |\n| `Language`     | Resolves language to ISO language object (name, alpha_2, alpha_3, scope, type, etc.).     |\n| `Person`       | Parses person names into subfields (e.g., first, last, suffix) using `python-nameparser`. |\n| `SSN`          | Extracts U.S. Social Security Numbers from strings using a regular expression.            |\n| `Time`         | Converts datetime strings to `datetime` objects using `dateparser`.                       |\n| `Vibemoji`     | Matches emojis using semantic similarity against aliases.                                 |\n| `Zipcode`      | Extracts U.S. ZIP codes (5 or 9 digits) from strings using a regular expression.          |\n\nThese usable types provide a wide range of commonly needed data validations and transformations, making it\neasier to work with various data formats and perform tasks like parsing, extraction, and matching.\n\n\n## InMemoryValidator and OnDiskValidator Configuration\n\nThe InMemory and OnDisk Validator objects work with lists of Entities.\n\nThe following table describes the available configuration options:\n\n| Argument          | Type                                    | Default               | Description                                                                                                                                                                                                                                                                                                                             |\n|-------------------|-----------------------------------------|-----------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| `case_sensitive`  | `bool`                                  | `False`               | If `True`, matches are case-sensitive. If `False`, matches are case-insensitive.                                                                                                                                                                                                                                                        |\n| `device`          | `Literal[\"cpu\", \"cuda\", \"mps\"]`         | `\"cpu\"`               | The device to use for generating semantic embeddings and LanceDB indexing. Available options are \"cpu\", \"cuda\" (for NVIDIA GPUs), and \"mps\" (for Apple's Metal Performance Shaders).                                                                                                                                                    |\n| `encoder`         | `Union[Callable, str, Any]`             | `None`                | The encoder to use for generating semantic embeddings. It can be a callable function, a string specifying the name or path of a pre-trained model, or any other object that implements the encoding functionality.                                                                                                                      |\n| `examples`        | `List[Any]`                             | `None`                | A list of example values to be used in schema generation. These examples are included in the generated JSON schema to provide guidance on the expected format of the input values.                                                                                                                                                      |\n| `fuzz_scorer`     | `Literal[\"token_sort_ratio\", ...]`      | `\"token_sort_ratio\"`  | The scoring algorithm to use for fuzzy string matching. Available options include \"token_sort_ratio\", \"ratio\", \"partial_ratio\", \"token_set_ratio\", \"partial_token_set_ratio\", \"token_ratio\", \"partial_token_ratio\", \"WRatio\", and \"QRatio\". Each algorithm has its own characteristics and trade-offs between accuracy and performance. |\n| `limit`           | `int`                                   | `10`                  | The maximum number of matches to return when performing fuzzy or semantic searches.                                                                                                                                                                                                                                                     |\n| `min_similarity`  | `float`                                 | `80.0`                | The minimum similarity score required for a match to be considered valid. Matches with a similarity score below this threshold will be discarded.                                                                                                                                                                                       |\n| `notfound_mode`   | `Literal[\"raise\", \"none\", \"allow\"]`     | `\"raise\"`             | The action to take when a matching entity is not found. Available options are \"raise\" (raises an exception), \"none\" (returns `None`), and \"allow\" (returns the input key as the value).                                                                                                                                                 |\n| `search_flag`     | `flags.SearchFlag`                      | `flags.DefaultSearch` | The search strategy to use for finding matches. It is a combination of flags that determine which fields of the `NamedEntity` are considered for matching and whether fuzzy or semantic search is enabled. Available options are defined in the `flags` module.                                                                         |\n| `tiebreaker_mode` | `Literal[\"raise\", \"lesser\", \"greater\"]` | `\"raise\"`             | The strategy to use for resolving ties when multiple matches have the same similarity score. Available options are \"raise\" (raises an exception), \"lesser\" (returns the match with the lower value), and \"greater\" (returns the match with the greater value).                                                                          |\n\n\n## Lazy Dependencies\n\nFuzzTypes leverages several powerful libraries to extend its functionality.\n\nThese dependencies are not installed by default with FuzzTypes to keep the\ninstallation lightweight. Instead, they are optional and can be installed\nas needed depending on which types you use.\n\nBelow is a list of these dependencies, including their licenses, purpose, and what\nspecific Types require them.\n\nRight now, you must pip install the modules directly, in the future you will \nbe able to install them automatically as part of the main install using pip extras.\n\nTo install all dependencies, you can copy and paste this:\n\n```bash\npip install anyascii dateparser emoji lancedb nameparser number-parser rapidfuzz sentence-transformers tantivy\n```\n\n\n| Fuzz Type         | Library                                                                  | License    | Purpose                                                    |\n|-------------------|--------------------------------------------------------------------------|------------|------------------------------------------------------------|\n| ASCII             | [anyascii](https://github.com/anyascii/anyascii)                         | ISC        | Converting Unicode into ASCII equivalents (not GPL)        |\n| ASCII             | [unidecode](https://github.com/avian2/unidecode)                         | GPL        | Converting Unicode into ASCII equivalents (better quality) |\n| Date              | [dateparser](https://github.com/scrapinghub/dateparser)                  | BSD-3      | Parsing dates from strings                                 |\n| Emoji             | [emoji](https://github.com/carpedm20/emoji/)                             | BSD        | Handling and manipulating emoji characters                 |\n| Fuzz              | [rapidfuzz](https://github.com/rapidfuzz/RapidFuzz)                      | MIT        | Performing fuzzy string matching                           |\n| InMemoryValidator | [numpy](https://numpy.org/)                                              | BSD        | Numerical computing in Python                              |\n| InMemoryValidator | [scikit-learn](https://scikit-learn.org/)                                | BSD        | Machine learning in Python                                 |\n| InMemoryValidator | [sentence-transformers](https://github.com/UKPLab/sentence-transformers) | Apache-2.0 | Encoding sentences into high-dimensional vectors           |\n| Integer           | [number-parser](https://github.com/scrapinghub/number-parser)            | BSD-3      | Parsing numbers from strings                               |\n| OnDiskValidator   | [lancedb](https://github.com/lancedb/lancedb)                            | Apache-2.0 | High-performance, on-disk vector database                  |\n| OnDiskValidator   | [pyarrow](https://github.com/apache/arrow)                               | Apache-2.0 | In-memory columnar data format and processing library      |\n| OnDiskValidator   | [sentence-transformers](https://github.com/UKPLab/sentence-transformers) | Apache-2.0 | Encoding sentences into high-dimensional vectors           |\n| OnDiskValidator   | [tantivy](https://github.com/quickwit-oss/tantivy-py)                    | MIT        | Full-text search (FTS) for LanceDB.                        |\n| Person            | [nameparser](https://github.com/derek73/python-nameparser)               | LGPL       | Parsing person names                                       |\n\n\n## Maintainer\n\nFuzzTypes was created by [Ian Maurer](https://x.com/imaurer), the CTO of [GenomOncology](https://genomoncology.com).\n\nThis MIT-based open-source project was extracted from our product which includes the ability to normalize biomedical\ndata for use in precision oncology clinical decision support systems. Contact me to learn more about our product\nofferings.\n\n\n| Type           | Description                                                                               |\n|----------------|-------------------------------------------------------------------------------------------|\n| `AirportCode`  | Represents airport codes (e.g., \"ORD\").                                                   |\n| `Airport`      | Represents airport names (e.g., \"O'Hare International Airport\").                          |\n| `CountryCode`  | Represents ISO country codes (e.g., \"US\").                                                |\n| `Country`      | Represents country names (e.g., \"United States\").                                         |\n| `Currency`     | Represents currency codes (e.g., \"USD\").                                                  |\n| `Quantity`     | Converts strings to `Quantity` objects with value and unit using `pint`.                  |\n| `URL`          | Represents normalized URLs with tracking parameters removed using `url-normalize`.        |\n| `USStateCode`  | Represents U.S. state codes (e.g., \"CA\").                                                 |\n| `USState`      | Represents U.S. state names (e.g., \"California\").                                         |\n\n\n## Structured Data Generation via LLM Function Calling and Custom GPT Actions\n\nSeveral libraries (e.g. [Instructor](https://github.com/jxnl/instructor),\n[Outlines](https://github.com/outlines-dev/outlines),\n[Marvin](https://github.com/prefecthq/marvin)) use Pydantic to define models for structured data generation\nusing Large Language Models (LLMs) via function calling or a grammar/regex\nbased sampling approach based on the [JSON schema generated by Pydantic](https://docs.pydantic.dev/latest/concepts/json_schema/).\n\nThis approach allows for the enumeration of allowed values using\nPython's `Literal`, `Enum` or JSON Schema's `examples` field directly\nin your Pydantic class declaration which is used by the LLM to\ngenerate valid values. This approach works exceptionally well for\nlow-cardinality (not many unique allowed values) such as the world's\ncontinents (7 in total).\n\nThis approach, however, doesn't scale well for high-cardinality (many unique\nallowed values) such as the number of known human genomic variants (~325M).\nWhere exactly the cutoff is between \"low\" and \"high\" cardinality is an exercise\nleft to the reader and their use case.\n\nThat's where FuzzTypes come in. The allowed values are managed by the FuzzTypes\nannotations and the values are resolved during the Pydantic validation process.\nThis can include fuzzy and semantic searching that throws an exception if the\nprovided value doesn't meet a minimum similarity threshold defined by the\ndeveloper.\n\nErrors discovered via Pydantic can be caught and resubmitted to the LLM for\ncorrection. The error will contain examples, expected patterns, and closest\nmatches to help steer the LLM to provide a better informed guess.\n\n\n## Creating Custom Types\n\nFuzzTypes provides a set of base types that you can use to create\nyour own custom annotation types. These base types offer different\ncapabilities and can be extended to suit your specific data validation\nand normalization needs.\n\n### EntitySource\n\nFuzzTypes provides the `EntitySource` class to manage and load\nentity data from various sources. It supports JSON Lines (`.jsonl`),\nCSV (`.csv`), TSV (`.tsv`), and Text (`.txt`) formats, as well as\nloading entities from a callable function.\n\nExample:\n```python\nfrom pathlib import Path\nfrom fuzztypes import EntitySource, NamedEntity\n\n# Load entities from a CSV file\nfruit_source = EntitySource(Path(\"path/to/fruits.csv\"))\n\n# Load entities from a callable function\ndef load_animals():\n    return [\n        NamedEntity(value=\"Dog\", aliases=[\"Canine\"]),\n        NamedEntity(value=\"Cat\", aliases=[\"Feline\"]),\n    ]\n\nanimal_source = EntitySource(load_animals)\n```\n\n### InMemoryValidator Base Type\n\nThe `InMemoryValidator` base type enables matching entities in memory using\nexact, alias, fuzzy, or semantic search. It is suitable for small\nto medium-sized datasets that can fit in memory and provides fast\nmatching capabilities.\n\nExample:\n```python\nfrom typing import Annotated\nfrom pydantic import BaseModel\nfrom fuzztypes import InMemoryValidator, flags\n\n# Create a custom annotation type for matching fruits\nfruits = [\"Apple\", \"Banana\", \"Orange\"]\nFruit = Annotated[\n    str, InMemoryValidator(fruits, search_flag=flags.FuzzSearch)\n]\n\nclass MyModel(BaseModel):\n    fruit: Fruit\n\nmodel = MyModel(fruit=\"appel\")\nassert model.fruit == \"Apple\"\n```\n\n### OnDiskValidator Base Type\n\nThe `OnDiskValidator` base type performs matching entities stored on disk\nusing exact, alias, fuzzy, or semantic search. It leverages the\nLanceDB library for efficient storage and retrieval of entities.\n`OnDiskValidator` is recommended for large datasets that cannot fit in memory.\n\nExample:\n```python\nfrom typing import Annotated\nfrom pydantic import BaseModel\nfrom fuzztypes import OnDiskValidator\n\n# Create a custom annotation type for matching countries stored on disk\ncountries = [\n    (\"United States\", \"US\"),\n    (\"United Kingdom\", \"UK\"),\n    (\"Canada\", \"CA\"),\n]\nCountry = Annotated[str, OnDiskValidator(\"Country\", countries)]\n\nclass MyModel(BaseModel):\n    country: Country\n\nassert MyModel(country=\"Canada\").country == \"Canada\"\nassert MyModel(country=\"US\").country == \"United States\"\n```\n\n### DateType and TimeType\n\nThe `DateValidator` and `DatetimeValidator` base types provide fuzzy parsing\ncapabilities for date and datetime objects, respectively. They allow\nyou to define flexible date and time formats and perform parsing\nbased on specified settings such as date order, timezone, and\nrelative base.\n\nExample:\n\n```python\nfrom datetime import date, datetime\nfrom pydantic import BaseModel\nfrom typing import Annotated\nfrom fuzztypes import DateValidator, DatetimeValidator\n\nMyDate = Annotated[date, DateValidator(date_order=\"MDY\")]\nMyTime = Annotated[datetime, DatetimeValidator(timezone=\"UTC\")]\n\nclass MyModel(BaseModel):\n    date: MyDate\n    time: MyTime\n\nmodel = MyModel(date=\"1/1/2023\", time=\"1/1/23 at 10:30 PM\")\nassert model.date.isoformat() == \"2023-01-01\"\nassert model.time.isoformat() == \"2023-01-01T22:30:00+00:00\"\n```\n\n\n### FuzzValidator\n\nThe `FuzzValidator` is the base of the fuzztypes typing system.\nIt can be used directly to wrap any python function.\n\nExample:\n```python\nfrom typing import Annotated\nfrom pydantic import BaseModel\nfrom fuzztypes import FuzzValidator\n\n# Create a custom annotation type that converts a value to uppercase\nUpperCase = Annotated[str, FuzzValidator(str.upper)]\n\nclass MyModel(BaseModel):\n    name: UpperCase\n\nmodel = MyModel(name=\"john\")\nassert model.name == \"JOHN\"\n```\n\n\n### Regex\n\nThe `Regex` base type allows matching values using a regular\nexpression pattern. It is useful for creating annotation types that\nvalidate and extract specific patterns from input values.\n\nExample:\n```python\nfrom typing import Annotated\nfrom pydantic import BaseModel\nfrom fuzztypes import RegexValidator\n\n# Create a custom annotation type for matching email addresses\nIPAddress = Annotated[\n    str, RegexValidator(r\"(?:[0-9]{1,3}\\.){3}[0-9]{1,3}$\")\n]\n\nclass MyModel(BaseModel):\n    ip_address: IPAddress\n\nmodel = MyModel(ip_address=\"My internet IP address is 192.168.127.12\")\nassert model.ip_address == \"192.168.127.12\"\n```\n\n### Languages\n\nLanguages are loaded from the [Debian iso-codes](https://salsa.debian.org/iso-codes-team/iso-codes/) project.\n\nLanguages are resolved using their preferred, common, inverted, bibliographic name, or 2 or 3 letter alpha code.\n\nLanguages can be included as a string name (LanguageName), string code (LanguageCode) or full language object.\n\nThe preferred code is the 2 letter version and will be used if available. Otherwise, the 3 letter alpha code is used.\n\nExample:\n\n```python\nfrom pydantic import BaseModel\nfrom fuzztypes import (\n    Language,\n    LanguageName,\n    LanguageCode,\n    LanguageScope,\n    LanguageType,\n    LanguageNamedEntity,\n    validate_python,\n)\nclass Model(BaseModel):\n    language_code: LanguageCode\n    language_name: LanguageName\n    language: Language\n\n# Test that Language resolves to the complete language object\ndata = dict(language_code=\"en\", language=\"English\", language_name=\"ENG\")\nobj = validate_python(Model, data)\nassert obj.language_code == \"en\"\nassert obj.language_name == \"English\"\nassert obj.language.scope == LanguageScope.INDIVIDUAL\nassert obj.language.type == LanguageType.LIVING\nassert isinstance(obj.language, LanguageNamedEntity)\nassert obj.model_dump(exclude_defaults=True, mode=\"json\") == {\n    \"language\": {\n        \"aliases\": [\"en\", \"eng\"],\n        \"alpha_2\": \"en\",\n        \"alpha_3\": \"eng\",\n        \"scope\": \"I\",\n        \"type\": \"L\",\n        \"value\": \"English\",\n    },\n    \"language_code\": \"en\",\n    \"language_name\": \"English\",\n}\n```\n\n### Validate Python and JSON functions\n\nFunctional approach to validating python and json are available.\nBelow are examples for the `validate_python` and `validate_json` functions:\n\n```python\nfrom pydantic import BaseModel\nfrom fuzztypes import validate_python, validate_json, Integer, Date\n\n# validate python\nassert validate_python(Integer, \"two hundred\") == 200\n\n# validate json\nclass MyModel(BaseModel):\n    date: Date\n\njson = '{\"date\": \"July 4th 2021\"}'\nobj = validate_json(MyModel, json)\nassert obj.date.isoformat() == \"2021-07-04\"\n```\n\n### Resolve Entities from FuzzValidator or Annotation\n\nEntities can be resolved from the `FuzzValidator` validators such as InMemoryValidator\nor OnDiskValidator or the defined `Annotation` type using the `resolve_entity` function:\n\n```python\nfrom typing import Annotated\nfrom fuzztypes import resolve_entity, InMemoryValidator\n\nelements = [\"earth\", \"fire\", \"water\", \"air\"]\nElementValidator = InMemoryValidator(elements)\nElement = Annotated[str, ElementValidator]\n\nassert resolve_entity(ElementValidator, \"EARTH\").model_dump() == {\n    \"aliases\": [],\n    \"label\": None,\n    \"meta\": None,\n    \"priority\": None,\n    \"value\": \"earth\",\n}\n\nassert resolve_entity(Element, \"Air\").model_dump(\n    exclude_defaults=True\n) == {\"value\": \"air\"}\n```","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgenomoncology%2FFuzzTypes","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgenomoncology%2FFuzzTypes","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgenomoncology%2FFuzzTypes/lists"}