{"id":17291049,"url":"https://github.com/decorator-factory/py-simpleparser","last_synced_at":"2026-05-05T05:41:14.271Z","repository":{"id":131171090,"uuid":"603584596","full_name":"decorator-factory/py-simpleparser","owner":"decorator-factory","description":null,"archived":false,"fork":false,"pushed_at":"2024-02-17T12:43:44.000Z","size":41,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-03-24T00:37:47.120Z","etag":null,"topics":["philosophy","python","shitpost","simple"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"unlicense","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/decorator-factory.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2023-02-19T00:22:43.000Z","updated_at":"2024-02-17T12:41:48.000Z","dependencies_parsed_at":null,"dependency_job_id":"cea12357-eed6-4713-8a41-9f0b3c0f76e8","html_url":"https://github.com/decorator-factory/py-simpleparser","commit_stats":null,"previous_names":["decorator-factory/py-simpleparser"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/decorator-factory%2Fpy-simpleparser","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/decorator-factory%2Fpy-simpleparser/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/decorator-factory%2Fpy-simpleparser/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/decorator-factory%2Fpy-simpleparser/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/decorator-factory","download_url":"https://codeload.github.com/decorator-factory/py-simpleparser/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245717795,"owners_count":20661150,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["philosophy","python","shitpost","simple"],"created_at":"2024-10-15T10:39:42.204Z","updated_at":"2026-05-05T05:41:14.232Z","avatar_url":"https://github.com/decorator-factory.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003e I do not know with what weapons World War III will be fought, but World War IV will be fought with sticks and stones\n\u003e\n\u003e \u0026mdash; someone, probably\n\n# py-simpleparser\n\nThis is a post-modern Python library for parsing/validating unstructured data, such as JSON returned by an HTTP server or a YAML configuration.\n\n## Installation\n\n1. Make sure you're using Python \u003e= 3.9\n2. Copy the `simpleparser.py` file from this repository into your project\n\n## Philosophy and rationale\n\n:bulb: Make sure to read the tutorial first. But I'm not gonig to stop you :^)\n\n\u003cdetails\u003e\n  \u003csummary\u003eWhat gives?\u003c/summary\u003e\n\nThis library stems from my general dissatisfaction with popular existing Python solutions to the very common\nproblem of parsing unstructured data.\n\n- **Parsing type annotations** is... complicated. Python doesn't provide a nice framework to do that, and\n  it's generally a mess. How do you automatically generate a parser for a generic class, respecting its variance?\n  I'll sleep better at night without such knowledge.\n\n- ...but type checkers are kinda nice. Unfortunately, in Python there's no way to make a nice declarative tool like\n  [zod from TypeScript](https://zod.dev/) where types are inferred from the schema, not the other way around.\n\n- Implicit coercions. That's a bad default. The good default is rejecting invalid data.\n\n  \u003e I want a string, but you send me an integer. I am not going to guess what you meant,\n  \u003e there's something wrong on your side.\n\n  If you want the rules to be more relaxed, specify where and how lax you want to be explicitly.\n\n  \u003e Explicit is better than implicit.\n  \u003e\n  \u003e Errors should never pass silently.\n  \u003e\n  \u003e Unless explicitly silenced.\n\n- **Simple cases and complex cases**. It's easy to optimize for the simple case of needing to map a JSON with 5 fields\n  to a `dataclass` of 5 fields with the same names. However, the real world is often more complicated.\n\n  Data can be more complicated.\n\n  Maybe your data uses `camelCase` for names. Or maybe `PascalCase`. If it's using `PascalCase`, should\n  HTTPClient be `h_t_t_p_client` or `http_client`, and what about `IAmAMD`\n  ('I am a [M.D.](https://en.wikipedia.org/wiki/Doctor_of_Medicine)')?\n\n  The data might contain flat data that you want to be nested. It's pretty reasonable to group `{\"x\", \"y\"}` to a single `pos`\n  attribute. (Or vice versa --- flatten something that's nested in the raw representation)\n\n  There's no standard way to represent tagged unions (sum types, variant record, whatever) in JSON/YAML.\n  In fact, Telegram has at least two ways of doing so.\n  [Some developers](https://discord.com/developers/docs/resources/channel#message-object) apparently don't believe in\n  tagged unions, and instead model their data as a record with 30 optional fields :facepalm: . The Rust library [`serde`](https://serde.rs/enum-representations.html) has some solutions to this, but I haven't seen anything similar in Python.\n\n\nThis is the kind of philosphy I like:\n\n\u003e Here's the recipe to solve 90% of your problems. It'a a bit more wordy than just slapping on a decorator or\n\u003e inheriting from a base class, but it's simple code. If you want something more complicated, use the\n\u003e Turing-complete language we already have to express your custom bits.\n\nSo this project is not much of a library, it's mostly a suggestion to take an alternative approach to parsing\nuntrusted data using simpler tools that you already have.\n\n\u003c/details\u003e\n\n## Tutorial\n\nFor an introduction, we are going to implement a module that works with a small part of [Telegram's Bot API](https://core.telegram.org/bots/api), namely the [`Update` object](https://core.telegram.org/bots/api#update).\n\n### Our model\n\nFirst, we need to decide how to model this thing. For our humble bot, we will only need two update types:\n\n- `message`: \"New incoming message of any kind - text, photo, sticker, etc.\"\n- `edited_message`: \"New version of a message that is known to the bot and was edited\"\n\nWould this be a good model?\n```py\n@dataclass(frozen=True)\nclass Update:\n    update_id: int\n    message: Union[Message, None] = None\n    edited_message: Union[Message, None] = None\n```\nI don't think that's going to serve us well. It's going to be hard to work with, because there are\ninvalid and otherwise awkward states this `Update` can be in.\n\nI would use something like this as our model:\n```py\n@dataclass(frozen=True)\nclass NewMessage:\n    message: Message\n\n@dataclass(frozen=True)\nclass MessageEdited:\n    message: Message\n\n@dataclass(frozen=True)\nclass UnsupportedUpdate:\n    raw: object\n\nUpdateBody = Union[\n    NewMessage,\n    MessageEdited,\n    UnsupportedUpdate,\n]\n\n@dataclass(frozen=True)\nclass Update:\n    update_id: int\n    body: UpdateBody\n```\nThis describes our domain pretty well:\n\n- we don't support every possible update (hence `UnsupportedUpdate`)\n- there is exactly one \"event\" in an update\n\n### Parsing a `Message`\n\nFor now, we'll have a very simple model for a message, because we only need a few things from it:\n\n```py\nfrom __future__ import annotations\nfrom typing import Union\nfrom datetime import datetime\nfrom dataclasses import dataclass\n\n\n@dataclass(frozen=True)\nclass Message:\n    message_id: int\n    sent_at: datetime\n    author: Union[User, Chat]\n    text: Union[str, None] = None\n\n\n@dataclass(frozen=True)\nclass User:\n    user_id: int\n    first_name: str\n    username: Union[str, None] = None\n\n\n@dataclass(frozen=True)\nclass Chat:\n    chat_id: int\n    title: str\n```\nAnd here's how you parse a `Message`:\n```py\nfrom simpleparser import (\n    is_any_of,\n    is_int,\n    is_str,\n    has_field,\n    has_optional_field,\n    ParseError,\n    Verbose,\n)\n\n\ndef is_message(source: object) -\u003e Message:\n    return Message(\n        message_id=has_field(\"message_id\", is_int)(source),\n        sent_at=has_field(\"date\", _is_timestamp)(source),\n        author=is_any_of(\n            has_field(\"sender_chat\", _is_chat),\n            has_field(\"from\", _is_user),\n        )(source),\n        text=has_optional_field(\"text\", is_str)(source),\n    )\n\n\ndef _is_chat(source: object) -\u003e Chat:\n    return Chat(\n        chat_id=has_field(\"id\", is_int)(source),\n        title=is_any_of(has_field(\"title\", is_str))(source),\n    )\n\n\ndef _is_user(source: object) -\u003e User:\n    return User(\n        user_id=has_field(\"id\", is_int)(source),\n        first_name=has_field(\"first_name\", is_str)(source),\n        username=has_optional_field(\"username\", is_str)(source),\n    )\n\n\ndef _is_timestamp(source: object) -\u003e datetime:\n    timestamp = is_int(source)\n    try:\n        return datetime.fromtimestamp(timestamp)\n    except (ValueError, OverflowError):\n        raise ParseError(Verbose(\"Timestamp is too big\"))\n```\n\nLet's try our parser on some example messages.\n```\nmessage_from_chat = {\n    \"message_id\": 100,\n    \"date\": 1676769964,\n    \"sender_chat\": {\"id\": 666, \"title\": \"Some Chat\"},\n}\nprint(is_message(message_from_chat))\n\n\u003e\u003e\u003e Message(message_id=100, sent_at=datetime.datetime(2023, 2, 19, 4, 26, 4), author=Chat(chat_id=666, title='Some Chat'), text=None)\n```\n```\nmessage_from_user = {\n    \"message_id\": 25045,\n    \"date\": 1676769966,\n    \"from\": {\"id\": 11111, \"first_name\": \"Bob\"},\n    \"text\": \"Hello there!\",\n}\nprint(is_message(message_from_user))\n\n\u003e\u003e\u003e Message(message_id=25045, sent_at=datetime.datetime(2023, 2, 19, 4, 26, 6), author=User(user_id=11111, first_name='Bob', username=None), text='Hello there!')\n```\n```\nbad_message = {\n    \"message_id\": 25045,\n    \"date\": 1676769966,\n    \"from\": {\"id\": 11111, \"first_name\": 42},\n    \"text\": \"Hello there!\",\n}\nis_message(bad_message)\n\n...\nTraceback (most recent call last):\n  File \"/.../tutorial.py\", line 95, in \u003cmodule\u003e\n    is_message(bad_message)\n  File \"/.../tutorial.py\", line 43, in is_message\n    author=is_any_of(\n           ^^^^^^^^^^\n  File \"/.../simpleparser.py\", line 289, in _is_any_of\n    raise ParseError(MultipleErrors(tuple(errors)))\nsimpleparser.ParseError: all possibilities failed:\n    - at key 'sender_chat': Key 'sender_chat' not found\n    - at key 'from': at key 'first_name': expected a string, got \u003cclass 'int'\u003e\n```\n\n### Parsing the `UpdateBody`\n\n```py\nfrom simpleparser import map_parser, is_always\n\ndef is_update_body(source: object) -\u003e UpdateBody:\n    return is_any_of(\n        map_parser(NewMessage, has_field(\"message\", is_message)),\n        map_parser(MessageEdited, has_field(\"message_edited\", is_message)),\n        is_always(UnsupportedUpdate(source)),\n    )(source)\n```\nHm... actually, we're not doing anything with the source besides passing it to other parsers.\nLet's refactor our code slightly:\n```py\nfrom simpleparser import is_anything\n\nis_update_body = is_any_of(\n    map_parser(NewMessage, has_field(\"message\", is_message)),\n    map_parser(MessageEdited, has_field(\"message_edited\", is_message)),\n    map_parser(UnsupportedUpdate, is_anything),\n)\n```\n\n\u003cdetails\u003e\n  \u003csummary\u003eBetter error messages\u003c/summary\u003e\n\nThis `is_any_of` is useful when you have few options, but the error message will not be very clear\nwith 10 variants. We can give each \"branch\" a name:\n```py\nfrom simpleparser import is_any_of_described\n\nis_update_body = is_any_of_described(\n    (\n        \"New message\",\n        map_parser(NewMessage, has_field(\"message\", is_message)),\n    ),\n    (\n        \"Message edited\",\n        map_parser(MessageEdited, has_field(\"message_edited\", is_message)),\n    ),\n    (\n        \"Unsupported update\",\n        map_parser(UnsupportedUpdate, is_anything),\n    ),\n)\n```\n\n\u003c/details\u003e\n\n### Parsing the `Update`\n\n```py\ndef is_update(source: object) -\u003e Update:\n    return Update(\n        update_id=has_field(\"update_id\", is_int)(source),\n        body=is_update_body(source),\n    )\n```\n\n\u003cdetails\u003e\n  \u003csummary\u003eLet's see our parser in action:\u003c/summary\u003e\n\n```py\n\u003e\u003e\u003e is_update({\n...     \"update_id\": 257,\n...     \"message\": {\n...         \"message_id\": 100,\n...         \"date\": 1676769964,\n...         \"sender_chat\": {\"id\": 666, \"title\": \"Some Chat\"},\n...     },\n... })\n...\nUpdate(\n    update_id=257,\n    body=NewMessage(\n        message=Message(\n            message_id=100,\n            sent_at=datetime.datetime(2023, 2, 19, 4, 26, 4),\n            author=Chat(chat_id=666, title='Some Chat'),\n            text=None,\n        ),\n    ),\n)\n\n\u003e\u003e\u003e is_update({\n...     \"update_id\": 257,\n...     \"unknown_update\": {\n...         \"duckies\": 666,\n...     },\n... })\n...\nUpdate(update_id=258, body=UnsupportedUpdate(raw={'update_id': 258, 'unknown_update': {'duckies': 666}}))\n\n\u003e\u003e\u003e is_update({\"update_id\": \"yes!\"})\nTraceback (most recent call last):\n...\nsimpleparser.ParseError: at key 'update_id': expected integer, got \u003cclass 'str'\u003e\n```\n\n\u003c/details\u003e\n\n\n### Making our parser more robust\n\nWhat we ended up with isn't bad, but there are some issues, especially as we're going to scale\nto accept more updates:\n\n- **Performance.** The way `is_any_of` works is: it tries all the given options one by one\n  until it finds an option that matches. This makes it very flexible, but it also means\n  that if there are 100 options, the parser will potentially have to go through all\n  the 100 options on every message.\n\n  In our case, we can optimize this because we know what update we want to parse based\n  on the second key present in the `Update` object.\n\n- **Error handling and unknown updates.** What happens if Telegram gives us a `message_edited`\n  update with a body that doesn't match our expectations? Right now the parser will classify that\n  as an `UnsupportedUpdate`, and we'll probably ignore it. That's very bad! We want to get an\n  error in that case.\n\nHere's one way you can solve the second problem:\n\n```py\nfrom simpleparser import is_dict\n\ndef is_update_body(source: object) -\u003e UpdateBody:\n    raw_dict = is_dict(source)\n\n    if \"message\" in raw_dict:\n        return NewMessage(is_message(raw_dict[\"message\"]))\n    elif \"message_edited\" in raw_dict:\n        return MessageEdited(is_message(raw_dict[\"message_edited\"]))\n    else:\n        return UnsupportedUpdate(raw_dict)\n```\n\nThis is still not perfect, we're going to accept updates which have both a `message` and\n`message_edited`. And we're still have a time complexity of `O(update_kinds)`.\n\nWe can solve both of these problems with a dictionary lookup:\n\n```py\nfrom simpleparser import Expectation\n\n\n_known_events = {\n    \"message\": map_parser(NewMessage, is_message),\n    \"message_edited\": map_parser(MessageEdited, is_message),\n}\n\n\ndef is_update_body(source: object) -\u003e UpdateBody:\n    raw_dict = is_dict(source)\n    keys = raw_dict.keys() - {\"update_id\"}\n    if len(keys) != 1:\n        raise ParseError(Expectation(expected=\"one key\", actual=str(list(keys))))\n    [event_type] = keys\n\n    if event_type in _known_events:\n        return _known_events[event_type](raw_dict[event_type])\n    else:\n        return UnsupportedUpdate(raw_dict)\n```\n\n\u003cdetails\u003e\n  \u003csummary\u003e\n\n### Advanced topic: Error values\n\n  \u003c/summary\u003e\n\n### Error values\n\nDo we want to raise an exception on an invalid update from Telegram?\n\nWhen we poll Telegram, we must specify what update ID we want the updates to start with.\nWhen we get update `#100`, we tell Telegram to send updates starting with `#101` next time.\nSo our \"main loop\" will look something like this:\n\n```py\nlast_update = 0\n\nwhile True:\n    response = requests.get(f\"{api_root}/getUpdates\", query={\"offset\": last_update, \"timeout\": 2}).json()\n    if not response[\"ok\"]:\n        logger.error(f\"Oh no! We're not OK: {response!r}\")\n        time.sleep(5)\n        continue\n\n    raw_updates = response[\"result\"]\n    for raw_update in raw_updates:\n        try:\n            update = is_update(raw_update)\n        except ParseError as exc:\n            logger.error(\"Wow, telegram sent us something stupid. \", exc_info=exc)\n        else:\n            last_update = max(last_update, update.id + 1)\n            process_update(update)\n```\n\nDo you see the problem? If we get an invalid update, we ignore its ID! If that was the\nonly update in a while, on the next iteration we're going to ask for the same update, without a timeout.\nTelegram will be very mad and will put us in the dreaded [429 Jail](https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/429).\n\nAnother point is that we might want to still process updates that weren't quite right. Perhaps\nwe want to keep track of update statistics in `process_update`, or something else.\n\n```diff\n+ from simpleparser improt ErrorValue\n\n+ @dataclass(frozen=True)\n+ class InvalidUpdateReceived:\n+     error: ErrorValue\n+     raw: object\n\n  UpdateBody = Union[\n      NewMessage,\n      MessageEdited,\n      UnsupportedUpdate,\n+     InvalidUpdateReceived,\n  ]\n```\n\nAn `ErrorValue` is a representation of what exactly went wrong during parsing.\nIt contains some clue as to what went wrong and where.\n\n\u003cdetails\u003e\n  \u003csummary\u003eSource code for `ErrorValue`\u003c/summary\u003e\n\n```py\n@dataclass(frozen=True)\nclass Verbose:\n    message: str\n\n\n@dataclass(frozen=True)\nclass Expectation:\n    expected: str\n    actual: str\n\n\n@dataclass(frozen=True)\nclass MultipleErrors:\n    errors: tuple[ErrorValue, ...]\n\n    def __post_init__(self) -\u003e None:\n        if len(self.errors) \u003c 2:\n            raise RuntimeError(\"Expected at least two errors for `MultipleErrors`\")\n\n\n@dataclass(frozen=True)\nclass AtIndex:\n    index: int\n    error: ErrorValue\n\n\n@dataclass(frozen=True)\nclass AtKey:\n    key: str\n    error: ErrorValue\n\n\n@dataclass\nclass Note:\n    note: str\n    original: ErrorValue\n\n\nErrorValue = Union[\n    Verbose,\n    Expectation,\n    MultipleErrors,\n    AtIndex,\n    AtKey,\n    Note,\n]\n```\n\n\u003c/details\u003e\n\nHere's how we can adjust the `is_update_body` parser to accomodate this design:\n```py\n_known_events = {\n    \"message\": map_parser(NewMessage, is_message),\n    \"message_edited\": map_parser(MessageEdited, is_message),\n}\n\n\ndef is_update_body(source: object) -\u003e UpdateBody:\n    raw_dict = is_dict(source)\n    keys = raw_dict.keys() - {\"update_id\"}\n    if len(keys) != 1:\n        error = Expectation(expected=\"one key\", actual=str(list(keys)))\n        return InvalidUpdateReceived(error, source)\n    [event_type] = keys\n\n    event_payload = raw_dict[event_type]\n    if event_type in _known_events:\n        try:\n            return _known_events[event_type](event_payload)\n        except ParseError as exc:\n            return InvalidUpdateReceived(exc.error, event_payload)\n    else:\n        return UnsupportedUpdate(raw_dict)\n```\n\n\u003c/details\u003e\n\n### Conclusion\n\nA short recap on `simpleparser`:\n\n- A parser is a function that accepts an object and either returns its parsed version, or raises `ParseError`\n- To parse a dictionary with known fields, use `has_field`\n- If the field can be missing, use `has_optional_field` instead\n- To try several options in order, use `any_of`\n- To adjust the output of an already existing parser, use `map_parser`\n- To accept any object at all, use `is_anything`\n- If you don't see how to combine existing parsers together in a nice way, write your own from scratch.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdecorator-factory%2Fpy-simpleparser","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdecorator-factory%2Fpy-simpleparser","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdecorator-factory%2Fpy-simpleparser/lists"}