{"id":39132073,"url":"https://github.com/y-scope/clp-ffi-py","last_synced_at":"2026-01-17T21:18:53.909Z","repository":{"id":174780808,"uuid":"645082594","full_name":"y-scope/clp-ffi-py","owner":"y-scope","description":"clp-ffi-py is a Python library to encode log messages with CLP, and work with the encoded messages using a foreign function interface (FFI).","archived":false,"fork":false,"pushed_at":"2025-10-21T18:59:54.000Z","size":426,"stargazers_count":11,"open_issues_count":5,"forks_count":6,"subscribers_count":5,"default_branch":"main","last_synced_at":"2025-11-27T13:42:11.479Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/y-scope.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-05-24T22:15:49.000Z","updated_at":"2025-10-21T18:59:59.000Z","dependencies_parsed_at":null,"dependency_job_id":"2d632f54-fb27-45cc-a7a1-c9c5c96f992c","html_url":"https://github.com/y-scope/clp-ffi-py","commit_stats":null,"previous_names":["y-scope/clp-ffi-py"],"tags_count":14,"template":false,"template_full_name":null,"purl":"pkg:github/y-scope/clp-ffi-py","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/y-scope%2Fclp-ffi-py","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/y-scope%2Fclp-ffi-py/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/y-scope%2Fclp-ffi-py/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/y-scope%2Fclp-ffi-py/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/y-scope","download_url":"https://codeload.github.com/y-scope/clp-ffi-py/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/y-scope%2Fclp-ffi-py/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28518618,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-17T18:55:29.170Z","status":"ssl_error","status_checked_at":"2026-01-17T18:55:03.375Z","response_time":85,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-01-17T21:18:52.796Z","updated_at":"2026-01-17T21:18:53.900Z","avatar_url":"https://github.com/y-scope.png","language":"C++","funding_links":[],"categories":[],"sub_categories":[],"readme":"# clp-ffi-py\n\n[![PyPI platforms][badge_pypi]][7]\n[![Build status][badge_build_status]][clp_ffi_py_gh_actions]\n[![Downloads][badge_total_downloads]][pepy/clp_ffi_py]\n[![Downloads][badge_monthly_downloads]][pepy/clp_ffi_py]\n\nThis module provides Python packages to interface with [CLP Core Features][1]\nthrough CLP's FFI (foreign function interface). At present, this library\nsupplies built-in functions for serializing/deserializing log messages using [CLP][2].\n\n## Quick Start\n\n### Install with `pip`:\n\n```bash\n# Install the latest version\npython3 -m pip install --upgrade clp-ffi-py\n```\n\nNote:\n\n- Python 3.7 or higher is required.\n- Tested on Linux, macOS and Windows.\n\nTo install an older version or download the prebuilt `whl` package, check the\nproject homepage on PyPI [here][7].\n\n## Compatibility\n\nTested on Python 3.7, 3.8, 3.11, 3.12, and 3.13, and it should work on any Python\nversion \u003e= 3.7.\n\n## API Reference\n\nThe API reference for this library can be found on our [docs hub][10].\n\n## Building/Packaging\n\nTo manually build a package for distribution, follow the steps below.\n\n### Requirements\n\n* A C++ compiler that supports C++20 and `std::span`, e.g:\n  * `clang++` \u003e= 7\n  * `g++` \u003e= 10\n  * `MSVC` \u003e= 1930 (included in Visual Studio 2022)\n* python3\n* python3-dev\n* python3-venv\n* [Task][9] \u003e= 3.38.0\n\n### Set up\n* Initialize and update yscope-dev-utils submodules:\n  ```shell\n  git submodule update --init --recursive tools/yscope-dev-utils\n  ```\n\n### Build commands\n\n* Build a Python wheel incrementally:\n\n  ```bash\n  task\n  ```\n  The command above will generate both a `.tar.gz` and `.whl` package under\n  `./build/dist/`.\n\n* Clean up the build:\n\n  ```bash\n  task clean\n  ```\n\n## Using Key-Value Pair IR Streams\nThe CLP key-value pair IR stream, introduced in version 0.0.14, is a new IR stream format that\nenables efficient serialization of key-value pair (kv-pair) log events.\n\nWe categorize the kv-pairs of a log event into two categories:\n\n- **Auto-generated kv-pairs**: KV-pairs (e.g., timestamps, log levels, other metadata) that are\n  automatically generated by the logging library.\n- **User-generated kv-pairs**: Custom kv-pairs (e.g., log messages).\n\n### Requirements\nThe serialization interface requires that kv-pairs are passed as [MessagePack][msgpack]-encoded\n**Map** objects, where keys and values are restricted to the following MessagePack types described\nbelow.\n\n#### Supported key types\nKeys must be UTF-8-encoded strings.\n\n#### Supported value types\nValues must be one of the following MessagePack-types:\n\n- Primitives:\n  - Integer\n  - Float\n  - String\n  - Boolean\n  - Null\n- Maps with keys and values that have the same supported types described here.\n- Arrays containing a sequence of supported primitives, arrays, or maps.\n\n#### Unsupported value types\nMessagePack's `Binary` and `Extension` types are not supported.\n\n### Example Code: Using `Serializer` to serialize key-value pair log events into an IR stream\n```python\nfrom clp_ffi_py.ir import Serializer\nfrom clp_ffi_py.utils import serialize_dict_to_msgpack\n\nwith open(\"example.clp\", \"wb\") as ir_stream, Serializer(ir_stream) as serializer:\n    serializer.serialize_log_event_from_msgpack_map(\n        auto_gen_msgpack_map=serialize_dict_to_msgpack({\"level\": \"INFO\"}),\n        user_gen_msgpack_map=serialize_dict_to_msgpack({\"message\": \"Service started.\"}),\n    )\n    serializer.serialize_log_event_from_msgpack_map(\n        auto_gen_msgpack_map=serialize_dict_to_msgpack({\"level\": \"WARN\"}),\n        user_gen_msgpack_map=serialize_dict_to_msgpack({\"uid\": 12345, \"ip\": \"127.0.0.1\"}),\n    )\n```\n\n`clp_ffi_py.utils.serialize_dict_to_msgpack` can be used to serialize a Python dictionary object\ninto a MessagePack object.\n\n### Example Code: Using `Deserializer` to read `KeyValuePairLogEvent`s from an IR stream\n```python\nfrom clp_ffi_py.ir import Deserializer, KeyValuePairLogEvent\nfrom typing import Optional\n\nwith open(\"example.clp\", \"rb\") as ir_stream:\n    deserializer = Deserializer(ir_stream)\n    while True:\n        log_event: Optional[KeyValuePairLogEvent] = deserializer.deserialize_log_event()\n        if log_event is None:\n            # The entire stream has been consumed\n            break\n        auto_gen_kv_pairs, user_gen_kv_pairs = log_event.to_dict()\n        print(auto_gen_kv_pairs)\n        print(user_gen_kv_pairs)\n```\n\n- `Deserializer.deserialize_log_event` can be used to read from the IR stream and output\n  `KeyValuePairLogEvent` objects.\n- `KeyValuePairLogEvent.to_dict` can be used to convert the underlying deserialized results into\n  Python dictionaries.\n\n\u003e [!IMPORTANT]\n\u003e The current `Deserializer` does not support reading the previous IR stream format. Backward\n\u003e compatibility will be added in future releases.\n\n## CLP IR Readers\n\nCLP IR Readers provide a convenient interface for CLP IR deserialization and search\nmethods.\n\n\u003e [!IMPORTANT]\n\u003e The readers below do not support reading or searching CLP *key-value pair IR streams*.\n\n### ClpIrStreamReader\n\n- Read+deserialize any arbitrary CLP IR stream (as an instance of `IO[bytes]`).\n- Can be used as an iterator that returns each log event as a `LogEvent` object.\n- Can search target log events by giving a search query:\n  - Searching log events within a certain time range.\n  - Searching log messages that match certain wildcard queries.\n\n### ClpIrFileReader\n\n- Simple wrapper around CLPIRStreamHandler that calls `open` with a given local\n  path.\n\n### Example Code: Using ClpIrFileReader to iterate and print log events\n\n```python\nfrom pathlib import Path\nfrom clp_ffi_py.ir import ClpIrFileReader\n\nwith ClpIrFileReader(Path(\"example.clp.zst\")) as clp_reader:\n    for log_event in clp_reader:\n        # Print the log message with its timestamp properly formatted.\n        print(log_event.get_formatted_message())\n```\n\nEach log event is represented by a `LogEvent` object, which offers methods to\nretrieve its underlying details, such as the timestamp and the log message. For\nmore information, use the following code to see all the available methods and\nthe associated docstring.\n\n```python\nfrom clp_ffi_py.ir import LogEvent\nhelp(LogEvent)\n```\n\n### Example Code: Using Query to search log events by specifying a certain time range\n\n```python\nfrom typing import List\n\nfrom clp_ffi_py.ir import ClpIrStreamReader, LogEvent, Query, QueryBuilder\n\n# Create a QueryBuilder object to build the search query.\nquery_builder: QueryBuilder = QueryBuilder()\n\n# Create a search query that specifies a time range by UNIX epoch timestamp in\n# milliseconds. It will search from 2016.Nov.28 21:00 to 2016.Nov.29 3:00.\ntime_range_query: Query = (\n    query_builder\n    .set_search_time_lower_bound(1480366800000) # 2016.11.28 21:00\n    .set_search_time_upper_bound(1480388400000) # 2016.11.29 03:00\n    .build()\n)\n\n# A list to store all the log events within the search time range\nlog_events: List[LogEvent] = []\n\n# Open IRstream compressed log file as a binary file stream, then pass it to\n# CLpIrStreamReader.\nwith open(\"example.clp.zst\", \"rb\") as compressed_log_file:\n    with ClpIrStreamReader(compressed_log_file) as clp_reader:\n        for log_event in clp_reader.search(time_range_query):\n            log_events.append(log_event)\n```\n\n### Example Code: Using Query to search log messages of certain pattern(s) specified by wildcard queries.\n\n```python\nfrom pathlib import Path\nfrom typing import List, Tuple\n\nfrom clp_ffi_py.ir import ClpIrFileReader, Query, QueryBuilder\nfrom clp_ffi_py.wildcard_query import FullStringWildcardQuery, SubstringWildcardQuery\n\n# Create a QueryBuilder object to build the search query.\nquery_builder: QueryBuilder = QueryBuilder()\n\n# Add wildcard patterns to filter log messages:\nquery_builder.add_wildcard_query(SubstringWildcardQuery(\"uid=*,status=failed\"))\nquery_builder.add_wildcard_query(\n    FullStringWildcardQuery(\"*UID=*,Status=KILLED*\", case_sensitive=True)\n)\n\n# Initialize a Query object using the builder:\nwildcard_search_query: Query = query_builder.build()\n# Store the log events that match the criteria in the format:\n# [timestamp, message]\nmatched_log_messages: List[Tuple[int, str]] = []\n\n# A convenience file reader class is also available to interact with a file that\n# represents a CLP IR stream directly.\nwith ClpIrFileReader(Path(\"example.clp.zst\")) as clp_reader:\n    for log_event in clp_reader.search(wildcard_search_query):\n        matched_log_messages.append((log_event.get_timestamp(), log_event.get_log_message()))\n```\n\nA `Query` object may have both the search time range and the wildcard queries\n(`WildcardQuery`) specified to support more complex search scenarios.\n`QueryBuilder` can be used to conveniently construct Query objects. For more\ndetails, use the following code to access the related docstring.\n\n```python\nfrom clp_ffi_py.ir import Query, QueryBuilder\nfrom clp_ffi_py import FullStringWildcardQuery, SubstringWildcardQuery, WildcardQuery\nhelp(Query)\nhelp(QueryBuilder)\nhelp(WildcardQuery)\nhelp(FullStringWildcardQuery)\nhelp(SubstringWildcardQuery)\n```\n\n### Streaming Deserialize/Search Directly from S3 Remote Storage\n\nWhen working with CLP IR files stored on S3-compatible storage systems,\n[smart_open][8] can be used to open and read the IR stream for the following\nbenefits:\n\n- It only performs stream operation and does not download the file to the disk.\n- It only invokes a single `GET` request so that the API access cost is\n  minimized.\n\nHere is an example:\n\n```python\nfrom pathlib import Path\nfrom clp_ffi_py.ir import ClpIrStreamReader\n\nimport boto3\nimport os\nimport smart_open\n\n# Create a boto3 session by reading AWS credentials from environment variables.\nsession = boto3.Session(\n    aws_access_key_id=os.environ['AWS_ACCESS_KEY_ID'],\n    aws_secret_access_key=os.environ['AWS_SECRET_ACCESS_KEY'],\n)\n\nurl = 's3://clp-example-s3-bucket/example.clp.zst'\n# Using `smart_open.open` to stream the CLP IR byte sequence:\nwith smart_open.open(\n    url, mode=\"rb\", compression=\"disable\", transport_params={\"client\": session.client(\"s3\")}\n) as istream:\n    with ClpIrStreamReader(istream, allow_incomplete_stream=True) as clp_reader:\n        for log_event in clp_reader:\n            # Print the log message with its timestamp properly formatted.\n            print(log_event.get_formatted_message())\n```\n\nNote:\n- Setting `compression=\"disable\"` is necessary so that `smart_open` doesn't\nundo the IR file's Zstandard compression (based on the file's extension) before\nstreaming it to `ClpIrStreamReader`; `ClpIrStreamReader` expects the input\nstream to be Zstandard-compressed.\n- When `allow_incomplete_stream` is set to False (default), the reader will raise\n`clp_ffi_py.ir.IncompleteStreamError` if the stream is incomplete (it doesn't end\nwith the byte sequence indicating the stream's end). In practice, this can occur\nif you're reading a stream that is still being written or wasn't properly\nclosed.\n\n### Parallel Processing\n\nThe `Query` and `LogEvent` classes can be serialized by [pickle][6]. Therefore,\ndeserializing and searching can be parallelized across streams/files using libraries\nsuch as [multiprocessing][4] and [tqlm][5].\n\n## Testing\n\n```bash\n# 1. Create and enter a virtual environment\npython -m venv venv \u0026\u0026 . ./venv/bin/activate\n\n# 2. Install development dependencies\npip install -r requirements-dev.txt\n\n# 3. Pull all submodules in preparation for building\ngit submodule update --init --recursive\n\n# 4. Install\npip install -e .\n\n# 5. Run unit tests\npython -m unittest -bv\n```\n\nNote: If the package is installed from a `whl` file into the site packages,\nrather than installed locally (`pip install -e .`), the tester cannot be\nlaunched from the project's root directory. If `unittest` is ran from the root\ndirectory, the local `clp_ffi_py` directory will shadow the `clp_ffi_py` module\ninstalled. To run the tester with the installed package, try the following:\n\n```bash\ncd tests\npython -m unittest -bv\n```\n\n## Build and Test with cibuildwheel\n\nThis project utilizes [cibuildwheel][3] configuration. Whenever modifications\nare made and committed to GitHub, the cibuildwheel Action will automatically\ninitiate, building this library for several Python environments across diverse\nOS and architectures. You can access the build outcomes (wheel files) via the\nGitHub Action page. For instructions on customizing the build targets or running\ncibuildwheel locally, please refer to the official documentation of\ncibuildwheel.\n\n## Adding files\nCertain file types need to be added to our linting rules manually:\n\n- **CMake**. If adding a CMake file, add it (or its parent directory) as an argument to the\n  `gersemi` command in [lint-tasks.yaml](lint-tasks.yaml).\n  - If adding a directory, the file must be named `CMakeLists.txt` or use the `.cmake` extension.\n- **YAML**. If adding a YAML file (regardless of its extension), add it as an argument to the\n  `yamllint` command in [lint-tasks.yaml](lint-tasks.yaml).\n## Linting\nBefore submitting a pull request, ensure you’ve run the linting commands below and either fixed any\nviolations or suppressed the warning.\n\nTo run all linting checks:\n```shell\ntask lint:check\n```\n\nTo run all linting checks AND automatically fix any fixable issues:\n```shell\ntask lint:fix\n```\n\n### Running specific linters\nThe commands above run all linting checks, but for performance you may want to run a subset (e.g.,\nif you only changed C++ files, you don't need to run the YAML linting checks) using one of the tasks\nin the table below.\n\n| Task                    | Description                                             |\n|-------------------------|---------------------------------------------------------|\n| `lint:cmake-check`      | Runs the CMake linters.                                 |\n| `lint:cmake-fix`        | Runs the CMake linters and fixes any violations.        |\n| `lint:cpp-check`        | Runs the C++ linters (formatters and static analyzers). |\n| `lint:cpp-fix`          | Runs the C++ linters and fixes some violations.         |\n| `lint:cpp-format-check` | Runs the C++ formatters.                                |\n| `lint:cpp-format-fix`   | Runs the C++ formatters and fixes some violations.      |\n| `lint:cpp-static-check` | Runs the C++ static analyzers.                          |\n| `lint:py-check`         | Runs the Python linters.                                |\n| `lint:py-fix`           | Runs the Python linters and fixes some violations.      |\n| `lint:yml-check`        | Runs the YAML linters.                                  |\n| `lint:yml-fix`          | Runs the YAML linters and fixes some violations.        |\n\n[1]: https://github.com/y-scope/clp/tree/main/components/core\n[2]: https://github.com/y-scope/clp\n[3]: https://cibuildwheel.readthedocs.io/en/stable/\n[4]: https://docs.python.org/3/library/multiprocessing.html\n[5]: https://tqdm.github.io/\n[6]: https://docs.python.org/3/library/pickle.html\n[7]: https://pypi.org/project/clp-ffi-py/\n[8]: https://github.com/RaRe-Technologies/smart_open\n[9]: https://taskfile.dev/installation/\n[10]: https://docs.yscope.com/clp-ffi-py/main/api/clp_ffi_py.html\n\n[badge_build_status]: https://github.com/y-scope/clp-ffi-py/workflows/Build/badge.svg\n[badge_monthly_downloads]: https://static.pepy.tech/badge/clp-ffi-py/month\n[badge_pypi]: https://badge.fury.io/py/clp-ffi-py.svg\n[badge_total_downloads]: https://static.pepy.tech/badge/clp-ffi-py\n[clp_ffi_py_gh_actions]: https://github.com/y-scope/clp-ffi-py/actions\n[msgpack]: https://github.com/msgpack/msgpack/blob/master/spec.md\n[pepy/clp_ffi_py]: https://pepy.tech/project/clp-ffi-py\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fy-scope%2Fclp-ffi-py","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fy-scope%2Fclp-ffi-py","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fy-scope%2Fclp-ffi-py/lists"}