{"id":37427195,"url":"https://github.com/eyecan-ai/pipelime-python","last_synced_at":"2026-01-16T06:21:48.245Z","repository":{"id":60989654,"uuid":"476244561","full_name":"eyecan-ai/pipelime-python","owner":"eyecan-ai","description":"A swiss army knife for data processing!","archived":false,"fork":false,"pushed_at":"2026-01-13T09:33:29.000Z","size":5350,"stargazers_count":19,"open_issues_count":1,"forks_count":0,"subscribers_count":4,"default_branch":"main","last_synced_at":"2026-01-13T11:53:27.510Z","etag":null,"topics":["ai","dataops","dataset","deeplearning","mlops","python"],"latest_commit_sha":null,"homepage":"https://pipelime-python.readthedocs.io/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/eyecan-ai.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2022-03-31T09:49:57.000Z","updated_at":"2026-01-03T20:08:12.000Z","dependencies_parsed_at":"2025-02-25T17:21:00.451Z","dependency_job_id":"676a08cb-8639-4345-adca-2b8b40f6dd66","html_url":"https://github.com/eyecan-ai/pipelime-python","commit_stats":{"total_commits":480,"total_committers":7,"mean_commits":68.57142857142857,"dds":"0.16874999999999996","last_synced_commit":"6ba317cae4ab0dc6beaf28c756146f4f0e30a6a7"},"previous_names":[],"tags_count":26,"template":false,"template_full_name":null,"purl":"pkg:github/eyecan-ai/pipelime-python","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/eyecan-ai%2Fpipelime-python","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/eyecan-ai%2Fpipelime-python/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/eyecan-ai%2Fpipelime-python/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/eyecan-ai%2Fpipelime-python/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/eyecan-ai","download_url":"https://codeload.github.com/eyecan-ai/pipelime-python/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/eyecan-ai%2Fpipelime-python/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28477647,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-16T03:13:13.607Z","status":"ssl_error","status_checked_at":"2026-01-16T03:11:47.863Z","response_time":107,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","dataops","dataset","deeplearning","mlops","python"],"created_at":"2026-01-16T06:21:47.726Z","updated_at":"2026-01-16T06:21:48.108Z","avatar_url":"https://github.com/eyecan-ai.png","language":"Python","readme":"\n# 🍋 `pipelime`\n\n[![Documentation Status](https://readthedocs.org/projects/pipelime-python/badge/?version=latest)](https://pipelime-python.readthedocs.io/en/latest/?badge=latest)\n[![PyPI version](https://badge.fury.io/py/pipelime-python.svg)](https://badge.fury.io/py/pipelime-python)\n\n\u003cimg src=\"docs/_static/pipelime_banner.png?raw=true\" width=\"100%\"/\u003e\n\n*If life gives you lemons, use `pipelime`.*\n\nWelcome to **pipelime**, a swiss army knife for data processing!\n\n`pipelime` is a full-fledge **framework** for **data science**: read your datasets,\nmanipulate them and write back to disk.\nThen build up your **dataflow** with Piper and manage the configuration with Choixe.\nFinally, **embed** your custom commands into the `pipelime` workspace, to act both as dataflow nodes and advanced command line interface.\n\nMaybe too much for you? No worries, `pipelime` is **modular** and you can just take out what you need:\n- **data processing scripts**: use the powerful `SamplesSequence` and create your own data processing pipelines, with a simple and intuitive API. Parallelization works out-of-the-box and, moreover, you can easily serialize your pipelines to yaml/json. Integrations with popular frameworks, e.g., [pytorch](https://pytorch.org/), are also provided.\n- **easy dataflow**: `Piper` can manage and execute directed acyclic graphs (DAGs), giving back feedback on the progress through sockets or custom callbacks.\n- **configuration management**: `Choixe` is a simple and intuitive mini scripting language designed to ease the creation of configuration files with the help of variables, symbol importing, for loops, switch statements, parameter sweeps and more.\n- **command line interface**: `pipelime` can remove all the boilerplate code needed to create a beautiful CLI for you scripts and packages. You focus on *what matters* and we provide input parsing, advanced interfaces for complex arguments, automatic help generation, configuration management. Also, any `PipelimeCommand` can be used as a node in a dataflow for free!\n- **pydantic tools**: most of the classes in `pipelime` derive from [`pydantic.BaseModel`](https://docs.pydantic.dev/), so we have built some useful tools to, e.g., inspect their structure, auto-generate human-friendly documentation and more (including a TUI to help you writing input data to [deserialize](https://docs.pydantic.dev/usage/models/#helper-functions) any pydantic model).\n\n---\n\n## Installation\n\nInstall `pipelime` using pip:\n\n```\npip install pipelime-python\n```\n\nTo be able to *draw* the dataflow graphs, you need the `draw` variant:\n\n```\npip install pipelime-python[draw]\n```\n\n\u003e **Warning**\n\u003e\n\u003e The `draw` variant needs `Graphviz` \u003chttps://www.graphviz.org/\u003e installed on your system\n\u003e On Linux Ubuntu/Debian, you can install it with:\n\u003e\n\u003e ```\n\u003e sudo apt-get install graphviz graphviz-dev\n\u003e ```\n\u003e\n\u003e Alternatively you can use `conda`\n\u003e\n\u003e ```\n\u003e conda install --channel conda-forge pygraphviz\n\u003e ```\n\u003e\n\u003e Please see the full options at https://github.com/pygraphviz/pygraphviz/blob/main/INSTALL.txt\n\n## Basic Usage\n\n### Underfolder Format\n\nThe **Underfolder** format is the preferred `pipelime` dataset formats, i.e., a flexible way to\nmodel and store a generic dataset through **filesystem**.\n\n![](https://github.com/eyecan-ai/pipelime-python/blob/main/docs/images/underfolder.png?raw=true)\n\nAn Underfolder **dataset** is a collection of samples. A **sample** is a collection of items.\nAn **item** is a unitary block of data, i.e., a multi-channel image, a python object,\na dictionary and more.\nAny valid underfolder dataset must contain a subfolder named `data` with samples\nand items. Also, *global shared* items can be stored in the root folder.\n\nItems are named using the following naming convention:\n\n![](https://github.com/eyecan-ai/pipelime-python/blob/main/docs/images/naming.png?raw=true)\n\nWhere:\n\n* `$ID` is the sample index, must be a unique integer for each sample.\n* `ITEM` is the item name.\n* `EXT` is the item extension.\n\nWe currently support many common file formats and others can be added by users:\n\n  * `.png`, `.jpeg/.jpg/.jfif/.jpe`, `.bmp` for images\n  * `.tiff/.tif` for multi-page images and multi-dimensional numpy arrays\n  * `.yaml/.yml`, `.json` and `.toml/.tml` for metadata\n  * `.txt` for numpy 2D matrix notation\n  * `.npy` for general numpy arrays\n  * `.pkl/.pickle` for picklable python objects\n  * `.bin` for generic binary data\n\nRoot files follow the same convention but they lack the sample identifier part, i.e., `$ITEM.$EXT`\n\n### Reading an Underfolder Dataset\n\npipelime provides an intuitive interface to read, manipulate and write Underfolder Datasets.\nNo complex signatures, weird object iterators, or boilerplate code, you just need a `SamplesSequence`:\n\n```python\n    from pipelime.sequences import SamplesSequence\n\n    # Read an underfolder dataset with a single line of code\n    dataset = SamplesSequence.from_underfolder('tests/sample_data/datasets/underfolder_minimnist')\n\n    # A dataset behaves like a Sequence\n    print(len(dataset))             # the number of samples\n    sample = dataset[4]             # get the fifth sample\n\n    # A sample is a mapping\n    print(len(sample))              # the number of items\n    print(set(sample.keys()))       # the items' keys\n\n    # An item is an object wrapping the actual data\n    image_item = sample[\"image\"]    # get the \"image\" item from the sample\n    print(type(image_item))         # \u003cclass 'pipelime.items.image_item.PngImageItem'\u003e\n    image = image_item()            # actually loads the data from disk (may have been on the cloud as well)\n    print(type(image))              # \u003cclass 'numpy.ndarray'\u003e\n```\n\n### Writing an Underfolder Dataset\n\nYou can **write** a dataset by calling the associated operation:\n\n```python\n    # Attach a \"write\" operation to the dataset\n    dataset = dataset.to_underfolder('/tmp/my_output_dataset')\n\n    # Now run over all the samples\n    dataset.run()\n\n    # You can easily spawn multiple processes if needed\n    dataset.run(num_workers=4)\n```\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Feyecan-ai%2Fpipelime-python","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Feyecan-ai%2Fpipelime-python","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Feyecan-ai%2Fpipelime-python/lists"}