{"id":26644944,"url":"https://github.com/jhd3197/tukuy","last_synced_at":"2026-03-08T05:01:58.037Z","repository":{"id":284107706,"uuid":"953778882","full_name":"jhd3197/Tukuy","owner":"jhd3197","description":"Tukuy is a robust, extensible data transformation library that leverages a flexible plugin system. It simplifies the manipulation, validation, and extraction of data across multiple formats (text, HTML, JSON, dates, numbers, and more), making it an ideal tool for building data pipelines and cleaning workflows.","archived":false,"fork":false,"pushed_at":"2026-03-07T18:26:12.000Z","size":6317,"stargazers_count":3,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2026-03-08T00:04:41.958Z","etag":null,"topics":["data-cleaning","data-transformation","date-parsing","plugin","python","text-processing"],"latest_commit_sha":null,"homepage":"https://jhd3197.github.io/Tukuy/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/jhd3197.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":".github/FUNDING.yml","license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null},"funding":{"github":["jhd3197"],"patreon":null,"open_collective":null,"ko_fi":null,"tidelift":null,"community_bridge":null,"liberapay":null,"issuehunt":null,"lfx_crowdfunding":null,"polar":null,"buy_me_a_coffee":"jhd3197","thanks_dev":null,"custom":null}},"created_at":"2025-03-24T04:09:16.000Z","updated_at":"2026-03-07T18:26:12.000Z","dependencies_parsed_at":"2026-02-17T00:02:29.580Z","dependency_job_id":"aa7bfaf8-1535-42d4-b935-5d6b1e2b2429","html_url":"https://github.com/jhd3197/Tukuy","commit_stats":null,"previous_names":["jhd3197/tukuy"],"tags_count":34,"template":false,"template_full_name":null,"purl":"pkg:github/jhd3197/Tukuy","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jhd3197%2FTukuy","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jhd3197%2FTukuy/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jhd3197%2FTukuy/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jhd3197%2FTukuy/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/jhd3197","download_url":"https://codeload.github.com/jhd3197/Tukuy/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jhd3197%2FTukuy/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":30246626,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-08T00:58:18.660Z","status":"online","status_checked_at":"2026-03-08T02:00:06.215Z","response_time":56,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-cleaning","data-transformation","date-parsing","plugin","python","text-processing"],"created_at":"2025-03-24T21:21:02.946Z","updated_at":"2026-03-08T05:01:58.023Z","avatar_url":"https://github.com/jhd3197.png","language":"Python","funding_links":["https://github.com/sponsors/jhd3197","https://buymeacoffee.com/jhd3197"],"categories":[],"sub_categories":[],"readme":"\u003cp align=\"center\"\u003e\n  \u003ch1 align=\"center\"\u003e🌀 Tukuy\u003c/h1\u003e\n  \u003cp align=\"center\"\u003ePortable agent skills library and data transformation toolkit for Python.\u003c/p\u003e\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"https://pypi.org/project/tukuy/\"\u003e\u003cimg src=\"https://badge.fury.io/py/tukuy.svg\" alt=\"PyPI version\"\u003e\u003c/a\u003e\n  \u003ca href=\"https://pypi.org/project/tukuy/\"\u003e\u003cimg src=\"https://img.shields.io/pypi/pyversions/tukuy.svg\" alt=\"Python versions\"\u003e\u003c/a\u003e\n  \u003ca href=\"https://opensource.org/licenses/MIT\"\u003e\u003cimg src=\"https://img.shields.io/badge/License-MIT-blue.svg\" alt=\"License: MIT\"\u003e\u003c/a\u003e\n  \u003ca href=\"https://pepy.tech/project/tukuy\"\u003e\u003cimg src=\"https://static.pepy.tech/badge/tukuy\" alt=\"Downloads\"\u003e\u003c/a\u003e\n  \u003ca href=\"https://github.com/jhd3197/Tukuy\"\u003e\u003cimg src=\"https://img.shields.io/github/stars/jhd3197/Tukuy?style=social\" alt=\"GitHub stars\"\u003e\u003c/a\u003e\n\u003c/p\u003e\n\n---\n\n**Tukuy** (meaning \"to transform\" or \"to become\" in Quechua) is a cross-platform skills layer that any Python agent framework can use. It provides typed skill descriptors, agent-framework bridges (OpenAI, Anthropic), async-first execution, smart composition, and runtime safety enforcement — all built on top of a proven plugin-based transformation engine.\n\n```python\nfrom tukuy import skill\n\n@skill(name=\"parse_date\", description=\"Parse a date string into ISO format\")\ndef parse_date(text: str) -\u003e str:\n    from dateutil import parser\n    return parser.parse(text).isoformat()\n\nresult = parse_date.__skill__.invoke(\"January 15, 2025\")\nprint(result.value)  # \"2025-01-15T00:00:00\"\n```\n\n## Installation\n\n```bash\npip install tukuy\n```\n\n## Quick Start\n\n### Define a skill\n\n```python\nfrom tukuy import skill\n\n@skill(\n    name=\"parse_date\",\n    description=\"Parse a date string into ISO format\",\n    category=\"date\",\n    tags=[\"parsing\", \"datetime\"],\n)\ndef parse_date(text: str, format: str = \"auto\") -\u003e str:\n    \"\"\"Parse date from text, return ISO 8601.\"\"\"\n    from dateutil import parser\n    return parser.parse(text).isoformat()\n```\n\nThe `@skill` decorator infers input/output schemas from type hints, detects async functions automatically, and attaches a `Skill` instance as `fn.__skill__`.\n\n### Invoke a skill\n\n```python\nresult = parse_date.__skill__.invoke(\"January 15, 2025\")\nprint(result.value)      # \"2025-01-15T00:00:00\"\nprint(result.success)    # True\nprint(result.duration_ms) # 0.42\n```\n\n### Use with an agent framework\n\n```python\nfrom tukuy import to_openai_tools, to_anthropic_tools\n\nskills = [parse_date, extract_entities, summarize]\n\n# OpenAI function-calling format\ntools = to_openai_tools(skills)\n\n# Anthropic tool_use format\ntools = to_anthropic_tools(skills)\n```\n\nDispatch tool calls back to skills:\n\n```python\nfrom tukuy import dispatch_openai, dispatch_anthropic\n\n# OpenAI\nresult_msg = dispatch_openai(tool_call, skills)\n\n# Anthropic\nresult_block = dispatch_anthropic(tool_use, skills)\n```\n\n---\n\n## Core Concepts\n\n### Skill Descriptors\n\nEvery skill has a declared-upfront contract via `SkillDescriptor`:\n\n```python\nfrom tukuy import SkillDescriptor\n\ndescriptor = SkillDescriptor(\n    name=\"web_scraper\",\n    description=\"Scrape and extract text from a URL\",\n    input_schema=str,\n    output_schema=str,\n    category=\"web\",\n    tags=[\"scraping\", \"extraction\"],\n    is_async=True,\n    requires_network=True,\n    required_imports=[\"aiohttp\", \"beautifulsoup4\"],\n    idempotent=True,\n    side_effects=False,\n)\n```\n\nDescriptors carry identity, typed I/O schemas, discovery metadata, operational hints, and safety declarations — everything an agent framework needs to discover and invoke a skill.\n\n### Async Support\n\nSkills work with both sync and async functions:\n\n```python\n@skill(name=\"fetch_page\", requires_network=True)\nasync def fetch_page(url: str) -\u003e str:\n    async with aiohttp.ClientSession() as session:\n        async with session.get(url) as resp:\n            return await resp.text()\n\n# Async invocation\nresult = await fetch_page.__skill__.ainvoke(\"https://example.com\")\n```\n\nSync skills also work with `ainvoke()` — they're called normally without blocking the event loop.\n\n### Composition\n\nChain, branch, and fan-out skills with `Chain`, `Branch`, and `Parallel`:\n\n```python\nfrom tukuy import Chain, branch, parallel\n\n# Sequential pipeline\nchain = Chain([\"strip\", \"lowercase\", parse_date])\nresult = chain.run(\"  January 15, 2025  \")\n\n# Conditional branching\nchain = Chain([\n    \"strip\",\n    branch(\n        on_match=lambda v: \"@\" in v,\n        true_path=[\"email_validator\"],\n        false_path=[\"url_validator\"],\n    ),\n])\n\n# Parallel fan-out with merge\nchain = Chain([\n    parallel(\n        steps=[\"extract_dates\", \"extract_emails\", \"extract_phones\"],\n        merge=\"dict\",  # {\"extract_dates\": [...], \"extract_emails\": [...], ...}\n    ),\n])\n\n# Async execution with asyncio.gather for parallel steps\nresult = await chain.arun(input_text)\n```\n\nSteps can be transformer names (strings), parametrized transforms (dicts), `Skill` instances, `@skill`-decorated functions, plain callables, or nested `Chain` objects.\n\n### Context\n\nSkills share state through a typed, scoped `SkillContext`:\n\n```python\nfrom tukuy import skill, SkillContext\n\n@skill(name=\"extract_entities\")\ndef extract_entities(text: str, ctx: SkillContext) -\u003e dict:\n    entities = do_extraction(text)\n    ctx.set(\"last_entities\", entities)\n    return entities\n\n@skill(name=\"format_entities\")\ndef format_entities(ctx: SkillContext) -\u003e str:\n    entities = ctx.get(\"last_entities\")\n    return format_them(entities)\n```\n\nContext supports namespaced scoping (for parallel branches), parent-child delegation, snapshot/merge, and bridging to plain dicts.\n\n### Safety Policy\n\nEach skill declares what resources it needs. The runtime enforces these declarations against an active policy:\n\n```python\nfrom tukuy import SafetyPolicy, set_policy\n\n# Define what the environment allows\npolicy = SafetyPolicy(\n    allowed_imports={\"json\", \"re\", \"datetime\"},\n    blocked_imports={\"os\", \"subprocess\"},\n    allow_network=False,\n    allow_filesystem=False,\n)\n\n# Activate globally (async-safe via contextvars)\nset_policy(policy)\n\n# Skills that violate the policy are blocked before execution\n@skill(name=\"web_scraper\", requires_network=True)\nasync def web_scraper(url: str) -\u003e str: ...\n\nresult = web_scraper.__skill__.invoke(\"https://example.com\")\n# result.success == False\n# result.error == \"Safety policy violated: Skill requires network access but policy denies it\"\n```\n\nConvenience constructors for common scenarios:\n\n```python\nSafetyPolicy.restrictive()    # No imports, no network, no filesystem\nSafetyPolicy.permissive()     # Everything allowed\nSafetyPolicy.network_only()   # Network yes, filesystem no\nSafetyPolicy.filesystem_only() # Filesystem yes, network no\n```\n\nPolicies can be exported/imported as sandbox configurations for integration with external sandbox runtimes:\n\n```python\nconfig = policy.to_sandbox_config()\n# {\"allowed_imports\": [\"json\", \"re\"], \"network\": False, \"filesystem\": False}\n\npolicy = SafetyPolicy.from_sandbox_config(config)\n```\n\n---\n\n## Data Transformations\n\nTukuy includes a full transformation engine with six built-in plugins. This is the foundation that the skills layer is built on.\n\n```python\nfrom tukuy import TukuyTransformer\n\nt = TukuyTransformer()\n\n# Text\nt.transform(\"  Hello World!  \", [\"strip\", \"lowercase\"])\n# \"hello world!\"\n\n# HTML\nt.transform(\"\u003cdiv\u003eHello \u003cb\u003eWorld\u003c/b\u003e!\u003c/div\u003e\", [\"strip_html_tags\", \"lowercase\"])\n# \"hello world!\"\n\n# Chained with parameters\nt.transform(\"  Hello World!  \", [\n    \"strip\",\n    \"lowercase\",\n    {\"function\": \"truncate\", \"length\": 5},\n])\n# \"hello...\"\n```\n\n### Built-in Plugins\n\n**Text** — `strip`, `lowercase`, `uppercase`, `truncate`, `replace`, `regex_replace`, `split`, `join`, `normalize`\n\n**HTML** — `strip_html_tags`, `extract_text`, `select`, `extract_links`, `extract_tables`, `clean_html`\n\n**JSON** — `parse_json`, `stringify`, `extract`, `flatten`, `merge`, `validate_schema`\n\n**Date** — `parse_date`, `format_date`, `age_calc`, `add_days`, `diff_days`, `is_weekend`\n\n**Numerical** — `round`, `format_number`, `to_currency`, `percentage`, `math_eval`, `scale`, `statistics`\n\n**Validation** — `email_validator`, `url_validator`, `phone_validator`, `length_validator`, `range_validator`, `regex_validator`, `type_validator`\n\n### Custom Plugins\n\nExtend Tukuy with your own transformer plugins:\n\n```python\nfrom tukuy import TransformerPlugin\nfrom tukuy.base import ChainableTransformer\n\nclass ReverseTransformer(ChainableTransformer):\n    def validate(self, value):\n        return isinstance(value, str)\n\n    def _transform(self, value, context=None):\n        return value[::-1]\n\nclass MyPlugin(TransformerPlugin):\n    def __init__(self):\n        super().__init__(\"my_plugin\")\n\n    @property\n    def transformers(self):\n        return {\"reverse\": lambda _: ReverseTransformer(\"reverse\")}\n\nt = TukuyTransformer()\nt.register_plugin(MyPlugin())\nt.transform(\"hello\", [\"reverse\"])  # \"olleh\"\n```\n\nPlugins support lifecycle hooks (`initialize()` / `cleanup()`) and can expose skills alongside transformers via the `skills` property.\n\n### Dynamic Tool Registration\n\nTukuy makes it easy to add tools at runtime without restarting your application.\n\n**Register a plugin on the fly:**\n\n```python\nfrom tukuy import TukuyTransformer, TransformerPlugin\n\nclass MyPlugin(TransformerPlugin):\n    def __init__(self):\n        super().__init__(\"my_plugin\")\n\n    @property\n    def transformers(self):\n        return {\"reverse\": lambda _: ReverseTransformer(\"reverse\")}\n\nt = TukuyTransformer()\nt.register_plugin(MyPlugin())       # available immediately\nt.transform(\"hello\", [\"reverse\"])   # \"olleh\"\nt.unregister_plugin(\"my_plugin\")    # remove when no longer needed\n```\n\n**Create skills at runtime:**\n\n```python\nfrom tukuy import skill, to_openai_tools\n\n@skill(name=\"sentiment\", description=\"Classify sentiment\", category=\"nlp\")\ndef sentiment(text: str) -\u003e str:\n    return \"positive\" if \"good\" in text.lower() else \"negative\"\n\n# Instantly usable — invoke directly or convert to agent tool format\nresult = sentiment.__skill__.invoke(\"This is good!\")\ntools = to_openai_tools([sentiment])  # ready for OpenAI function-calling\n```\n\n**Use the `@tukuy_plugin` decorator for metadata:**\n\n```python\nfrom tukuy import tukuy_plugin\n\n@tukuy_plugin(\"analytics\", \"Real-time analytics transforms\", \"1.0.0\")\nclass AnalyticsPlugin(TransformerPlugin):\n    @property\n    def transformers(self):\n        return {\"moving_avg\": lambda p: MovingAvgTransformer(\"moving_avg\", **p)}\n```\n\n**Hot-reload plugins without restarting:**\n\n```python\nfrom tukuy import hot_reload\n\nhot_reload(\"my_plugin\")  # reload a specific plugin\nhot_reload()             # reload all plugins\n```\n\n**Discover what's available:**\n\n```python\nfrom tukuy import browse_tools, get_tool_details, search_tools\n\nindex = browse_tools()                      # compact index of all tools\ndetails = get_tool_details(\"reverse\")       # full metadata for a specific tool\nresults = search_tools(\"date\", limit=5)     # keyword search across all tools\n```\n\n---\n\n## Pattern-based Extraction\n\n### HTML\n\n```python\npattern = {\n    \"properties\": [\n        {\"name\": \"title\", \"selector\": \"h1\", \"transform\": [\"strip\", \"lowercase\"]},\n        {\"name\": \"links\", \"selector\": \"a\", \"attribute\": \"href\", \"type\": \"array\"},\n    ]\n}\ndata = t.extract_html_with_pattern(html, pattern)\n```\n\n### JSON\n\n```python\npattern = {\n    \"properties\": [\n        {\n            \"name\": \"user\",\n            \"selector\": \"data.user\",\n            \"properties\": [\n                {\"name\": \"name\", \"selector\": \"fullName\", \"transform\": [\"strip\"]},\n            ],\n        }\n    ]\n}\ndata = t.extract_json_with_pattern(json_str, pattern)\n```\n\n---\n\n## Error Handling\n\n```python\nfrom tukuy.exceptions import ValidationError, TransformationError, ParseError\n\ntry:\n    result = t.transform(data, transformations)\nexcept ValidationError as e:\n    print(f\"Validation failed: {e}\")\nexcept ParseError as e:\n    print(f\"Parsing failed: {e}\")\nexcept TransformationError as e:\n    print(f\"Transformation failed: {e}\")\n```\n\n---\n\n## CLI\n\nTukuy ships with a command-line interface for inspecting plugins, running skills, and applying transformers directly from the terminal.\n\n```bash\npip install tukuy\n```\n\n### Discovery\n\n```bash\n# High-level summary (plugin/skill/transformer/group counts)\ntukuy info\n\n# List everything\ntukuy list plugins\ntukuy list skills\ntukuy list transformers\ntukuy list groups\n\n# Filter lists\ntukuy list skills --plugin country\ntukuy list skills --group Integrations\ntukuy list skills --tag crypto\ntukuy list plugins --group Data\n\n# JSON output for scripting\ntukuy list skills --json\ntukuy list plugins --json\n```\n\n### Inspect\n\n```bash\n# Detailed plugin info (transformers, skills, requirements)\ntukuy show plugin country\n\n# Detailed skill info (parameters, risk level, tags, config)\ntukuy show skill crypto_price\n```\n\n### Run Skills\n\n```bash\n# Run a skill with keyword arguments\ntukuy run word_define --word hello\ntukuy run crypto_price --coins bitcoin\ntukuy run public_holidays --country_code US\n\n# Raw JSON output\ntukuy run crypto_price --coins bitcoin --raw\n```\n\n### Apply Transformers\n\n```bash\n# Transform inline text\ntukuy transform lowercase \"HELLO WORLD\"\ntukuy transform hash_text \"secret\" --algorithm md5\n\n# Pipe input from stdin\necho \"HELLO\" | tukuy transform lowercase\ncat data.json | tukuy transform parse_json\n```\n\n---\n\n## MCP Server\n\nTukuy includes an [MCP](https://modelcontextprotocol.io/) server that exposes all plugins, skills, and transformers as tools for Claude Desktop, Claude Code, and other MCP clients.\n\n```bash\npip install 'tukuy[mcp]'\n```\n\n### Configuration\n\n**Claude Desktop** (`claude_desktop_config.json`):\n\n```json\n{\n  \"mcpServers\": {\n    \"tukuy\": {\n      \"command\": \"python\",\n      \"args\": [\"-m\", \"tukuy.mcp_server\"]\n    }\n  }\n}\n```\n\n**Claude Code** (`.claude/settings.json`):\n\n```json\n{\n  \"mcpServers\": {\n    \"tukuy\": {\n      \"command\": \"python\",\n      \"args\": [\"-m\", \"tukuy.mcp_server\"]\n    }\n  }\n}\n```\n\nThe server provides 6 meta-tools rather than registering hundreds of individual tools:\n\n| Tool | Purpose |\n|------|---------|\n| `tukuy_info` | Summary of all capabilities (counts, groups) |\n| `tukuy_browse` | Browse plugins with their skills and transformers |\n| `tukuy_search` | Keyword search across skills and transformers |\n| `tukuy_show` | Detailed info for a specific skill or transformer |\n| `tukuy_run` | Execute a skill with parameters |\n| `tukuy_transform` | Apply a transformer to input |\n\n### Filtering Plugins\n\nControl which plugins are exposed with `--only` / `--exclude` flags or environment variables. Values can be plugin names or group names.\n\n```bash\n# Only expose math-related plugins\ntukuy-mcp --only numerical\n\n# Only expose an entire group\ntukuy-mcp --only Data\n\n# Exclude dangerous plugins\ntukuy-mcp --exclude shell,file_ops\n\n# Combine: start from a group, then remove some\ntukuy-mcp --only Data --exclude sql\n```\n\nEnvironment variables work the same way, useful for JSON-based MCP config:\n\n```json\n{\n  \"mcpServers\": {\n    \"tukuy-math\": {\n      \"command\": \"python\",\n      \"args\": [\"-m\", \"tukuy.mcp_server\"],\n      \"env\": { \"TUKUY_MCP_ONLY\": \"numerical,date\" }\n    },\n    \"tukuy-safe\": {\n      \"command\": \"python\",\n      \"args\": [\"-m\", \"tukuy.mcp_server\"],\n      \"env\": { \"TUKUY_MCP_EXCLUDE\": \"shell,file_ops,git\" }\n    }\n  }\n}\n```\n\nAvailable groups: `Code`, `Core`, `Data`, `Documents`, `Extensibility`, `Integrations`, `Interaction`, `Media`, `Web`.\n\n---\n\n## Architecture\n\n```\ntukuy/\n    skill.py          Skill descriptors, @skill decorator, invoke/ainvoke\n    context.py        SkillContext with scoping, snapshot, merge\n    safety.py         SafetyPolicy, manifest validation, sandbox integration\n    bridges.py        OpenAI and Anthropic tool format bridges\n    chain.py          Chain, Branch, Parallel composition\n    cli.py            Command-line interface (tukuy info/list/show/run/transform)\n    mcp_server.py     MCP server for Claude Desktop / Claude Code\n    async_base.py     Async transformer base classes\n    base.py           Sync transformer base classes\n    plugins/          Built-in plugins (text, html, json, date, numerical, validation, ...)\n    core/             Registration, introspection, unified registry\n    transformers/     Transformer implementations\n```\n\n## Contributing\n\nContributions are welcome.\n\n1. Fork the repository\n2. Create a feature branch (`git checkout -b feature/my-feature`)\n3. Make your changes\n4. Run tests with `pytest`\n5. Commit and push\n6. Open a Pull Request\n\n## License\n\nSee [LICENSE](LICENSE) for details.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjhd3197%2Ftukuy","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjhd3197%2Ftukuy","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjhd3197%2Ftukuy/lists"}