{"id":48501191,"url":"https://github.com/freshmag/scarfolder-py","last_synced_at":"2026-04-07T15:00:22.463Z","repository":{"id":349310943,"uuid":"1201846583","full_name":"FreshMag/scarfolder-py","owner":"FreshMag","description":"Data and file scaffolding via configurable YAML pipelines in a ETL fashion","archived":false,"fork":false,"pushed_at":"2026-04-05T10:08:47.000Z","size":180,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-04-05T10:20:27.754Z","etag":null,"topics":["docker","python","scaffolder","scaffolding","utility"],"latest_commit_sha":null,"homepage":"https://freshmag.github.io/scarfolder-py/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/FreshMag.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-04-05T08:45:21.000Z","updated_at":"2026-04-05T10:08:50.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/FreshMag/scarfolder-py","commit_stats":null,"previous_names":["freshmag/scarfolder-py"],"tags_count":2,"template":false,"template_full_name":null,"purl":"pkg:github/FreshMag/scarfolder-py","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/FreshMag%2Fscarfolder-py","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/FreshMag%2Fscarfolder-py/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/FreshMag%2Fscarfolder-py/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/FreshMag%2Fscarfolder-py/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/FreshMag","download_url":"https://codeload.github.com/FreshMag/scarfolder-py/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/FreshMag%2Fscarfolder-py/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31516839,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-07T03:10:19.677Z","status":"ssl_error","status_checked_at":"2026-04-07T03:10:13.982Z","response_time":105,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["docker","python","scaffolder","scaffolding","utility"],"created_at":"2026-04-07T15:00:20.511Z","updated_at":"2026-04-07T15:00:22.421Z","avatar_url":"https://github.com/FreshMag.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cdiv align=\"center\"\u003e\n  \u003cimg src=\"./images/logo.png\" alt=\"Scarfolder logo\" width=\"480\" /\u003e\n  \u003cbr/\u003e\n  \u003cbr/\u003e\n  \u003cp\u003e\n    \u003cstrong\u003eData and file scaffolding via configurable YAML pipelines.\u003c/strong\u003e\n  \u003c/p\u003e\n  \u003cp\u003e\n    Define generators, transformers, and loaders — wire them together in YAML — run anywhere.\n  \u003c/p\u003e\n  \u003cp\u003e\n    \u003ca href=\"https://github.com/FreshMag/scarfolder-py/actions/workflows/ci.yml\"\u003e\n      \u003cimg alt=\"CI\" src=\"https://img.shields.io/github/actions/workflow/status/FreshMag/scarfolder-py/ci.yml?branch=main\u0026label=CI\u0026logo=githubactions\u0026logoColor=white\" /\u003e\n    \u003c/a\u003e\n    \u003ca href=\"https://github.com/FreshMag/scarfolder-py/releases\"\u003e\n      \u003cimg alt=\"Release\" src=\"https://img.shields.io/github/v/release/FreshMag/scarfolder-py?display_name=tag\u0026logo=github\" /\u003e\n    \u003c/a\u003e\n    \u003ca href=\"https://github.com/FreshMag/scarfolder-py/blob/main/LICENSE\"\u003e\n      \u003cimg alt=\"License\" src=\"https://img.shields.io/github/license/FreshMag/scarfolder-py\" /\u003e\n    \u003c/a\u003e\n    \u003ca href=\"https://github.com/FreshMag/scarfolder-py/stargazers\"\u003e\n      \u003cimg alt=\"Stars\" src=\"https://img.shields.io/github/stars/FreshMag/scarfolder-py?style=social\" /\u003e\n    \u003c/a\u003e\n    \u003ca href=\"https://github.com/FreshMag/scarfolder-py\"\u003e\n      \u003cimg alt=\"Repo Size\" src=\"https://img.shields.io/github/repo-size/FreshMag/scarfolder-py\" /\u003e\n    \u003c/a\u003e\n    \u003ca href=\"https://github.com/FreshMag/scarfolder-py/issues\"\u003e\n      \u003cimg alt=\"Issues\" src=\"https://img.shields.io/github/issues/FreshMag/scarfolder-py\" /\u003e\n    \u003c/a\u003e\n  \u003c/p\u003e\n  \u003cbr/\u003e\n\u003c/div\u003e\n\n---\n\n## Table of Contents\n\n- [Concepts](#concepts)\n- [Installation](#installation)\n- [Quick Start](#quick-start)\n- [Pipeline Configuration](#pipeline-configuration)\n  - [Structure](#structure)\n  - [Inline Chaining](#inline-chaining)\n  - [Args \u0026 Placeholders](#args--placeholders)\n  - [External Refs](#external-refs)\n- [CLI Reference](#cli-reference)\n- [Built-in Plugins](#built-in-plugins)\n- [Writing Custom Plugins](#writing-custom-plugins)\n- [Running with Docker](#running-with-docker)\n\n---\n\n## Concepts\n\nA **Scarf** is a full pipeline defined in a single `.yaml` file. It contains one or more **Steps**. Each step has three plugin roles:\n\n| Plugin | Role |\n|---|---|\n| **Generator** | Produces a list of values |\n| **Transformer** | Receives a list and returns a new list |\n| **Loader** | Consumes a list — writes files, runs queries, prints, etc. |\n\nEach step can be given an `id` so its output can be referenced by downstream steps via `${steps.id}`.\n\nSteps are executed in **topological order** — declaration order in the file does not matter.\n\n---\n\n## Installation\n\n**Requirements:** Python 3.11+\n\n```bash\ngit clone \u003crepo-url\u003e scarfolder-py\ncd scarfolder-py\n\npython3.11 -m venv .venv\nsource .venv/bin/activate      # Windows: .venv\\Scripts\\activate\n\npip install -e .               # add [dev] for pytest\n```\n\nThe `scarfolder` command is now available in your shell.\n\n---\n\n## Quick Start\n\n```bash\n# Run the included hello-world example\nscarfolder run examples/hello_world/scarf.yaml\n\n# Override a config arg at runtime\nscarfolder run examples/hello_world/scarf.yaml -pcount=10 -poutput=out.txt\n\n# Check a config file without running it\nscarfolder validate examples/hello_world/scarf.yaml\n\n# Inspect the steps of a pipeline\nscarfolder list-steps examples/hello_world/scarf.yaml\n```\n\n---\n\n## Pipeline Configuration\n\n### Structure\n\n```yaml\nname: my-pipeline\ndescription: Optional description\n\n# (optional) External YAML files accessible via ${ref_name.key}\nrefs:\n  queries: ./sql/queries.yaml\n\n# Default argument values.\n# Set a value to null to mark it as required — the CLI will prompt for it.\nargs:\n  language: en\n  count: 10\n  output: null       # required — must be supplied via -p or interactive prompt\n\nsteps:\n  - id: names                         # optional; required if referenced downstream\n    generator:\n      name: my_pkg.generators.Name\n      args:\n        language: ${args.language}\n        count: ${args.count}\n\n  - generator:\n      name: scarfolder.generators.util.Combine\n      args:\n        streams:\n          - ${steps.names}\n          - ${steps.surnames}\n    transformer: scarfolder.transformers.text.join\n    loader:\n      name: scarfolder.loaders.file.WriteLines\n      args:\n        path: ${args.output}\n```\n\n### Inline Chaining\n\nA step can combine a generator, one or more transformers, and one or more loaders into a single declaration. The pipeline automatically injects the output of each stage as `values` into the next — no intermediate steps or explicit `${steps.*}` references needed.\n\n```yaml\n- id: greetings\n  generator:\n    name: scarfolder.generators.util.Constant\n    args:\n      value: hello\n      count: 5\n  transformers:\n    - name: scarfolder.transformers.text.capitalize_first   # values auto-injected\n    - name: scarfolder.transformers.text.format_template    # values auto-injected\n      args:\n        template: \"Greeting: {value}\"\n  loaders:\n    - name: scarfolder.loaders.console.Print                # values auto-injected\n    - name: scarfolder.loaders.file.WriteLines              # values auto-injected\n      args:\n        path: output.txt\n```\n\nUse `transformer` (singular) and `loader` (singular) for the common single-item case. The plural forms accept a YAML list.\n\nWhen a step has **no generator**, the first transformer is the primary producer and must declare its input explicitly:\n\n```yaml\n- id: upper_names\n  transformer:\n    name: scarfolder.transformers.text.upper\n    args:\n      values: ${steps.names}    # explicit — no generator to inject from\n```\n\n### Args \u0026 Placeholders\n\nPlaceholders use `${namespace.key}` syntax and are resolved before each step runs.\n\n| Placeholder | Resolves to |\n|---|---|\n| `${args.key}` | A runtime argument (CLI or config default) |\n| `${key}` | Shorthand for `${args.key}` |\n| `${steps.id}` | The output list of a previously executed step |\n| `${refname.key}` | A value from an external YAML file (see `refs:`) |\n| `${env.VAR}` | An OS environment variable |\n\n**Type preservation:** a value that is entirely a placeholder (e.g. `${steps.names}`) receives the actual Python object — not its string representation. This allows lists to flow between steps.\n\n**Required args** are declared with a `null` default. If not provided via `-p`, the CLI prompts interactively.\n\n### External Refs\n\n```yaml\nrefs:\n  queries: ./sql/queries.yaml\n\nsteps:\n  - generator:\n      name: my_pkg.generators.SqlRows\n      args:\n        query: ${queries.select_users}\n```\n\n---\n\n## CLI Reference\n\n```\nscarfolder [OPTIONS] COMMAND [ARGS]\n```\n\n### `run`\n\n```bash\nscarfolder run SCARF_FILE [OPTIONS]\n\nOptions:\n  -p, --param KEY=VALUE   Override or supply a config arg. Repeatable.\n  --dry-run               Validate config without executing any steps.\n```\n\n### `validate`\n\nParse and validate a Scarf file without running it.\n\n```bash\nscarfolder validate SCARF_FILE\n```\n\n### `list-steps`\n\nPrint a summary of all steps and their plugin chains. Each step shows its full chain with role labels — `[G]` Generator, `[T]` Transformer, `[L]` Loader.\n\n```bash\nscarfolder list-steps SCARF_FILE\n```\n\n---\n\n## Built-in Plugins\n\n### Generators\n\n| Path | Description |\n|---|---|\n| `scarfolder.generators.util.Constant` | Repeat a single value `count` times |\n| `scarfolder.generators.util.Range` | Integer sequence (`start`, `stop`, `step`) |\n| `scarfolder.generators.util.Combine` | Zip multiple streams into tuples |\n| `scarfolder.generators.util.Enumerate` | Pair each item with its index |\n\n### Transformers\n\nAll built-in text transformers operate on `list[str]`. When chained to a generator, `values` is auto-injected; when used standalone, declare `values: ${steps.\u003cid\u003e}` in args.\n\n| Path | Description |\n|---|---|\n| `scarfolder.transformers.text.capitalize_first` | Capitalise first letter of each string |\n| `scarfolder.transformers.text.upper` | Upper-case every string |\n| `scarfolder.transformers.text.lower` | Lower-case every string |\n| `scarfolder.transformers.text.strip` | Strip leading/trailing whitespace |\n| `scarfolder.transformers.text.join` | Join each inner sequence into a string |\n| `scarfolder.transformers.text.prefix` | Prepend a fixed string |\n| `scarfolder.transformers.text.suffix` | Append a fixed string |\n| `scarfolder.transformers.text.format_template` | Apply `{value}` format template |\n\n### Loaders\n\nWhen chained to a step, `values` is auto-injected; when used standalone, declare `values: ${steps.\u003cid\u003e}` in args.\n\n| Path | Description |\n|---|---|\n| `scarfolder.loaders.file.WriteLines` | Write one value per line to a text file |\n| `scarfolder.loaders.file.WriteJson` | Serialise values as a JSON array |\n| `scarfolder.loaders.console.Print` | Print values to stdout with optional template/header/footer |\n| `scarfolder.loaders.file.print_values` | Print values to stdout (simple function) |\n| `scarfolder.loaders.sql.ExecuteStatements` | Execute each value as a raw SQL statement |\n| `scarfolder.loaders.sql.ExecuteMany` | Execute a parameterised query for each row |\n\n---\n\n## Writing Custom Plugins\n\nAny Python class or plain callable can be a plugin — reference it by its fully qualified dotted path.\n\n### Class-based (recommended for stateful plugins)\n\nAll data arrives through the constructor. Action methods take no positional arguments.\n\n```python\n# my_project/generators.py\nfrom scarfolder.core.base import Generator\n\nclass Name(Generator):\n    def __init__(self, language: str = \"en\", count: int = 5):\n        self.pool = [\"Alice\", \"Bob\"] if language == \"en\" else [\"Luca\", \"Sofia\"]\n        self.count = count\n\n    def generate(self) -\u003e list[str]:\n        import random\n        return [random.choice(self.pool) for _ in range(self.count)]\n```\n\n```python\n# my_project/loaders.py\nimport csv\nfrom pathlib import Path\nfrom scarfolder.core.base import Loader\n\nclass WriteCsv(Loader):\n    def __init__(self, values: list, path: str, headers: list[str] | None = None):\n        self.values = values   # auto-injected when chained; explicit via ${steps.*} otherwise\n        self.path = Path(path)\n        self.headers = headers\n\n    def load(self) -\u003e None:\n        self.path.parent.mkdir(parents=True, exist_ok=True)\n        with self.path.open(\"w\", newline=\"\") as f:\n            writer = csv.writer(f)\n            if self.headers:\n                writer.writerow(self.headers)\n            writer.writerows([[v] for v in self.values])\n```\n\n### Function-based (simpler for stateless transforms)\n\nAll resolved args are passed as keyword arguments. The data input is just another named keyword argument.\n\n```python\n# my_project/transforms.py\n\ndef shout(values: list[str], mark: str = \"!\") -\u003e list[str]:\n    return [v.upper() + mark for v in values]\n```\n\n### Referencing in YAML\n\n```yaml\nsteps:\n  - id: names\n    generator:\n      name: my_project.generators.Name\n      args:\n        language: it\n        count: 20\n    transformer:                          # chained — values auto-injected\n      name: my_project.transforms.shout\n      args:\n        mark: \"!!!\"\n    loader:                               # chained — values auto-injected\n      name: my_project.loaders.WriteCsv\n      args:\n        path: output/names.csv\n        headers: [name]\n```\n\nMake sure your project directory is on `PYTHONPATH`:\n\n```bash\nPYTHONPATH=. scarfolder run pipeline.yaml\n```\n\n---\n\n## Running with Docker\n\nA pre-built image is available. Mount your project to `/workspace` — that directory is automatically on `PYTHONPATH`, so your custom plugins are importable with no extra setup.\n\n### One-off run\n\n```bash\ndocker run --rm \\\n  -v ./my_project:/workspace \\\n  ghcr.io/freshmag/scarfolder:latest \\\n  run scarf.yaml -planguage=it\n```\n\n### With Docker Compose\n\n```yaml\n# docker-compose.yml\nservices:\n  scarfolder:\n    image: ghcr.io/freshmag/scarfolder:latest\n    volumes:\n      - .:/workspace\n    command: [\"run\", \"scarf.yaml\", \"-planguage=it\"]\n```\n\n```bash\ndocker compose run --rm scarfolder\n```\n\n### Plugins outside the project directory\n\nUse `SCARFOLDER_PLUGINS_PATH` (colon-separated) to inject additional paths:\n\n```bash\ndocker run --rm \\\n  -v ./my_project:/workspace \\\n  -v ./shared_plugins:/plugins \\\n  -e SCARFOLDER_PLUGINS_PATH=/plugins \\\n  ghcr.io/freshmag/scarfolder:latest \\\n  run scarf.yaml\n```\n\n---\n\n\u003cdiv align=\"center\"\u003e\n  \u003csub\u003eMade with ❤️ and a warm scarf.\u003c/sub\u003e\n\u003c/div\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffreshmag%2Fscarfolder-py","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffreshmag%2Fscarfolder-py","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffreshmag%2Fscarfolder-py/lists"}