{"id":50327636,"url":"https://github.com/langfuse/experiment-action","last_synced_at":"2026-05-29T07:32:48.313Z","repository":{"id":356111202,"uuid":"1215919607","full_name":"langfuse/experiment-action","owner":"langfuse","description":"Run your Langfuse experiment with your GitHub action workflow.","archived":false,"fork":false,"pushed_at":"2026-05-21T11:24:19.000Z","size":3063,"stargazers_count":9,"open_issues_count":2,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-05-21T16:23:03.420Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/langfuse.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":"AGENTS.md","dco":null,"cla":null}},"created_at":"2026-04-20T11:38:48.000Z","updated_at":"2026-05-21T08:57:47.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/langfuse/experiment-action","commit_stats":null,"previous_names":["langfuse/experiment-action"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/langfuse/experiment-action","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/langfuse%2Fexperiment-action","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/langfuse%2Fexperiment-action/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/langfuse%2Fexperiment-action/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/langfuse%2Fexperiment-action/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/langfuse","download_url":"https://codeload.github.com/langfuse/experiment-action/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/langfuse%2Fexperiment-action/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33642312,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-05-29T02:00:06.066Z","response_time":107,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-05-29T07:32:47.530Z","updated_at":"2026-05-29T07:32:48.295Z","avatar_url":"https://github.com/langfuse.png","language":"TypeScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"![GitHub Banner](https://github.com/langfuse/langfuse-js/assets/2834609/d1613347-445f-4e91-9e84-428fda9c3659)\n\n# langfuse/experiment-action\n\n[![MIT License](https://img.shields.io/badge/License-MIT-red.svg?style=flat-square)](https://opensource.org/licenses/MIT)\n[![CI test status](https://img.shields.io/github/actions/workflow/status/langfuse/experiment-action/ci.yml?style=flat-square\u0026label=All%20tests)](https://github.com/langfuse/experiment-action/actions/workflows/ci.yml?query=branch%3Amain)\n[![GitHub Repo stars](https://img.shields.io/github/stars/langfuse/langfuse?style=flat-square\u0026logo=GitHub\u0026label=langfuse%2Flangfuse)](https://github.com/langfuse/langfuse)\n[![Discord](https://img.shields.io/discord/1111061815649124414?style=flat-square\u0026logo=Discord\u0026logoColor=white\u0026label=Discord\u0026color=%23434EE4)](https://discord.gg/7NXusRtqYU)\n[![YC W23](https://img.shields.io/badge/Y%20Combinator-W23-orange?style=flat-square)](https://www.ycombinator.com/companies/langfuse)\n\nRun a [Langfuse](https://langfuse.com) experiment in your CI pipeline. The\naction loads your experiment script, runs it against a Langfuse dataset,\ncomments the result on the PR, and optionally fails the job when a regression\nis detected. Learn more in the Langfuse docs on\n[testing experiments in CI environments](https://langfuse.com/docs/evaluation/experiments/experiments-via-sdk#testing-in-ci-environments).\n\n## Contents\n\n- [Quickstart](#quickstart)\n- [Usage](#usage)\n  - [Inputs](#inputs)\n  - [Outputs](#outputs)\n  - [Script contract](#script-contract)\n  - [Consuming the result in later steps](#consuming-the-result-in-later-steps)\n  - [Experiment metadata](#experiment-metadata)\n- [FAQ](#faq)\n  - [Can I run Python and TypeScript experiments in the same step?](#can-i-run-python-and-typescript-experiments-in-the-same-step)\n  - [How do I manage the Langfuse SDK installation myself?](#how-do-i-manage-the-langfuse-sdk-installation-myself)\n  - [How do I pass extra secrets (OpenAI keys, etc.) to my experiment?](#how-do-i-pass-extra-secrets-openai-keys-etc-to-my-experiment)\n  - [Can I pin a specific Langfuse SDK version?](#can-i-pin-a-specific-langfuse-sdk-version)\n  - [Why does the action need a `github_token`?](#why-does-the-action-need-a-github_token)\n  - [Does PR commenting work on forked-PR runs?](#does-pr-commenting-work-on-forked-pr-runs)\n  - [Why can't I see my experiment in Langfuse?](#why-cant-i-see-my-experiment-in-langfuse)\n- [Contributing](#contributing)\n- [License](#license)\n\n## Quickstart\n\n```yaml\nname: Langfuse experiment\non:\n  pull_request:\n\npermissions:\n  contents: read\n  pull-requests: write # required for posting the experiment comment\n  actions: read # lets \"View run\" link to the specific job (falls back to the workflow-run URL otherwise)\n\njobs:\n  experiment:\n    runs-on: ubuntu-latest\n    steps:\n      - uses: actions/checkout@v6\n\n      # For experiments written in Python (`.py` scripts).\n      - uses: actions/setup-python@v6\n        with:\n          python-version: \"3.14\"\n\n      # For experiments written in TypeScript / JavaScript\n      # (`.ts` / `.js` / `.mjs` / `.cjs` scripts). Safe to include both\n      # setups if your `experiment_path` is a directory with a mix of\n      # runtimes.\n      - uses: actions/setup-node@v6\n        with:\n          node-version: \"24\"\n\n      # Pin to a release SHA (Zizmor-friendly, protects you from\n      # tag-moved attacks). The `# v\u003cversion\u003e` comment is what humans\n      # read; the SHA is what GitHub resolves. This line is auto-bumped\n      # by `.github/workflows/release-bump-readme.yml` on every release.\n      - uses: langfuse/experiment-action@887e7936bdf64a2197aa7dcfdc8a9e4afd85e229 # v1.0.3\n        with:\n          langfuse_public_key: ${{ secrets.LANGFUSE_PUBLIC_KEY }}\n          langfuse_secret_key: ${{ secrets.LANGFUSE_SECRET_KEY }}\n          experiment_path: experiments/qa_experiment.py\n          dataset_name: qa-set\n          github_token: ${{ github.token }}\n```\n\nOnly include the setup steps you actually need — Python-only projects can\ndrop `actions/setup-node`, TS-only projects can drop `actions/setup-python`.\n\n## Usage\n\n### Inputs\n\n| Input                          | Required | Default                      | Description                                                                                                                                                     |\n| ------------------------------ | -------- | ---------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------- |\n| `langfuse_public_key`          | yes      |                              | Langfuse public API key.                                                                                                                                        |\n| `langfuse_secret_key`          | yes      |                              | Langfuse secret API key.                                                                                                                                        |\n| `langfuse_base_url`            | no       | `https://cloud.langfuse.com` | Langfuse base URL.                                                                                                                                              |\n| `experiment_path`              | yes      |                              | File, directory, or glob pattern pointing at experiment scripts.                                                                                                |\n| `dataset_name`                 | no       |                              | Dataset to run against. If omitted, the user script is expected to select its own dataset.                                                                      |\n| `dataset_version`              | no       |                              | Pin the experiment to a specific dataset version.                                                                                                               |\n| `experiment_metadata`          | no       |                              | Additional metadata as a multiline `key=value` string. Shown under the Metadata column in the Langfuse UI.                                                      |\n| `should_fail_on_regression`    | no       | `true`                       | Fail CI when an experiment raises `RegressionError`.                                                                                                            |\n| `should_fail_on_script_error`  | no       | `true`                       | Fail CI when an experiment script crashes or raises a non-regression error.                                                                                     |\n| `should_comment_on_pr`         | no       | `true`                       | Post the result as a PR comment.                                                                                                                                |\n| `python_sdk_version`           | no       | `4.6.0`                      | Python SDK version to install via `pip` (for `.py` experiments).                                                                                                |\n| `js_sdk_version`               | no       | `5.3.0`                      | JS SDK version (`@langfuse/client`) to install via `npm` (for `.ts`/`.js`/`.mjs`/`.cjs` experiments).                                                           |\n| `should_skip_sdk_installation` | no       | `false`                      | Skip automatic SDK installation and use the ambient Python/Node environment.                                                                                    |\n| `github_token`                 | no       |                              | Token used to post the PR comment. Pass `${{ github.token }}` and grant `pull-requests: write` (and optionally `actions: read` for job-level \"View run\" links). |\n\n### Outputs\n\n| Output        | Description                                                                                                                              |\n| ------------- | ---------------------------------------------------------------------------------------------------------------------------------------- |\n| `result_json` | JSON output with action metadata and experiment results. See [`schemas/result-json.v1.schema.json`](schemas/result-json.v1.schema.json). |\n| `failed`      | `\"true\"` if any experiment errored or raised a regression, else `\"false\"`.                                                               |\n\nFor the full `result_json` structure, see [`schemas/result-json.v1.schema.json`](schemas/result-json.v1.schema.json).\n\n### Script contract\n\nYour experiment script must define a function named `experiment`. The action\ncreates a Langfuse SDK `RunnerContext` and passes it as the first argument so\nscripts can use action-injected defaults for dataset, dataset version, and\nmetadata. Use the access patterns from the [Langfuse experiments\ndocs][docs-experiments] — iterate `run_evaluations` / `runEvaluations` to find\nthe score you want to gate on.\n\n[docs-experiments]: https://langfuse.com/docs/evaluation/experiments/experiments-via-sdk#basic-usage\n\n#### Python\n\n```python\nfrom langfuse import RegressionError, RunnerContext\n\n\ndef experiment(context: RunnerContext):\n    result = context.run_experiment(\n        name=\"My experiment\",\n        task=my_task,\n        evaluators=[my_evaluator],\n        run_evaluators=[avg_accuracy],\n    )\n\n    avg_accuracy = next(\n        evaluation.value\n        for evaluation in result.run_evaluations\n        if evaluation.name == \"avg_accuracy\"\n    )\n    if avg_accuracy \u003c 0.9:\n        raise RegressionError(result=result)\n\n    return result\n```\n\n#### TypeScript / JavaScript\n\n```ts\nimport { RegressionError, type RunnerContext } from \"@langfuse/client\";\n\nexport async function experiment(context: RunnerContext) {\n  const result = await context.runExperiment({\n    name: \"My experiment\",\n    task: myTask,\n    evaluators: [myEvaluator],\n    runEvaluators: [avgAccuracy],\n  });\n\n  const avgAccuracy =\n    (result.runEvaluations.find((e) =\u003e e.name === \"avg_accuracy\")?.value as number | undefined) ??\n    0;\n  if (avgAccuracy \u003c 0.9) {\n    throw new RegressionError({ result });\n  }\n\n  return result;\n}\n```\n\nThe action serializes the returned value to JSON and exposes it through the\n`result_json` output. If the function raises, the error is captured and the\nCI job fails depending on `should_fail_on_regression` /\n`should_fail_on_script_error`.\n\n### Consuming the result in later steps\n\n```yaml\n- uses: langfuse/experiment-action@887e7936bdf64a2197aa7dcfdc8a9e4afd85e229 # v1.0.3\n  id: experiment\n  with: # ...\n\n- name: Upload result as artifact\n  run: echo '${{ steps.experiment.outputs.result_json }}' \u003e experiment.json\n\n- uses: actions/upload-artifact@v7\n  with:\n    name: experiment-result\n    path: experiment.json\n```\n\n### Experiment metadata\n\nEvery experiment run carries the following metadata, in addition to anything\nyou pass via `experiment_metadata`. Action-generated keys are namespaced\nunder `langfuse.*` so they're easy to distinguish from your own.\n\n| Key                             | Source                                            | Notes                                                                                                                       |\n| ------------------------------- | ------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------- |\n| `langfuse.git_sha`              | `$GITHUB_SHA`                                     | The commit being tested.                                                                                                    |\n| `langfuse.branch`               | `$GITHUB_REF_NAME`                                |                                                                                                                             |\n| `langfuse.event`                | `$GITHUB_EVENT_NAME`                              | E.g. `pull_request`, `push`.                                                                                                |\n| `langfuse.actor`                | `$GITHUB_TRIGGERING_ACTOR` → `$GITHUB_ACTOR`      | The user who kicked off the current attempt.                                                                                |\n| `langfuse.pr_url`               | derived from `$GITHUB_REF`                        | Only on `pull_request` events.                                                                                              |\n| `langfuse.github_workflow_name` | `$GITHUB_WORKFLOW`                                | E.g. `CI`.                                                                                                                  |\n| `langfuse.github_job_name`      | `$GITHUB_JOB`                                     | The workflow job running the experiment.                                                                                    |\n| `langfuse.github_job_attempt`   | `$GITHUB_RUN_ATTEMPT`                             | `\"1\"` on the initial run, `\"2\"`+ on re-runs.                                                                                |\n| `langfuse.github_job_url`       | resolved via the GitHub API from `$GITHUB_RUN_ID` | Direct link to this job's logs. Requires `github_token` with `actions: read`; falls back to the workflow-run URL otherwise. |\n| _custom_                        | `experiment_metadata`                             | Forwarded verbatim — pick whatever namespace your org prefers.                                                              |\n\n## FAQ\n\n### Can I run Python and TypeScript experiments in the same step?\n\nYes. Point `experiment_path` at a directory (or a glob) that contains a mix\nof `.py` and `.ts` / `.js` / `.mjs` / `.cjs` files. The action detects the\nruntime per-script, installs each SDK once per step, and runs everything\nsequentially. Make sure your workflow has `actions/setup-python` **and**\n`actions/setup-node` before the action step.\n\n```yaml\nexperiment_path: experiments/ # contains both python and ts scripts\n```\n\nFiles starting with `.` or `_` are skipped so helper modules (e.g.\n`__init__.py`, `_utils.ts`) don't get executed as experiments.\n\n### How do I manage the Langfuse SDK installation myself?\n\nSet `should_skip_sdk_installation: \"true\"` and install the SDK yourself\nbefore the action runs. Useful when your project has pinned lockfiles\nyou'd rather honour than reinstall against the action's default SDK versions.\n\n#### Python — install from your own `requirements.txt`\n\n```yaml\n- uses: actions/setup-python@v6\n  with:\n    python-version: \"3.14\"\n    cache: pip\n\n- run: pip install -r requirements.txt # must include `langfuse`\n\n- uses: langfuse/experiment-action@887e7936bdf64a2197aa7dcfdc8a9e4afd85e229 # v1.0.3\n  with:\n    experiment_path: experiments/\n    should_skip_sdk_installation: \"true\"\n    langfuse_public_key: ${{ secrets.LANGFUSE_PUBLIC_KEY }}\n    langfuse_secret_key: ${{ secrets.LANGFUSE_SECRET_KEY }}\n```\n\nWorks with any Python installer — swap `pip install -r requirements.txt`\nfor `poetry install`, `uv sync`, `pdm install`, etc.\n\n#### Node — install from your own lockfile\n\nThe action needs `@langfuse/client`, `@langfuse/tracing`, `@langfuse/otel`,\n`@opentelemetry/sdk-node`, and `tsx` reachable from the working directory's\n`node_modules/`.\n\n```yaml\n- uses: actions/setup-node@v6\n  with:\n    node-version: \"24\"\n    cache: npm # or pnpm / yarn\n\n- run: npm ci # or `pnpm install --frozen-lockfile`, `yarn install --frozen-lockfile`\n\n- uses: langfuse/experiment-action@887e7936bdf64a2197aa7dcfdc8a9e4afd85e229 # v1.0.3\n  with:\n    experiment_path: experiments/\n    should_skip_sdk_installation: \"true\"\n    langfuse_public_key: ${{ secrets.LANGFUSE_PUBLIC_KEY }}\n    langfuse_secret_key: ${{ secrets.LANGFUSE_SECRET_KEY }}\n```\n\nMake sure `@langfuse/client`, `@langfuse/tracing`, `@langfuse/otel`,\n`@opentelemetry/sdk-node`, and `tsx` are listed in `package.json` (either\n`dependencies` or `devDependencies` — `npm ci` installs both).\n\n### How do I pass extra secrets (OpenAI keys, etc.) to my experiment?\n\nSet them as env vars on the action step — the experiment subprocess\ninherits the parent process's environment.\n\n```yaml\n- uses: langfuse/experiment-action@887e7936bdf64a2197aa7dcfdc8a9e4afd85e229 # v1.0.3\n  env:\n    OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}\n    ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}\n  with:\n    experiment_path: experiments/\n    # ... usual inputs\n```\n\nYour `experiment()` function can read them with `os.environ[...]` (Python)\nor `process.env.[...]` (TS / JS).\n\n### Can I pin a specific Langfuse SDK version?\n\nYes, for each runtime independently:\n\n```yaml\npython_sdk_version: \"4.1.2\"\njs_sdk_version: \"5.0.0-rc.3\"\n```\n\nIf a version is already installed and matches, the action skips the install.\n\n### Why does the action need a `github_token`?\n\nFor two things, both optional:\n\n1. Posting the experiment results comment on the PR (requires\n   `pull-requests: write` permission on the workflow).\n2. Resolving the specific job URL — the action does one API call to\n   `GET /repos/.../actions/runs/\u003cid\u003e/jobs` so the PR comment and the\n   `langfuse.github_job_url` metadata link to _this_ job's logs instead of the\n   broader workflow run.\n\nIf `github_token` is blank, both features are silently skipped; the\nexperiment still runs and succeeds/fails as usual.\n\n### Does PR commenting work on forked-PR runs?\n\nNo — GitHub restricts the default `GITHUB_TOKEN` to **read-only** on\nworkflows triggered from forks, which blocks comment creation. This is a\nplatform-level constraint, not something the action can work around\ndirectly. Two common mitigations:\n\n- Use `pull_request_target` (carefully — [read GitHub's security\n  guidance][pr-target-security] first; this runs workflows in the context\n  of the base repo with write permissions).\n- Use a separate `workflow_run`-triggered job that grabs artefacts and\n  posts the comment with elevated permissions.\n\n[pr-target-security]: https://securitylab.github.com/resources/github-actions-preventing-pwn-requests/\n\n### Why can't I see my experiment in Langfuse?\n\nThe action only renders `View in Langfuse` for dataset-backed experiments.\nTo get a `View in Langfuse` link, run against a real Langfuse dataset by passing `dataset_name` to the action.\n\n## Contributing\n\nSee [CONTRIBUTING.md](CONTRIBUTING.md).\n\n## License\n\nMIT\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flangfuse%2Fexperiment-action","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flangfuse%2Fexperiment-action","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flangfuse%2Fexperiment-action/lists"}