{"id":50125558,"url":"https://github.com/evaliphy/evaliphy","last_synced_at":"2026-06-09T11:00:41.531Z","repository":{"id":348892297,"uuid":"1200267062","full_name":"Evaliphy/evaliphy","owner":"Evaliphy","description":"The E2E AI testing tool | No ML Overhead","archived":false,"fork":false,"pushed_at":"2026-05-05T10:02:59.000Z","size":34749,"stargazers_count":16,"open_issues_count":16,"forks_count":9,"subscribers_count":1,"default_branch":"main","last_synced_at":"2026-05-05T11:32:40.194Z","etag":null,"topics":["ai","ai-test-automation","ai-testing","ai-testing-tool","end-to-end-testing","llm-evaluation","llm-evaluation-framework","llm-evaluation-toolkit","llm-testing","rag","rag-evaluation","rag-pipeline","test-automation","test-automation-framework","testing-tools"],"latest_commit_sha":null,"homepage":"https://evaliphy.com","language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Evaliphy.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-04-03T08:04:27.000Z","updated_at":"2026-05-05T10:03:04.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/Evaliphy/evaliphy","commit_stats":null,"previous_names":["priyanshus/evaliphy","evaliphy/evaliphy"],"tags_count":12,"template":false,"template_full_name":null,"purl":"pkg:github/Evaliphy/evaliphy","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Evaliphy%2Fevaliphy","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Evaliphy%2Fevaliphy/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Evaliphy%2Fevaliphy/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Evaliphy%2Fevaliphy/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Evaliphy","download_url":"https://codeload.github.com/Evaliphy/evaliphy/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Evaliphy%2Fevaliphy/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34103357,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-09T02:00:06.510Z","response_time":63,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","ai-test-automation","ai-testing","ai-testing-tool","end-to-end-testing","llm-evaluation","llm-evaluation-framework","llm-evaluation-toolkit","llm-testing","rag","rag-evaluation","rag-pipeline","test-automation","test-automation-framework","testing-tools"],"created_at":"2026-05-23T20:00:20.718Z","updated_at":"2026-06-09T11:00:41.525Z","avatar_url":"https://github.com/Evaliphy.png","language":"TypeScript","funding_links":[],"categories":["Software"],"sub_categories":["AI \u0026 LLM Testing"],"readme":"# Evaliphy\n\u003cdiv align=\"center\"\u003e\n  \u003cimg src=\"./docs/banner.png\" alt=\"Evaliphy\" width=\"800\"\u003e\n  \u003cbr\u003e\u003cbr\u003e\n\u003c/div\u003e\n\u003cp align=\"center\"\u003e\n  \u003cstrong\u003eE2E AI system testing tool\u003c/strong\u003e\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"https://www.npmjs.com/package/evaliphy\"\u003e\u003cimg src=\"https://img.shields.io/npm/v/evaliphy/beta.svg\" alt=\"npm version\" /\u003e\u003c/a\u003e\n  \u003ca href=\"https://opensource.org/licenses/MIT\"\u003e\u003cimg src=\"https://img.shields.io/badge/License-MIT-yellow.svg\" alt=\"License: MIT\" /\u003e\u003c/a\u003e\n  \u003ca href=\"https://evaliphy.com\"\u003e\u003cimg src=\"https://img.shields.io/badge/docs-latest-blue.svg\" alt=\"Documentation\" /\u003e\u003c/a\u003e\n  \u003ca href=\"https://nodejs.org\"\u003e\u003cimg src=\"https://img.shields.io/badge/node-%3E%3D24.0.0-brightgreen.svg\" alt=\"Node.js version\" /\u003e\u003c/a\u003e\n\u003c/p\u003e\n\n\u003cdiv align=\"center\"\u003e\n\u003cbr\u003e\n  \u003ca href=\"#quick-start\"\u003eQuick start\u003c/a\u003e · \u003ca href=\"#assertions\"\u003eAssertions\u003c/a\u003e · \u003ca href=\"#supported-llm-providers\"\u003eLLM Providers\u003c/a\u003e · \u003ca href=\"#ci-integration\"\u003eCI Integration\u003c/a\u003e · \u003ca href=\"#project-structure\"\u003eProject Structure\u003c/a\u003e \n\u003c/div\u003e\n\n---\n\u003e ⭐️ Star to stay updated. [Contributions welcome!](#contributing)\n---\n\nEvaliphy is an AI system tool that treats your AI system as a black box. Write assertions against your real API, get structured results, and catch regressions in CI — without touching your pipeline internals or writing prompt engineering from scratch.\n\nBuilt-in LLM-as-Judge assertions handle the hard parts. You focus on writing evaluations, not wiring up models.\n\n![Evaliphy Demo](./docs/gif/demo.gif)\n\n---\n\n## Prerequisites\n\n- Node JS 24.0.0 or higher\n- An OpenAI API key or any OpenAI-compatible provider\n- A running AI application with an HTTP endpoint\n\n---\n\n## Quick start\n\n### 1. Install and initialise\n\n```bash\nnpm install -g @evaliphy/sdk\nevaliphy init my-eval-project\ncd my-eval-project\nnpm install\n```\n\n### 2. Set your environment variables\n\n```bash\ncp .env.example .env\n```\n\nAdd your API key to `.env`:\n\n```\nOPENAI_API_KEY=your-api-key-here\n```\n\n### 3. Configure Evaliphy\n\nOpen `evaliphy.config.ts` and point it at your AI application:\n\n```typescript\nimport { defineConfig } from \"@evaliphy/sdk\";\n\nexport default defineConfig({\n  http: {\n    baseUrl: \"https://api.your-service.com\",\n    timeout: 10_000,\n    headers: {\n      Authorization: `Bearer ${process.env.API_KEY}`,\n    },\n  },\n  llmAsJudgeConfig: {\n    model: \"gpt-4o-mini\",\n    provider: {\n      type: \"openai\",\n      apiKey: process.env.OPENAI_API_KEY,\n    },\n  },\n  reporters: [\"console\", \"html\"],\n});\n```\n\n### 4. Write your first evaluation\n\nCreate `evals/chat.eval.ts`:\n\n```typescript\nimport { evaluate, expect } from \"@evaliphy/sdk\";\n\nconst sample = {\n  query: \"What is the return policy?\",\n  expectedContext: \"Items can be returned within 30 days.\"\n};\n\nevaluate(\"Return Policy Chat\", async ({ httpClient }) =\u003e {\n  // 1. Hit your RAG endpoint\n  const res = await httpClient.post('/api/chat', { message: sample.query });\n  const data = await res.json();\n\n  // 2. Assert in plain English\n  await expect({\n    query: sample.query,\n    context: sample.expectedContext,\n    response: data.answer\n  }).toBeFaithful();\n\n  // Or use positional arguments for simplicity\n  await expect(sample.query, sample.expectedContext, data.answer).toBeRelevant({ threshold: 0.7 });\n});\n```\n\n### 5. Run your evaluations\n\n```bash\nevaliphy eval\n```\n\n---\n\n## Assertions\n\n### LLM assertions\n\nScored 0.0 to 1.0 by a configurable judge model. Pass if the score meets or exceeds the threshold.\n\n| Assertion        | What it checks                                |\n| ---------------- | --------------------------------------------- |\n| `toBeFaithful()` | Response is grounded in the retrieved context |\n| `toBeRelevant()` | Response addresses the query                  |\n| `toBeGrounded()` | Claims are supported by source documents      |\n| `toBeCoherent()` | Response is logically consistent              |\n| `toBeHarmless()` | Response contains no harmful or toxic content |\n\nAll LLM assertions accept an optional config object:\n\n```typescript\nawait expect({ query, response, context }).toBeFaithful({\n  threshold: 0.9, // override global threshold for this assertion\n});\n```\n\n### Deterministic assertions\n\nComing in v1. Fast, free, no LLM call required.\n\n---\n\n## Configuration reference\n\n| Field                         | Type   | Default       | Description                     |\n| ----------------------------- | ------ | ------------- | ------------------------------- |\n| `http.baseUrl`                | string | —             | Base URL of your AI application |\n| `http.timeout`                | number | `10000`       | Request timeout in ms           |\n| `http.headers`                | object | `{}`          | Headers sent with every request |\n| `llmAsJudgeConfig.model`      | string | `gpt-4o-mini` | Judge model                     |\n| `llmAsJudgeConfig.threshold`  | number | `0.7`         | Global pass threshold           |\n| `llmAsJudgeConfig.promptsDir` | string | —             | Path to custom prompt directory |\n| `reporters`                   | array  | `['console']` | Output formats                  |\n\n---\n\n## Supported LLM Providers\n\nEvaliphy uses the [Vercel AI SDK](https://sdk.vercel.ai) under the hood, which means it supports a wide range of LLM providers out of the box. Configure your provider once in `evaliphy.config.ts` and Evaliphy handles the rest.\n\n| Provider | Type key | Required field |\n|---|---|---|\n| OpenAI | `openai` | `apiKey` |\n| Anthropic | `anthropic` | `apiKey` |\n| Azure OpenAI | `azure` | `apiKey`, `resourceName` |\n| Google Gemini | `google` | `apiKey` |\n| Mistral | `mistral` | `apiKey` |\n| OpenAI-compatible gateway | `gateway` | `apiKey`, `url` |\n\n### OpenAI\n\n```typescript\nllmAsJudgeConfig: {\n  model: 'gpt-4o-mini',\n  provider: {\n    type: 'openai',\n    apiKey: process.env.OPENAI_API_KEY,\n  }\n}\n```\n\n### Anthropic\n\n```typescript\nllmAsJudgeConfig: {\n  model: 'claude-3-5-haiku-20241022',\n  provider: {\n    type: 'anthropic',\n    apiKey: process.env.ANTHROPIC_API_KEY,\n  }\n}\n```\n\n### OpenAI-compatible gateway (OpenRouter, LiteLLM, etc.)\n\n```typescript\nllmAsJudgeConfig: {\n  model: 'gpt-4o-mini',\n  provider: {\n    type: 'gateway',\n    url: 'https://openrouter.ai/api/v1',\n    apiKey: process.env.OPENROUTER_API_KEY,\n  }\n}\n```\n\n### Azure OpenAI\n\n```typescript\nllmAsJudgeConfig: {\n  model: 'gpt-4o-mini',\n  provider: {\n    type: 'azure',\n    resourceName: process.env.AZURE_RESOURCE_NAME,\n    apiKey: process.env.AZURE_API_KEY,\n  }\n}\n```\n\nAny provider supported by the Vercel AI SDK can be used with Evaliphy. See the [Vercel AI SDK provider documentation](https://sdk.vercel.ai/providers/ai-sdk-providers) for the full list.\n\n---\n\n## Custom prompts\n\nEvaliphy ships with built-in prompts for every assertion. Override any of them by creating a markdown file in your prompts directory and pointing `promptsDir` at it.\n\n```\nmy-eval-project/\n  prompts/\n    faithfulness.md    ← overrides built-in faithfulness prompt\n```\n\n```typescript\nllmAsJudgeConfig: {\n  promptsDir: \"./prompts\";\n}\n```\n\nEach prompt file uses frontmatter to declare its input variables:\n\n```markdown\n---\nname: faithfulness\ninput_variables:\n  - question\n  - context\n  - response\n---\n\nYou are evaluating a RAG system for a UK e-commerce company.\nFaithfulness means every claim traces back to the retrieved context.\n\n## Question\n\n{{question}}\n\n## Context\n\n{{context}}\n\n## Response\n\n{{response}}\n```\n\nSee the [custom prompts guide](https://evaliphy.com/docs/llm-as-judge#using-custom-prompts) for full documentation.\n\n---\n\n## CI integration\n\nEvaliphy exits with a non-zero code when any assertion fails, making it compatible with any CI pipeline.\n\n### GitHub Actions\n\n```yaml\nname: Evaliphy\n\non: [push, pull_request]\n\njobs:\n  eval:\n    runs-on: ubuntu-latest\n    steps:\n      - uses: actions/checkout@v4\n      - uses: actions/setup-node@v4\n        with:\n          node-version: 20\n\n      - run: npm ci\n      - run: evaliphy eval\n        env:\n          OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}\n          API_KEY: ${{ secrets.API_KEY }}\n```\n\n---\n\n## Reporters\n\n| Reporter  | Output       | Description                                   |\n| --------- | ------------ | --------------------------------------------- |\n| `console` | Terminal     | Streams results as tests run                  |\n| `json`    | `.json` file | Machine-readable, good for CI pipelines       |\n| `html`    | `.html` file | Self-contained visual report                  |\n| `csv`     | `.csv` file  | Coming Soon                       |\n| `xlsx`    | `.xlsx` file | Coming Soon |\n\nConfigure in `evaliphy.config.ts`:\n\n---\n\n## How it works\n\n1. Your eval file makes an HTTP call to your real running API\n2. The response and context are passed to the assertion\n3. The assertion sends a rendered prompt to the judge model\n4. The judge scores the response 0.0 to 1.0\n5. The score is compared against the threshold — pass or fail\n6. Results are written to all configured reporters\n\n---\n\n## Why Evaliphy\n\n**It fits where your tests already live.** Eval files are TypeScript files that sit in your repo alongside your other tests. No Python notebooks, no complex setup, no new workflow to learn.\n\n**You test your real API.** Evaliphy makes HTTP calls to your actual running service — not a mocked response or an offline dataset. If your AI system breaks in production, Evaliphy catches it.\n\n**The judges are built in.** Faithfulness, relevance, groundedness — the assertions that matter are shipped with the framework. No prompt writing or LLM wiring required.\n\n**Configurable when you need it.** Sensible defaults out of the box. Override the judge model globally, per file, or per assertion. Bring your own prompts for domain-specific evaluation.\n\n---\n\n## Project structure\n\nAfter running `evaliphy init`, your project looks like this:\n\n```\nmy-eval-project/\n  evals/\n    example.eval.ts       — sample evaluation to get you started\n  prompts/                — optional custom prompt overrides\n  evaliphy.config.ts      — main configuration file\n  .env.example            — environment variable template\n  package.json\n  tsconfig.json\n```\n\n---\n\n## Beta\n\nEvaliphy is in open beta. The API may change between versions. We are looking for feedback from engineers and teams building AI applications.\n\n- Free for commercial use during beta\n- Influence the v1.0 roadmap directly\n- Contribute to the growing assertion library\n\n[Submit feedback](https://forms.gle/9ztrqUCXUg2YGSJJA)\n\n---\n\n## Contributing\n\nContributions are welcome. Please read the [contributing guide](./CONTRIBUTING.md) before opening a pull request.\n\n---\n\n## Built by the community\n\n\u003ca href=\"https://github.com/Evaliphy/evaliphy/graphs/contributors\"\u003e\n  \u003cimg src=\"https://contrib.rocks/image?repo=Evaliphy/evaliphy\" /\u003e\n\u003c/a\u003e\n\n---\n\n## License\n\nMIT © [Evaliphy](https://github.com/evaliphy/evaliphy)\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fevaliphy%2Fevaliphy","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fevaliphy%2Fevaliphy","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fevaliphy%2Fevaliphy/lists"}