{"id":38171774,"url":"https://github.com/code-sensei/artemiskit","last_synced_at":"2026-03-05T01:11:41.545Z","repository":{"id":332024220,"uuid":"1132489385","full_name":"code-sensei/artemiskit","owner":"code-sensei","description":"Agent Reliability Toolkit for LLMs - Test, evaluate, stress-test, and red-team your AI applications with scenario-based testing, multiple evaluators, and multi-provider support.","archived":false,"fork":false,"pushed_at":"2026-02-19T15:55:16.000Z","size":3663,"stargazers_count":4,"open_issues_count":15,"forks_count":1,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-02-27T01:49:31.820Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"https://artemiskit.vercel.app","language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/code-sensei.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":"ROADMAP.md","authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-01-12T03:05:09.000Z","updated_at":"2026-02-09T23:24:10.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/code-sensei/artemiskit","commit_stats":null,"previous_names":["code-sensei/artemiskit"],"tags_count":65,"template":false,"template_full_name":null,"purl":"pkg:github/code-sensei/artemiskit","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/code-sensei%2Fartemiskit","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/code-sensei%2Fartemiskit/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/code-sensei%2Fartemiskit/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/code-sensei%2Fartemiskit/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/code-sensei","download_url":"https://codeload.github.com/code-sensei/artemiskit/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/code-sensei%2Fartemiskit/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":30104221,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-05T01:06:53.091Z","status":"ssl_error","status_checked_at":"2026-03-05T01:02:35.679Z","response_time":59,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-01-16T23:27:46.463Z","updated_at":"2026-03-05T01:11:41.500Z","avatar_url":"https://github.com/code-sensei.png","language":"TypeScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# ArtemisKit\n\n ![Artemiskit logo](https://artemiskit.vercel.app/artemiskit-logo.png)\n\n**Open-source LLM evaluation toolkit** - Test, evaluate, stress-test, and red-team your AI applications with scenario-based testing and multi-provider support.\n\n[![License](https://img.shields.io/badge/license-Apache--2.0-blue.svg)](LICENSE)\n[![npm](https://img.shields.io/npm/v/@artemiskit/cli.svg)](https://www.npmjs.com/package/@artemiskit/cli)\n[![Documentation](https://img.shields.io/badge/docs-artemiskit.vercel.app-blue)](https://artemiskit.vercel.app)\n\n📚 **[Documentation](https://artemiskit.vercel.app)** | 🚀 **[Getting Started](https://artemiskit.vercel.app/docs/cli/getting-started/)**\n\n## Features\n\n- **Scenario-Based Testing** - Define test cases in YAML with multi-turn conversation support\n- **Security Red Teaming** - Automatically test for prompt injection, jailbreaks, and data extraction\n- **Stress Testing** - Measure latency, throughput, and reliability under load\n- **Multi-Provider Support** - OpenAI, Azure OpenAI, Vercel AI SDK (20+ providers)\n- **Rich Reports** - Interactive HTML reports with configuration traceability\n- **CI/CD Ready** - Exit codes and JSON output for automation\n\n## Installation\n\n```bash\nnpm install -g @artemiskit/cli\n# or\npnpm add -g @artemiskit/cli\n# or\nbun add -g @artemiskit/cli\n```\n\n## Quick Start (Basic Example)\n\nThis is the simplest way to get started with ArtemisKit.\n\n### 1. Set up your API key\n\n```bash\nexport OPENAI_API_KEY=\"your-api-key\"\n```\n\n### 2. Create a simple scenario\n\n```yaml\n# scenarios/hello.yaml\nname: hello-test\ndescription: My first ArtemisKit test\n\ncases:\n  - id: greeting-test\n    prompt: \"Say hello\"\n    expected:\n      type: contains\n      values:\n        - \"hello\"\n      mode: any\n```\n\n### 3. Run it\n\n```bash\nartemiskit run scenarios/hello.yaml\n# or use the short alias\nakit run scenarios/hello.yaml\n```\n\nThat's it! ArtemisKit will use OpenAI by default. See below for full configuration options.\n\n---\n\n## Configuration\n\n### Config File (Full Reference)\n\nCreate `artemis.config.yaml` in your project root. Here's every available option:\n\n```yaml\n# artemis.config.yaml - Full Reference\n# =====================================\n\n# Project identifier (used in run storage and reports)\nproject: my-project\n\n# Default provider to use when not specified in scenario or CLI\n# Options: openai, azure-openai, vercel-ai\nprovider: openai\n\n# Default model to use\n# NOTE: For azure-openai, this is DISPLAY ONLY - the actual model\n# is determined by your Azure deployment, not this value.\n# See docs/providers/azure-openai.md for details.\nmodel: gpt-4o\n\n# Directory containing scenario files\nscenariosDir: ./scenarios\n\n# Provider-specific configuration\nproviders:\n  openai:\n    # API key (can use environment variable reference)\n    apiKey: ${OPENAI_API_KEY}\n    \n  azure-openai:\n    # API key for Azure OpenAI\n    apiKey: ${AZURE_OPENAI_API_KEY}\n    # Your Azure resource name (the subdomain in your endpoint URL)\n    resourceName: ${AZURE_OPENAI_RESOURCE_NAME}\n    # The deployment name you created in Azure Portal\n    deploymentName: ${AZURE_OPENAI_DEPLOYMENT_NAME}\n    # API version (optional, has sensible default)\n    apiVersion: \"2024-02-15-preview\"\n\n  vercel-ai:\n    # Underlying provider for Vercel AI SDK\n    underlyingProvider: openai\n    apiKey: ${OPENAI_API_KEY}\n\n# Storage configuration for run history\nstorage:\n  # Storage type: \"local\" or \"supabase\"\n  type: local\n  # Base path for local storage (relative to project root)\n  basePath: ./artemis-runs\n\n# Output configuration for reports\noutput:\n  # Output format: \"json\", \"html\", or \"both\"\n  format: html\n  # Directory for generated reports\n  dir: ./artemis-output\n\n# CI-specific settings (optional)\nci:\n  # Fail if regression exceeds threshold\n  failOnRegression: true\n  # Regression threshold (0-1)\n  regressionThreshold: 0.05\n```\n\n### Minimal Config File\n\nIf you just want to set defaults, a minimal config works too:\n\n```yaml\n# artemis.config.yaml - Minimal\nproject: my-project\nprovider: openai\nmodel: gpt-4o\n```\n\n---\n\n## Scenario Format\n\n### Basic Scenario (Simple Prompts)\n\n```yaml\n# scenarios/basic.yaml\nname: basic-test\ndescription: Simple prompt-response tests\n\n# Optional: Override provider/model for this scenario\nprovider: openai\nmodel: gpt-4o\n\ncases:\n  - id: greeting\n    prompt: \"Say hello\"\n    expected:\n      type: contains\n      values:\n        - \"hello\"\n      mode: any\n```\n\n### Full Scenario Reference\n\nHere's every available option for scenarios:\n\n```yaml\n# scenarios/full-reference.yaml - Complete Example\n# =================================================\n\n# Required: Unique name for this scenario\nname: customer-support-eval\n\n# Optional: Human-readable description\ndescription: Evaluate customer support bot responses\n\n# Optional: Scenario version\nversion: \"1.0\"\n\n# Optional: Tags for filtering (use --tags flag)\ntags:\n  - support\n  - production\n\n# Optional: Provider override (defaults to config file, then \"openai\")\n# Options: openai, azure-openai, vercel-ai\nprovider: openai\n\n# Optional: Model override\n# NOTE: For azure-openai, this is DISPLAY ONLY - actual model\n# is determined by your Azure deployment. See docs/providers/azure-openai.md\nmodel: gpt-4o\n\n# Optional: Model parameters\ntemperature: 0.7\nmaxTokens: 1024\nseed: 42\n\n# Optional: System prompt prepended to all cases\nsetup:\n  systemPrompt: |\n    You are a helpful customer support assistant.\n    Always be polite and professional.\n\n# Optional: Scenario-level variables (available to all cases)\n# Case-level variables override these. Use {{var_name}} syntax.\nvariables:\n  company_name: \"Acme Corp\"\n  default_greeting: \"Hello\"\n\n# Required: Test cases to run\ncases:\n  # ---- Simple prompt/response case ----\n  - id: simple-greeting\n    name: Simple greeting test\n    description: Test basic greeting response\n    # The prompt to send to the model\n    prompt: \"Hello, I need help\"\n    # Expected result validation\n    expected:\n      type: contains\n      values:\n        - \"help\"\n        - \"assist\"\n      mode: any\n    # Optional: Tags for this case\n    tags:\n      - basic\n\n  # ---- Case with regex matching ----\n  - id: order-number-check\n    name: Order number extraction\n    prompt: \"My order number is #12345\"\n    expected:\n      type: regex\n      pattern: \"12345\"\n      flags: \"i\"\n\n  # ---- Case with exact match ----\n  - id: yes-no-response\n    name: Binary response test\n    prompt: \"Reply with only 'Yes' or 'No': Is the sky blue?\"\n    expected:\n      type: exact\n      value: \"Yes\"\n      caseSensitive: false\n\n  # ---- Case with fuzzy matching ----\n  - id: fuzzy-match-test\n    name: Fuzzy similarity test\n    prompt: \"What color is grass?\"\n    expected:\n      type: fuzzy\n      value: \"green\"\n      threshold: 0.8\n\n  # ---- Case with LLM grading ----\n  - id: quality-check\n    name: Response quality evaluation\n    prompt: \"Explain quantum computing in simple terms\"\n    expected:\n      type: llm_grader\n      rubric: |\n        Score 1.0 if the explanation is clear and accurate.\n        Score 0.5 if partially correct but confusing.\n        Score 0.0 if incorrect or overly technical.\n      threshold: 0.7\n\n  # ---- Case with JSON schema validation ----\n  - id: json-output-test\n    name: Structured output test\n    prompt: \"Return a JSON object with name and age fields\"\n    expected:\n      type: json_schema\n      schema:\n        type: object\n        properties:\n          name:\n            type: string\n          age:\n            type: number\n        required:\n          - name\n          - age\n\n  # ---- Multi-turn conversation ----\n  - id: multi-turn-support\n    name: Multi-turn conversation\n    # Use array of messages for multi-turn\n    prompt:\n      - role: user\n        content: \"I have a problem with my order\"\n      - role: assistant\n        content: \"I'd be happy to help. What's your order number?\"\n      - role: user\n        content: \"Order number is #99999\"\n    expected:\n      type: contains\n      values:\n        - \"99999\"\n      mode: any\n\n  # ---- Case with variables ----\n  - id: dynamic-content\n    name: Variable substitution test\n    # Case-level variables override scenario-level\n    variables:\n      product_name: \"Widget Pro\"\n      order_id: \"ORD-789\"\n    prompt: \"What's the status of my {{product_name}} order {{order_id}}?\"\n    expected:\n      type: contains\n      values:\n        - \"ORD-789\"\n      mode: any\n\n  # ---- Case with timeout and retries ----\n  - id: slow-response-test\n    name: Timeout handling test\n    prompt: \"Generate a detailed report\"\n    expected:\n      type: contains\n      values:\n        - \"report\"\n      mode: any\n    timeout: 30000\n    retries: 2\n```\n\n### Variables\n\nVariables let you create dynamic, reusable scenarios. Use `{{variable_name}}` syntax in prompts.\n\n```yaml\nname: customer-support\ndescription: Test with dynamic content\n\n# Scenario-level variables - available to all cases\nvariables:\n  company_name: \"Acme Corp\"\n  support_email: \"support@acme.com\"\n\ncases:\n  # Uses scenario-level variables\n  - id: contact-info\n    prompt: \"What is the email for {{company_name}}?\"\n    expected:\n      type: contains\n      values:\n        - \"support@acme.com\"\n      mode: any\n\n  # Case-level variables override scenario-level\n  - id: different-company\n    variables:\n      company_name: \"TechCorp\"  # Overrides \"Acme Corp\"\n      product: \"Widget\"\n    prompt: \"Tell me about {{product}} from {{company_name}}\"\n    expected:\n      type: contains\n      values:\n        - \"TechCorp\"\n      mode: any\n```\n\nVariable precedence: **case-level \u003e scenario-level**\n\n### Expectation Types\n\n| Type | Description | Key Fields |\n|------|-------------|------------|\n| `contains` | Response contains string(s) | `values: [...]`, `mode: all\\|any` |\n| `exact` | Response exactly equals value | `value: \"...\"`, `caseSensitive: bool` |\n| `regex` | Response matches regex pattern | `pattern: \"...\"`, `flags: \"i\"` |\n| `fuzzy` | Fuzzy string similarity | `value: \"...\"`, `threshold: 0.8` |\n| `llm_grader` | LLM-based evaluation | `rubric: \"...\"`, `threshold: 0.7` |\n| `json_schema` | Validate JSON structure | `schema: {...}` |\n\n---\n\n## CLI Commands\n\n| Command | Description |\n|---------|-------------|\n| `artemiskit run \u003cscenario\u003e` | Run scenario-based evaluations |\n| `artemiskit redteam \u003cscenario\u003e` | Run security red team tests |\n| `artemiskit stress \u003cscenario\u003e` | Run load/stress tests |\n| `artemiskit report \u003crun-id\u003e` | Regenerate report from saved run |\n| `artemiskit history` | View run history |\n| `artemiskit compare \u003cid1\u003e \u003cid2\u003e` | Compare two runs |\n| `artemiskit init` | Initialize configuration |\n\nUse `akit` as a shorter alias for `artemiskit`.\n\n### Run Command Options\n\n```bash\nartemiskit run \u003cscenario\u003e [options]\n\nOptions:\n  -p, --provider \u003cprovider\u003e   Provider: openai, azure-openai, vercel-ai\n  -m, --model \u003cmodel\u003e         Model to use\n  -o, --output \u003cdir\u003e          Output directory for results\n  -v, --verbose               Verbose output\n  -t, --tags \u003ctags...\u003e        Filter test cases by tags\n  -c, --concurrency \u003cn\u003e       Number of concurrent test cases (default: 1)\n  --timeout \u003cms\u003e              Timeout per test case in milliseconds\n  --retries \u003cn\u003e               Number of retries per test case\n  --config \u003cpath\u003e             Path to config file\n  --save                      Save results to storage (default: true)\n```\n\n---\n\n## Providers\n\nArtemisKit supports multiple LLM providers. See the [provider documentation](docs/providers/) for detailed setup guides.\n\n| Provider | Use Case | Docs |\n|----------|----------|------|\n| `openai` | Direct OpenAI API | [docs/providers/openai.md](docs/providers/openai.md) |\n| `azure-openai` | Azure OpenAI Service | [docs/providers/azure-openai.md](docs/providers/azure-openai.md) |\n| `vercel-ai` | 20+ providers via Vercel AI SDK | [docs/providers/vercel-ai.md](docs/providers/vercel-ai.md) |\n\n### Quick Setup\n\n**OpenAI:**\n```bash\nexport OPENAI_API_KEY=\"sk-...\"\nakit run scenario.yaml --provider openai --model gpt-4o\n```\n\n**Azure OpenAI:**\n```bash\nexport AZURE_OPENAI_API_KEY=\"...\"\nexport AZURE_OPENAI_RESOURCE_NAME=\"my-resource\"\nexport AZURE_OPENAI_DEPLOYMENT_NAME=\"gpt-4o-deployment\"\nakit run scenario.yaml --provider azure-openai --model gpt-4o\n# Note: --model is for display only; actual model is your deployment\n```\n\n**Vercel AI (any provider):**\n```bash\nexport ANTHROPIC_API_KEY=\"sk-ant-...\"\nakit run scenario.yaml --provider vercel-ai --model anthropic:claude-3-5-sonnet-20241022\n```\n\n---\n\n## Security Testing (Red Team)\n\nTest your LLM for vulnerabilities:\n\n```bash\nakit redteam scenarios/my-bot.yaml --mutations typo,role-spoof,cot-injection\n```\n\n### Available Mutations\n\n| Mutation | Description |\n|----------|-------------|\n| `typo` | Introduce typos to bypass filters |\n| `role-spoof` | Attempt role/identity spoofing |\n| `instruction-flip` | Reverse or negate instructions |\n| `cot-injection` | Chain-of-thought injection attacks |\n\n---\n\n## Packages\n\nArtemisKit is a monorepo with the following packages:\n\n| Package | Description |\n|---------|-------------|\n| `@artemiskit/cli` | Command-line interface |\n| `@artemiskit/core` | Core runner, types, and storage (internal) |\n| `@artemiskit/sdk` | Programmatic SDK for TypeScript/JavaScript (coming soon) |\n| `@artemiskit/reports` | HTML and JSON report generation |\n| `@artemiskit/redteam` | Red team mutation strategies |\n| `@artemiskit/adapter-openai` | OpenAI/Azure provider adapter |\n| `@artemiskit/adapter-vercel-ai` | Vercel AI SDK adapter |\n| `@artemiskit/adapter-anthropic` | Anthropic provider adapter |\n\n---\n\n## Development\n\n```bash\n# Clone the repository\ngit clone https://github.com/artemiskit/artemiskit.git\ncd artemiskit\n\n# Install dependencies\nbun install\n\n# Build all packages\nbun run build\n\n# Run tests\nbun test\n\n# Type check\nbun run typecheck\n\n# Lint\nbun run lint\n```\n\n## Roadmap\n\nSee [ROADMAP.md](ROADMAP.md) for the full development roadmap.\n\n## Contributing\n\nContributions are welcome! Please read [CONTRIBUTING.md](CONTRIBUTING.md) before submitting a pull request.\n\n## License\n\nApache-2.0 - See [LICENSE](LICENSE) for details.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcode-sensei%2Fartemiskit","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcode-sensei%2Fartemiskit","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcode-sensei%2Fartemiskit/lists"}