{"id":14041786,"url":"https://github.com/BCG-X-Official/artkit","last_synced_at":"2025-07-27T15:31:03.974Z","repository":{"id":245239074,"uuid":"812677652","full_name":"BCG-X-Official/artkit","owner":"BCG-X-Official","description":"Automated prompt-based testing and evaluation of Gen AI applications","archived":false,"fork":false,"pushed_at":"2024-10-23T16:52:28.000Z","size":31649,"stargazers_count":130,"open_issues_count":9,"forks_count":22,"subscribers_count":5,"default_branch":"1.0.x","last_synced_at":"2024-11-29T05:56:15.899Z","etag":null,"topics":["asyncio","data-science","gen-ai","genai","python","red-teaming","test-automation"],"latest_commit_sha":null,"homepage":"https://bcg-x-official.github.io/artkit/_generated/home.html","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/BCG-X-Official.png","metadata":{"files":{"readme":"README.rst","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-06-09T15:15:39.000Z","updated_at":"2024-11-26T21:52:53.000Z","dependencies_parsed_at":"2024-09-12T05:49:26.157Z","dependency_job_id":"f7b6ef68-fa40-4c17-a304-305584fee3fc","html_url":"https://github.com/BCG-X-Official/artkit","commit_stats":null,"previous_names":["bcg-x-official/artkit"],"tags_count":9,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BCG-X-Official%2Fartkit","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BCG-X-Official%2Fartkit/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BCG-X-Official%2Fartkit/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BCG-X-Official%2Fartkit/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/BCG-X-Official","download_url":"https://codeload.github.com/BCG-X-Official/artkit/tar.gz/refs/heads/1.0.x","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":227814212,"owners_count":17823861,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["asyncio","data-science","gen-ai","genai","python","red-teaming","test-automation"],"created_at":"2024-08-12T08:00:37.818Z","updated_at":"2025-07-27T15:31:03.966Z","avatar_url":"https://github.com/BCG-X-Official.png","language":"Jupyter Notebook","funding_links":[],"categories":["Open Source Security Tools","AI Red Teaming (Testing AI Targets)"],"sub_categories":[],"readme":".. image:: sphinx/source/_images/ARTKIT_Logo_Light_RGB.png\n   :alt: ARTKIT logo\n   :width: 400px\n\nAutomated Red Teaming (ART) and testing toolkit\n===============================================\n\n**ARTKIT** is a Python framework developed by `BCG X \u003chttps://www.bcg.com/x\u003e`_ for automating prompt-based\ntesting and evaluation of Gen AI applications.\n\n.. Begin-Badges\n\n|pypi| |conda| |python_versions| |code_style| |made_with_sphinx_doc| |license_badge| |github_actions_build_status| |Contributor_Convenant|\n\n.. End-Badges\n\nGetting started\n---------------\n\n- See the `ARTKIT Documentation \u003chttps://bcg-x-official.github.io/artkit/_generated/home.html\u003e`_ for our `User Guides \u003chttps://bcg-x-official.github.io/artkit/user_guide/index.html\u003e`_, `Examples \u003chttps://bcg-x-official.github.io/artkit/examples/index.html\u003e`_, `API Reference \u003chhttps://bcg-x-official.github.io/artkit/apidoc/artkit.html\u003e`_, and more.\n- See `Contributing \u003chttps://github.com/BCG-X-Official/artkit/blob/HEAD/CONTRIBUTING.md\u003e`_ or visit our `Contributor Guide \u003chttps://bcg-x-official.github.io/artkit/contributor_guide/index.html\u003e`_ for information on contributing.\n- We have an `FAQ \u003chttps://bcg-x-official.github.io/artkit/faq.html\u003e`_ for common questions. For anything else, please reach out to ARTKIT@bcg.com.\n\n.. _Introduction:\n\n\nIntroduction\n------------\n\nARTKIT is a Python framework for developing automated end-to-end testing and evaluation pipelines for Gen AI applications.\nBy leveraging flexible Gen AI models to automate key steps in the testing and evaluation process, ARTKIT pipelines are \nreadily adapted to meet the testing and evaluation needs of a wide variety of Gen AI systems.\n\n.. image:: sphinx/source/_images/artkit_pipeline_schematic.png\n   :alt: ARTKIT pipeline schematic\n\nARTKIT also supports automated `multi-turn conversations \u003chttps://bcg-x-official.github.io/artkit/user_guide/generating_challenges/multi_turn_personas.html\u003e`_\nbetween a challenger bot and a target system. Issues and vulnerabilities are more likely to arise after extended\ninteractions with Gen AI systems, so multi-turn testing is critical for interactive applications. \n\nWe recommend starting with our `User Guide \u003chttps://bcg-x-official.github.io/artkit/user_guide/index.html\u003e`_\nto learn the core concepts and functionality of ARTKIT.\nVisit our `Examples \u003chttps://bcg-x-official.github.io/artkit/examples/index.html\u003e`_ to see how\nARTKIT can be used to test and evaluate Gen AI systems for:\n\n1. Q\u0026A Accuracy:\n    - Generate a *Q\u0026A golden dataset* from ground truth documents, augment questions to simulate variation in user inputs,\n      and evaluate system responses for `faithfulness, completeness, and relevancy \u003chttps://bcg-x-official.github.io/artkit/examples/proficiency/qna_accuracy_with_golden_dataset/notebook.html\u003e`_.\n\n2. Upholding Brand Values:\n    - Implement *persona-based testing* to simulate diverse users interacting with your system and evaluate system responses for\n      `brand conformity \u003chttps://bcg-x-official.github.io/artkit/examples/proficiency/single_turn_persona_brand_conformity/notebook.html\u003e`_.\n\n3. Equitability:\n    - Run a *counterfactual experiment* by systematically modifying demographic indicators across a set of documents and statistically\n      evaluate system responses for `undesired demographic bias \u003chttps://bcg-x-official.github.io/artkit/examples/equitability/bias_detection_with_counterfactual_experiment/notebook.html\u003e`_.\n\n4. Safety:\n    - Use *adversarial prompt augmentation* to strengthen adversarial prompts drawn from a prompt library and evaluate system responses for\n      `refusal to engage with adversarial inputs \u003chttps://bcg-x-official.github.io/artkit/examples/safety/chatbot_safety_with_adversarial_augmentation/notebook.html\u003e`_ .\n\n5. Security:\n    - Use *multi-turn attackers* to execute multi-turn strategies for extracting the system prompt from a chatbot, challenging the system's \n      `defenses against prompt exfiltration \u003chttps://bcg-x-official.github.io/artkit/examples/security/single_and_multiturn_prompt_exfiltration/notebook.html#Multi-Turn-Attacks\u003e`_.\n\nThese are just a few examples of the many ways ARTKIT can be used to test and evaluate Gen AI systems for proficiency, equitability, safety, and security.\n\nKey Features\n------------\n\nThe beauty of ARTKIT is that it allows you to do a lot with a little: A few simple functions and classes support the development of fast, flexible, fit-for-purpose\npipelines for testing and evaluating your Gen AI system. Key features include:\n\n- **Simple API:** ARTKIT provides a small set of simple but powerful functions that support customized pipelines to test and evaluate virtually any Gen AI system.\n- **Asynchronous:** Leverage asynchronous processing to speed up processes that depend heavily on API calls.\n- **Caching:** Manage development costs by caching API responses to reduce the number of calls to external services.\n- **Model Agnostic:** ARTKIT supports connecting to major Gen AI model providers and allows users to develop new model classes to connect to any Gen AI service.\n- **End-to-End Pipelines:** Build end-to-end flows to generate test prompts, interact with a target system (i.e., system being tested), perform quantitative evaluations, and structure results for reporting.\n- **Multi-Turn Conversations:** Create automated interactive dialogs between a target system and an LLM persona programmed to interact with the target system in pursuit of a specific goal.\n- **Robust Data Flows:** Automatically track the flow of data through testing and evaluation pipelines, facilitating full traceability of data lineage in the results.\n- **Visualizations:** Generate flow diagrams to visualize pipeline structure and verify the flow of data through the system.\n\n\n.. note::\n\n    ARTKIT is designed to be customized by data scientists and engineers to enhance human-in-the-loop testing and evaluation. \n    We intentionally do not provide a \"push button\" solution because experience has taught us that effective testing and evaluation\n    must be tailored to each Gen AI use case. Automation is a strategy for scaling and accelerating testing and evaluation, not a \n    substitute for case-specific risk landscape mapping, domain expertise, and critical thinking.\n\n\nSupported Model Providers\n-------------------------\n\nARTKIT provides out-of-the-box support for the following model providers:\n\n- `Anthropic \u003chttps://www.anthropic.com/\u003e`_\n- `AWS Bedrock \u003chttps://aws.amazon.com/bedrock/\u003e`_\n- Google's `Gemini \u003chttps://gemini.google.com/\u003e`_ and `Vertex AI \u003chttps://cloud.google.com/vertex-ai?hl=en\u003e`_\n- `Grok \u003chttps://groq.com/\u003e`_\n- `Hugging Face \u003chttps://huggingface.co/\u003e`_\n- `Microsoft Azure \u003chttps://azure.microsoft.com/en-us/\u003e`_\n- `OpenAI \u003chttps://openai.com/\u003e`_\n\nARTKIT also supports models deployed with the following open-source servers:\n- `vLLM \u003chttps://docs.vllm.ai/en/latest/\u003e`\n- `Ollama \u003chttps://ollama.com/\u003e`\n\nTo connect to other services, users can develop `new model classes \u003chttps://bcg-x-official.github.io/artkit/user_guide/advanced_tutorials/creating_new_model_classes.html\u003e`_.\n\nInstallation\n-------------\n\nARTKIT supports both PyPI and Conda installations. We recommend installing ARTKIT in a dedicated virtual environment.\n\nPip\n^^^^\n\n**MacOS and Linux:**\n\n::\n\n    python -m venv artkit\n    source artkit/bin/activate\n    pip install artkit\n\n**Windows:**\n\n::\n    \n    python -m venv artkit\n    artkit\\Scripts\\activate.bat\n    pip install artkit\n\nConda\n^^^^^\n\n::\n\n    conda install -c conda-forge artkit\n\n\nOptional dependencies\n^^^^^^^^^^^^^^^^^^^^^\n\nTo enable visualizations of pipeline flow diagrams, install `GraphViz \u003chttps://graphviz.org/\u003e`_ and ensure it is in your system's PATH variable:\n\n- For MacOS and Linux users, instructions provided on `GraphViz Downloads \u003chttps://www.graphviz.org/download/\u003e`_ automatically add GraphViz to your path.\n- Windows users may need to manually add GraphViz to your PATH (see `Simplified Windows installation procedure \u003chttps://forum.graphviz.org/t/new-simplified-installation-procedure-on-windows/224\u003e`_).\n- Run ``dot -V`` in Terminal or Command Prompt to verify installation.\n\n\nEnvironment variables\n^^^^^^^^^^^^^^^^^^^^^\n\nMost ARTKIT users will need to access services from external model providers such as OpenAI or Hugging Face. \n\nOur recommended approach is:\n\n1. Install ``python-dotenv`` using ``pip``:\n\n::\n\n    pip install python-dotenv\n\nor ``conda``:\n\n::\n\n    conda install -c conda-forge python-dotenv\n\n2. Create a file named ``.env`` in your project root.\n3. Add ``.env`` to your ``.gitignore`` to ensure it is not committed to your Git repo.\n4. Define environment variables inside ``.env``, for example, ``API_KEY=your_api_key``\n5. In your Python scripts or notebooks, load the environmental variables with:\n\n.. code-block:: python\n\n    from dotenv import load_dotenv\n    load_dotenv()\n\n    # Verify that the environment variable is loaded\n    import os\n    os.getenv('YOUR_API_KEY')\n\nThe ARTKIT repository includes an example file called ``.env_example`` in the project root which provides a template for defining environment variables, \nincluding placeholder credentials for supported APIs.\n\nTo encourage secure storage of credentials, ARTKIT model classes do not accept API credentials directly, but instead require environmental variables to be defined.\nFor example, if your OpenAI API key is stored in an environment variable called ``OPENAI_API_KEY``, you can initialize an OpenAI model class like this:\n\n.. code-block:: python\n    \n    import artkit.api as ak\n\n    ak.OpenAIChat(\n        model_id=\"gpt-4o\", \n        api_key_env=\"OPENAI_API_KEY\"\n        )\n\nThe ``api_key_env`` variable accepts the name of the environment variable as a string instead of directly accepting an API key as a parameter,\nwhich reduces risk of accidental exposure of API keys in code repositories since the key is not stored as a Python object which can be printed. \n\nQuick Start\n-----------\n\nThe core ARTKIT functions are:\n\n1. ``run``: Execute one or more pipeline steps\n2. ``step``: A single pipeline step which produces a dictionary or an iterable of dictionaries\n3. ``chain``: A set of steps that run in sequence\n4. ``parallel``: A set of steps that run in parallel\n\nBelow, we develop a simple example pipeline with the following steps:\n\n1. Rephrase input prompts to have a specified tone, either \"polite\" or \"sarcastic\"\n2. Send rephrased prompts to a chatbot named AskChad which is programmed to mirror the user's tone\n3. Evaluate the responses according to a \"sarcasm\" metric\n\nTo begin, import ``artkit.api`` and set up a session with the OpenAI GPT-4o model. The code\nbelow assumes you have an OpenAI API key stored in an environment variable called ``OPENAI_API_KEY``\nand that you wish to cache the responses in a database called ``cache/chat_llm.db``.\n\n\n.. code-block:: python\n\n    import artkit.api as ak\n\n    # Set up a chat system with the OpenAI GPT-4o model\n    chat_llm = ak.CachedChatModel(\n        model=ak.OpenAIChat(model_id=\"gpt-4o\"),\n        database=\"cache/chat_llm.db\"\n    )\n\n\nNext, define a few functions that will be used as pipeline steps. \nARTKIT is designed to work with `asynchronous generators \u003chttps://realpython.com/lessons/asynchronous-generators-python/\u003e`_\nto allow for asynchronous processing, so the functions below are defined with ``async``, ``await``, and ``yield`` keywords.\n\n\n.. code-block:: python\n\n    # A function that rephrases input prompts to have a specified tone\n    async def rephrase_tone(prompt: str, tone: str, llm: ak.ChatModel):\n\n        response = await llm.get_response(\n            message = (\n                f\"Your job is to rephrase in input question to have a {tone} tone.\\n\"\n                f\"This is the question you must rephrase:\\n{prompt}\"\n            )\n        )\n\n        yield {\"prompt\": response[0], \"tone\": tone}\n\n\n    # A function that behaves as a chatbot named AskChad who mirrors the user's tone\n    async def ask_chad(prompt: str, llm: ak.ChatModel):\n\n        response = await llm.get_response(\n            message = (\n                \"You are AskChad, a chatbot that mirrors the user's tone. \"\n                \"For example, if the user is rude, you are rude. \"\n                \"Your responses contain no more than 10 words.\\n\"\n                f\"Respond to this user input:\\n{prompt}\"\n            )\n        )\n\n        yield {\"response\": response[0]}\n\n\n    # A function that evaluates responses according to a specified metric\n    async def evaluate_metric(response: str, metric: str, llm: ak.ChatModel):\n\n        score = await llm.get_response(\n            message = (\n                f\"Your job is to evaluate prompts according to whether they are {metric}. \"\n                f\"If the input prompt is {metric}, return 1, otherwise return 0.\\n\"\n                f\"Please evaluate the following prompt:\\n{response}\"\n            ) \n        )\n\n        yield {\"evaluation_metric\": metric, \"score\": int(score[0])}\n\n\nNext, define a pipeline which rephrases an input prompt according to two different tones (polite and sarcastic), \nsends the rephrased prompts to AskChad, and finally evaluates the responses for sarcasm.\n\n\n.. code-block:: python\n\n    pipeline = (\n        ak.chain(\n            ak.parallel(\n                ak.step(\"tone_rephraser\", rephrase_tone, tone=\"POLITE\", llm=chat_llm),\n                ak.step(\"tone_rephraser\", rephrase_tone, tone=\"SARCASTIC\", llm=chat_llm),\n            ),\n            ak.step(\"ask_chad\", ask_chad, llm=chat_llm),\n            ak.step(\"evaluation\", evaluate_metric, metric=\"SARCASTIC\", llm=chat_llm)\n        )\n    )\n\n    pipeline.draw()\n\n\n.. image:: sphinx/source/_images/quick_start_flow_diagram.png\n\n\nFinally, run the pipeline with an input prompt and display the results in a table.\n\n\n.. code-block:: python\n\n    # Input to run through the pipeline\n    prompt = {\"prompt\": \"What is a fun activity to do in Boston?\"}\n    \n    # Run pipeline\n    result = ak.run(steps=pipeline, input=prompt)\n\n    # Convert results dictionary into a multi-column dataframe\n    result.to_frame()\n\n\n.. image:: sphinx/source/_images/quick_start_results.png\n  \n\nFrom left to right, the results table shows:\n\n1. ``input``: The original input prompt\n2. ``tone_rephraser``: The rephrased prompts, which rephrase the original prompt to have the specified tone\n3. ``ask_chad``: The response from AskChad, which mirrors the tone of the user\n4. ``evaluation``: The evaluation score for the SARCASTIC metric, which flags the sarcastic response with a 1\n\nFor a complete introduction to ARTKIT, please visit our `User Guide \u003chttps://bcg-x-official.github.io/artkit/user_guide/index.html\u003e`_\nand `Examples \u003chttps://bcg-x-official.github.io/artkit/examples/index.html\u003e`_.\n\n\nContributing\n------------\n\nContributions to ARTKIT are welcome and appreciated! Please see the `Contributor Guide \u003chttps://bcg-x-official.github.io/artkit/contributor_guide/index.html\u003e`_ section for information.\n\n\nLicense\n-------\n\nThis project is licensed under Apache 2.0, allowing free use, modification, and distribution with added protections against patent litigation. \nSee the `LICENSE \u003chttps://github.com/BCG-X-Official/artkit/blob/HEAD/LICENSE\u003e`_ file for more details or visit `Apache 2.0 \u003chttps://www.apache.org/licenses/LICENSE-2.0\u003e`_.\n\n\nBCG X\n-----\n\n`BCG X \u003chttps://www.bcg.com/x\u003e`_ is the tech build and design unit of Boston Consulting Group. \n\nWe are always on the lookout for talented data scientists and software engineers to join our team! \nVisit `BCG X Careers \u003chttps://careers.bcg.com/x\u003e`_ to learn more.\n\n.. Begin-Badges\n\n.. |pypi| image:: https://badge.fury.io/py/artkit.svg\n    :target: https://pypi.org/project/artkit/\n\n.. |conda| image:: https://anaconda.org/bcg_gamma/gamma-facet/badges/version.svg\n    :target: https://anaconda.org/BCG_Gamma/artkit\n\n.. |python_versions| image:: https://img.shields.io/badge/python-3.10|3.11|3.12-blue.svg\n   :target: https://www.python.org/downloads/release/python-3100/\n\n.. |code_style| image:: https://img.shields.io/badge/code%20style-black-000000.svg\n   :target: https://github.com/psf/black\n\n.. |made_with_sphinx_doc| image:: https://img.shields.io/badge/Made%20with-Sphinx-1f425f.svg\n   :target: https://bcg-x-official.github.io/facet/index.html\n\n.. |license_badge| image:: https://img.shields.io/badge/License-Apache%202.0-olivegreen.svg\n   :target: https://opensource.org/licenses/Apache-2.0\n\n.. |github_actions_build_status| image:: https://github.com/BCG-X-Official/artkit/actions/workflows/artkit-release-pipeline.yml/badge.svg\n    :target: https://github.com/BCG-X-Official/artkit/actions/workflows/artkit-release-pipeline.yml\n    :alt: ARTKIT Release Pipeline\n\n.. |Contributor_Convenant| image:: https://img.shields.io/badge/Contributor%20Covenant-2.1-4baaaa.svg \n   :target: CODE_OF_CONDUCT.md\n\n.. End-Badges","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FBCG-X-Official%2Fartkit","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FBCG-X-Official%2Fartkit","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FBCG-X-Official%2Fartkit/lists"}