{"id":15037013,"url":"https://github.com/giskard-ai/giskard","last_synced_at":"2025-05-14T22:04:15.815Z","repository":{"id":37084338,"uuid":"466864356","full_name":"Giskard-AI/giskard","owner":"Giskard-AI","description":"🐢 Open-Source Evaluation \u0026 Testing for AI \u0026 LLM systems","archived":false,"fork":false,"pushed_at":"2025-04-22T05:49:17.000Z","size":183592,"stargazers_count":4486,"open_issues_count":26,"forks_count":318,"subscribers_count":35,"default_branch":"main","last_synced_at":"2025-04-22T20:11:21.568Z","etag":null,"topics":["agent-evaluation","ai-red-team","ai-security","ai-testing","fairness-ai","llm","llm-eval","llm-evaluation","llm-security","llmops","ml-testing","ml-validation","mlops","rag-evaluation","red-team-tools","responsible-ai","trustworthy-ai"],"latest_commit_sha":null,"homepage":"https://docs.giskard.ai","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Giskard-AI.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":".github/CODEOWNERS","security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null},"funding":{"github":"Giskard-AI"}},"created_at":"2022-03-06T21:45:37.000Z","updated_at":"2025-04-22T15:25:29.000Z","dependencies_parsed_at":"2024-03-10T14:29:56.013Z","dependency_job_id":"7003b936-6273-498a-8fbe-35c53a70e427","html_url":"https://github.com/Giskard-AI/giskard","commit_stats":null,"previous_names":[],"tags_count":93,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Giskard-AI%2Fgiskard","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Giskard-AI%2Fgiskard/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Giskard-AI%2Fgiskard/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Giskard-AI%2Fgiskard/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Giskard-AI","download_url":"https://codeload.github.com/Giskard-AI/giskard/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252841433,"owners_count":21812481,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["agent-evaluation","ai-red-team","ai-security","ai-testing","fairness-ai","llm","llm-eval","llm-evaluation","llm-security","llmops","ml-testing","ml-validation","mlops","rag-evaluation","red-team-tools","responsible-ai","trustworthy-ai"],"created_at":"2024-09-24T20:33:02.468Z","updated_at":"2025-05-07T08:26:53.146Z","avatar_url":"https://github.com/Giskard-AI.png","language":"Python","funding_links":["https://github.com/sponsors/Giskard-AI"],"categories":["🛠️ General ML Testing Frameworks"],"sub_categories":[],"readme":"\u003cp align=\"center\"\u003e\n  \u003cimg alt=\"giskardlogo\" src=\"https://raw.githubusercontent.com/giskard-ai/giskard/main/readme/giskard_logo.png#gh-light-mode-only\"\u003e\n  \u003cimg alt=\"giskardlogo\" src=\"https://raw.githubusercontent.com/giskard-ai/giskard/main/readme/giskard_logo_green.png#gh-dark-mode-only\"\u003e\n\u003c/p\u003e\n\u003ch1 align=\"center\" weight='300' \u003eThe Evaluation \u0026 Testing framework for AI systems\u003c/h1\u003e\n\u003ch3 align=\"center\" weight='300' \u003eControl risks of performance, bias and security issues in AI systems\u003c/h3\u003e\n\u003cdiv align=\"center\"\u003e\n\n  [![GitHub release](https://img.shields.io/github/v/release/Giskard-AI/giskard)](https://github.com/Giskard-AI/giskard/releases)\n  [![License](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://github.com/Giskard-AI/giskard/blob/main/LICENSE)\n  [![Downloads](https://static.pepy.tech/badge/giskard/month)](https://pepy.tech/project/giskard)\n  [![CI](https://github.com/Giskard-AI/giskard/actions/workflows/build-python.yml/badge.svg?branch=main)](https://github.com/Giskard-AI/giskard/actions/workflows/build-python.yml?query=branch%3Amain)\n  [![Giskard on Discord](https://img.shields.io/discord/939190303397666868?label=Discord)](https://gisk.ar/discord)\n\n  \u003ca rel=\"me\" href=\"https://fosstodon.org/@Giskard\"\u003e\u003c/a\u003e\n\n\u003c/div\u003e\n\u003ch3 align=\"center\"\u003e\n   \u003ca href=\"https://docs.giskard.ai/en/stable/getting_started/index.html\"\u003e\u003cb\u003eDocs\u003c/b\u003e\u003c/a\u003e \u0026bull;\n  \u003ca href=\"https://www.giskard.ai/?utm_source=github\u0026utm_medium=github\u0026utm_campaign=github_readme\u0026utm_id=readmeblog\"\u003e\u003cb\u003eWebsite\u003c/b\u003e\u003c/a\u003e \u0026bull;\n  \u003ca href=\"https://gisk.ar/discord\"\u003e\u003cb\u003eCommunity\u003c/b\u003e\u003c/a\u003e\n \u003c/h3\u003e\n\u003cbr /\u003e\n\n## Install Giskard 🐢\nInstall the latest version of Giskard from PyPi using pip:\n```sh\npip install \"giskard[llm]\" -U\n```\nWe officially support Python 3.9, 3.10 and 3.11.\n## Try in Colab 📙\n[Open Colab notebook](https://colab.research.google.com/github/giskard-ai/giskard/blob/main/docs/getting_started/quickstart/quickstart_llm.ipynb)\n\n______________________________________________________________________\n\nGiskard is an open-source Python library that **automatically detects performance, bias \u0026 security issues in AI applications**. The library covers LLM-based applications such as RAG agents, all the way to traditional ML models for tabular data.\n\n## Scan: Automatically assess your LLM-based agents for performance, bias \u0026 security issues ⤵️\n\nIssues detected include:\n- Hallucinations\n- Harmful content generation\n- Prompt injection\n- Robustness issues\n- Sensitive information disclosure\n- Stereotypes \u0026 discrimination\n- many more...\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"https://raw.githubusercontent.com/giskard-ai/giskard/main/readme/scan_updates.gif\" alt=\"Scan Example\" width=\"800\"\u003e\n\u003c/p\u003e\n\n## RAG Evaluation Toolkit (RAGET): Automatically generate evaluation datasets \u0026 evaluate RAG application answers ⤵️\n\nIf you're testing a RAG application, you can get an even more in-depth assessment using **RAGET**, Giskard's RAG Evaluation Toolkit.\n\n- **RAGET** can generate automatically a list of `question`, `reference_answer` and `reference_context` from the knowledge base of the RAG. You can then use this generated test set to evaluate your RAG agent.\n- **RAGET** computes scores *for each component of the RAG agent*. The scores are computed by aggregating the correctness of the agent’s answers on different question types.\n\n  - Here is the list of components evaluated with **RAGET**:\n    - `Generator`: the LLM used inside the RAG to generate the answers\n    - `Retriever`: fetch relevant documents from the knowledge base according to a user query\n    - `Rewriter`: rewrite the user query to make it more relevant to the knowledge base or to account for chat history\n    - `Router`: filter the query of the user based on his intentions\n    - `Knowledge Base`: the set of documents given to the RAG to generate the answers\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"https://raw.githubusercontent.com/giskard-ai/giskard/main/readme/RAGET_updated.gif\" alt=\"Test Suite Example\" width=\"800\"\u003e\n\u003c/p\u003e\n\n\nGiskard works with any model, in any environment and integrates seamlessly with your favorite tools ⤵️ \u003cbr/\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003cimg width='600' src=\"https://raw.githubusercontent.com/giskard-ai/giskard/main/readme/tools_updated.png\"\u003e\n\u003c/p\u003e\n\u003cbr/\u003e\n\nLooking for solutions to evaluate computer vision models? Check out [giskard-vision](https://github.com/Giskard-AI/giskard-vision), a library dedicated for computer vision tasks.\n\n# Contents\n\n- 🤸‍♀️ **[Quickstart](#quickstart)**\n    - **1**. 🏗️ [Build a LLM agent](#build-a-llm-agent)\n    - **2**. 🔎 [Scan your model for issues](#scan-your-model-for-issues)\n    - **3**. 🪄 [Automatically generate an evaluation dataset for your RAG applications](#automatically-generate-an-evaluation-dataset-for-your-rag-applications)\n- 👋 **[Community](#community)**\n\n\u003ch1 id=\"quickstart\"\u003e🤸‍♀️ Quickstart\u003c/h1\u003e\n\n\u003ch2 id=\"build-a-llm-agent\"\u003e1. 🏗️ Build a LLM agent\u003c/h2\u003e\n\nLet's build an agent that answers questions about climate change, based on the 2023 Climate Change Synthesis Report by the IPCC.\n\nBefore starting let's install the required libraries:\n```sh\npip install langchain langchain-community langchain-openai tiktoken \"pypdf\u003c=3.17.0\"\n```\n\n\n```python\nfrom langchain import FAISS, PromptTemplate\nfrom langchain_openai import OpenAIEmbeddings, OpenAI\nfrom langchain.document_loaders import PyPDFLoader\nfrom langchain.chains import RetrievalQA\nfrom langchain.text_splitter import RecursiveCharacterTextSplitter\n\n# Prepare vector store (FAISS) with IPPC report\ntext_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100, add_start_index=True)\nloader = PyPDFLoader(\"https://www.ipcc.ch/report/ar6/syr/downloads/report/IPCC_AR6_SYR_LongerReport.pdf\")\ndb = FAISS.from_documents(loader.load_and_split(text_splitter), OpenAIEmbeddings())\n\n# Prepare QA chain\nPROMPT_TEMPLATE = \"\"\"You are the Climate Assistant, a helpful AI assistant made by Giskard.\nYour task is to answer common questions on climate change.\nYou will be given a question and relevant excerpts from the IPCC Climate Change Synthesis Report (2023).\nPlease provide short and clear answers based on the provided context. Be polite and helpful.\n\nContext:\n{context}\n\nQuestion:\n{question}\n\nYour answer:\n\"\"\"\n\nllm = OpenAI(model=\"gpt-3.5-turbo-instruct\", temperature=0)\nprompt = PromptTemplate(template=PROMPT_TEMPLATE, input_variables=[\"question\", \"context\"])\nclimate_qa_chain = RetrievalQA.from_llm(llm=llm, retriever=db.as_retriever(), prompt=prompt)\n```\n\n\u003ch2 id=\"scan-your-model-for-issues\"\u003e2. 🔎 Scan your model for issues\u003c/h2\u003e\n\nNext, wrap your agent to prepare it for Giskard's scan:\n\n```python\nimport giskard\nimport pandas as pd\n\ndef model_predict(df: pd.DataFrame):\n    \"\"\"Wraps the LLM call in a simple Python function.\n\n    The function takes a pandas.DataFrame containing the input variables needed\n    by your model, and must return a list of the outputs (one for each row).\n    \"\"\"\n    return [climate_qa_chain.run({\"query\": question}) for question in df[\"question\"]]\n\n# Don’t forget to fill the `name` and `description`: they are used by Giskard\n# to generate domain-specific tests.\ngiskard_model = giskard.Model(\n    model=model_predict,\n    model_type=\"text_generation\",\n    name=\"Climate Change Question Answering\",\n    description=\"This model answers any question about climate change based on IPCC reports\",\n    feature_names=[\"question\"],\n)\n```\n\n✨✨✨Then run Giskard's magical scan✨✨✨\n```python\nscan_results = giskard.scan(giskard_model)\n```\nOnce the scan completes, you can display the results directly in your notebook:\n\n```python\ndisplay(scan_results)\n\n# Or save it to a file\nscan_results.to_html(\"scan_results.html\")\n```\n\n*If you're facing issues, check out our [docs](https://docs.giskard.ai/en/stable/open_source/scan/scan_llm/index.html) for more information.*\n\n\u003ch2 id=\"automatically-generate-an-evaluation-dataset-for-your-rag-applications\"\u003e3. 🪄 Automatically generate an evaluation dataset for your RAG applications\u003c/h2\u003e\n\nIf the scan found issues in your model, you can automatically extract an evaluation dataset based on the issues found:\n\n```python\ntest_suite = scan_results.generate_test_suite(\"My first test suite\")\n```\n\nBy default, RAGET automatically generates 6 different question types (these can be selected if needed, see advanced question generation). The total number of questions is divided equally between each question type. To make the question generation more relevant and accurate, you can also provide a description of your agent.\n\n```python\n\nfrom giskard.rag import generate_testset, KnowledgeBase\n\n# Load your data and initialize the KnowledgeBase\ndf = pd.read_csv(\"path/to/your/knowledge_base.csv\")\n\nknowledge_base = KnowledgeBase.from_pandas(df, columns=[\"column_1\", \"column_2\"])\n\n# Generate a testset with 10 questions \u0026 answers for each question types (this will take a while)\ntestset = generate_testset(\n    knowledge_base,\n    num_questions=60,\n    language='en',  # optional, we'll auto detect if not provided\n    agent_description=\"A customer support chatbot for company X\", # helps generating better questions\n)\n```\n\nDepending on how many questions you generate, this can take a while. Once you’re done, you can save this generated test set for future use:\n\n```python\n# Save the generated testset\ntestset.save(\"my_testset.jsonl\")\n```\nYou can easily load it back\n\n```python\nfrom giskard.rag import QATestset\n\nloaded_testset = QATestset.load(\"my_testset.jsonl\")\n\n# Convert it to a pandas dataframe\ndf = loaded_testset.to_pandas()\n```\n\nHere’s an example of a generated question:\n\n| question                               | reference_context                                                                                                                                                     | reference_answer                                             | metadata                                               |\n|----------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------|-------------------------------------------------------|\n| For which countries can I track my shipping? | Document 1: We offer free shipping on all orders over $50. For orders below $50, we charge a flat rate of $5.99. We offer shipping services to customers residing in all 50 states of the US, in addition to providing delivery options to Canada and Mexico. Document 2: Once your purchase has been successfully confirmed and shipped, you will receive a confirmation email containing your tracking number. You can simply click on the link provided in the email or visit our website’s order tracking page. | We ship to all 50 states in the US, as well as to Canada and Mexico. We offer tracking for all our shippings. | `{\"question_type\": \"simple\", \"seed_document_id\": 1, \"topic\": \"Shipping policy\"}` |\n\nEach row of the test set contains 5 columns:\n\n- `question`: the generated question\n- `reference_context`: the context that can be used to answer the question\n- `reference_answer`: the answer to the question (generated with GPT-4)\n- `conversation_history`: not shown in the table above, contain the history of the conversation with the agent as a list, only relevant for conversational question, otherwise it contains an empty list.\n- `metadata`: a dictionary with various metadata about the question, this includes the question_type, seed_document_id the id of the document used to generate the question and the topic of the question\n\n\u003ch1 id=\"community\"\u003e👋 Community\u003c/h1\u003e\n\nWe welcome contributions from the AI community! Read this [guide](./CONTRIBUTING.md) to get started, and join our thriving community on [Discord](https://gisk.ar/discord).\n\n🌟 [Leave us a star](https://github.com/Giskard-AI/giskard), it helps the project to get discovered by others and keeps us motivated to build awesome open-source tools! 🌟\n\n❤️ If you find our work useful, please consider [sponsoring us](https://github.com/sponsors/Giskard-AI) on GitHub. With a monthly sponsoring, you can get a sponsor badge, display your company in this readme, and get your bug reports prioritized. We also offer one-time sponsoring if you want us to get involved in a consulting project, run a workshop, or give a talk at your company.\n\n\u003ch2 id=\"sponsors\"\u003e💚 Current sponsors\u003c/h1\u003e\n\nWe thank the following companies which are sponsoring our project with monthly donations:\n\n**[Lunary](https://lunary.ai/)**\n\n\u003cimg src=\"https://lunary.ai/logo-blue-bg.svg\" alt=\"Lunary logo\" width=\"100\"/\u003e\n\n**[Biolevate](https://www.biolevate.com/)**\n\n\u003cimg src=\"https://awsmp-logos.s3.amazonaws.com/seller-wgamx5z6umune/2d10badd2ccac49699096ea7fb986b98.png\" alt=\"Biolevate logo\" width=\"400\"/\u003e\n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgiskard-ai%2Fgiskard","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgiskard-ai%2Fgiskard","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgiskard-ai%2Fgiskard/lists"}