{"id":13490304,"url":"https://github.com/IngestAI/deepmark","last_synced_at":"2025-03-28T06:30:45.796Z","repository":{"id":207649787,"uuid":"703804017","full_name":"IngestAI/deepmark","owner":"IngestAI","description":"Deepmark AI enables a unique testing environment for language models (LLM) assessment on task-specific metrics and on your own data so your GenAI-powered solution has predictable and reliable performance.","archived":false,"fork":false,"pushed_at":"2023-11-24T09:35:09.000Z","size":2085,"stargazers_count":104,"open_issues_count":0,"forks_count":2,"subscribers_count":5,"default_branch":"main","last_synced_at":"2024-10-31T03:35:34.144Z","etag":null,"topics":["assessment-tool","benchmark","benchmarking","benchmarks","extrinsic-parameters","extrinsic-quality-measures","generative-ai","laravel","llm","llms","php","reliability-benchmarking"],"latest_commit_sha":null,"homepage":"https://ingestai.io/deepmark-ai","language":"PHP","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"agpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/IngestAI.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2023-10-12T00:37:47.000Z","updated_at":"2024-10-21T11:06:24.000Z","dependencies_parsed_at":"2024-01-12T09:46:06.122Z","dependency_job_id":null,"html_url":"https://github.com/IngestAI/deepmark","commit_stats":{"total_commits":100,"total_committers":8,"mean_commits":12.5,"dds":"0.43000000000000005","last_synced_commit":"cfcd8a36d36f797bd39a0d7856f32e042e84177c"},"previous_names":["ingestai/deepmark"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/IngestAI%2Fdeepmark","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/IngestAI%2Fdeepmark/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/IngestAI%2Fdeepmark/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/IngestAI%2Fdeepmark/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/IngestAI","download_url":"https://codeload.github.com/IngestAI/deepmark/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245984250,"owners_count":20704787,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["assessment-tool","benchmark","benchmarking","benchmarks","extrinsic-parameters","extrinsic-quality-measures","generative-ai","laravel","llm","llms","php","reliability-benchmarking"],"created_at":"2024-07-31T19:00:44.513Z","updated_at":"2025-03-28T06:30:44.350Z","avatar_url":"https://github.com/IngestAI.png","language":"PHP","funding_links":[],"categories":["PHP"],"sub_categories":[],"readme":"\u003ch1 align=\"center\" style=\"border-bottom: none\"\u003e\n    \u003cdiv\u003e\n        \u003ca href=\"https://deepmark.ai\"\u003e\n            \u003cimg src=\"https://deepmark.ai/deepmark.jpg\" width=\"80\" /\u003e\n            \u003cbr\u003e\n            DeepMark\n        \u003c/a\u003e\n    \u003c/div\u003e\n\u003c/h1\u003e\n\n\u003cp align=\"center\"\u003eDeepmark AI empowers Generative AI builders to make informed decisions when choosing among Large Language Models (LLM), enabling seamless assessment of various LLM on your own data, so your AI applications have predictable and reliable performance.\u003c/p\u003e\n\n# Introduction\n\nArtificial Intelligence (AI) is expected to contribute approximately $15.7 trillion to the global economy by 2030, according to a \u003ca href=\"https://www.pwc.com/gx/en/issues/data-and-analytics/publications/artificial-intelligence-study.html\" target=\"_blank\"\u003erecent study by PwC\u003c/a\u003e. As AI continues to play a crucial role in various domains, Generative AI and Large Language Models (LLM) have emerged as a powerful building block in creating AI-powered applications capable of generating enormous business value and generative AI is the key element in these kinds of applications.\n\n# Why are We Doing This? - Problem Statement\n\nAI sparked a revolution in the last decade and now AI Subject Matter Experts at MIT (\u003ca href=\"https://horizon.mit.edu/about-us\" target=\"_blank\"\u003ehttps://horizon.mit.edu/about-us\u003c/a\u003e) believe that Generative AI is going to further transform several domains such as code development, chatbots, audio/video amongst many others. With the advancement of Generative AI companies such as openAI and their products such as ChatGPT, there are legal, ethical and trust issues with Gen AI. These challenges beg the need for a good assessment of the products including metrics that need to aim to improve or rank these various models that drive the overall technology. This is also a roadblock for adaptation of GenAI in several companies today.\n\nAccording to \u003ca href=\"https://hbr.org/2023/06/managing-the-risks-of-generative-ai\" target=\"_blank\"\u003erecent HBR report\u003c/a\u003e: Generative AI cannot operate on a set-it-and-forget-it basis — the tools need constant oversight.\n\nAlthough assessment metrics are clearly defined and intrinsic metrics are normally assessed almost instantly when an LLM model is released, there’s no available tools (open-source or proprietary) that enable developers to seamlessly make task-specific (intrinsic) assessments on their unique data. The only solution close to it is the LangChain LangSmith, which is still in closed beta and is not mature enough to provide comprehensive extrinsic metrics that are essential for adoption.\n\nIn summary, organizations need to be able to assess LLM models on their own data to deliver verifiable results that balance accuracy, precision, recall (the model’s ability to correctly identify positive cases within a given dataset), and reliability, as models can produce different answers to the same prompts, impeding the user’s ability to assess the accuracy of outputs.\n\n# Our Solution\n\nTo address this challenge of reliability, we (IngestAI Labs) have developed Deepmark AI - a benchmarking tool that enables assessment of large language models (LLM) on various extrinsic (task-specific) metrics on your own data. It has pre-built integration with leading Generative AI APIs such as GPT-4, Anthropic, GPT-3.5 Turbo, Cohere, AI21, and others. \n\nCurrent GenAI (LLM) Assessment Metrics\n\nWhen it comes to assessing the performance of LLMs, there are two main types of metrics that can be used: intrinsic and extrinsic.\n\nExamples of intrinsic metrics include, but they are not limited to\n- Entropy,\n- Perplexity,\n- Coherence, etc.\n\nExtrinsic metrics, or also called Task-Specific metrics, may include:\n- Accuracy,\n- Latency,\n- Cost, etc.\n\nThese assessment metrics are not exhaustive, and specific applications may have additional or alternative metrics depending on the context and requirements, but some of the task-specific metrics like latency, accuracy, or cost can be considered as the most commonly used.\n\nDeepmark AI enables a unique testing environment for language models (LLM), allowing GenAI developers to easily diagnose inaccuracies and performance issues in a matter of seconds. By using Deepmark AI,  Generative AI applications developers can run multiple LLM models on hundreds or thousands of iterations over specific tasks (question-answering, sentiment analysis, NER, etc) and get exact assessment results in seconds.\n\n\u003cimg src=\"https://ingestai.io/storage/files/6/Screenshot%202023-10-17%20at%2000.29.37.png\"\u003e\n\nDeepMark AI is a tool specifically designed for Generative AI builders.This solution focuses on iterative assessment of extrinsic (task-specific) metrics to identify most predictable, reliable, and cost-effective Generative AI models based on the unique needs of a particular use case. Deepmark AI offers capabilities for comprehensive assessment of various important GenAI performance metrics, such as:\n\n- Question answering accuracy\n- Text classification accuracy\n- PII recognition accuracy\n- Named entity recognition (NER) accuracy\n- Summarization quality (Relevance)\n- Sentiment analysis accuracy\n- Cost analysis\n- Failure rate\n- Accuracy\n- Latency\n\nDeepmark AI empowers developers and organizations to make informed decisions when navigating through the most important performance metrics of Large Language Models.\n\n**User Adoption:**\n\nSince its launch in February 2023, IngestAI Labs plantorm (Playground, AI Aggregator, App Builder) has quickly gained popularity as a community-driven platform for rapid exploration, experimentation, and rapid prototyping of various AI use cases.\n\nThe platform has gained a significant industry recognition:\n- StartX AI Series,\n- ProductHunt Product #1 of the Day,\n- Accelerated by the PLUGandPLAY Silicon Valley program, and\n- Backed by the Cohere Acceleration Program.\n\nIn less than one year, IngestAI has amassed an impressive user base of over 40,000 individuals, with nearly 15,000 active users on a monthly basis and few NASDAQ-traded companies among customers and in the pipeline. This level of traction speaks to the platform's ability to attract and engage users and generate business value.\n\n# Key features of Deepmark AI include\n\n## Reliability Assessment\n\nReliability is a critical factor in determining the effectiveness of Generative AI models. DeepMark.AI.AI offers comprehensive reliability assessments by evaluating model performance under various conditions and capturing potential failure points. This enables developers to identify areas for improvement and enhance the overall reliability of their AI applications.\n\n## Accuracy Evaluation\n\nEnsuring the accuracy of Generative AI models is essential for generating high-quality outputs. DeepMark.AI.AI provides developers with tools to rigorously evaluate the accuracy of their models through extensive testing and validation procedures. By leveraging advanced statistical techniques and comparison methodologies, developers can derive meaningful insights into the accuracy of their Generative AI applications.\n\n## Cost Analysis\n\nUnderstanding the cost implications before deploying Generative AI models is vital for optimizing resource allocation and maximizing return on investment. DeepMark.AI incorporates cost analysis, enabling developers to make precise estimations of the financial requirements associated with running their AI applications on different GenAI models. By providing cost projections, DeepMark.AI helps developers make informed decisions to achieve cost-effective solutions.\n\n## Relevance Assessment\n\nEnsuring the relevance of generated outputs is critical, especially in applications where Generative AI is employed to address specific use cases. DeepMark.AI.AI facilitates relevance assessment by providing developers with tools to compare generated outputs against desired criteria. This allows developers to fine-tune their models and ensure the generated content aligns with the intended goals and requirements.\n\n## Latency Assessment\n\nThe assessment of latency in APIs for Generative AI models is of critical importance to deliver high-quality, efficient AI-powered applications. Latency denotes the time taken to get a response after a request is made and is a potential indicator of performance. By evaluating latency, AI developers can identify inefficiencies and ensure that AI applications perform at an optimal speed. This contributes to overall user satisfaction and impacts the reliability and credibility of AI applications.\n\n## Failure Rate Assessment\n\nAssessing and monitoring failure rates on hundreds or thousands of requests is an essential aspect of assessment of robustness of Generative AI applications. DeepMark.AI offers failure rates assessment capabilities, allowing developers to seamlessly track failure rates at various scales, from hundreds to thousands of requests per second. By providing insights into potential failure patterns, DeepMark.AI enables developers to proactively address issues and maintain optimal performance.\n\n# Key Benefits of Deepmark AI\n\nIncorporating the DeepMark.AI technology developed by IngestAI Labs within a AI development can yield to numerous advantages, including:\n\n## Predictability and Cost-effectiveness\n\nDeepMark.AI prioritizes predictability and cost-effectiveness by providing developers with reliable assessment metrics, cost estimations, and optimization recommendations. This empowers developers to make informed decisions, reducing the risks associated with designing and deploying Generative AI applications.\n\n## Data-driven Decision-making\n\nBy leveraging data and rigor, DeepMark.AI enables organizations to move away from relying solely on intuition when assessing Generative AI models. This data-driven approach instills confidence in the decision-making process, allowing for greater precision and accuracy in AI applications development.\n\n## Enhances Application Quality\n\nThe ability of DeepMark.AI to comprehensively assess reliability, accuracy, relevance, and cost-efficiency contributes to enhancing the overall quality of AI applications. Through continuous monitoring or periodic assessment, developers can iteratively improve their models’ performance (e.g. by improving metapromts or fine-tuning), ensuring optimal performance and user satisfaction.\n\n# Path Forward\n\nIngestAI is working on building own bias detection model based on a proprietary comparative dataset consisting of 7,5+ millions of varied requests and responses of different large language models, which are being labeled and used for training, testing, and refining of identification of bias-related contexts, real-time detection and resolution of biases and unsafe prompts or responses. Deepmark AI is a tool built on top of proprietary ML models for AI application developers which provides reliable assessments of predictability, accuracy, cost-efficiency, and other benchmark metrics. By prioritizing safety, truthfulness, predictability, and cost-effectiveness, while leveraging data and rigor, Deepmark AI empowers developers to build high-quality reliable Generative AI-powered applications. With its comprehensive features and benefits, Deepmark AI opens up new possibilities for organizations seeking to harness the true potential of Generative AI.\n\n# IngestAI DeepMark Setup via Docker Image\n\nDocker Image: \u003ca href=\"https://hub.docker.com/r/embedditor/deepmark\" target=\"_blank\"\u003ehttps://hub.docker.com/r/embedditor/deepmark\u003c/a\u003e\n\nYou can find detailed instructions on the Docker web page.\n\n# IngestAI DeepMark Setup via GitHub\n\n1) Install Laravel\n\n2) php artisan storage:link\n\n3) php artisan queue:table\n\n4) php artisan migrate\n\n5) Set BEARER_TOKEN in the .env\n\n6) Use the token from p.5 as the HTTP Header \"X-Bearer-Token\"\n\nInstall frontend\n\n1) You should have installed node.js and npm on your local machine, please see the documentation https://nodejs.org/\n2) Stable version for node.js is 16.16.0 you can use this https://github.com/nvm-sh/nvm for installing several node versions in 1 machine\n3) Go to the project root directory and in your terminal run `npm i`\n4) If you want to build project in the dev version you should run `npm run dev`, or `npm run build` for the production version\n5) For the local version, follow the link you will find in the terminal\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FIngestAI%2Fdeepmark","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FIngestAI%2Fdeepmark","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FIngestAI%2Fdeepmark/lists"}