{"id":26927436,"url":"https://github.com/cloudguruab/modsysML","last_synced_at":"2025-04-02T04:02:26.702Z","repository":{"id":153612907,"uuid":"629501722","full_name":"apolloapi/modsysML","owner":"apolloapi","description":"Model management toolkit for continuous model improvement. Evaluate and compare LLM outputs, test quality, catch regressions and automate.","archived":false,"fork":false,"pushed_at":"2024-10-01T13:49:58.000Z","size":303,"stargazers_count":36,"open_issues_count":23,"forks_count":6,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-03-03T03:34:35.749Z","etag":null,"topics":["ai","automation-framework","data-science","machinelearning","mlops","natural-language-processing","nlp-machine-learning","open-source","prompt-toolkit","prompts","python","security-tools"],"latest_commit_sha":null,"homepage":"https://modsys.vercel.app/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/apolloapi.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-04-18T12:49:35.000Z","updated_at":"2025-01-31T01:56:46.000Z","dependencies_parsed_at":"2024-04-18T16:24:06.523Z","dependency_job_id":"8d5464bb-f8b6-433e-9605-b0d35bbc10b4","html_url":"https://github.com/apolloapi/modsysML","commit_stats":null,"previous_names":["modsysml/modsys","cloudguruab/modsysml","modsysml/modsysml"],"tags_count":13,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/apolloapi%2FmodsysML","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/apolloapi%2FmodsysML/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/apolloapi%2FmodsysML/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/apolloapi%2FmodsysML/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/apolloapi","download_url":"https://codeload.github.com/apolloapi/modsysML/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246752622,"owners_count":20827987,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","automation-framework","data-science","machinelearning","mlops","natural-language-processing","nlp-machine-learning","open-source","prompt-toolkit","prompts","python","security-tools"],"created_at":"2025-04-02T04:02:25.506Z","updated_at":"2025-04-02T04:02:26.693Z","avatar_url":"https://github.com/apolloapi.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"# modsys: model management tool\n\n[![python](https://img.shields.io/pypi/pyversions/3)](https://www.python.org/downloads/)\n![GitHub Workflow Status](https://img.shields.io/github/actions/workflow/status/modsysML/modsys/pre-commit.yml)\n![Release](https://img.shields.io/github/v/release/modsysML/modsys)\n\n\u003cdiv align=\"center\" style=\"max-width: 50%;\"\u003e\n\n![ModsysML_Media_Intro_docs](https://github.com/modsysML/documentation/assets/72639210/106b58fc-4dfe-4077-8a0d-089355612d01)\n\n\u003c/div\u003e\n\n\u003cdiv align=\"center\"\u003e\nFramework for improving AI models\n\u003c/div\u003e\n\n\u003cp align=\"center\"\u003e\n    \u003cbr /\u003e\n    \u003ca href=\"https://modsys.vercel.app/docs/intro\" rel=\"dofollow\"\u003e\u003cstrong\u003eDocs»\u003c/strong\u003e\u003c/a\u003e\n    \u003cbr /\u003e\n\n  \u003cbr/\u003e\n    \u003ca href=\"https://www.apolloapi.io/\"\u003eJoin the waitlist\u003c/a\u003e\n    ·\n    \u003ca href=\"https://github.com/modsysML/modsys/issues\"\u003eReport Bug\u003c/a\u003e\n    ·\n    \u003ca href=\"https://discord.gg/ZUH7f7AzUY\"\u003eCommunity Discord\u003c/a\u003e\n\u003c/p\u003e\n\n\nModsysML is an open source toolbox for automating, building insights and evaluating model responses across relevant testcases. Prompt tuning is a way to fine-tune AI models with a small context window of data. Instead of foundational training, prompt tuning allows users to take pre-trained models and tweak them to a specific use-case. Instead of stopping here we've incorporated the ability to automate evaluations and even suggestions for model responses while giving users the ability to connect to any database/api resource for curating analytics alongside running evaluations.\n\nModsysML provides a framework for building relevant test cases across 4 domains, semantic, LLM based, Human in the loop and programmatic tests. It's a collection of python plugins wrapped into an SDK that provide the basic functionality, but may not fit each use case out of the box. You can craft custom test cases, run evaluations on model responses, curate automated scripts and generate data frames.\n\nModsysML strictly uses official provider integrations via API, psycopg and a few other modules to provide the building blocks. ModsysML is the upstream project to the Apollo API interface which serves as a UI for your development needs.\n\n## 🚀 Interesting, how can I try it?\n\nLets install the SDK first...\n\n```bash\npip install modsys\n```\n\n## Regression tests vs Automated pipelines\n\n`modsys` helps you tune LLM prompts systematically across many relevant test cases. By evaluating and comparing LLM outputs to build decision making workflows. Users can test prompt quality and catch regressions faster.\n\n### Evaluating prompt quality\n\n**With Modsys python library and CLI toolkit, you can:**\n\n- **Detecting real-time changes** in your data\n- Automating tasks against **image, video, audio or text**\n- Simplifying the process of **back-testing quality** for your AI models\n- Making sure your integration is robust, so you **never again have to worry about stuck/stale data or false-positives**\n- **Test multiple prompts** against predefined test cases\n- **Evaluate quality and catch regressions** by comparing LLM outputs side-by-side\n- **Speed up evaluations** with caching and concurrent tests\n- **Flag bad outputs automatically** by setting \"expectations\"\n- Use as a **command line tool, or integrate into your workflow with our library**\n- **Use any** AI provider, API or database under one API\n\n`modsys` produces table views that allow you to quickly review prompt outputs across many inputs. The goal: tune prompts systematically across all relevant test cases, instead of testing prompts by trial and error.\n\n#### Usage (command line)\n\n_Support for user interface coming soon_\n\n**It works on the command line, you can output to [`json`, `csv`, `yaml`]:**\n\n![Prompt eval](https://github.com/apolloapi/apolloapi/assets/72639210/c65b4565-5d17-4b32-971c-d4a51d9d137e)\n\nTo get started, run the following command:\n\n```\nmodsys init\n```\n\nThis will create some templates in your current directory: `prompts.txt`, `vars.csv`, and `config.json`.\n\nAfter editing the prompts and variables to your desired state, `modsys` command to kick off an prompt evaluation test:\n\n```\nmodsys -p ./prompts.txt -v ./vars.csv -r openai:completion\n```\n\nIf you're looking to customize your usage, you have a wide set of parameters at your disposal. See the [Configuration docs](https://docs.apolloapi.io/docs/api/configuration_tests) for more detail:\n\n| Option                                           | Description                                                                                                                                            |\n| ------------------------------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------ |\n| `-p, --prompts \u003cpaths...\u003e`                       | Paths to prompt files, directory, or glob                                                                                                              |\n| `-r, --providers \u003cname or path...\u003e`              | One of: openai:chat, openai:completion, openai:model-name, hive:hate, google:safety, etc. See [AI Providers](https://docs.apolloapi.io/docs/providers) |\n| `-o, --output \u003cpath\u003e`                            | Path to output file (csv, json, yaml, html)                                                                                                            |\n| `-v, --vars \u003cpath\u003e`                              | Path to file with prompt variables (csv, json, yaml)                                                                                                   |\n| `-c, --config \u003cpath\u003e`                            | Path to configuration file. `config.json` is automatically loaded if present                                                                           |\n| `-j, --max-concurrency \u003cnumber\u003e` _coming soon_   | Maximum number of concurrent API calls                                                                                                                 |\n| `--table-cell-max-length \u003cnumber\u003e` _coming soon_ | Truncate console table cells to this length                                                                                                            |\n| `--grader` _coming soon_                         | Provider that will grade outputs, if you are using                                                                                                     |\n\n### Examples\n\n#### Prompt quality\n\nIn this example, we evaluate whether adding adjectives to the personality of an chat bot affects the responses:\n\n```bash\nmodsys -p prompts.txt -v vars.csv -r openai:gpt-3.5-turbo\n```\n\n![Prompt eval](https://uploads-ssl.webflow.com/640ca38ad086fde245b76c9d/647411b456031b5145019909_Screenshot%202023-05-28%20at%2010.44.48%20PM.png)\n\nThis command will evaluate the prompts in `prompts.txt`, substituing the variable values from `vars.csv`, and output results in your terminal.\n\nHave a look at the setup and full output in another format:\n\n```\nmodsys -p prompts.txt -v vars.csv -r openai:gpt-3.5-turbo -o output.json\n```\n\nYou can also output a nice **spreadsheet, JSON, or YAML** file:\n\n```json\n{\n  \"results\": [\n    {\n      \"prompt\": {\n        \"raw\": \"Rephrase this in French: Hello world\",\n        \"display\": \"Rephrase this in French: {{body}}\"\n      },\n      \"vars\": {\n        \"body\": \"Hello world\"\n      },\n      \"response\": {\n        \"output\": \"Bonjour le monde\",\n        \"tokenUsage\": {\n          \"total\": 19,\n          \"prompt\": 16,\n          \"completion\": 3\n        }\n      }\n    }\n    // ...\n  ],\n  \"stats\": {\n    \"successes\": 4,\n    \"failures\": 0,\n    \"tokenUsage\": {\n      \"total\": 120,\n      \"prompt\": 72,\n      \"completion\": 48\n    }\n  }\n}\n```\n\nHere's an example of a side-by-side comparison of multiple prompts and inputs:\n\n#### Model quality\n\nYou can evaluate the difference between safety outputs for a specific context:\n\n_Model quality tests \u0026 python package for model testing is a beta feature at the moment, open an issue and tag us to setup_\n\n```bash\nmodsys -p prompts.txt -r hiveai:hate google:safety -o output.json\n```\n\n#### Configuration\n\n- **Setting up an model test**: Learn more about how to set up prompt files, vars file, output, etc.\n\n### Building Automated Pipelines in the User Interface or Programmatically\n\n![image](https://github.com/apolloapi/apolloapi/assets/72639210/602234c2-f855-4514-8188-505c0d6c39c1)\n\nLet's setup your first Integration!\n\nIt will pull from your local database (and keep it in sync).\n\n```python\n# import the package\nfrom modsys.client import Modsys\n\n# sync data from your database instance\n# (we support supabase at the current moment or postgresql via uri format)\nModsys.connect(\"postgres://username:password@hostname:port/database_name\")\n\n# If you want to test out operation on your external connection\nModsys.fetch_tables()\nModsys.query(\"desc\", \"table\", \"column\")\n```\n\n...and create a workflow with a simple command:\n\n```python\n# import the package\nfrom modsys.client import Modsys\n\n# Use any provider\nModsys.use(\"google_perspective:\u003cmodel name\u003e\", google_perspective_api_key=\"YOUR_API_TOKEN_HERE\")\n\n# An option for image detection, connect to sightengine provider or other image service first\nModsys.detectImage('https://example.com/some-endpoint') # Image Analysis/OCR\n\n# Lets check to see if a phrase contains threats\nModsys.detectText(prompt=\"Phrase1\", content_id=\"content-id\", community_id=\"user-id\")\n```\n\n**Example response**:\n\n```json\n{\n  \"attributeScores\": {\n    \"THREAT\": {\n      \"spanScores\": [\n        {\n          \"begin\": 0,\n          \"end\": 12,\n          \"score\": { \"value\": 0.008090926, \"type\": \"PROBABILITY\" }\n        }\n      ],\n      \"summaryScore\": { \"value\": 0.008090926, \"type\": \"PROBABILITY\" }\n    },\n    \"INSULT\": {\n      \"spanScores\": [\n        {\n          \"begin\": 0,\n          \"end\": 12,\n          \"score\": { \"value\": 0.008804884, \"type\": \"PROBABILITY\" }\n        }\n      ],\n      \"summaryScore\": { \"value\": 0.008804884, \"type\": \"PROBABILITY\" }\n    },\n    \"SPAM\" // ...\n  },\n  \"languages\": [\"en\"],\n  \"clientToken\": \"content_123\",\n  \"detectedLanguages\": [\"en\", \"fil\"]\n}\n```\n\n_Experimental inputs_:\n\n```python\n# Create custom rules which creates a task!\nModsys.rule('Phrase1', '\u003e=', '0.8')\n\nModsys.detectImage('Image1', 'contains', 'VERY_LIKELY') # Image Analysis/OCR\nModsys.detectSpeech('Audio1', 'contains', 'UNLIKELY') # Audio Processing\nModsys.detectVideo('Video1', 'contains', 'POSSIBLE') # Video Analysis\nModsys.detectText('Phrase1', 'contains', 'UNKNOWN') # Text Analysis\nModsys.test('prompt', 'expected_output') # ML Validation\n```\n\nThat's all it takes!\n\nIn practice, you probably want to use one of our native SDKs to interact with different AI providers or use our custom browser client so you dont have to write code. If so, sign up for the downstream [Apollo ModsysML Console](https://use.apolloapi.io/admin/)!\n\n##### Cool, what can I build with it?\n\n- Modsys can help you **quickly automate tasks** for model management, performance, labeling, object detection and more.\n- Teams can use Modsys to **build native in-app connections** related to active response, content moderation, risk management, fraud detection, etc.\n- Some **automate their personal lives** with Modsys by integrating against discord communities or their personal lives\n\n## Development\n\nContributions are welcome! Please feel free to submit a pull request or open an issue.\n\n### 📦 pre-commit config\n\nAs an open source project, Modsys welcomes contributions from the community at large. This isn’t an exhaustive reference and is a living document subject to change as needed when the project formalizes any practice or pattern.\n\nClone the repo and start Modsys locally...\n\n```bash\ngit clone https://github.com/modsysML/modsys.git\ncd modsys \u0026\u0026 python3 -m venv env \u0026\u0026 source env/bin/activate \u0026\u0026 pip install -r requirements.txt\n```\n\n- After installing system dependencies be sure to install pre-commit for lint checks\n\n```bash\npip install pre-commit\n\npre-commit install\n\npre-commit run --all-files\n```\n\nModsys uses commit messages for automated generation of project changelog. For every pull request we request contributors to be compliant with the following commit message notation.\n\n```\n\u003ctype\u003e: \u003csummary\u003e\n\n\u003cbody\u003e\n```\n\nAccepted `\u003ctype\u003e` values:\n\n- new = newly implemented user-facing features\n- chg = changes in existing user-facing features\n- fix = user-facing bugfixes\n- oth = other changes which users should know about\n- dev = any developer-facing changes, regardless of new/chg/fix status\n\n##### Summary (The first line)\n\nThe first line should not be longer than 75 characters, the second line is always blank and other lines should be wrapped at 80 characters.\n\n## 🔍 Neat, I would like to learn more\n\n⭐ Follow our development by starring us here on GitHub ⭐\n\n- Share feedback or chat with us on the [Discord community](https://discord.gg/ZUH7f7AzUY)\n\n## AI Providers\n\nWe support OpenAI as well as a number of models. It's also possible to set up your own custom AI provider.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcloudguruab%2FmodsysML","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcloudguruab%2FmodsysML","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcloudguruab%2FmodsysML/lists"}