{"id":19185827,"url":"https://github.com/phospho-app/fastassert","last_synced_at":"2025-07-30T23:02:48.084Z","repository":{"id":222241433,"uuid":"756670424","full_name":"phospho-app/fastassert","owner":"phospho-app","description":"Dockerized LLM inference server with constrained output (JSON mode), built on top of vLLM and outlines. Faster, cheaper and without rate limits. Compare the quality and latency to your current LLM API provider.","archived":false,"fork":false,"pushed_at":"2024-02-17T02:50:15.000Z","size":180,"stargazers_count":27,"open_issues_count":0,"forks_count":0,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-05-08T01:17:03.258Z","etag":null,"topics":["docker","llm","llm-inference","outlines","vllm"],"latest_commit_sha":null,"homepage":"https://phospho.ai","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/phospho-app.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-02-13T04:23:28.000Z","updated_at":"2024-11-15T10:28:47.000Z","dependencies_parsed_at":null,"dependency_job_id":"876fe855-d7a2-4383-a8c1-a6707c99bec4","html_url":"https://github.com/phospho-app/fastassert","commit_stats":null,"previous_names":["phospho-app/fastassert"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/phospho-app/fastassert","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/phospho-app%2Ffastassert","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/phospho-app%2Ffastassert/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/phospho-app%2Ffastassert/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/phospho-app%2Ffastassert/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/phospho-app","download_url":"https://codeload.github.com/phospho-app/fastassert/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/phospho-app%2Ffastassert/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":267959714,"owners_count":24172469,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-07-30T02:00:09.044Z","response_time":70,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["docker","llm","llm-inference","outlines","vllm"],"created_at":"2024-11-09T11:12:03.240Z","updated_at":"2025-07-30T23:02:48.036Z","avatar_url":"https://github.com/phospho-app.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# FastAssert \n\n\u003e **TL;DR :** FastAssert is a dockerized LLM inference server with constrained output (JSON models), built on top of vLLM and outlines. It’s faster, cheaper and doesn’t impact your LLM provider API rate limit. Get a script to compare the quality and latency compared to your current LLM API provider.\n\n## What is FastAssert?\n\nFastAssert is a dockerized LLM inference server with constrained output (JSON models). It's built on top of vLLM, the leading LLM inference server.\n\nThis means, with the same level of accuracy :\n\n- Same quality as GPT 3.5 but 3 times faster\n- No validation error thanks to **outlines** (guaranteed JSON or regex output)\n- Reduced cost (especially compared to Function Calling)\n- Lower chance to hit the API rate limit of your LLM provider\n\n![Completion Time Comparison](images/completion_times.png \"Completion Time Comparison\")\n\nAccuracy on the task:\n```\nGPT3-5 turbo accuracy 0.7857\nGPT4 accuracy 0.9285\nFastAssert accuracy 0.7857\n```\n\nSee the notebook `performance.ipynb` for a full analysis.\n\n## How does it work?\n\nFastAssert provides you with :\n\n- a server to run locally, using vLLM, fastapi and outlines : takes a text prompt and a JSON format or regex expression as request, and returns the desired generation\n- a notebook and script to assess the performance of the server in term of accuracy and latency compared to your current implementation and find the optimal model to use\n\n## Common use cases\n\n- Complex chaining : to enforce a given output JSON format\n- Tool use : generate the tool parameters using the JSON output\n- Intent Classification : check if the user input is a question, a command, a greeting, etc.\n- Sentiment Analysis : detect angry or negative user input\n- Detect forbiden content : NSFW, hate speech, violence, etc.\n- Detect specific content : competitor mention, personal data, etc.\n- And more!\n\n## Requirements\n\n- OS: Linux\n- CUDA 12.1\n- Min. GPU RAM for inference : 16 GB (we used a NVIDIA A100 40GB)\n\n## Installation\n\n### Server\n\nBuild the container:\n```shell\nsudo docker build -t fastassert .\n```\n\nRun the container:\n```shell\nsudo docker run --gpus all -p 8000:8000 fastassert\n```\n\nYour server is now running on port 8000\n\nYou can call it through a OpenAI compatile API with a prompt and a JSON schema:\n```\ncurl http://127.0.0.1:8000/generate \\\n    -d '{\n        \"prompt\": \"What is the capital of France?\",\n        \"schema\": {\"type\": \"string\", \"maxLength\": 5}\n        }'\n```\nor a regex:\n```\ncurl http://127.0.0.1:8000/generate \\\n    -d '{\n        \"prompt\": \"What is Pi? Give me the first 15 digits: \",\n        \"regex\": \"(-)?(0|[1-9][0-9]*)(\\\\.[0-9]+)?([eE][+-][0-9]+)?\"\n        }'\n```\n\n### Notebook\n\nInstall the required dependancies to run a Jupyter Notebook \n```\nconda create --name notebookenv python=3.11\nconda activate notebookenv\nconda install ipykernel\npython -m ipykernel install --user --name=notebookenv --display-name=\"Python (notebookenv)\"\nconda install notebook\n```\n\n(Optional) Add a password to secure your notebook\n```\njupyter notebook password\n```\n\nIf you want to compare the performance of the FastAssert server to your current OpenAI implementation, don't forget to export your API key as an environment variable before starting the jupyter notebook:\n```\nexport OPENAI_API_KEY=\"\"\n```\n\nLaunch the notebook in remote access mode in the remote machine:\n```\njupyter notebook --no-browser --port=8888 --ip=0.0.0.0\n```\n\nIn your local machine:\n```\nssh -L localhost:8888:localhost:8888 remote_user@remote_host\n```\n\nOpen the notebook in your local webbrowser and enjoy your notebook:\n```\nhttp://localhost:8888\n```\n\nMake sure you use the right kernel, `Python (notebookenv)` in our case.\n\n## Experimental results \n\nFor a JSON constrained output completion task:\n```\nOpenAI GPT-3.5 Mean: 0.92350 s, Standard Deviation: 0.33619\nOpenAI GPT-4 Mean: 1.44287 s, Standard Deviation: 0.375827\nFastAssert Mean: 0.30335 s, Standard Deviation: 0.0055845\n```\n\n## Wanna try it through an hosted API?\n\nFill in this [quick form](https://docs.google.com/forms/d/e/1FAIpQLSc8TaSb90r4CMFBbbpnF-6CWSIhvQlfvAY62eeu-GV6X2eA8Q/viewform?usp=sf_link) to get access to the hosted API.\n\n## What's next\n\nTry out on your use case to see if you can keep the same accuracy but improve costs and latency.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fphospho-app%2Ffastassert","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fphospho-app%2Ffastassert","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fphospho-app%2Ffastassert/lists"}