{"id":13676570,"url":"https://github.com/patil-suraj/onnx_transformers","last_synced_at":"2025-04-15T01:04:40.619Z","repository":{"id":53502066,"uuid":"289531256","full_name":"patil-suraj/onnx_transformers","owner":"patil-suraj","description":"Accelerated NLP pipelines for fast inference on CPU. Built with Transformers and ONNX runtime.","archived":false,"fork":false,"pushed_at":"2020-12-05T23:16:40.000Z","size":486,"stargazers_count":126,"open_issues_count":7,"forks_count":27,"subscribers_count":6,"default_branch":"master","last_synced_at":"2025-04-15T01:04:23.334Z","etag":null,"topics":["inference","nlp","onnx","onnxruntime","transformers"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/patil-suraj.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-08-22T17:05:09.000Z","updated_at":"2024-09-19T23:52:53.000Z","dependencies_parsed_at":"2022-09-19T21:09:52.359Z","dependency_job_id":null,"html_url":"https://github.com/patil-suraj/onnx_transformers","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/patil-suraj%2Fonnx_transformers","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/patil-suraj%2Fonnx_transformers/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/patil-suraj%2Fonnx_transformers/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/patil-suraj%2Fonnx_transformers/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/patil-suraj","download_url":"https://codeload.github.com/patil-suraj/onnx_transformers/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248986313,"owners_count":21194025,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["inference","nlp","onnx","onnxruntime","transformers"],"created_at":"2024-08-02T13:00:29.812Z","updated_at":"2025-04-15T01:04:40.590Z","avatar_url":"https://github.com/patil-suraj.png","language":"Jupyter Notebook","funding_links":[],"categories":["Jupyter Notebook"],"sub_categories":[],"readme":"# onnx_transformers\n\n![onnx_transformers](https://github.com/patil-suraj/onnx_transformers/blob/master/data/social_preview.jpeg?raw=True)\n\nAccelerated NLP pipelines for fast inference 🚀 on CPU. Built with 🤗Transformers and ONNX runtime.\n\n## Installation:\n\n```bash\npip install git+https://github.com/patil-suraj/onnx_transformers\n```\n\n## Usage:\n\n\u003e *NOTE* : This is an experimental project and only tested with PyTorch\n\nThe pipeline API is similar to transformers [pipeline](https://huggingface.co/transformers/main_classes/pipelines.html) with just a few differences which are explained below.\n\nJust provide the path/url to the model and it'll download the model if needed from the [hub](https://huggingface.co/models) and automatically create onnx graph and run inference.\n\n```python\nfrom onnx_transformers import pipeline\n\n# Initialize a pipeline by passing the task name and \n# set onnx to True (default value is also True)\n\u003e\u003e\u003e nlp = pipeline(\"sentiment-analysis\", onnx=True)\n\u003e\u003e\u003e nlp(\"Transformers and onnx runtime is an awesome combo!\")\n[{'label': 'POSITIVE', 'score': 0.999721109867096}]  \n```\n\nOr provide a different model using the `model` argument.\n\n```python\nfrom onnx_transformers import pipeline\n\n\u003e\u003e\u003e nlp = pipeline(\"question-answering\", model=\"deepset/roberta-base-squad2\", onnx=True)\n\u003e\u003e\u003e nlp({\n  \"question\": \"What is ONNX Runtime ?\", \n  \"context\": \"ONNX Runtime is a highly performant single inference engine for multiple platforms and hardware\"\n})\n{'answer': 'highly performant single inference engine for multiple platforms and hardware', 'end': 94, 'score': 0.751201868057251, 'start': 18}\n```\n\nSet `onnx` to `False` for standard torch inference.\n\nYou can create `Pipeline` objects for the following down-stream tasks:\n\n - `feature-extraction`: Generates a tensor representation for the input sequence\n - `ner`: Generates named entity mapping for each word in the input sequence.\n - `sentiment-analysis`: Gives the polarity (positive / negative) of the whole input sequence. Can be used for any text classification model.\n - `question-answering`: Provided some context and a question referring to the context, it will extract the answer to the question in the context.\n - `zero-shot-classification`:\n  \n\nCalling the pipeline for the first time loads the model, creates the onnx graph, and caches it for future use. Due to this, the first load will take some time. Subsequent calls to the same model will load the onnx graph automatically from the cache.\n\nThe key difference between HF pipeline and onnx_transformers is that the `model` parameter should always be a `string` (path or url to the saved model). Also, the `zero-shot-classification` pipeline here uses `roberta-large-mnli` as default model instead of `facebook/bart-large-mnli` as BART is not yet tested with onnx runtime.\n\n\n## Benchmarks\n\n\u003e Note: For some reason, onnx is slow on colab notebook so you won't notice any speed-up there. Benchmark it on your own hardware.\n\nFor detailed benchmarks and other information refer to this blog post and notebook.\n- [Accelerate your NLP pipelines using Hugging Face Transformers and ONNX Runtime](https://medium.com/microsoftazure/accelerate-your-nlp-pipelines-using-hugging-face-transformers-and-onnx-runtime-2443578f4333)\n- [Exporting 🤗 transformers model to ONNX](https://github.com/huggingface/transformers/blob/master/notebooks/04-onnx-export.ipynb)\n\nTo benchmark the pipelines in this repo, see the [benchmark_pipelines](https://github.com/patil-suraj/onnx_transformers/blob/master/notebooks/benchmark_pipelines.ipynb) notebook. \n\u003e(Note: These are not yet comprehensive benchmarks.)\n\n**Benchmark `feature-extraction` pipeline** \n\n![](https://github.com/patil-suraj/onnx_transformers/blob/master/data/feature_extraction_pipeline_benchmark.png?raw=True)\n\n\n**Benchmark `question-answering` pipeline**\n\n![](https://github.com/patil-suraj/onnx_transformers/blob/master/data/qa_pipeline_benchmark.png?raw=True)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpatil-suraj%2Fonnx_transformers","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpatil-suraj%2Fonnx_transformers","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpatil-suraj%2Fonnx_transformers/lists"}