{"id":14961312,"url":"https://github.com/ofa-sys/ofasys","last_synced_at":"2025-10-24T20:31:39.478Z","repository":{"id":64478227,"uuid":"575711048","full_name":"OFA-Sys/OFASys","owner":"OFA-Sys","description":"OFASys: A Multi-Modal Multi-Task Learning System for Building Generalist Models","archived":false,"fork":false,"pushed_at":"2023-01-07T09:02:22.000Z","size":21334,"stargazers_count":146,"open_issues_count":9,"forks_count":12,"subscribers_count":13,"default_branch":"main","last_synced_at":"2025-01-31T04:23:47.031Z","etag":null,"topics":["audio","computer-vision","deep-learning","motion","multimodal-learning","multitask-learning","nlp","pretrained-models","pytorch","transformers","vision-and-language"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/OFA-Sys.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2022-12-08T05:55:45.000Z","updated_at":"2025-01-21T01:42:50.000Z","dependencies_parsed_at":"2023-02-06T18:01:50.573Z","dependency_job_id":null,"html_url":"https://github.com/OFA-Sys/OFASys","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OFA-Sys%2FOFASys","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OFA-Sys%2FOFASys/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OFA-Sys%2FOFASys/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OFA-Sys%2FOFASys/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/OFA-Sys","download_url":"https://codeload.github.com/OFA-Sys/OFASys/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":238035385,"owners_count":19405682,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["audio","computer-vision","deep-learning","motion","multimodal-learning","multitask-learning","nlp","pretrained-models","pytorch","transformers","vision-and-language"],"created_at":"2024-09-24T13:24:45.820Z","updated_at":"2025-10-24T20:31:35.638Z","avatar_url":"https://github.com/OFA-Sys.png","language":"Python","readme":"\u003cp align=\"center\"\u003e\n  \u003cbr\u003e\n    \u003cimg src=\"images/ofasys_logo.svg\" width=\"250\" /\u003e\n  \u003cbr\u003e\n  \n   \u003cbr\u003e\n  \n  \u003ca href='https://ofasys-doc.readthedocs.io/en/latest/?badge=latest'\u003e\n    \u003cimg src='https://readthedocs.org/projects/ofasys-doc/badge/?version=latest' alt='Documentation Status' /\u003e\n\u003c/a\u003e\n \u003ca href=\"https://github.com/OFA-Sys/OFASys/blob/main/LICENSE\"\u003e\u003cimg alt=\"License\" src=\"https://img.shields.io/badge/license-Apache--2.0-blue\"/\u003e\u003c/a\u003e\n\n\u003cp align=\"center\"\u003e\n         \u0026nbsp\u003ca href=\"https://ofasys-doc.readthedocs.io/en/latest/start/whatis.html\"\u003eDocumentation \u003c/a\u003e\u0026nbsp| \u0026nbsp\u003ca href=\"https://arxiv.org/abs/2212.04408\"\u003ePaper\u003c/a\u003e\u0026nbsp｜\u0026nbsp Blog \u0026nbsp |\u0026nbsp ModelScope \u0026nbsp \n\u003c/p\u003e\n\n\u003c/p\u003e\n\n# What is OFASys?\n\u003cp align=\"center\"\u003e\n\u003cimg src=\"images/task7.gif\" width = \"700\" alt=\"\" align=center /\u003e\n\u003c/p\u003e\nOFASys is a multi-modal multi-task learning system designed to make multi-modal tasks declarative, modular and task-scalable. With OFASys, it is easy to:\n\n- Rapidly introduce new multi-modal tasks/datasets by defining a declarative one-line instruction.\n- Develop new or reuse existing modality-specific components.\n- Jointly train multiple multi-modal tasks together without manual processing of multi-modal data collating.\n\nFor now, OFASys supports 7 modalities and more than 20 classes of multi-modal tasks, including:\n* Text: for tasks like Natural language Understanding, Text Summarization and Text Infilling.\n* Image: for tasks like Image Classification, Visual Entailment, Image Captioning, Visual Question Answering, Text-to-Image Generation and Image Infilling.\n* Box: for tasks like Visual Grounding, Grounded Caption, Object Detection\n* Video: for tasks like Video Classification, Video Captioning and Video Question Answering.\n* Audio: for tasks like Automatic Speech Recognition, and Text to Speech.\n* Structural Language: for tasks like Text-to-SQL, Table-to-Text, Table question answering, and Sudoku.\n* Motion: for tasks like Text-to-Motion.\n\n# News\n* 2022.12.23 v0.1.0-patch1:\n  - Refactored and released diffusion-based `Text-to-Motion` task (v0.1), see [doc](https://ofasys-doc.readthedocs.io/en/latest/task/motion.html) for usage.\n  - Refactored TextPreprocess: BOS and EOS no longer required when writing an instruction.\n  - Added DatabasePreprocess for the `Text-to-SQL` task.\n\n# Requirements\n\n- PyTorch version \u003e= 1.8.0\n- Python version \u003e= 3.7\n- Torchaudio \u003e= 0.8.0\n\n# Installation\n\n## Install with pip\n\nThrough the pip installation, users can experience the basic multi-task training and inference functions of OFASys.\n\n```\npip install http://ofasys.oss-cn-zhangjiakou.aliyuncs.com/pkg/ofasys-0.1.0-py3-none-any.whl\n```\n\nTest your installation.\n\n```\npython -c \"import ofasys\"\n```\n\nUsing the audio feature in OFASys requires the [soundfile](https://github.com/bastibe/python-soundfile#installation) library to be installed.\nIn the Ubuntu OS, run the following command:\n\n```\nsudo apt-get update\nsudo apt-get install libsndfile1\n```\n\n## Install with Source (Optional)\n\nUsers can install OFASys from the source code to customize their training tasks and full functions.\n\n```\ngit clone https://github.com/OFA-Sys/OFASys.git\ncd OFASys\npython setup.py develop\n```\n\n# Getting Started\n\nThe [documents](https://ofasys-doc.readthedocs.io/en/latest/start/quickstart.html) contains more instructions for getting started.\n\n## Training One Model for All Tasks\n\n### Define the Tasks\n\nOFASys can co-train multiple multi-modal tasks flexibly.\n\n```python\nfrom ofasys import Task, Trainer, GeneralistModel\ntask1 = Task(\n     name='caption',\n     instruction='[IMAGE:image_url] what does the image describe? -\u003e [TEXT:caption]',\n     micro_batch_size=4,\n )\ntask2 = Task(\n     name='text_infilling',\n     instruction='what is the complete text of \" [TEXT:sentence,mask_ratio=0.3] \"? -\u003e [TEXT:sentence]',\n     micro_batch_size=2,\n )\n```\nIn the simplest scenario, you only need to specify an instruction to define your task and a task name as an identifier.\n\n### Set the Dataset\n\nThe Task can use a regular Pytorch Dataloader which can be constructed by Huggingface Dataset or a customized Pytorch Dataset.\n\n```python\nfrom datasets import load_dataset\ntask1.add_dataset(load_dataset('TheFusion21/PokemonCards')['train'], 'train')\ntask2.add_dataset(load_dataset('glue', 'cola')['train'], 'train')\n```\n    \n### Create a Generalist Model and Train All Tasks Together\n\nThe GeneralistModel of OFASys (OFA+) is capable of handling multiple [modalities](https://ofasys-doc.readthedocs.io/en/latest/concept/plan.html#modality) including:\n*TEXT*, *IMAGE*, *AUDIO*, *VIDEO*, *MOTION*, *BOX*, *PHONE*.\n\nThe OFASys Trainer “mixes” multiple Tasks with any dataset and abstracts away all the engineering complexity needed for scale.\n\n```python\nmodel = GeneralistModel()\ntrainer = Trainer()\ntrainer.fit(model=model, tasks=[task1, task2])\n```\n\nThe complete script is available at [scripts/trainer_api.py](https://github.com/OFA-Sys/OFASys/blob/main/scripts/trainer_api.py).\n\n## Infer Multiple Multi-modal Tasks with One Checkpoint\n\nOFASys can infer multiple multi-modal tasks using just **One** checkpoint.\n\n```python\nfrom ofasys import OFASys\nmodel = OFASys.from_pretrained('multitask.pt')\n```\n\nOFASys enables multi-task multi-modal inference through the instruction alone. The multitask checkpoint can be download at [here](http://ofasys.oss-cn-zhangjiakou.aliyuncs.com/model_hub/multitask_10k.pt). Let's go through a couple of examples!\n    \n### Image Captioning\n\u003cimg src=\"https://ofasys.oss-cn-zhangjiakou.aliyuncs.com/data/coco/2014/val2014/COCO_val2014_000000222628.jpg\" width=\"400\"\u003e\n\n```python\ninstruction = '[IMAGE:img] what does the image describe?  -\u003e [TEXT:cap]'\ndata = {'img': \"./COCO_val2014_000000222628.jpg\"}\noutput = model.inference(instruction, data=data)\nprint(output.text)\n# \"a man and woman sitting in front of a laptop computer\"\n```\n\n### Visual Grounding\n\u003cimg src=\"https://www.2008php.com/2014_Website_appreciate/2015-06-22/20150622131649.jpg\" width=\"400\"\u003e\n\n```python\ninstruction = '[IMAGE:img] which region does the text \" [TEXT:cap] \" describe? -\u003e [BOX:patch_boxes]'\ndata = {'img': \"https://www.2008php.com/2014_Website_appreciate/2015-06-22/20150622131649.jpg\", \"cap\": \"hand\"}\noutput = model.inference(instruction, data=data)\noutput.save_box(\"output.jpg\")\n```\n\u003cimg src=\"http://ofasys.oss-cn-zhangjiakou.aliyuncs.com/examples/inference_caption_0.jpg\" width=\"400\"\u003e\n\n### Text Summarization\n\n```python\ninstruction = 'what is the summary of article \" [TEXT:src] \"? -\u003e [TEXT:tgt]'\ndata = {'src': \"poland 's main opposition party tuesday endorsed president lech walesa in an upcoming \"\n        \"presidential run-off election after a reformed communist won the first round of voting .\"}\noutput = model.inference(instruction, data=data)\nprint(output.text)\n# \"polish opposition endorses walesa in presidential run-off\"\n```\n\n### Table-to-Text Generation\n\n```python\ninstruction = 'structured knowledge: \" [STRUCT:database,uncased] \"  . how to describe the tripleset ? -\u003e [TEXT:tgt]'\ndata = {\n     'database': [['Atlanta', 'OFFICIAL_POPULATION', '5,457,831'],\n                  ['[TABLECONTEXT]', 'METROPOLITAN_AREA', 'Atlanta'],\n                  ['5,457,831', 'YEAR', '2012'],\n                  ['[TABLECONTEXT]', '[TITLE]', 'List of metropolitan areas by population'],\n                  ['Atlanta', 'COUNTRY', 'United States'],\n     ]\n }\noutput = model.inference(instruction, data=data, beam_size=1)\nprint(output.text)\n# \"atlanta, united states has a population of 5,457,831 in 2012.\"\n```\n\n### Text-to-SQL Generation\n\n```python\ninstruction = ' \" [TEXT:src] \" ; structured knowledge: \" [STRUCT:database,max_length=876] \" . generating sql code. -\u003e [TEXT:tgt]'\ndatabase = [\n             ['concert_singer'],\n             ['stadium', 'stadium_id , location , name , capacity , highest , lowest , average'],\n             ['singer', 'singer_id , name , country , song_name , song_release_year , age , is_male'],\n             ['concert', 'concert_id , concert_name , theme , stadium_id , year'],\n             ['singer_in_concert', 'concert_id , singer_id']\n ]\ndata = [\n     {'src': 'What are the names, countries, and ages for every singer in descending order of age?', 'database': database},\n     {'src': 'What are all distinct countries where singers above age 20 are from?', 'database': database},\n     {'src': 'Show the name and the release year of the song by the youngest singer.', 'database': database}\n ]\noutput = model.inference(instruction, data=data)\nprint('\\n'.join([o.text for o in output]))\n# \"select name, country, age from singer order by age desc\"\n# \"select distinct country from singer where age \u003e 20\"\n# \"select song_name, song_release_year from singer order by age limit 1\"\n``` \n\n### Video Captioning\n  \n\u003cimg src=\"https://ofasys.oss-cn-zhangjiakou.aliyuncs.com/examples/video.png\" width=\"400\"\u003e\n\n```python\ninstruction = '[VIDEO:video] what does the video describe? -\u003e [TEXT:cap]'\ndata = {'video': './video7021.mp4'}\noutput = model.inference(instruction, data=data)\nprint(output.text)\n# \"a baseball player is hitting a ball\"\n```\n\n### Speech-to-Text Generation\n\n\u003caudio controls=\"controls\"\u003e\n  \u003csource src=\"http://ofasys.oss-cn-zhangjiakou.aliyuncs.com/data/librispeech/dev-clean/1272/128104/1272-128104-0001.flac\" type=\"audio/wav\"\u003e\n  Your browser does not support the \u003ccode\u003eaudio\u003c/code\u003e element.\n\u003c/audio\u003e\n\n```python    \ninstruction = '[AUDIO:wav] what is the text corresponding to the voice? -\u003e [TEXT:text,preprocess=text_phone]'\ndata = {'wav': './1272-128104-0001.flac'}\noutput = model.inference(instruction, data=data)\nprint(output.text)\n# \"nor is mister klohs manner less interesting than his manner\"\n```\n\n### Text-to-Image Generation\n\n```python   \ninstruction = 'what is the complete image? caption: [TEXT:text]\"? -\u003e [IMAGE,preprocess=image_vqgan,adaptor=image_vqgan]'\ndata = {'text': \"a city with tall buildings and a large green park.\"}\noutput = model.inference(instruction, data=data)\noutput[0].save_image('0.png')\n```\n\n\u003cimg src=\"https://ofasys.oss-cn-zhangjiakou.aliyuncs.com/examples/image-gen_example.png\" width=\"400\"\u003e\n  \n### Text-to-Motion Generation\n\n```\nmodel = OFASys.from_pretrained('single_task_motion.pt')\ninstruction = 'motion capture: [TEXT:text] -\u003e [MOTION:bvh_frames,preprocess=motion_6d,adaptor=motion_6d]'\nguided_prompts = [\n    {'text': 'run then jump'},  # # The positive prompt.\n    {'text': ''},  # The negative prompt, or an empty string for classifier-free guidance.\n]\n# This API requires the positive and negative prompts be in the same batch, so please ensure batch_size % 2 == 0.\noutput = model.inference(instruction, data=guided_prompts, guidance_weight=3.0, batch_size=2)\noutput[0].save_as_gif('run_then_jump__guided.gif')\n```\n\n\u003cimg src=\"https://ofasys.oss-cn-zhangjiakou.aliyuncs.com/examples/run_then_jump_guided.gif\" width=\"400\"\u003e\n\nThe checkpoint of the single motion task and more motion cases can be found at [here](https://ofasys-doc.readthedocs.io/en/latest/task/motion.html).\n\n\n# Learn More\n\n| Section | Description |\n|-|-|\n| [Documentation](https://ofasys-doc.readthedocs.io/en/latest/index.html) | Full API documentation and tutorials |\n| [Quick tour](https://ofasys-doc.readthedocs.io/en/latest/start/quickstart.html) | Usage in 15 minutes, including training and inference|\n| [How to define a task](https://ofasys-doc.readthedocs.io/en/latest/howto/add_task.html) | How to define a task using the instruction |\n| [Task summary](https://ofasys-doc.readthedocs.io/en/latest/task/text.html) | Tasks supported by OFASys |\n\n\n# Getting Involved\nFeel free to submit Github issues or pull requests. Welcome to contribute to our project!\n\nTo contact us, never hestitate to send an email to `jinze.bjz@alibaba-inc.com` or `menrui.mr@alibaba-inc.com`!\n\u003cbr\u003e\u003c/br\u003e\n\n# Citation\n\nPlease cite our [paper](https://arxiv.org/abs/2212.04408) if you find it helpful :)\n\n```\n@article{bai2022ofasys,\n  author    = {\n      Jinze Bai and \n      Rui Men and \n      Hao Yang and \n      Xuancheng Ren and \n      Kai Dang and \n      Yichang Zhang and \n      Xiaohuan Zhou and \n      Peng Wang and \n      Sinan Tan and \n      An Yang and \n      Zeyu Cui and \n      Yu Han and \n      Shuai Bai and \n      Wenbin Ge and \n      Jianxin Ma and \n      Junyang Lin and \n      Jingren Zhou and \n      Chang Zhou},\n  title     = {OFASys: A Multi-Modal Multi-Task Learning System for Building Generalist Models},\n  journal   = {CoRR},\n  volume    = {abs/2212.04408},\n  year      = {2022}\n}\n```\n\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fofa-sys%2Fofasys","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fofa-sys%2Fofasys","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fofa-sys%2Fofasys/lists"}