{"id":23327785,"url":"https://github.com/apple/ml-interactive-data-augmentation","last_synced_at":"2025-08-22T21:32:13.639Z","repository":{"id":268311289,"uuid":"902013173","full_name":"apple/ml-interactive-data-augmentation","owner":"apple","description":"Interactive Data Augmentation (CHI 2025)","archived":false,"fork":false,"pushed_at":"2025-03-20T19:46:13.000Z","size":76555,"stargazers_count":22,"open_issues_count":0,"forks_count":2,"subscribers_count":9,"default_branch":"main","last_synced_at":"2025-08-22T02:57:02.650Z","etag":null,"topics":["data-visualization","large-language-models","machine-learning","synthetic-data-generation"],"latest_commit_sha":null,"homepage":"","language":"Svelte","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/apple.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2024-12-11T18:28:46.000Z","updated_at":"2025-07-02T19:19:59.000Z","dependencies_parsed_at":"2025-04-11T15:29:16.540Z","dependency_job_id":null,"html_url":"https://github.com/apple/ml-interactive-data-augmentation","commit_stats":null,"previous_names":["apple/ml-interactive-data-augmentation"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/apple/ml-interactive-data-augmentation","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/apple%2Fml-interactive-data-augmentation","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/apple%2Fml-interactive-data-augmentation/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/apple%2Fml-interactive-data-augmentation/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/apple%2Fml-interactive-data-augmentation/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/apple","download_url":"https://codeload.github.com/apple/ml-interactive-data-augmentation/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/apple%2Fml-interactive-data-augmentation/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":271703775,"owners_count":24806527,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-22T02:00:08.480Z","response_time":65,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-visualization","large-language-models","machine-learning","synthetic-data-generation"],"created_at":"2024-12-20T20:23:42.755Z","updated_at":"2025-08-22T21:32:08.598Z","avatar_url":"https://github.com/apple.png","language":"Svelte","readme":"# Interactive Data Augmentation\n\nAmplio is an interactive research tool for data augmentation. The system visualizes the embeddings of input sentences and helps users systematically explore and fill in \"empty data spaces,\" i.e., parts of the desired dataset distribution with few or no data points. To do this, Amplio includes a suite of three human-in-the-loop methods for augmenting unstructured text datasets: *Augment with LLM*, *Augment by Interpolation*, and *Augment with Concepts*. \n\n![Overview of Amplio](img/teaser.png)\n\nThis code accompanies the research paper:\n\n**Exploring Empty Spaces: Human-in-the-Loop Data Augmentation**  \nCatherine Yeh, Donghao Ren, Yannick Assogba, Dominik Moritz, Fred Hohman  \n*arXiv, 2024.*  \nPaper: https://arxiv.org/abs/2410.01088\n\n## Demo and Development Setup\n\nThe system setup requires running two main components: (1) the backend server and (2) the frontend interface. The backend and frontend run on separate servers. \n\nFirst, create a secrets file and install the pipenv environment.\n\n### Secrets File\n\nTo add your API key, duplicate `secrets_example.json` and rename it to `secrets.json`. Then copy and paste your own API keys in `secrets.json`.\n\n### Install Requirements\n\nInstall pipenv:\n\n```\npip install pipenv\n```\n\nBuild requirements for project:\n\n```\npipenv install\n```\n\nStart virtual env:\n```\npipenv shell\n```\n\n### Backend\n\nNavigate into [backend](backend) folder:\n\n```\ncd backend\n```\n\nStart the backend server:\n\n```\npython server.py\n```\n\nThe server should now be running at [`127.0.0.1:5000`](http://127.0.0.1:5000).\n\n### Frontend\n\nAfter the backend server is running, in a separate terminal window, navigate into [frontend](frontend) folder:\n\n```\ncd frontend\n```\n\nInstall dependencies:\n\n```\nnpm install\n```\n\nStart frontend development server:\n\n```\nnpm run dev\n```\n\nThe interface should now be live at [`localhost:5173`](http://localhost:5173).\n\n## Data and Models\n\nAll data needed to run the system is available in the [data](data) folder. This data was generated using **Python 3.11**.\n\nSimilarly, all models needed to run the system are available in the [models](models) folder.\n\n**Note:** you may run into issues if your Python version != 3.11. In this case, please run the [data/generate_data.ipynb](data/generate_data.ipynb) notebook to regenerate the data and model files needed to run the demo. You can also use this notebook to add new datasets.\n\n### Adding a New Dataset\n\nIf you add a new dataset you will need to update these files:\n\n* [backend/server.py](backend/server.py)\n* [frontend/src/routes/components/LeftSidebar.svelte](frontend/src/routes/components/LeftSidebar.svelte)\n\nLook for the sections marked with `UPDATE HERE IF YOU ADD A NEW DATASET`.\n\nSimilarly, if you want to remove a dataset from the system, you will need to edit the files above.\n\n## Contributing\n\nWhen making contributions, refer to the [`CONTRIBUTING`](CONTRIBUTING.md) guidelines and read the [`CODE OF CONDUCT`](CODE_OF_CONDUCT.md).\n\n## BibTeX\n\nTo cite our paper, please use:\n\n```bibtex\n@article{yeh2024exploring,\n    title={{Exploring Empty Spaces: Human-in-the-Loop Data Augmentation}},\n    author={Yeh, Catherine and Ren, Donghao and Assogba, Yannick and Moritz, Dominik and Hohman, Fred},\n    journal={arXiv preprint arXiv:2410.01088},\n    year={2024},\n    doi={10.48550/arXiv.2410.01088}\n}\n```\n\n## License\n\nThis code is released under the [`LICENSE`](LICENSE) terms.\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fapple%2Fml-interactive-data-augmentation","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fapple%2Fml-interactive-data-augmentation","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fapple%2Fml-interactive-data-augmentation/lists"}