{"id":13566057,"url":"https://github.com/victordibia/data2vis","last_synced_at":"2025-07-23T18:34:01.764Z","repository":{"id":77574418,"uuid":"135771747","full_name":"victordibia/data2vis","owner":"victordibia","description":"Data2Vis: Automatic Generation of Data Visualizations Using Sequence to Sequence Recurrent Neural Networks","archived":false,"fork":false,"pushed_at":"2023-12-09T17:13:12.000Z","size":118282,"stargazers_count":147,"open_issues_count":8,"forks_count":32,"subscribers_count":5,"default_branch":"master","last_synced_at":"2025-04-05T12:42:11.401Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/victordibia.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-06-01T23:26:06.000Z","updated_at":"2025-03-23T23:56:44.000Z","dependencies_parsed_at":null,"dependency_job_id":"cfeba89c-a253-4299-8d4d-f6ea1522db2e","html_url":"https://github.com/victordibia/data2vis","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/victordibia/data2vis","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/victordibia%2Fdata2vis","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/victordibia%2Fdata2vis/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/victordibia%2Fdata2vis/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/victordibia%2Fdata2vis/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/victordibia","download_url":"https://codeload.github.com/victordibia/data2vis/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/victordibia%2Fdata2vis/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":266732508,"owners_count":23976043,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-07-23T02:00:09.312Z","response_time":66,"last_error":null,"robots_txt_status":null,"robots_txt_updated_at":null,"robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-01T13:02:01.127Z","updated_at":"2025-07-23T18:34:01.699Z","avatar_url":"https://github.com/victordibia.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"## Data2Vis: Automatic Generation of Data Visualizations Using Sequence-to-Sequence Recurrent Neural Networks.\n\n\u003e Experiments in generating visualizations using sequence to sequence models.\n\nThis repository contains source code used for experiments in the [Data2Vis](https://arxiv.org/abs/1804.03126) paper. Note that the code was tested with Python 3. Please python 3 as your test environment. The model also contains large files (vizmodel) saved with git-lfs, please install git-lfs and ensure you have the right filesize.\n\n\u003e Update: An update on this work has published based on large language models. Dibia, Victor. \"LIDA: A Tool for Automatic Generation of Grammar-Agnostic Visualizations and Infographics using Large Language Models.\" arXiv preprint arXiv:2303.02927 (2023). Paper and code available on GitHub https://github.com/microsoft/lida\n\nSlides: [Data2Vis Slides](https://docs.google.com/presentation/d/e/2PACX-1vSKaGElY3kNozGvIhINyIuwtsJ3AmBxhXtHQmRaQqasyGu5lw3YJxhCHSdRmq3UVAot_2c3F0NJC2Hg/pub?start=false\u0026loop=false\u0026delayms=10000)\n\nPaper: [Data2Vis](https://arxiv.org/abs/1804.03126) paper\n\nDemo: View [sample results here](http://hci.stanford.edu/~cagatay/data2vis/).\n\n\u003e The models in this repo were exported and tested using **Tensorflow version 1.10**. More recent verions might fail to load the saved models due to differences in ops specification. Please use **Tensorflow version 1.10** or earlier.  \n\n\u003cimg src=\"static/assets/blogheader.jpg\" width=\"100%\"\u003e\n\nRapidly creating effective visualizations using expressive grammars is challenging for users who have limited time and limited skills in statistics and data visualization. Even high-level, dedicated visualization tools often require users to manually select among data attributes, decide which transformations to apply, and specify mappings between visual encoding variables and raw or transformed attributes. \nIn this paper we introduce Data2Vis, a neural translation model for automatically generating visualizations from given datasets. We formulate visualization generation as a sequence to sequence translation problem where data specifications are mapped to visualization specifications in a declarative language (Vega-Lite). To this end, we train a multilayered attention-based recurrent neural network (RNN) with long short-term memory (LSTM) units on a corpus of visualization specifications. \nQualitative results show that our model learns the vocabulary and syntax for a valid visualization specification, appropriate transformations (count, bins, mean) and how to use common data selection patterns that occur within data visualizations. Data2Vis generates visualizations that are comparable to manually-created visualizations in a fraction of the time, with potential to learn more complex visualization strategies at scale.\n\n\n\n## Data Generation and Model\n\u003cimg src=\"static/assets/datatransform.jpg\" width=\"100%\"\u003e\n\n\u003e Data2Vis is based on the code [seq2seq](https://github.com/google/seq2seq) code repository. The reader is highly encouraged to read the [seq2seq code documentation](https://google.github.io/seq2seq/) to learn more about training and inference are implemented. This repo only adds our [training data](examples), [data preparation scripts](utils), and a script to [serve](webserver.py) seq2seq inference results over a web api used in the [web demo](static).\n\nSequence to sequence models are trained using source and target pairs. In this experiment the source pair is a line of `json data` and target pair is a valid Vega-lite visualization specification for the `json data`. The [examples](code/examples) folder contains 4300 Vega-lite examples from which 215k pairs are generated ([sourcedata](code/sourcedata)) folder)and subsequently used to train a seq2seq model. \n\nFor convenience we include a data generation [script](utils/data_gen.py) which is used to generate source, and target pairs from a folder containing Vega-lite visualization examples. Additional details on the content of the repo are given below.\n\n| Folder | Content |\n|----------|----------|\n| [examples](examples)      | Directory containing 4300 Vega-lite example visualization specifications      |\n| [examplesdata](examplesdata)     | Directory containing `json data` used in the visualization specifications above    | \n| [sourcedata](sourcedata)     | Directory containing `training data` (source, target pairs split into train/dev/test sets) used to train the seq2seq model. You can take a look at the [data_gen.py](utils/data_gen.py) script to see how the this training data is generated from the examples.| \n| [static](static)     | Directory containing web demo css and js files   | \n| [code/vizmodel](vizmodel)     | Directory containing the `trained model` generated in our training runs     | \n\n## Install Dependencies\n\nThe seq2seq code has a few dependencies that can be installed using the `requirement.txt` file\n\n```bash\nsudo pip3 install -r requirements.txt\n```\n\n## Training a Model\n\nFollowing directions in the seq2seq repository, you can initiate a training run by first specifying your model configuration yaml file (s) and then using the train script in the [bin](bin) folder.\n\nYou can find several example configurations in the [example_configs](example_configs) folder.\n\n```shell\npython3 -m bin.train --config_paths=\"example_configs/nmt_bi.yml,example_configs/train_seq2seq.yml,example_configs/text_metrics_bpe.yml\" \n```\n\nNote: we used no delimiters to indicate we are training a character model.\n\n```\n\n```\n\n## Inference\n\nTo run inference, use the infer script in the [bin](bin) folder. \n\n```bash\npython3 -m bin.infer \\\n  --tasks \"\n    - class: DecodeText\n      params:\n        delimiter: '' \" \\\n  --model_dir vizmodel \\\n  --model_params \"\n    inference.beam_search.beam_width: 2\" \\\n  --input_pipeline \"\n    class: ParallelTextInputPipeline\n    params:\n      source_delimiter: ''\n      target_delimiter: ''\n      source_files:\n        -  test.txt \" \n\n``` \nNote: The above prints out an array containing predictions (array size = beam width).\n`model_dir` is the directory containing trained model.\n`source_files` is the path to a file containing text (data) to be translated.\n`inference.beam_search.beam_width` sets the beam width used in beam search. \n\nAlso note that the input text in `test.txt` must be in the transformed input format (ie. string, numeric and date column names are replaced with a short form).\n\n```\n{\"num0\": \"0\", \"num1\": \"4\", \"num2\": \"80\", \"str0\": \"female\"}\n```\n\nThe  `forward_norm` method in the `utils/data_utils.py` file can be used to generate this normalized version of any data input. Alternatively, the user is encouraged to use the web demo interface.\n\n\n## Web Demo.\n\n\u003cimg src=\"static/assets/screen.jpg\" width=\"100%\"\u003e\n\nFor convenience, we provide a wrapper ([webserver.py](webserver.py)) that runs a web application with POST endpoints which return translations in JSON format.\n\nCode for the [web demo](http://hci.stanford.edu/~cagatay/data2vis/) can be run with the following command.\n\n```\npython3 webserver.py\n```\n\n\n\u003e Note that this demo uses a saved model from the vizmodel directory (187mb). The model is stored on Github using git-lfs. Please use git-lfs to clone the repository and ensure you have the entire 187mb saved model.\n\nAlso note that the parameters for the model are stored in the [vizmodel/train_options.json](vizmodel/train_options.json) folder. If you trained the model from scratch, add `the max_sequence_length` parameter and set it to a large value (e.g 2000) otherwise the model would generate a short sequence by default.\n\n## Citing this work\n\nThe Data2Vis paper can be cited as follows:\n\n```\n@article{DBLP:journals/corr/abs-1804-03126,\n  author    = {Victor Dibia and\n               {\\c{C}}agatay Demiralp},\n  title     = {Data2Vis: Automatic Generation of Data Visualizations Using Sequence\n               to Sequence Recurrent Neural Networks},\n  journal   = {CoRR},\n  volume    = {abs/1804.03126},\n  year      = {2018},\n  url       = {http://arxiv.org/abs/1804.03126},\n  archivePrefix = {arXiv},\n  eprint    = {1804.03126},\n  timestamp = {Tue, 01 May 2018 19:46:29 +0200},\n  biburl    = {https://dblp.org/rec/bib/journals/corr/abs-1804-03126},\n  bibsource = {dblp computer science bibliography, https://dblp.org}\n}\n```\n\n\n## Acknowledgement\nThis work was enabled by the contributions of many individuals. Thanks to the authors of the Vega-Lite,Voyager library and for sharing example data used for our experiments. Many thanks to the authors of the TensorFlow [seq2seq](https://github.com/google/seq2seq) model implementation and the TensorFlow library team — their work enabled us to learn about sequence models and rapidly prototype our experiments will little previous experience.\n\n \n \n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvictordibia%2Fdata2vis","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fvictordibia%2Fdata2vis","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvictordibia%2Fdata2vis/lists"}