{"id":30069186,"url":"https://github.com/apache/texera","last_synced_at":"2026-03-03T04:17:42.432Z","repository":{"id":37285069,"uuid":"53976910","full_name":"apache/texera","owner":"apache","description":"Collaborative Machine-Learning-Centric Data Analytics Using Workflows","archived":false,"fork":false,"pushed_at":"2025-08-06T23:29:42.000Z","size":99060,"stargazers_count":188,"open_issues_count":136,"forks_count":93,"subscribers_count":28,"default_branch":"master","last_synced_at":"2025-08-06T23:32:30.254Z","etag":null,"topics":["artificial-intelligence","data","data-analytics","data-science","machine-learning","texera","workflow"],"latest_commit_sha":null,"homepage":"https://texera.io","language":"Scala","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/apache.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2016-03-15T20:38:46.000Z","updated_at":"2025-08-06T23:13:46.000Z","dependencies_parsed_at":"2024-05-03T13:33:39.397Z","dependency_job_id":"5b805657-b05c-4e22-85c5-1fcab1009821","html_url":"https://github.com/apache/texera","commit_stats":null,"previous_names":["apache/texera"],"tags_count":3,"template":false,"template_full_name":null,"purl":"pkg:github/apache/texera","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/apache%2Ftexera","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/apache%2Ftexera/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/apache%2Ftexera/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/apache%2Ftexera/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/apache","download_url":"https://codeload.github.com/apache/texera/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/apache%2Ftexera/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":269410216,"owners_count":24412155,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-08T02:00:09.200Z","response_time":72,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["artificial-intelligence","data","data-analytics","data-science","machine-learning","texera","workflow"],"created_at":"2025-08-08T11:01:57.519Z","updated_at":"2025-12-15T20:14:02.985Z","avatar_url":"https://github.com/apache.png","language":"Scala","readme":"\u003ch1 align=\"center\"\u003eTexera - Collaborative Data Science and AI/ML Using Workflows\u003c/h1\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"https://texera.io\"\u003e \u003cimg src=\"core/gui/src/assets/logos/full_logo_small.png\" alt=\"texera-logo\" height=\"150px\"/\u003e \u003c/a\u003e\n  \u003cbr\u003e\n  \u003ci\u003eTexera supports scalable data computation and enables advanced AI/ML techniques.\u003c/i\u003e\n  \u003cbr\u003e\n  \u003ci\u003e\"Collaboration\" is a key focus, and we enable an experience similar to Google Docs, but for data science. \u003c/i\u003e\n  \u003cbr\u003e\n  \n  \u003ch4 align=\"center\"\u003e\n    \u003ca href=\"https://texera.io\"\u003eOfficial Site\u003c/a\u003e\n    |\n    \u003ca href=\"https://texera.io/publications/\"\u003ePublications\u003c/a\u003e\n    |\n    \u003ca href=\"https://texera.io/category/video/\"\u003eVideo\u003c/a\u003e\n    | \n    \u003ca href=\"https://texera.io/category/blog/\"\u003eBlog\u003c/a\u003e\n    |\n    \u003ca href=\"https://github.com/Texera/texera/wiki/Getting-Started\"\u003eGetting Started\u003c/a\u003e\n    \u003cbr\u003e\n  \u003c/h4\u003e\n  \n\u003c/p\u003e\n\u003c/p\u003e\n\u003cp align=\"center\"\u003e\n  \u003cimg alt=\"Static Badge\" src=\"https://img.shields.io/badge/Users-332-blue\"\u003e\n  \u003cimg alt=\"Static Badge\" src=\"https://img.shields.io/badge/Projects-86-blue\"\u003e\n  \u003cimg alt=\"Static Badge\" src=\"https://img.shields.io/badge/Workflows-2,481-blue\"\u003e\n  \u003cimg alt=\"Static Badge\" src=\"https://img.shields.io/badge/Executions-51K-blue\"\u003e\n  \u003cimg alt=\"Static Badge\" src=\"https://img.shields.io/badge/Workflow_Versions-357K-blue\"\u003e\n  \u003cimg alt=\"Static Badge\" src=\"https://img.shields.io/badge/Deployments-7-blue\"\u003e\n  \u003cimg alt=\"Static Badge\" src=\"https://img.shields.io/badge/Largest_Deployment-100_nodes,_400_cores-green\"\u003e\n\u003c/p\u003e\n\n# Goals\n\n* Provide data science as cloud services;\n* Provide a browser-based GUI to form a workflow without writing code;\n* Allow non-IT people to access data science;\n* Support collaborative data science;\n* Allow users to interact with the execution of a job;\n* Support huge volumes of data efficiently.\n\n# Workflow GUI\nThe Texera interface supports real-time collaboration on data science projects, allowing seamless sharing of data and workflows with easy access to AI/ML techniques and efficient management of public and private resources. \nThe workflow in the use case shown below includes data cleaning, ML model training, and validation.\n![texera-screenshot](https://github.com/user-attachments/assets/4384b8f5-3a9a-4bbc-a804-1dadd156ebb3)\n\n# Publications (Computer Science)\n*  (5/2025) **Responsive Retrieval of Consistent States in Pipelined Executions of Dataflows**  \n   Shengquan Ni, and Chen Li  \n   _To appear in HILDA Workshop at SIGMOD 2025_\n*  (11/2024) **IcedTea: Efficient and Responsive Time-Travel Debugging in Dataflow Systems**  \n   Shengquan Ni, Yicong Huang, Zuozhi Wang, and Chen Li\n   _To appear in VLDB 2025_\n*  (8/2024) **Pasta: A Cost-Based Optimizer for Generating Pipelining Schedules for Dataflow DAGs**  \n   Xiaozhen Liu, Yicong Huang, Xinyuan Lin, Avinash Kumar, Sadeem Alsudais, and Chen Li  \n   _To appear in SIGMOD 2025_\n*  (7/2024) **Texera: A System for Collaborative and Interactive Data Analytics Using Workflows**  \n   Zuozhi Wang, Yicong Huang, Shengquan Ni, Avinash Kumar, Sadeem Alsudais, Xiaozhen Liu, Xinyuan Lin, Yunyan Ding, and Chen Li  \n   _In VLDB 2024, Scalable Data Science track_ | [PDF](https://www.vldb.org/pvldb/vol17/p3580-wang.pdf) | [Slides](https://chenli.ics.uci.edu/files/vldb2024-texera-presentation.pdf)\n*  (3/2024) **Demonstration of Udon: Line-by-line Debugging of User-Defined Functions in Data Workflows**  \n   Yicong Huang, Zuozhi Wang, and Chen Li  \n   _In SIGMOD 2024 **Best Demo Runner-Up Award🏆**_ | [PDF](https://dl.acm.org/doi/10.1145/3626246.3654756)\n*  (2/2024) **Data Science Tasks Implemented with Scripts versus GUI-Based Workflows:** The Good, the Bad, and the Ugly  \n   Alexander K Taylor, Yicong Huang, Junheng Hao, Xinyuan Lin, Xiusi Chen, Wei Wang, and Chen Li  \n   _In DataPlat Workshop at ICDE 2024_ | [PDF](https://ieeexplore.ieee.org/abstract/document/10555112) | [Slides](https://chenli.ics.uci.edu/files/icde2024-dataplat-workshop.pdf)\n\u003cdetails\u003e\n\u003csummary\u003eExpand All\u003c/summary\u003e\n  \n* (8/2023) **Building a Collaborative Data Analytics System: Opportunities and Challenges**\n   Zuozhi Wang, Chen Li  \n   _In Tutorial at VLDB 2023_ | [PDF](https://www.vldb.org/pvldb/vol16/p3898-wang.pdf) | [Slides](https://chenli.ics.uci.edu/files/vldb2023-texera-tutorial.pdf)\n* (8/2023) **Udon: Efficient Debugging of User-Defined Functions in Big Data Systems with Line-by-Line Control**\n   Yicong Huang, Zuozhi Wang, and Chen Li  \n   _In SIGMOD 2024_ | [PDF](https://dl.acm.org/doi/10.1145/3626712) | [Slides](https://chenli.ics.uci.edu/files/sigmod2024-udon-presentation.pdf)\n* (8/2023) **Improving Iterative Analytics in GUI-Based Data-Processing Systems with Visualization, Version Control, and Result Reuse**  \n   Sadeem Alsudais Ph.D. Thesis | [PDF](https://sadeemsaleh.github.io/Sadeem_phd_thesis.pdf)\n* (7/2023) **Using Texera to Characterize Climate Change Discussions on Twitter During Wildfires**  \n   Shengquan Ni, Yicong Huang, Jessie W. Y. Ko, Alexander Taylor, Xiusi Chen, Avinash Kumar, Sadeem Alsudais, Zuozhi Wang, Xiaozhen Liu, Wei Wang, Suellen Hopfer, and Chen Li  \n   _In Data Science Day at KDD 2023_\n* (7/2023) **Raven: Accelerating Execution of Iterative Data Analytics by Reusing Results of Previous Equivalent Versions**  \n   Sadeem Alsudais, Avinash Kumar, and Chen Li  \n   _In HILDA Workshop at SIGMOD 2023_ | [PDF](https://dl.acm.org/doi/10.1145/3597465.3605219)\n* (6/2023) **Texera: A System for Collaborative and Interactive Data Analytics Using Workflows**  \n   Zuozhi Wang Ph.D. Thesis | [PDF](https://zuozhiw.github.io/Zuozhi_Wang_UCI_PhD_Thesis.pdf)\n* (12/2022) **Towards Interactive, Adaptive and Result-aware Big Data Analytics**  \n   Avinash Kumar Ph.D. Thesis | [PDF](https://arxiv.org/abs/2212.07096)\n* (9/2022) **Fries: Fast and Consistent Runtime Reconfiguration in Dataflow Systems with Transactional Guarantees**  \n   Zuozhi Wang, Shengquan Ni, Avinash Kumar, and Chen Li  \n   _In VLDB 2023_ | [PDF](https://www.vldb.org/pvldb/vol16/p256-wang.pdf) | [Slides](https://chenli.ics.uci.edu/files/vldb2023-fries.pdf)\n* (7/2022) **Drove: Tracking Execution Results of Workflows on Large Datasets**  \n   Sadeem Alsudais  \n   _In the Ph.D. Workshop at VLDB 2022_ | [PDF](http://ceur-ws.org/Vol-3186/paper_10.pdf)\n* (6/2022) **Demonstration of Accelerating Machine Learning Inference Queries with Correlative Proxy Models**  \n   Zhihui Yang, Yicong Huang, Zuozhi Wang, Feng Gao, Yao Lu, Chen Li, and X. Sean Wang  \n   _In VLDB 2022_ | [PDF](https://www.vldb.org/pvldb/vol15/p3734-yang.pdf)\n* (6/2022) **Demonstration of Collaborative and Interactive Workflow-Based Data Analytics in Texera**  \n   Xiaozhen Liu, Zuozhi Wang, Shengquan Ni, Sadeem Alsudais, Yicong Huang, Avinash Kumar, and Chen Li  \n  _In VLDB 2022_ | [PDF](https://www.vldb.org/pvldb/vol15/p3738-liu.pdf) | [Demo Video](https://youtu.be/2gfPUZNsoBs)\n* (4/2022) **Optimizing Machine Learning Inference Queries with Correlative Proxy Models**  \n   Zhihui Yang, Zuozhi Wang, Yicong Huang, Yao Lu, Chen Li, and X. Sean Wang  \n   _In VLDB 2022_ | [PDF](https://www.vldb.org/pvldb/vol15/p2032-yang.pdf)\n* (7/2020) **Demonstration of Interactive Runtime Debugging of Distributed Dataflows in Texera**  \n   Zuozhi Wang, Avinash Kumar, Shengquan Ni, and Chen Li  \n   _In VLDB 2020_ | [PDF](http://www.vldb.org/pvldb/vol13/p2953-wang.pdf) | [Video](https://www.youtube.com/watch?v=SP-XiDADbw0) | [Slides](https://docs.google.com/presentation/d/14U6RPZfeb8Ho0aO2HsCSc8lRs6ul6AxEIm5gpjeVUYA/edit?usp=sharing)\n* (1/2020) **Amber: A Debuggable Dataflow system based on the Actor Model**  \n   Avinash Kumar, Zuozhi Wang, Shengquan Ni, and Chen Li  \n   _In VLDB 2020_ | [PDF](http://www.vldb.org/pvldb/vol13/p740-kumar.pdf) | [Video](https://www.youtube.com/watch?v=T5ShFRfHmgI) | [Slides](https://docs.google.com/presentation/d/1v8G9lDmfv4Ff2YWyrGfo_9iMQVF4N8a-4gO4H-K6rCk/edit?usp=sharing)\n* (4/2017) **A Demonstration of TextDB: Declarative and Scalable Text Analytics on Large Data Sets**  \n   Zuozhi Wang, Flavio Bayer, Seungjin Lee, Kishore Narendran, Xuxi Pan, Qing Tang, Jimmy Wang, and Chen Li  \n   _In ICDE 2017_ **Best Demo award** | [PDF](https://chenli.ics.uci.edu/files/icde2017-textdb-demo.pdf) | [Video](https://github.com/Texera/texera/wiki/Video)\n\n\u003c/details\u003e\n\n# Publications (Interdisciplinary):\n* (2/2025) **DS4ALL: Teaching High-School Students Data Science and AI/ML Using the Texera Workflow Platform as a Service**  \n  Jiadong Bai, Xiaozhen Liu, Anthony Cuturrufo, Alexander Kundu Taylor, Jeehyun Hwang, Mingyu Derek Ma, Xinyuan Lin, Yanqiao Zhu, Yicong Huang, Yunyan Ding, Wei Wang, and Chen Li  \n  _To appear in [Data Science Education K-12: Research to Practice Annual Conference 2025](https://web.cvent.com/event/d641bd9f-6c99-4cbc-951b-33b1ca05d4ed/summary)_\n* (7/2024) **Brain Image Data Processing Using Collaborative Data Workflows on Texera**  \n  Yunyan Ding, Yicong Huang, Pan Gao, Andy Thai, Atchuth Naveen Chilaparasetti, M. Gopi, Xiangmin Xu, and Chen Li  \n  _In Frontiers Neural Circuits_ | [PDF](https://doi.org/10.3389/fncir.2024.1398884)\n* (1/2024) **Wording Matters: The Effect of Linguistic Characteristics and Political Ideology on Resharing of COVID-19 Vaccine Tweets**  \n  Judith Borghouts, Yicong Huang, Suellen Hopfer, Chen Li, and Gloria Mark  \n   _In TOCHI 2024_ | [PDF](https://dl.acm.org/doi/pdf/10.1145/3637876)\n* (1/2024) **How the Experience of California Wildfires Shape Twitter Climate Change Framings**  \n  Jessie W. Y. Ko, Shengquan Ni, Alexander Taylor, Xiusi Chen, Yicong Huang, Avinash Kumar, Sadeem Alsudais, Zuozhi Wang, Xiaozhen Liu, Wei Wang, Chen Li, and Suellen Hopfer\n  _In Climatic Change 2024_ | [PDF](https://link.springer.com/content/pdf/10.1007/s10584-023-03668-0.pdf)\n* (11/2023) **The Marketing and Perceptions of Non-Tobacco Blunt Wraps on Twitter**  \n  Joshua U. Rhee, Yicong Huang, Aurash J. Soroosh, Sadeem Alsudais, Shengquan Ni, Avinash Kumar, Jacob Paredes, Chen Li, and David S. Timberlake\n  _In Substance Use \u0026 Misuse 2023_ | [PDF](https://www.tandfonline.com/doi/epdf/10.1080/10826084.2023.2280572?needAccess=true)\n\n\u003cdetails\u003e\n\u003csummary\u003eExpand All\u003c/summary\u003e\n\n* (3/2023) **Understanding Underlying Moral Values and Language Use of COVID-19 Vaccine Attitudes on Twitter**  \n  Judith Borghouts, Yicong Huang, Sydney Gibbs, Suellen Hopfer, Chen Li, and Gloria Mark\n  _In PNAS Nexus 2023_ | [PDF](https://academic.oup.com/pnasnexus/article-pdf/2/3/pgad013/49435858/pgad013.pdf)\n* (10/2022) **Public Opinions Toward COVID-19 Vaccine Mandates: A Machine Learning-Based Analysis of U.S. Tweets**  \n  Yawen Guo, Jun Zhu, Yicong Huang, Lu He, Changyang He, Chen Li, and Kai Zheng\n  _In AMIA 2022_ | [PDF](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10148373/pdf/1066.pdf)\n* (9/2021) **The Social Amplification and Attenuation of COVID-19 Risk Perception Shaping Mask-Wearing Behavior: A Longitudinal Twitter Analysis**  \n  Suellen Hopfer, Emilia J. Fields, Yuwen Lu, Ganesh Ramakrishnan, Ted Grover, Quishi Bai, Yicong Huang, Chen Li, and Gloria Mark\n  _In PLOS ONE 2021_ | [PDF](https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0257428)\n* (4/2021) **Why Do People Oppose Mask Wearing? A Comprehensive Analysis of U.S. Tweets During the COVID-19 Pandemic**  \n  Lu He, Changyang He, Tera Leigh Reynolds, Qiushi Bai, Yicong Huang, Chen Li, Kai Zheng, and Yunan Chen  \n  _In JAMIA 2021_ | [PDF](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7989302/pdf/ocab047.pdf)\n\u003c/details\u003e\n\n# Getting Started\n\n* For users, visit [Guide to Use Texera](https://github.com/Texera/texera/wiki/Getting-Started).\n* For developers, visit [Guide to Develop Texera](https://github.com/Texera/texera/wiki/Guide-for-Developers).\n\nTexera was formally known as \"TextDB\" before August 28, 2017.\n\n# Acknowledgements\n\nThis project is supported by the \u003ca href=\"http://www.nsf.gov\"\u003eNational Science Foundation\u003c/a\u003e under the awards [IIS-1745673](https://www.nsf.gov/awardsearch/showAward?AWD_ID=1745673), [IIS-2107150](https://www.nsf.gov/awardsearch/showAward?AWD_ID=2107150), AWS Research Credits, and Google Cloud Platform Education Programs.\n\n* \u003ca href=\"https://www.niddk.nih.gov/\"\u003e\u003cimg src=\"https://github.com/Texera/texera/assets/17627829/d279897a-3efb-41c1-b2d3-8fd20c800ad7\" alt=\"NIH NIDDK\" height=\"30\"/\u003e\u003c/a\u003e This project is supported by an \u003ca href=\"https://reporter.nih.gov/project-details/10818244\"\u003eNIH NIDDK\u003c/a\u003e award.\n\n\n* \u003ca href=\"http://www.yourkit.com\"\u003e\u003cimg src=\"https://www.yourkit.com/images/yklogo.png\" alt=\"Yourkit\" height=\"30\"/\u003e\u003c/a\u003e  [Yourkit](https://www.yourkit.com/) has given an open source license to use their profiler in this project.\n\n# Citation\nPlease cite Texera as \n```\n\n@article{DBLP:journals/pvldb/WangHNKALLDL24,\n  author       = {Zuozhi Wang and\n                  Yicong Huang and\n                  Shengquan Ni and\n                  Avinash Kumar and\n                  Sadeem Alsudais and\n                  Xiaozhen Liu and\n                  Xinyuan Lin and\n                  Yunyan Ding and\n                  Chen Li},\n  title        = {Texera: {A} System for Collaborative and Interactive Data Analytics\n                  Using Workflows},\n  journal      = {Proc. {VLDB} Endow.},\n  volume       = {17},\n  number       = {11},\n  pages        = {3580--3588},\n  year         = {2024},\n  url          = {https://www.vldb.org/pvldb/vol17/p3580-wang.pdf},\n  timestamp    = {Thu, 19 Sep 2024 13:09:37 +0200},\n  biburl       = {https://dblp.org/rec/journals/pvldb/WangHNKALLDL24.bib},\n  bibsource    = {dblp computer science bibliography, https://dblp.org}\n}\n```\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fapache%2Ftexera","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fapache%2Ftexera","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fapache%2Ftexera/lists"}