Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/Texera/texera
Collaborative Machine-Learning-Centric Data Analytics Using Workflows
https://github.com/Texera/texera
data-analytics declarative-ui machine-learning nlp texera workflow
Last synced: 3 days ago
JSON representation
Collaborative Machine-Learning-Centric Data Analytics Using Workflows
- Host: GitHub
- URL: https://github.com/Texera/texera
- Owner: Texera
- License: apache-2.0
- Created: 2016-03-15T20:38:46.000Z (almost 9 years ago)
- Default Branch: master
- Last Pushed: 2024-10-29T19:13:30.000Z (2 months ago)
- Last Synced: 2024-10-29T21:27:47.649Z (2 months ago)
- Topics: data-analytics, declarative-ui, machine-learning, nlp, texera, workflow
- Language: Scala
- Homepage: https://texera.github.io
- Size: 75 MB
- Stars: 163
- Watchers: 29
- Forks: 72
- Open Issues: 65
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome_ai_agents - Texera - Collaborative Machine-Learning-Centric Data Analytics Using Workflows (Building / Workflows)
README
Texera - Collaborative Data Science and AI/ML Using Workflows
Texera supports scalable data computation and enables advanced AI/ML techniques.
"Collaboration" is a key focus, and we enable an experience similar to Google Docs, but for data science.
Official Site
|
Publications
|
Video
|
Blog
|
Getting Started
# Goals
* Provide data science as cloud services;
* Provide a browser-based GUI to form a workflow without writing code;
* Allow non-IT people to access data science;
* Support collaborative data science;
* Allow users to interact with the execution of a job;
* Support huge volumes of data efficiently.# Workflow GUI
The Texera interface supports real-time collaboration on data science projects, allowing seamless sharing of data and workflows with easy access to AI/ML techniques and efficient management of public and private resources.
The workflow in the use case shown below includes data cleaning, ML model training, and validation.
![texera-screenshot](https://github.com/user-attachments/assets/4384b8f5-3a9a-4bbc-a804-1dadd156ebb3)# Publications (Computer Science)
* (11/2024) **IcedTea: Efficient and Responsive Time-Travel Debugging in Dataflow Systems**
Shengquan Ni, Yicong Huang, Zuozhi Wang, and Chen Li
_To appear in VLDB 2025_
* (8/2024) **Pasta: A Cost-Based Optimizer for Generating Pipelining Schedules for Dataflow DAGs**
Xiaozhen Liu, Yicong Huang, Xinyuan Lin, Avinash Kumar, Sadeem Alsudais, and Chen Li
_To appear in SIGMOD 2025_
* (7/2024) **Texera: A System for Collaborative and Interactive Data Analytics Using Workflows**
Zuozhi Wang, Yicong Huang, Shengquan Ni, Avinash Kumar, Sadeem Alsudais, Xiaozhen Liu, Xinyuan Lin, Yunyan Ding, and Chen Li
_In VLDB 2024, Scalable Data Science track_ | [PDF](https://www.vldb.org/pvldb/vol17/p3580-wang.pdf) | [Slides](https://chenli.ics.uci.edu/files/vldb2024-texera-presentation.pdf)
* (3/2024) **Demonstration of Udon: Line-by-line Debugging of User-Defined Functions in Data Workflows**
Yicong Huang, Zuozhi Wang, and Chen Li
_In SIGMOD 2024 **Best Demo Runner-Up Awardπ**_ | [PDF](https://dl.acm.org/doi/10.1145/3626246.3654756)
* (2/2024) **Data Science Tasks Implemented with Scripts versus GUI-Based Workflows:** The Good, the Bad, and the Ugly
Alexander K Taylor, Yicong Huang, Junheng Hao, Xinyuan Lin, Xiusi Chen, Wei Wang, and Chen Li
_In DataPlat Workshop at ICDE 2024_ | [PDF](https://ieeexplore.ieee.org/abstract/document/10555112) | [Slides](https://chenli.ics.uci.edu/files/icde2024-dataplat-workshop.pdf)Expand All
* (8/2023) **Building a Collaborative Data Analytics System: Opportunities and Challenges**
Zuozhi Wang, Chen Li
_In Tutorial at VLDB 2023_ | [PDF](https://www.vldb.org/pvldb/vol16/p3898-wang.pdf) | [Slides](https://chenli.ics.uci.edu/files/vldb2023-texera-tutorial.pdf)
* (8/2023) **Udon: Efficient Debugging of User-Defined Functions in Big Data Systems with Line-by-Line Control**
Yicong Huang, Zuozhi Wang, and Chen Li
_In SIGMOD 2024_ | [PDF](https://dl.acm.org/doi/10.1145/3626712) | [Slides](https://chenli.ics.uci.edu/files/sigmod2024-udon-presentation.pdf)
* (8/2023) **Improving Iterative Analytics in GUI-Based Data-Processing Systems with Visualization, Version Control, and Result Reuse**
Sadeem Alsudais Ph.D. Thesis | [PDF](https://sadeemsaleh.github.io/Sadeem_phd_thesis.pdf)
* (7/2023) **Using Texera to Characterize Climate Change Discussions on Twitter During Wildfires**
Shengquan Ni, Yicong Huang, Jessie W. Y. Ko, Alexander Taylor, Xiusi Chen, Avinash Kumar, Sadeem Alsudais, Zuozhi Wang, Xiaozhen Liu, Wei Wang, Suellen Hopfer, and Chen Li
_In Data Science Day at KDD 2023_
* (7/2023) **Raven: Accelerating Execution of Iterative Data Analytics by Reusing Results of Previous Equivalent Versions**
Sadeem Alsudais, Avinash Kumar, and Chen Li
_In HILDA Workshop at SIGMOD 2023_ | [PDF](https://dl.acm.org/doi/10.1145/3597465.3605219)
* (6/2023) **Texera: A System for Collaborative and Interactive Data Analytics Using Workflows**
Zuozhi Wang Ph.D. Thesis | [PDF](https://zuozhiw.github.io/Zuozhi_Wang_UCI_PhD_Thesis.pdf)
* (12/2022) **Towards Interactive, Adaptive and Result-aware Big Data Analytics**
Avinash Kumar Ph.D. Thesis | [PDF](https://arxiv.org/abs/2212.07096)
* (9/2022) **Fries: Fast and Consistent Runtime Reconfiguration in Dataflow Systems with Transactional Guarantees**
Zuozhi Wang, Shengquan Ni, Avinash Kumar, and Chen Li
_In VLDB 2023_ | [PDF](https://www.vldb.org/pvldb/vol16/p256-wang.pdf) | [Slides](https://chenli.ics.uci.edu/files/vldb2023-fries.pdf)
* (7/2022) **Drove: Tracking Execution Results of Workflows on Large Datasets**
Sadeem Alsudais
_In the Ph.D. Workshop at VLDB 2022_ | [PDF](http://ceur-ws.org/Vol-3186/paper_10.pdf)
* (6/2022) **Demonstration of Accelerating Machine Learning Inference Queries with Correlative Proxy Models**
Zhihui Yang, Yicong Huang, Zuozhi Wang, Feng Gao, Yao Lu, Chen Li, and X. Sean Wang
_In VLDB 2022_ | [PDF](https://www.vldb.org/pvldb/vol15/p3734-yang.pdf)
* (6/2022) **Demonstration of Collaborative and Interactive Workflow-Based Data Analytics in Texera**
Xiaozhen Liu, Zuozhi Wang, Shengquan Ni, Sadeem Alsudais, Yicong Huang, Avinash Kumar, and Chen Li
_In VLDB 2022_ | [PDF](https://www.vldb.org/pvldb/vol15/p3738-liu.pdf) | [Demo Video](https://youtu.be/2gfPUZNsoBs)
* (4/2022) **Optimizing Machine Learning Inference Queries with Correlative Proxy Models**
Zhihui Yang, Zuozhi Wang, Yicong Huang, Yao Lu, Chen Li, and X. Sean Wang
_In VLDB 2022_ | [PDF](https://www.vldb.org/pvldb/vol15/p2032-yang.pdf)
* (7/2020) **Demonstration of Interactive Runtime Debugging of Distributed Dataflows in Texera**
Zuozhi Wang, Avinash Kumar, Shengquan Ni, and Chen Li
_In VLDB 2020_ | [PDF](http://www.vldb.org/pvldb/vol13/p2953-wang.pdf) | [Video](https://www.youtube.com/watch?v=SP-XiDADbw0) | [Slides](https://docs.google.com/presentation/d/14U6RPZfeb8Ho0aO2HsCSc8lRs6ul6AxEIm5gpjeVUYA/edit?usp=sharing)
* (1/2020) **Amber: A Debuggable Dataflow system based on the Actor Model**
Avinash Kumar, Zuozhi Wang, Shengquan Ni, and Chen Li
_In VLDB 2020_ | [PDF](http://www.vldb.org/pvldb/vol13/p740-kumar.pdf) | [Video](https://www.youtube.com/watch?v=T5ShFRfHmgI) | [Slides](https://docs.google.com/presentation/d/1v8G9lDmfv4Ff2YWyrGfo_9iMQVF4N8a-4gO4H-K6rCk/edit?usp=sharing)
* (4/2017) **A Demonstration of TextDB: Declarative and Scalable Text Analytics on Large Data Sets**
Zuozhi Wang, Flavio Bayer, Seungjin Lee, Kishore Narendran, Xuxi Pan, Qing Tang, Jimmy Wang, and Chen Li
_In ICDE 2017_ **Best Demo award** | [PDF](https://chenli.ics.uci.edu/files/icde2017-textdb-demo.pdf) | [Video](https://github.com/Texera/texera/wiki/Video)# Publications (Interdisciplinary):
* (2/2025) **DS4ALL: Teaching High-School Students Data Science and AI/ML Using the Texera Workflow Platform as a Service**
Jiadong Bai, Xiaozhen Liu, Anthony Cuturrufo, Alexander Kundu Taylor, Jeehyun Hwang, Mingyu Derek Ma, Xinyuan Lin, Yanqiao Zhu, Yicong Huang, Yunyan Ding, Wei Wang, and Chen Li
_To appear in [Data Science Education K-12: Research to Practice Annual Conference 2025](https://web.cvent.com/event/d641bd9f-6c99-4cbc-951b-33b1ca05d4ed/summary)_
* (7/2024) **Brain Image Data Processing Using Collaborative Data Workflows on Texera**
Yunyan Ding, Yicong Huang, Pan Gao, Andy Thai, Atchuth Naveen Chilaparasetti, M. Gopi, Xiangmin Xu, and Chen Li
_In Frontiers Neural Circuits_ | [PDF](https://doi.org/10.3389/fncir.2024.1398884)
* (1/2024) **Wording Matters: The Effect of Linguistic Characteristics and Political Ideology on Resharing of COVID-19 Vaccine Tweets**
Judith Borghouts, Yicong Huang, Suellen Hopfer, Chen Li, and Gloria Mark
_In TOCHI 2024_ | [PDF](https://dl.acm.org/doi/pdf/10.1145/3637876)
* (1/2024) **How the Experience of California Wildfires Shape Twitter Climate Change Framings**
Jessie W. Y. Ko, Shengquan Ni, Alexander Taylor, Xiusi Chen, Yicong Huang, Avinash Kumar, Sadeem Alsudais, Zuozhi Wang, Xiaozhen Liu, Wei Wang, Chen Li, and Suellen Hopfer
_In Climatic Change 2024_ | [PDF](https://link.springer.com/content/pdf/10.1007/s10584-023-03668-0.pdf)
* (11/2023) **The Marketing and Perceptions of Non-Tobacco Blunt Wraps on Twitter**
Joshua U. Rhee, Yicong Huang, Aurash J. Soroosh, Sadeem Alsudais, Shengquan Ni, Avinash Kumar, Jacob Paredes, Chen Li, and David S. Timberlake
_In Substance Use & Misuse 2023_ | [PDF](https://www.tandfonline.com/doi/epdf/10.1080/10826084.2023.2280572?needAccess=true)Expand All
* (3/2023) **Understanding Underlying Moral Values and Language Use of COVID-19 Vaccine Attitudes on Twitter**
Judith Borghouts, Yicong Huang, Sydney Gibbs, Suellen Hopfer, Chen Li, and Gloria Mark
_In PNAS Nexus 2023_ | [PDF](https://academic.oup.com/pnasnexus/article-pdf/2/3/pgad013/49435858/pgad013.pdf)
* (10/2022) **Public Opinions Toward COVID-19 Vaccine Mandates: A Machine Learning-Based Analysis of U.S. Tweets**
Yawen Guo, Jun Zhu, Yicong Huang, Lu He, Changyang He, Chen Li, and Kai Zheng
_In AMIA 2022_ | [PDF](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10148373/pdf/1066.pdf)
* (9/2021) **The Social Amplification and Attenuation of COVID-19 Risk Perception Shaping Mask-Wearing Behavior: A Longitudinal Twitter Analysis**
Suellen Hopfer, Emilia J. Fields, Yuwen Lu, Ganesh Ramakrishnan, Ted Grover, Quishi Bai, Yicong Huang, Chen Li, and Gloria Mark
_In PLOS ONE 2021_ | [PDF](https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0257428)
* (4/2021) **Why Do People Oppose Mask Wearing? A Comprehensive Analysis of U.S. Tweets During the COVID-19 Pandemic**
Lu He, Changyang He, Tera Leigh Reynolds, Qiushi Bai, Yicong Huang, Chen Li, Kai Zheng, and Yunan Chen
_In JAMIA 2021_ | [PDF](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7989302/pdf/ocab047.pdf)# Getting Started
* For users, visit [Guide to Use Texera](https://github.com/Texera/texera/wiki/Getting-Started).
* For developers, visit [Guide to Develop Texera](https://github.com/Texera/texera/wiki/Guide-for-Developers).Texera was formally known as "TextDB" before August 28, 2017.
# Acknowledgements
This project is supported by the National Science Foundation under the awards [IIS-1745673](https://www.nsf.gov/awardsearch/showAward?AWD_ID=1745673), [IIS-2107150](https://www.nsf.gov/awardsearch/showAward?AWD_ID=2107150), AWS Research Credits, and Google Cloud Platform Education Programs.
* This project is supported by an NIH NIDDK award.
* [Yourkit](https://www.yourkit.com/) has given an open source license to use their profiler in this project.
# Citation
Please cite Texera as
```@article{DBLP:journals/pvldb/WangHNKALLDL24,
author = {Zuozhi Wang and
Yicong Huang and
Shengquan Ni and
Avinash Kumar and
Sadeem Alsudais and
Xiaozhen Liu and
Xinyuan Lin and
Yunyan Ding and
Chen Li},
title = {Texera: {A} System for Collaborative and Interactive Data Analytics
Using Workflows},
journal = {Proc. {VLDB} Endow.},
volume = {17},
number = {11},
pages = {3580--3588},
year = {2024},
url = {https://www.vldb.org/pvldb/vol17/p3580-wang.pdf},
timestamp = {Thu, 19 Sep 2024 13:09:37 +0200},
biburl = {https://dblp.org/rec/journals/pvldb/WangHNKALLDL24.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
```