https://github.com/thu-keg/vta
Code, APIs and data for the CIKM23 paper "LittleMu: Deploying an Online Virtual Teaching Assistant via Heterogeneous Sources Integration and Chain-of-Teach Prompts"
https://github.com/thu-keg/vta
dialogue-system education mooc
Last synced: 3 months ago
JSON representation
Code, APIs and data for the CIKM23 paper "LittleMu: Deploying an Online Virtual Teaching Assistant via Heterogeneous Sources Integration and Chain-of-Teach Prompts"
- Host: GitHub
- URL: https://github.com/thu-keg/vta
- Owner: THU-KEG
- Created: 2023-02-03T01:34:38.000Z (almost 3 years ago)
- Default Branch: main
- Last Pushed: 2023-02-08T03:17:55.000Z (almost 3 years ago)
- Last Synced: 2025-05-13T18:02:32.660Z (8 months ago)
- Topics: dialogue-system, education, mooc
- Homepage: https://arxiv.org/abs/2308.05935
- Size: 1.65 MB
- Stars: 3
- Watchers: 6
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# VTA (Virtual Teaching Assistant)
Data and code for LittleMu: Deploying an Online Virtual Teaching Assistant via Heterogeneous Sources Integration and Chain-of-Teach Prompts.
# Data
To facilitate the development and evaluation for future VTA systems, we present the **VTA** dataset, which consists of 3 kinds of data:
- human preference scores for different models
- coarse-to-fine intention for history dialogues
- reference answers for real questions.
| Task | Courses | Participants | Label |
| ------------------------ | ------- | ------------ | ------- |
| Intention Classification | 139 | 10 | 4,000 |
| Question Answering | 139 | 10 | 1,767 |
| General Dialog | 20 | 20 | 124,512 |
You can download the VTA dataset at [this repository](https://github.com/THU-KEG/VTA/tree/main/VTA_dataset)
- The data access permission is granted by XuetangX. We follow the privacy policy and protect the users' data by deploying our system on the company's intranet to prevent data leaking. Therefore, we anonymize the user's name for the intention and QA data using real history dialogues.
- See more details in our [README](https://github.com/THU-KEG/VTA/tree/main/VTA_dataset) for VTA dataset
Here we share the collection process for VTA dataset:
- Intention Classification dataset's [annotation guideline](https://kvbpkpddff.feishu.cn/docx/LsSkdZwOEoIBZCxeU0Wc414qnLd)
- Question Answering dataset's [annotation guideline](https://kvbpkpddff.feishu.cn/docx/EOsJdZbIdoWrcqx0MwCcYRdtnHb)
- General Dialog dataset's [annotation guideline](https://kvbpkpddff.feishu.cn/docx/doxcn3OhKgFU0zPxDT0yiszULQh)
# Code
For more deployment details, you can view this [document](https://kvbpkpddff.feishu.cn/docx/doxcnCgjYpEK0ms7KCrEmBx2Iuc)
- The retrieval module is implemented in this [repository](http://xiaomu-student.xuetangx.com/gitlab/xuwei/xiaomu)
- As the retrieval module utilized course-specific data on XuetangX MOOC platform, we will check the privacy policy and release the visible database in next a few weeks.
- The generation module is implemented in this [repository](https://github.com/lcy2723/XDAI)
- The generation module is implemented based on some code from [XDAI](https://github.com/THUDM/XDAI)
# Milestones
We record our discussions, progresses and weekly plans in this [document](https://kvbpkpddff.feishu.cn/docx/doxcnbT70XKGLC1uTV19243bY0e)