https://github.com/amazon-science/multiatis
Data and code for the paper "End-to-End Slot Alignment and Recognition for Cross-Lingual NLU" (Accepted to EMNLP 2020)
https://github.com/amazon-science/multiatis
Last synced: about 1 year ago
JSON representation
Data and code for the paper "End-to-End Slot Alignment and Recognition for Cross-Lingual NLU" (Accepted to EMNLP 2020)
- Host: GitHub
- URL: https://github.com/amazon-science/multiatis
- Owner: amazon-science
- License: apache-2.0
- Created: 2020-10-05T21:01:59.000Z (over 5 years ago)
- Default Branch: main
- Last Pushed: 2022-01-13T20:10:08.000Z (over 4 years ago)
- Last Synced: 2025-04-07T15:11:11.667Z (about 1 year ago)
- Language: Python
- Homepage:
- Size: 33.2 KB
- Stars: 24
- Watchers: 1
- Forks: 2
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
Awesome Lists containing this project
README
## MultiAtis++ Corpus
### Description
The ATIS (Air Travel Information Services) collection was developed to support the research and development of speech understanding systems [1]. The original English data includes intent and slot annotations, and was later extended to Hindi and Turkish [2]. MultiATIS++ futher extends ATIS to 6 more languages, and hence, covers a total of 9 languages, that is, English, Spanish, German, French, Portuguese, Chinese, Japanese, Hindi and Turkish. These locales belong to a diverse set of language families- Indo-European, Sino-Tibetan, Japonic and Altaic.
MultiATIS++ corpus has been outsourced to foster further research in the domain of multilingual/cross-lingual natural language understanding.
For more details, please check the paper:
Xu, W., Haider, B. and Mansour, S., 2020. End-to-End Slot Alignment and Recognition for Cross-Lingual NLU. arXiv preprint arXiv:2004.14353 (https://arxiv.org/abs/2004.14353)
### Accessing MultiAtis++
To obtain a copy of *MutliAtis++* data, please visit:
https://catalog.ldc.upenn.edu/LDC2021T04
Please send your queries/comments to multiatis@amazon.com.
### Citation
Please cite [3] when referring to the MultiATIS++ dataset.
## Soft-Align Implementation
Implementation of the *soft-align* method introduced in [3] will be available here, soon.
## Security
See [CONTRIBUTING](CONTRIBUTING.md#security-issue-notifications) for more information.
## License
This project is licensed under the Apache-2.0 License.
## References
[1] LDC93S5 ATIS2, LDC94S19 ATIS3 Training Data, LDC95S26 ATIS3 Test Data
[2] Shyam Upadhyay, Manaal Faruqui, Gokhan Tur, Dilek Hakkani-Tur, Larry Heck. (Almost) Zero-Shot Cross-Lingual Spoken Language Understanding. IEEE ICASSP 2018.
[3] Weijia Xu, Batool Haider, Saab Mansour. 2020. End-to-End Slot Alignment and Recognition for Cross-Lingual NLU. arXiv preprint arXiv:2004.14353.