https://github.com/shunk031/huggingface-datasets_stair-captions
STAIR Captions: Constructing a Large-Scale Japanese Image Caption Dataset for huggingface datasets
https://github.com/shunk031/huggingface-datasets_stair-captions
Last synced: 8 months ago
JSON representation
STAIR Captions: Constructing a Large-Scale Japanese Image Caption Dataset for huggingface datasets
- Host: GitHub
- URL: https://github.com/shunk031/huggingface-datasets_stair-captions
- Owner: shunk031
- Created: 2024-06-20T05:19:56.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-06-22T06:33:45.000Z (over 1 year ago)
- Last Synced: 2025-01-10T17:53:45.432Z (9 months ago)
- Language: Python
- Homepage:
- Size: 6.84 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
---
annotations_creators:
- crowdsourced
language:
- ja
language_creators:
- found
license:
- cc-by-4.0
multilinguality:
- monolingual
pretty_name: STAIR Captions is a large-scale dataset containing 820,310 Japanese captions.
size_categories:
- 100K### Languages
The language data in JDocQA is in Japanese ([BCP-47 ja-JP](https://www.rfc-editor.org/info/bcp47)).
## Dataset Structure
### Data Instances
[More Information Needed]
### Data Fields
[More Information Needed]
### Data Splits
[More Information Needed]
## Dataset Creation
### Curation Rationale
[More Information Needed]
### Source Data
[More Information Needed]
#### Initial Data Collection and Normalization
[More Information Needed]
#### Who are the source language producers?
[More Information Needed]
### Annotations
[More Information Needed]
#### Annotation process
[More Information Needed]
#### Who are the annotators?
[More Information Needed]
### Personal and Sensitive Information
[More Information Needed]
## Considerations for Using the Data
### Social Impact of Dataset
[More Information Needed]
### Discussion of Biases
[More Information Needed]
### Other Known Limitations
[More Information Needed]
## Additional Information
### Dataset Curators
[More Information Needed]
### Licensing Information
[Creative Commons Attribution 4.0 License.](https://creativecommons.org/licenses/by/4.0/legalcode)
### Citation Information
```bibtex
@inproceedings{yoshikawa2017stair,
title={STAIR Captions: Constructing a Large-Scale Japanese Image Caption Dataset},
author={Yoshikawa, Yuya and Shigeto, Yutaro and Takeuchi, Akikazu},
booktitle={Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)},
pages={417--421},
year={2017}
}
```### Contributions
Thanks to [@yuyay](https://github.com/yuyay) for creating this dataset.