https://github.com/shunk031/huggingface-datasets_stair-captions

STAIR Captions: Constructing a Large-Scale Japanese Image Caption Dataset for huggingface datasets
https://github.com/shunk031/huggingface-datasets_stair-captions

Last synced: 8 months ago
JSON representation

STAIR Captions: Constructing a Large-Scale Japanese Image Caption Dataset for huggingface datasets

Host: GitHub
URL: https://github.com/shunk031/huggingface-datasets_stair-captions
Owner: shunk031
Created: 2024-06-20T05:19:56.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2024-06-22T06:33:45.000Z (over 1 year ago)
Last Synced: 2025-01-10T17:53:45.432Z (9 months ago)
Language: Python
Homepage:
Size: 6.84 KB
Stars: 0
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

          ---

annotations_creators:

- crowdsourced

language:

- ja

language_creators:

- found

license:

- cc-by-4.0

multilinguality:

- monolingual

pretty_name: STAIR Captions is a large-scale dataset containing 820,310 Japanese captions.

size_categories:

- 100K

### Languages

The language data in JDocQA is in Japanese ([BCP-47 ja-JP](https://www.rfc-editor.org/info/bcp47)).

## Dataset Structure

### Data Instances

[More Information Needed]

### Data Fields

[More Information Needed]

### Data Splits

[More Information Needed]

## Dataset Creation

### Curation Rationale

[More Information Needed]

### Source Data

[More Information Needed]

#### Initial Data Collection and Normalization

[More Information Needed]

#### Who are the source language producers?

[More Information Needed]

### Annotations

[More Information Needed]

#### Annotation process

[More Information Needed]

#### Who are the annotators?

[More Information Needed]

### Personal and Sensitive Information

[More Information Needed]

## Considerations for Using the Data

### Social Impact of Dataset

[More Information Needed]

### Discussion of Biases

[More Information Needed]

### Other Known Limitations

[More Information Needed]

## Additional Information

### Dataset Curators

[More Information Needed]

### Licensing Information

[Creative Commons Attribution 4.0 License.](https://creativecommons.org/licenses/by/4.0/legalcode)

### Citation Information

```bibtex

@inproceedings{yoshikawa2017stair,

  title={STAIR Captions: Constructing a Large-Scale Japanese Image Caption Dataset},

  author={Yoshikawa, Yuya and Shigeto, Yutaro and Takeuchi, Akikazu},

  booktitle={Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)},

  pages={417--421},

  year={2017}

}

```

### Contributions

Thanks to [@yuyay](https://github.com/yuyay) for creating this dataset.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/shunk031/huggingface-datasets_stair-captions

Awesome Lists containing this project

README