An open API service indexing awesome lists of open source software.

https://github.com/uncomputable/frequency-data

Raw Japanese frequency data.
https://github.com/uncomputable/frequency-data

dictionary japanese japanese-study language raw-data

Last synced: 15 days ago
JSON representation

Raw Japanese frequency data.

Awesome Lists containing this project

README

        

# Raw Japanese frequency data

This repository hosts data from [NINJAL](https://www.ninjal.ac.jp/).

The data is already public. I mirror it here to prevent link rot.

## Balanced Corpus of Contemporary Written Japanese (BCCWJ)

Creative Commons License

One of the largest and most popular corpora out there. It focuses on written language.

[See the university website](https://clrd.ninjal.ac.jp/bccwj/bcc-chu.html).

## Corpus of Spontaneous Japanese (CSJ)

Creative Commons License

Another popular corpus with a focus on spoken language.

[See the university website](https://clrd.ninjal.ac.jp/csj/chunagon.html).

## NINJAL Web Japanese Corpus (NWJC)

Creative Commons License

A corpus which was created by crawling the web.

[The official website](https://masayu-a.github.io/NWJC/) doesn't seem to host any data.

[See NINJAL's repository](https://repository.ninjal.ac.jp/) and navigate like so:

1. 言語資源
2. 国語研日本語ウェブコーパス
3. 『国語研日本語ウェブコーパス』中納言搭載データ語彙表

## Corpus of Historical Japanese (CHJ)

Creative Commons License

A corpus that covers different eras of Japanese history.

[See the university website](https://clrd.ninjal.ac.jp/chj/chj-wc.html).

## Showa-Heisei corpus of written Japanese (SHC)

Creative Commons License

A corpus that covers the Showa and Heisei era of Japanese history.

[See the university website](https://clrd.ninjal.ac.jp/shc/stats.html).