Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/chriskempson/japanese-subtitles-word-kanji-frequency-lists

A word frequency list derived from subtitles from Japanese drama, anime and films.
https://github.com/chriskempson/japanese-subtitles-word-kanji-frequency-lists

Last synced: about 2 months ago
JSON representation

A word frequency list derived from subtitles from Japanese drama, anime and films.

Awesome Lists containing this project

README

        

# Japanese Subtitles Word & Kanji Frequency Lists
A word frequency and kanji frequency list derived from subtitles from Japanese drama, anime and films.

The data set was comprised of 12,277 subtitle files taken from https://github.com/Matchoo95/JP-Subtitles. The frequeny lists were generated with JParser and cb's Japanese Text Analysis Tool.

## Format of Word Frequency Report:
- Field 1: Number of times word was encountered
- Field 2: Word
- Field 3: Frequency Group
- Field 4: Frequency Rank
- Field 5: Percentage (Field 1 / Total number of words)
- Field 6: Cumulative percentage
- Field 7: Part-of-speech

## Format of Kanji Frequency Report:
- Field 1: Number of times kanji was encountered
- Field 2: Kanji
- Field 3: Frequency Group
- Field 4: Frequency Rank
- Field 5: Percentage (Field 1 / Total number of kanji)
- Field 6: Cumulative percentage