Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/alicewriteswrongs/hindi-sentences
~2800 sentences, naively sorted
https://github.com/alicewriteswrongs/hindi-sentences
Last synced: about 1 month ago
JSON representation
~2800 sentences, naively sorted
- Host: GitHub
- URL: https://github.com/alicewriteswrongs/hindi-sentences
- Owner: alicewriteswrongs
- Created: 2020-05-25T14:32:30.000Z (over 4 years ago)
- Default Branch: master
- Last Pushed: 2020-05-25T14:41:50.000Z (over 4 years ago)
- Last Synced: 2024-10-30T06:59:35.407Z (3 months ago)
- Language: JavaScript
- Size: 170 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Hindi sentences
This is just a quick script to sort a corpus of Hindi/English sentences by 'frequency'.
The original sentences come from a [shared Anki deck
here](https://ankiweb.net/shared/info/707994711). I exported the deck to `.tsv`
from Anki (checked in here as `sentences.tsv`). Then I sorted them by
'frequency rank' by counting up word frequencies in this corpus and assigning
an average frequency rank to each sentence. Then the sentences are sorted in
descending order, so the highest-scoring sentences (meaning sentences which
have the more frequent words) appear earlier.I made no effort to do anything fancy with word counting, so for instance करता,
करती, and करना are all treated as separate words even though they are different
forms of the same verb. I think it works well enough without bothering to get that
complicated.The results are checked in as `sorted_sentences.tsv`. This should be suitable for
importing into Anki or for using in any other language learning software.