Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

https://github.com/urduhack/urdu-words

📝A text file containing 150,000 Urdu words for all your dictionary/word-based projects e.g: auto-completion / autosuggestion.
https://github.com/urduhack/urdu-words

autosuggestion backer bigram dictionary ner ner-labels sponsors trigram urdu urdu-words words-collection

Last synced: 3 months ago
JSON representation

📝A text file containing 150,000 Urdu words for all your dictionary/word-based projects e.g: auto-completion / autosuggestion.

Lists

README

        

# 150k+ unique Urdu words collections

[![License: MIT](https://img.shields.io/badge/license-MIT-blue.svg)](https://github.com/urduhack/urdu-words/blob/master/LICENSE)
![Last commit](https://img.shields.io/github/last-commit/urduhack/urdu-words.svg)
[![Build Status](https://travis-ci.org/urduhack/urdu-words.svg?branch=master)](https://travis-ci.org/urduhack/urdu-words)
![Last commit](https://img.shields.io/github/last-commit/urduhack/urdu-words.svg)
[![image](https://img.shields.io/github/contributors/urduhack/urdu-words.svg)](https://github.com/urduhack/urdu-words/graphs/contributors)
[![Join Slack](https://img.shields.io/badge/join-us%20on%20slack-gray.svg?longCache=true&logo=slack&colorB=red)](https://join.slack.com/t/urduhack/shared_invite/zt-5cpkrvz8-Zu_tOyR5AEcspCBCyqhSZQ)
[![Say Thanks!](https://img.shields.io/badge/Say%20Thanks-!-1EAEDB.svg)](https://saythanks.io/to/akkefa)

Consists of text files containing 150k+ Urdu words for all your dictionary/word-based projects e.g: auto-completion / autosuggestion / Embedding networks / Tagging
## Files you may be interested in:

I pulled out the words into a simple new-line-delimited text file.
Which is more useful when building apps or importing into databases etc.

- [words.txt](words.txt) Contains all urdu words.
- [bigram_words.txt](bigram_words.txt) Contains all urdu bigram words.
- [trigram_words.txt](trigram_words.txt) Contains all urdu trigram words.

## NER Labels
I have added words for labelling Named Entity Recognition(NER) Data. These labels contain words related to different categories
like _Persons_, _Locations_, _Organizations_ and _Dates_ etc. These words give a good starting point for labelling NER data.
Below are the files containing different label words.

- [locations.txt](ner/locations.txt) Contains locations from across the world
- [persons.txt](ner/persons.txt) Contains Person Names
- [organizations.txt](ner/organizations.txt) Contains Organization names
- [dates.txt](ner/dates.txt) Contains time and date related words

## Table of contents

- [Contributing](#contributing)
- [Bugs and feature requests](#bugs-and-feature-requests)
- [Contributors](#contributors)
- [Copyright and license](#copyright-and-license)

## Contributing

All contributions are more than welcomed. Contributions may close an issue, fix a bug (reported or not reported), improve the existing code and so on.
If you would like to add a word or a new set of words, send a PR.

## Bugs and feature requests

Have a bug or a feature request? If you wish to remove or update some of the words, please file an issue first before sending a PR on the repo. [[please open a new issue](https://github.com/urduhack/urdu-words/issues/new)]

## Contributors

Special thanks to everyone who contributed to getting the Urdu hack to the current state.
Thanks to Center for Language Engineering for providing the word list.

## Backers [![Backers on Open Collective](https://opencollective.com/urduhack/backers/badge.svg)](#backers)
Thank you to all our backers! 🙏 [[Become a backer](https://opencollective.com/urduhack#backer)]

## Sponsors [![Sponsors on Open Collective](https://opencollective.com/urduhack/sponsors/badge.svg)](#sponsors)
Support this project by becoming a sponsor. [[Become a sponsor](https://opencollective.com/urduhack#sponsor)]


## Copyright and license

Code released under the [MIT License](ttps://github.com/urduhack/urdu-words/blob/master/LICENSE).