Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/yash1994/distil-lang-detect
Language Detection using DistilBERT
https://github.com/yash1994/distil-lang-detect
bert distilbert huggingface-transformer language-detection transformer
Last synced: about 1 month ago
JSON representation
Language Detection using DistilBERT
- Host: GitHub
- URL: https://github.com/yash1994/distil-lang-detect
- Owner: yash1994
- Created: 2020-04-02T12:03:41.000Z (almost 5 years ago)
- Default Branch: master
- Last Pushed: 2020-04-13T10:43:58.000Z (almost 5 years ago)
- Last Synced: 2024-10-20T04:17:01.830Z (3 months ago)
- Topics: bert, distilbert, huggingface-transformer, language-detection, transformer
- Language: Python
- Homepage:
- Size: 37.1 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Distil-Lang-Detect
[![Python 3.6](https://img.shields.io/badge/python-3.6-blue.svg)](https://www.python.org/downloads/release/python-360/) [![Build Status](https://travis-ci.org/yash1994/distil-lang-detect.svg?branch=master)](https://travis-ci.org/yash1994/distil-lang-detect)
Distil-lang-detect is text language detection module based on sequence classification technique [DistilBERT](https://github.com/huggingface/transformers/tree/master/examples/distillation) by 🤗 [Huggingface Transformers](https://github.com/huggingface/transformers).
## Getting Started
Distil-Lang-Detect can be easily fired-up. Just need to the following.
### Requirements
* python 3.5
* torch >= 1.2.0
* transformers >= 2.2.2### Installation
```bash
git clone https://github.com/yash1994/distil-lang-detect.git
cd dframcy
python setup.py install
```## Usage
```python
from distillangdetect.detector import Detector
dct = Detector(device="cpu")
det = dct.detect("I love retro computing.")
print(det)
```>'English'
## Todos
* [ ] Extensive testing.
* [ ] Add training and evaluation scripts.
* [ ] Output format options.
* [ ] Batch Processing.
* [ ] Bechmarking on different datasets.