https://github.com/tikquuss/eulapp
Django application of https://github.com/Tikquuss/eulascript, the Machine learning (ML) solution that review end-user license agreements (EULA) for terms and conditions that are unacceptable to the government
https://github.com/tikquuss/eulapp
bert css django fine-tuning html linear-regression nlp-machine-learning nltk sklearn text-classification transformer xlnet
Last synced: 7 months ago
JSON representation
Django application of https://github.com/Tikquuss/eulascript, the Machine learning (ML) solution that review end-user license agreements (EULA) for terms and conditions that are unacceptable to the government
- Host: GitHub
- URL: https://github.com/tikquuss/eulapp
- Owner: Tikquuss
- Created: 2020-08-05T12:07:50.000Z (about 5 years ago)
- Default Branch: master
- Last Pushed: 2021-11-20T22:54:15.000Z (almost 4 years ago)
- Last Synced: 2025-01-18T13:41:16.467Z (9 months ago)
- Topics: bert, css, django, fine-tuning, html, linear-regression, nlp-machine-learning, nltk, sklearn, text-classification, transformer, xlnet
- Language: JavaScript
- Homepage: https://eulapp.herokuapp.com/
- Size: 7.63 MB
- Stars: 1
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
Use the application by directly following this link : https://whispering-cove-26674.herokuapp.com/
# Dependencies
* django
* torch
* numpy
* nltk
* numpy
* transformers# User's Guide
## Setting up dependencies
```
pip install -r requirements.txt
```All the pre-trained models, dictionaries and useful methods have been serialized and deposited in [production.pth](prediction/production.pth)
It is a dictionary containing the following elements:
- **WORDS_TO_INDEX** : Dictionary of words and their order in the vocabulary (Bag of words)
- **DICT_SIZE** : size of the dictionary (Bag of word)
- **classifier_mybag** : model of logistic regression based on the Bag of word
- **tfidf_vectorizer** : method to transform sentences into vectors (TF-IDF)
- **classifier_tfidf**: logistic regression model based on TF-IDF
- **classifier_bert** : logistic regression model based on BERT
- **max_input_length** : maximum length of sentences accepted by BERTMore details can be found in [utils.py](prediction/utils.py).
The first launch of the application takes a little time because bert pre-trained is loaded as well as his tokenizer: see [utils.py](prediction/utils.py).
You can also adapt the previous parameters by following the steps in this [notebook](https://colab.research.google.com/drive/1Ptq1A27ENcqqtcq2WmB6aztBc-qzBq_7#scrollTo=3QFCqOZ9hCb1).
## Launch the application
```
python manage.py runsever
```
http://localhost:8000/## Home