https://github.com/tikquuss/eulapp

Django application of https://github.com/Tikquuss/eulascript, the Machine learning (ML) solution that review end-user license agreements (EULA) for terms and conditions that are unacceptable to the government
https://github.com/tikquuss/eulapp

bert css django fine-tuning html linear-regression nlp-machine-learning nltk sklearn text-classification transformer xlnet

Last synced: 7 months ago
JSON representation

Host: GitHub
URL: https://github.com/tikquuss/eulapp
Owner: Tikquuss
Created: 2020-08-05T12:07:50.000Z (about 5 years ago)
Default Branch: master
Last Pushed: 2021-11-20T22:54:15.000Z (almost 4 years ago)
Last Synced: 2025-01-18T13:41:16.467Z (9 months ago)
Topics: bert, css, django, fine-tuning, html, linear-regression, nlp-machine-learning, nltk, sklearn, text-classification, transformer, xlnet
Language: JavaScript
Homepage: https://eulapp.herokuapp.com/
Size: 7.63 MB
Stars: 1
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

Use the application by directly following this link : https://whispering-cove-26674.herokuapp.com/

# Dependencies
* django
* torch
* numpy
* nltk
* numpy
* transformers

# User's Guide
## Setting up dependencies
```
pip install -r requirements.txt
```

All the pre-trained models, dictionaries and useful methods have been serialized and deposited in [production.pth](prediction/production.pth)

It is a dictionary containing the following elements:
- **WORDS_TO_INDEX** : Dictionary of words and their order in the vocabulary (Bag of words)
- **DICT_SIZE** : size of the dictionary (Bag of word)
- **classifier_mybag** : model of logistic regression based on the Bag of word
- **tfidf_vectorizer** : method to transform sentences into vectors (TF-IDF)
- **classifier_tfidf**: logistic regression model based on TF-IDF
- **classifier_bert** : logistic regression model based on BERT
- **max_input_length** : maximum length of sentences accepted by BERT

More details can be found in [utils.py](prediction/utils.py).

The first launch of the application takes a little time because bert pre-trained is loaded as well as his tokenizer: see [utils.py](prediction/utils.py).

You can also adapt the previous parameters by following the steps in this [notebook](https://colab.research.google.com/drive/1Ptq1A27ENcqqtcq2WmB6aztBc-qzBq_7#scrollTo=3QFCqOZ9hCb1).

## Launch the application
```
python manage.py runsever
```
http://localhost:8000/

## Home

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/tikquuss/eulapp

Awesome Lists containing this project

README