{"id":20807448,"url":"https://github.com/tikquuss/eulascript","last_synced_at":"2026-04-28T13:38:21.228Z","repository":{"id":76993613,"uuid":"288703474","full_name":"Tikquuss/eulascript","owner":"Tikquuss","description":"Machine learning (ML) solution that review end-user license agreements (EULA) for terms and conditions that are unacceptable to the government","archived":false,"fork":false,"pushed_at":"2021-11-14T01:59:24.000Z","size":10949,"stargazers_count":2,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-06-20T06:06:00.036Z","etag":null,"topics":["albert","bert","distilbert","eula","huggingface","ktrain","pandas","roberta","xlnet"],"latest_commit_sha":null,"homepage":"https://eulapp.herokuapp.com/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Tikquuss.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-08-19T10:36:07.000Z","updated_at":"2021-11-15T07:55:27.000Z","dependencies_parsed_at":null,"dependency_job_id":"51b83bad-72d2-46b3-8ced-9c966d9f089a","html_url":"https://github.com/Tikquuss/eulascript","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/Tikquuss/eulascript","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Tikquuss%2Feulascript","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Tikquuss%2Feulascript/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Tikquuss%2Feulascript/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Tikquuss%2Feulascript/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Tikquuss","download_url":"https://codeload.github.com/Tikquuss/eulascript/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Tikquuss%2Feulascript/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32383781,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-28T11:25:28.583Z","status":"ssl_error","status_checked_at":"2026-04-28T11:25:05.435Z","response_time":56,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["albert","bert","distilbert","eula","huggingface","ktrain","pandas","roberta","xlnet"],"created_at":"2024-11-17T19:37:47.147Z","updated_at":"2026-04-28T13:38:21.210Z","avatar_url":"https://github.com/Tikquuss.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# 1 - Cloning the repository\n```\ngit clone https://github.com/Tikquuss/eulascript\n```\n\n# 2 - Installing the dependencies\n\n* [PyPDF2](https://pypi.org/project/PyPDF2/) and [PyMuPDF](https://pypi.org/project/PyMuPDF/): for reading pdf files\n* [python-docx](https://pypi.org/project/python-docx/) : for reading docx files \n* [wget](https://pypi.org/project/wget/) : for model downloading \n* [pandas](https://pandas.pydata.org/) : to write the result in csv files\n* [validators](https://pypi.org/project/validators/) : to check the validity of the urls\n* [ktrain](https://github.com/Tikquuss/ktrain) : for loading models. It is a duplication of [amaiya/ktrain](https://github.com/amaiya/ktrain) modified to install `tensorflow-cpu` (instead of `tensorflow-2.1.0-cp36-cp36m-manylinux2010_x86_64.whl`) and `tqdm\u003e=4.29.1`.\n```\npip install -r eulascript/requirements.txt\n```\n\n# 3 - Try\n\n* **model_folder** : directory (or url of the directory) where the model is located (must contain the following three files: `tf_model.preproc`, `config.json` and `tf_model.h5`). In the case of a url the three previous files are downloaded automatically. You can use the pre-trained models directly from [huggingface](https://huggingface.co/transformers/), but [this notebook](samples/public_transformers_in_ktrain.ipynb) illustrates how to fine-tune these models (**bert, distilbert, albert, roberta, xlnet**) on our [dataset](https://drive.google.com/file/d/1eyGBYLpOPsvif0iomTBxjHtXoiY8gnLE/view?usp=sharing) with the [ktrain](https://pypi.org/project/ktrain/) library.\n* **output_dir** : folder in which the csv file(s) containing the results (in the format: `clause, label, probability`) will be stored (the name of the created file starts with the name, without extension, of the original file containing the license, followed optionally by a number to avoid file collisions)\n* **path_to_eula** : comma-separated list of documents (`txt, md, pdf and docx`) containing the licenses to be analyzed\n* **logistic_regression** :  this parameter can be provided at the expense of **model_folder** in order to use one of the [pre-trained logistic regression models](production.pth) (must be obligatorily made from these three models: **bag_of_word, tf_idf, bert or distilbert**). This parameter is ignored if it is passed at the same time as **model_folder**. This [notebook](samples/logistic_regression.ipynb) illustrates the process of obtaining the [production.pth](production.pth) file.\n\n```\nmodel_folder=my/model_dir_or_url\noutput_dir=my/output_folder\npath_to_eula=my/eula.txt,my/eula.md,my/eula.pdf,my/eula.docx\n\npython eulascript/eula.py --model_folder $model_folder --path_to_eula $path_to_eula --output_dir $output_dir\n```\n\n```\nlogistic_regression=bag_of_word\noutput_dir=my/output_folder\npath_to_eula=my/eula.txt,my/eula.md,my/eula.pdf,my/eula.docx\n\npython eulascript/eula.py --logistic_regression $logistic_regression --path_to_eula $path_to_eula --output_dir $output_dir\n```\n\n\n**Note**: \n* the [samples](samples) folder contains some user licenses and a [notebook](samples/notebook.ipynb) illustrating all. \n* The associated web application is available [here](https://eulapp.herokuapp.com/). \n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftikquuss%2Feulascript","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftikquuss%2Feulascript","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftikquuss%2Feulascript/lists"}