https://github.com/cmdoret/jigsaw-toxic-comment-classification-challenge-2018
Improving best solutions from an old competition for educational purpose
https://github.com/cmdoret/jigsaw-toxic-comment-classification-challenge-2018
Last synced: 4 months ago
JSON representation
Improving best solutions from an old competition for educational purpose
- Host: GitHub
- URL: https://github.com/cmdoret/jigsaw-toxic-comment-classification-challenge-2018
- Owner: cmdoret
- License: gpl-3.0
- Created: 2022-06-05T14:04:48.000Z (about 4 years ago)
- Default Branch: main
- Last Pushed: 2021-04-12T20:23:06.000Z (about 5 years ago)
- Last Synced: 2025-05-16T14:11:24.735Z (about 1 year ago)
- Size: 64.5 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# jigsaw-toxic-comment-classification-challenge-2018
Improving best solutions from an old competition for educational purpose.
## Project structure:
* src: Stores the code for the main pipeline, data extraction, processing, training, evaluation...
* notebooks: Exploratory analyses in the form of jupyter notebooks.
* toxic_comments: Boilerplate code and utilities meant to be imported as a python package and reusable in other projects.
* build: contains everything generated by us, be it temporary files, model weights, predictions, ...
* input: input files, such as training data or embedding vectors.
## Setup:
All dependencies can be installed using:
```bash
make deps
```
To make `toxic_comments` importable in python scripts and notebooks, you can run: `make setup`.
All input and output data are managed via dvc. They can be imported as follows:
```bash
pip install dvc[gdrive]
dvc pull
```
## Workflow
Code changes are managed via `git`. Data changes are managed via `dvc`, which is connected to a google drive folder.
When modifying or adding new datafiles, the modifications must be uploaded to the dvc server.
The updated small tracker file (`.dvc`) must be commited to git to keep track of changes.
The standard process is as follows:
```bash
dvc add data
dvc push
git add data.dvc
git commit -m 'added new file'
git push
```