https://github.com/piesposito/transformers-low-code-experiments

Low-code pre-built pipelines for experiments with huggingface/transformers for Data Scientists in a rush.
https://github.com/piesposito/transformers-low-code-experiments

deep-learning machine-learning nlp pytorch transformer

Last synced: 2 months ago
JSON representation

Low-code pre-built pipelines for experiments with huggingface/transformers for Data Scientists in a rush.

Host: GitHub
URL: https://github.com/piesposito/transformers-low-code-experiments
Owner: piEsposito
License: apache-2.0
Created: 2020-10-14T14:20:34.000Z (almost 5 years ago)
Default Branch: main
Last Pushed: 2020-10-14T18:04:01.000Z (almost 5 years ago)
Last Synced: 2025-04-23T21:33:32.026Z (6 months ago)
Topics: deep-learning, machine-learning, nlp, pytorch, transformer
Language: Python
Homepage:
Size: 226 KB
Stars: 16
Watchers: 1
Forks: 3
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# Transformers for Data Scientists in a rush
Low-code pre-built pipelines for experiments with huggingface/transformers for Data Scientists in a rush.

---
This repository contains low-code, easy to understand, pre-built pipelines for fast experimentation on NLP tasks using [huggingface/transformers](https://github.com/huggingface/transformers) pre-trained language models, which are explained and explored in a post series on Medium about the theme.

This was inspired by a LinkedIn post of Thomas Wolf, HuggingFace's CSO in which there was an image of a low-code pipeline for fast experimentation on their Transformers repo. As I could not see anything like it implemented on the internet, I've decided to do it myself.

# Index
As of now, we have:
* [classification](#Classification), with a classification example.

## Classification
On the classification example, we use a [dataset for email spam classification](https://www.kaggle.com/team-ai/spam-text-message-classification) from Kaggle, and use [optuna](https://optuna.org/) for hyperparameter tuning.

You might run it, on the classification directory, with:

```bash
python classification-experiment.py --model-name bert-base-multilingual-cased ---metric f1_score --train-data-path train.csv --test-data-path test.csv --max-sequence-length 25 --label-nbr 2
```

It should yield a f1_score higher than 0.9.

---
###### Made by Pi Esposito

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/piesposito/transformers-low-code-experiments

Awesome Lists containing this project

README