An open API service indexing awesome lists of open source software.

https://github.com/skrub-data/skrub-tutorials

This repository contains material used for tutorials, courses and MOOCs on skrub
https://github.com/skrub-data/skrub-tutorials

Last synced: 4 months ago
JSON representation

This repository contains material used for tutorials, courses and MOOCs on skrub

Awesome Lists containing this project

README

          

# Introduction to the course
This is the website for the
[Inria Academy course](https://www.inria-academy.fr/formation/skrub-like-a-pro-clean-prepare-and-transform-your-data-faster/)
on the [skrub package](https://skrub-data.org/stable/): it contains all the material
used for the course, including the datasets and exercises used during the session.

## Beta warning
If you are reading this, then you will be attending the **Beta version** of this
course. As a **Beta version**, this is not the final version of the course and
it will be tweaked according to the feedback provided after the session.

Both the presentation and the content of the book are liable to be changed based
on feedback.

## Structure of the course
The course covers the main features of skrub, from data exploration to pipeline
construction, with the notable exclusion of the Data Ops.

Each chapter includes a section that describes how a specific feature may assist
in building a machine learning pipeline, along with practical code examples.

Some chapters include exercises for participants to work with the explained features.
These exercises are made available in `content/exercises`, as well as at the end
of the respective lesson in `content/notebooks`.

The content of the book is split in sections, and each section includes a "final
quiz" that covers the subjects covered up to that point.

# Prepration and setup
First of all, clone the [GitHub repo](https://github.com/skrub-data/skrub-tutorials/tree/main)
of this book to have access to the exercises. In a future version, Jupyterlite
will be made available.

## Setting up a local environment

### Finding the material
Following any of the following commands should let you open a Jupyter lab or
notebook instance in the root of the folder. Then, you will find all the course
material as notebooks in `content/notebooks`, and only the exercises in
`content/exercises`.

All the datasets are made available to the notebooks by cloning the repo.

### Using pixi
The easiest way to set up the environment is by installing and
using [pixi](https://pixi.sh/latest/installation/). Follow the platform-specific
instructions in the link to install pixi, then open a terminal window.

Run
```sh
pixi install
```
to create the environment, followed by

```sh
pixi run lab
```
to start a Jupyter lab instance.

### Using `pip`
Create the and activate the environment:

```sh
python -m venv skrub-tutorial
source skrub-tutorial/bin/activate
```

Install the required dependencies using the `requirements.txt` file:
```sh
pip install -r requirements.txt
```

Start the Jupyter lab instance:
```sh
jupyter lab
```

### Using conda
An `environment.yaml` file is provided to create a conda environment.

Create and activate the environment with

```sh
conda env create -f environment.yaml
conda activate skrub-tutorial
```

Then, start a jupyter lab instance:

```sh
jupyter lab
```

### Using `uv`
Create the environment using `pyproject.toml` as the requirement file.

```sh
uv venv
uv pip install -r pyproject.toml
```

Activate the environment that was created in the folder.
```sh
source .venv/bin/activate
```

Start the Jupyter lab instance:
```sh
jupyter lab
```