An open API service indexing awesome lists of open source software.

https://github.com/gyrdym/ml_dataframe

A way to store and manipulate data
https://github.com/gyrdym/ml_dataframe

data-science dataframe datascience dataset toy-dataset toy-datasets

Last synced: over 1 year ago
JSON representation

A way to store and manipulate data

Awesome Lists containing this project

README

          

[![Build Status](https://github.com/gyrdym/ml_dataframe/workflows/CI%20pipeline/badge.svg)](https://github.com/gyrdym/ml_dataframe/actions?query=branch%3Amaster+)
[![Coverage Status](https://coveralls.io/repos/github/gyrdym/ml_dataframe/badge.svg?branch=master)](https://coveralls.io/github/gyrdym/ml_dataframe?branch=master)
[![pub package](https://img.shields.io/pub/v/ml_dataframe.svg)](https://pub.dartlang.org/packages/ml_dataframe)
[![Gitter Chat](https://badges.gitter.im/gyrdym/gyrdym.svg)](https://gitter.im/gyrdym/)

# ml_dataframe
A way to store and manipulate data

The library exposes in-memory storage for dynamically typed data. The storage is represented by [DataFrame](https://github.com/gyrdym/ml_dataframe/blob/master/lib/src/data_frame/data_frame.dart) class.

## Table of contents

- [Usage example](#usage-example)
- [DataFrame API](#dataframe-api-with-examples)
- [Get the header](#get-the-header-of-the-data)
- [Get the rows](#get-the-rows-of-the-data)
- [Get the series](#get-the-series-collection-columns-of-the-data)
- [Get the shape](#get-the-shape-of-the-data)
- [Add a series](#add-a-series)
- [Drop a series by a name](#drop-a-series-by-a-series-name)
- [Drop a series by an index](#drop-a-series-by-a-series-index)
- [Sample a dataframe from rows](#sample-a-new-dataframe-from-rows-of-an-existing-dataframe)
- [Sample a dataframe from series indices](#sample-a-new-dataframe-from-series-indices-of-an-existing-dataframe)
- [Sample a dataframe from series names](#sample-a-new-dataframe-from-series-names-of-an-existing-dataframe)
- [Save a dataframe](#save-a-dataframe-to-a-json-file)
- [Shuffle rows of a dataframe](#shuffle-rows-in-a-dataframe)
- [Get a JSON representation](#get-a-json-serializable-representation)
- [Convert to Matrix](#convert-a-dataframe-to-a-matrix)
- [Get a series by name](#get-a-series-by-its-name)
- [Get a series by index](#get-a-series-by-its-index)
- [Map values](#map-values-of-a-dataframe)
- [Map values of a series](#map-values-of-a-specific-dataframe-series)
- [Ways to create a dataframe](#ways-to-create-a-dataframe)
- [DataFrame constructor](#dataframe-constructor)
- [Create a dataframe from a CSV file](#fromcsv-function)
- [Restore a dataframe from JSON](#restore-a-dataframe-previously-persisted-as-a-json-file----fromjson-function)
- [Prefilled dataframes](#dataframes-with-prefilled-data)
- [Iris dataset](#iris-dataset---function-getirisdataframe)
- [Pima Indians diabetes dataset](#pima-indians-diabetes-dataset---function-getpimaindiansdiabetesdataframe)
- [Red wine quality dataset](#red-wine-quality-dataset---function-getwinequalitydataframe)
- [Boston housing dataset](#boston-housing-dataset---function-gethousingdataframe)
- [Contacts](#contacts)

## Usage example:

```dart
import 'package:ml_dataframe/ml_dataframe.dart';

void main() {
final data = [
['Id', 'SepalLengthCm', 'SepalWidthCm', 'PetalLengthCm', 'PetalWidthCm', 'Species'],
[ 1, 5.1, 3.5, 1.4, 0.2, 'Iris-setosa'],
[ 2, 4.9, 3.0, 1.4, 0.2, 'Iris-setosa'],
[ 89, 5.6, 3.0, 4.1, 1.3, 'Iris-versicolor'],
[ 90, 5.5, 2.5, 4.0, 1.3, 'Iris-versicolor'],
[ 91, 5.5, 2.6, 4.4, 1.2, 'Iris-versicolor'],
];

final dataframe = DataFrame(data);

print(dataframe);
// DataFrame (5 x 6)
// Id SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm Species
// 1 5.1 3.5 1.4 0.2 Iris-setosa
// 2 4.9 3.0 1.4 0.2 Iris-setosa
// 89 5.6 3.0 4.1 1.3 Iris-versicolor
// 90 5.5 2.5 4.0 1.3 Iris-versicolor
// 91 5.5 2.6 4.4 1.2 Iris-versicolor
}
```

## `DataFrame` API with examples:

### Get the header of the data

By default, the very first row is considered a header, unless one specify their own header or autogenerated one. More on
this is [here](#dataframe-constructor)

```dart
import 'package:ml_dataframe/ml_dataframe.dart';

void main() {
final dataframe = DataFrame([
['Id', 'SepalLengthCm', 'SepalWidthCm', 'PetalLengthCm', 'PetalWidthCm', 'Species'],
[ 1, 5.1, 3.5, 1.4, 0.2, 'Iris-setosa'],
[ 2, 4.9, 3.0, 1.4, 0.2, 'Iris-setosa'],
[ 89, 5.6, 3.0, 4.1, 1.3, 'Iris-versicolor'],
[ 90, 5.5, 2.5, 4.0, 1.3, 'Iris-versicolor'],
[ 91, 5.5, 2.6, 4.4, 1.2, 'Iris-versicolor'],
]);
final header = dataframe.header;

print(header);
// ['Id', 'SepalLengthCm', 'SepalWidthCm', 'PetalLengthCm', 'PetalWidthCm', 'Species']
}
```

### Get the rows of the data

```dart
import 'package:ml_dataframe/ml_dataframe.dart';

void main() {
final dataframe = DataFrame([
['Id', 'SepalLengthCm', 'SepalWidthCm', 'PetalLengthCm', 'PetalWidthCm', 'Species'],
[ 1, 5.1, 3.5, 1.4, 0.2, 'Iris-setosa'],
[ 2, 4.9, 3.0, 1.4, 0.2, 'Iris-setosa'],
[ 89, 5.6, 3.0, 4.1, 1.3, 'Iris-versicolor'],
[ 90, 5.5, 2.5, 4.0, 1.3, 'Iris-versicolor'],
[ 91, 5.5, 2.6, 4.4, 1.2, 'Iris-versicolor'],
]);
final rows = dataframe.rows;

print(rows);
// [
// [1, 5.1, 3.5, 1.4, 0.2, 'Iris-setosa'],
// [2, 4.9, 3.0, 1.4, 0.2, 'Iris-setosa'],
// [89, 5.6, 3.0, 4.1, 1.3, 'Iris-versicolor'],
// [90, 5.5, 2.5, 4.0, 1.3, 'Iris-versicolor'],
// [91, 5.5, 2.6, 4.4, 1.2, 'Iris-versicolor'],
// ],
}
```

### Get the series collection (columns) of the data

```dart
import 'package:ml_dataframe/ml_dataframe.dart';

void main() {
final dataframe = DataFrame([
['Id', 'SepalLengthCm', 'SepalWidthCm', 'PetalLengthCm', 'PetalWidthCm', 'Species'],
[ 1, 5.1, 3.5, 1.4, 0.2, 'Iris-setosa'],
[ 2, 4.9, 3.0, 1.4, 0.2, 'Iris-setosa'],
[ 89, 5.6, 3.0, 4.1, 1.3, 'Iris-versicolor'],
[ 90, 5.5, 2.5, 4.0, 1.3, 'Iris-versicolor'],
[ 91, 5.5, 2.6, 4.4, 1.2, 'Iris-versicolor'],
]);
final series = dataframe.series;

print(series);
// [
// 'Id': [1, 2, 89, 90, 91],
// 'SepalLengthCm': [5.1, 4.9, 5.6, 5.5, 5.5],
// 'SepalWidthCm': [3.5, 3.0, 3.0, 2.5, 2.6],
// 'PetalLengthCm': [1.4, 1.4, 4.1, 4.0, 4.4],
// 'PetalWidthCm': [0.2, 0.2, 1.3, 1.3, 1.2],
// 'Species': ['Iris-setosa', 'Iris-setosa', 'Iris-versicolor', 'Iris-versicolor', 'Iris-versicolor'],
// ],
}
```

### Get the shape of the data

```dart
import 'package:ml_dataframe/ml_dataframe.dart';

void main() {
final dataframe = DataFrame([
['Id', 'SepalLengthCm', 'SepalWidthCm', 'PetalLengthCm', 'PetalWidthCm', 'Species'],
[ 1, 5.1, 3.5, 1.4, 0.2, 'Iris-setosa'],
[ 2, 4.9, 3.0, 1.4, 0.2, 'Iris-setosa'],
[ 89, 5.6, 3.0, 4.1, 1.3, 'Iris-versicolor'],
[ 90, 5.5, 2.5, 4.0, 1.3, 'Iris-versicolor'],
[ 91, 5.5, 2.6, 4.4, 1.2, 'Iris-versicolor'],
]);
final shape = dataframe.shape;

print(shape);
// [5, 6] - 5 rows, 6 columns
}
```

### Add a series

```dart
import 'package:ml_dataframe/ml_dataframe.dart';

void main() {
final firstSeries = Series('super_series', [1, 2, 3, 4, 5, 6]);
final dataframe = DataFrame([
['Id', 'SepalLengthCm', 'SepalWidthCm', 'PetalLengthCm', 'PetalWidthCm', 'Species'],
[ 1, 5.1, 3.5, 1.4, 0.2, 'Iris-setosa'],
[ 2, 4.9, 3.0, 1.4, 0.2, 'Iris-setosa'],
[ 89, 5.6, 3.0, 4.1, 1.3, 'Iris-versicolor'],
[ 90, 5.5, 2.5, 4.0, 1.3, 'Iris-versicolor'],
[ 91, 5.5, 2.6, 4.4, 1.2, 'Iris-versicolor'],
]);

final modifiedDataframe = dataframe.addSeries([firstSeries]); // The method doesn't mutate the original dataframe

print(modifiedDataframe.series.first);
// 'super_series': [1, 2, 3, 4, 5, 6]
}
```

### Drop a series by a series name

```dart
import 'package:ml_dataframe/ml_dataframe.dart';

void main() {
final dataframe = DataFrame([
['Id', 'SepalLengthCm', 'SepalWidthCm', 'PetalLengthCm', 'PetalWidthCm', 'Species'],
[ 1, 5.1, 3.5, 1.4, 0.2, 'Iris-setosa'],
[ 2, 4.9, 3.0, 1.4, 0.2, 'Iris-setosa'],
[ 89, 5.6, 3.0, 4.1, 1.3, 'Iris-versicolor'],
[ 90, 5.5, 2.5, 4.0, 1.3, 'Iris-versicolor'],
[ 91, 5.5, 2.6, 4.4, 1.2, 'Iris-versicolor'],
]);

print(dataframe.shape);
// [5, 6] - 6 rows, 6 columns

final modifiedDataframe = dataframe.dropSeries(names: ['Id']); // The method doesn't mutate the original dataframe

print(modifiedDataframe.shape);
// [5, 5] - after a series had been dropped, the number of columns became one lesser
}
````

### Drop a series by a series index

```dart
import 'package:ml_dataframe/ml_dataframe.dart';

void main() {
final dataframe = DataFrame([
['Id', 'SepalLengthCm', 'SepalWidthCm', 'PetalLengthCm', 'PetalWidthCm', 'Species'],
[ 1, 5.1, 3.5, 1.4, 0.2, 'Iris-setosa'],
[ 2, 4.9, 3.0, 1.4, 0.2, 'Iris-setosa'],
[ 89, 5.6, 3.0, 4.1, 1.3, 'Iris-versicolor'],
[ 90, 5.5, 2.5, 4.0, 1.3, 'Iris-versicolor'],
[ 91, 5.5, 2.6, 4.4, 1.2, 'Iris-versicolor'],
]);
print(dataframe.shape);
// [5, 6] - 5 rows, 6 columns

final modifiedDataframe = dataframe.dropSeries(indices: [0]); // The method doesn't mutate the original dataframe

print(modifiedDataframe.shape);
// [5, 5] - after a series had been dropped, the number of columns became one lesser
}
````

### Sample a new dataframe from rows of an existing dataframe

```dart
import 'package:ml_dataframe/ml_dataframe.dart';

void main() {
final dataframe = DataFrame([
['Id', 'SepalLengthCm', 'SepalWidthCm', 'PetalLengthCm', 'PetalWidthCm', 'Species'],
[ 1, 5.1, 3.5, 1.4, 0.2, 'Iris-setosa'],
[ 2, 4.9, 3.0, 1.4, 0.2, 'Iris-setosa'],
[ 89, 5.6, 3.0, 4.1, 1.3, 'Iris-versicolor'],
[ 90, 5.5, 2.5, 4.0, 1.3, 'Iris-versicolor'],
[ 91, 5.5, 2.6, 4.4, 1.2, 'Iris-versicolor'],
]);
final sampled = dataframe.sampleFromRows([0, 5]);

print(sampled);
// DataFrame (2 x 6)
// Id SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm Species
// 1 5.1 3.5 1.4 0.2 Iris-setosa
// 91 5.5 2.6 4.4 1.2 Iris-versicolor
}
````

### Sample a new dataframe from series indices of an existing dataframe

```dart
import 'package:ml_dataframe/ml_dataframe.dart';

void main() {
final dataframe = DataFrame([
['Id', 'SepalLengthCm', 'SepalWidthCm', 'PetalLengthCm', 'PetalWidthCm', 'Species'],
[ 1, 5.1, 3.5, 1.4, 0.2, 'Iris-setosa'],
[ 2, 4.9, 3.0, 1.4, 0.2, 'Iris-setosa'],
[ 89, 5.6, 3.0, 4.1, 1.3, 'Iris-versicolor'],
[ 90, 5.5, 2.5, 4.0, 1.3, 'Iris-versicolor'],
[ 91, 5.5, 2.6, 4.4, 1.2, 'Iris-versicolor'],
]);
final sampled = dataframe.sampleFromSeries(indices: [0, 1]);

print(sampled);
// DataFrame (5 x 2)
// Id SepalLengthCm
// 1 5.1
// 2 4.9
// 89 5.6
// 90 5.5
// 91 5.5
}
````

### Sample a new dataframe from series names of an existing dataframe

```dart
import 'package:ml_dataframe/ml_dataframe.dart';

void main() {
final dataframe = DataFrame([
['Id', 'SepalLengthCm', 'SepalWidthCm', 'PetalLengthCm', 'PetalWidthCm', 'Species'],
[ 1, 5.1, 3.5, 1.4, 0.2, 'Iris-setosa'],
[ 2, 4.9, 3.0, 1.4, 0.2, 'Iris-setosa'],
[ 89, 5.6, 3.0, 4.1, 1.3, 'Iris-versicolor'],
[ 90, 5.5, 2.5, 4.0, 1.3, 'Iris-versicolor'],
[ 91, 5.5, 2.6, 4.4, 1.2, 'Iris-versicolor'],
]);
final sampled = dataframe.sampleFromSeries(names: ['Id', 'SepalLengthCm']);

print(sampled);
// DataFrame (5 x 2)
// Id SepalLengthCm
// 1 5.1
// 2 4.9
// 89 5.6
// 90 5.5
// 91 5.5
}
````

### Save a dataframe to a JSON file

```dart
import 'package:ml_dataframe/ml_dataframe.dart';

void main() async {
final dataframe = DataFrame([
['Id', 'SepalLengthCm', 'SepalWidthCm', 'PetalLengthCm', 'PetalWidthCm', 'Species'],
[ 1, 5.1, 3.5, 1.4, 0.2, 'Iris-setosa'],
[ 2, 4.9, 3.0, 1.4, 0.2, 'Iris-setosa'],
[ 89, 5.6, 3.0, 4.1, 1.3, 'Iris-versicolor'],
[ 90, 5.5, 2.5, 4.0, 1.3, 'Iris-versicolor'],
[ 91, 5.5, 2.6, 4.4, 1.2, 'Iris-versicolor'],
]);

await dataframe.saveAsJson('path/to/json/file.json');
}
````

### Shuffle rows in a dataframe

```dart
import 'package:ml_dataframe/ml_dataframe.dart';

void main() {
final dataframe = DataFrame([
['Id', 'SepalLengthCm', 'SepalWidthCm', 'PetalLengthCm', 'PetalWidthCm', 'Species'],
[ 1, 5.1, 3.5, 1.4, 0.2, 'Iris-setosa'],
[ 2, 4.9, 3.0, 1.4, 0.2, 'Iris-setosa'],
[ 89, 5.6, 3.0, 4.1, 1.3, 'Iris-versicolor'],
[ 90, 5.5, 2.5, 4.0, 1.3, 'Iris-versicolor'],
[ 91, 5.5, 2.6, 4.4, 1.2, 'Iris-versicolor'],
]);

print(dataframe);
// DataFrame (5 x 6)
// Id SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm Species
// 1 5.1 3.5 1.4 0.2 Iris-setosa
// 2 4.9 3.0 1.4 0.2 Iris-setosa
// 89 5.6 3.0 4.1 1.3 Iris-versicolor
// 90 5.5 2.5 4.0 1.3 Iris-versicolor
// 91 5.5 2.6 4.4 1.2 Iris-versicolor

final shuffled = dataframe.shuffle(); // keep in mind that `shuffle` like other methods returns a new dataframe, the method doesn't mutate the source dataframe

print(shuffled);
// DataFrame (5 x 6)
// Id SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm Species
// 89 5.6 3.0 4.1 1.3 Iris-versicolor
// 1 5.1 3.5 1.4 0.2 Iris-setosa
// 91 5.5 2.6 4.4 1.2 Iris-versicolor
// 2 4.9 3.0 1.4 0.2 Iris-setosa
// 90 5.5 2.5 4.0 1.3 Iris-versicolor
}
````

One can use `seed` parameter to keep the order of rows disregard the number of `shuffle` calls:

```dart
dataframe.shuffle(seed: 10);
```

### Get a json-serializable representation

```dart
import 'package:ml_dataframe/ml_dataframe.dart';

void main() {
final dataframe = DataFrame([
['Id', 'SepalLengthCm', 'SepalWidthCm', 'PetalLengthCm', 'PetalWidthCm', 'Species'],
[ 1, 5.1, 3.5, 1.4, 0.2, 'Iris-setosa'],
[ 2, 4.9, 3.0, 1.4, 0.2, 'Iris-setosa'],
[ 89, 5.6, 3.0, 4.1, 1.3, 'Iris-versicolor'],
[ 90, 5.5, 2.5, 4.0, 1.3, 'Iris-versicolor'],
[ 91, 5.5, 2.6, 4.4, 1.2, 'Iris-versicolor'],
]);
final json = dataframe.toJson(); // json contains a serializable map
}
```

### Convert a dataframe to a matrix:

```dart
import 'package:ml_dataframe/ml_dataframe.dart';

void main() {
final dataframe = DataFrame([
['Id', 'SepalLengthCm', 'SepalWidthCm', 'PetalLengthCm', 'PetalWidthCm'],
[ 1, 5.1, 3.5, 1.4, 0.2],
[ 2, 4.9, 3.0, 1.4, 0.2],
[ 89, 5.6, 3.0, 4.1, 1.3],
[ 90, 5.5, 2.5, 4.0, 1.3],
[ 91, 5.5, 2.6, 4.4, 1.2],
]);

final matrix = dataframe.toMatrix();

print(matrix); // because of internal representation of Float32 numbers there are some round-off errors in the output
// Matrix 5 x 5:
// (1.0, 5.099999904632568, 3.5, 1.399999976158142, 0.20000000298023224)
// (2.0, 4.900000095367432, 3.0, 1.399999976158142, 0.20000000298023224)
// (89.0, 5.599999904632568, 3.0, 4.099999904632568, 1.2999999523162842)
// (90.0, 5.5, 2.5, 4.0, 1.2999999523162842)
// (91.0, 5.5, 2.5999999046325684, 4.400000095367432, 1.2000000476837158)
}
```

the method throws an error if there are inconvertible to a number values in the dataframe.

### Get a series by its index

```dart
import 'package:ml_dataframe/ml_dataframe.dart';

void main() {
final dataframe = DataFrame([
['Id', 'SepalLengthCm', 'SepalWidthCm', 'PetalLengthCm', 'PetalWidthCm', 'Species'],
[ 1, 5.1, 3.5, 1.4, 0.2, 'Iris-setosa'],
[ 2, 4.9, 3.0, 1.4, 0.2, 'Iris-setosa'],
[ 89, 5.6, 3.0, 4.1, 1.3, 'Iris-versicolor'],
[ 90, 5.5, 2.5, 4.0, 1.3, 'Iris-versicolor'],
[ 91, 5.5, 2.6, 4.4, 1.2, 'Iris-versicolor'],
]);
final series = dataframe[0];

print(series);
// Id: [1, 2, 89, 90, 91]
}
```

### Get a series by its name

```dart
import 'package:ml_dataframe/ml_dataframe.dart';

void main() {
final dataframe = DataFrame([
['Id', 'SepalLengthCm', 'SepalWidthCm', 'PetalLengthCm', 'PetalWidthCm', 'Species'],
[ 1, 5.1, 3.5, 1.4, 0.2, 'Iris-setosa'],
[ 2, 4.9, 3.0, 1.4, 0.2, 'Iris-setosa'],
[ 89, 5.6, 3.0, 4.1, 1.3, 'Iris-versicolor'],
[ 90, 5.5, 2.5, 4.0, 1.3, 'Iris-versicolor'],
[ 91, 5.5, 2.6, 4.4, 1.2, 'Iris-versicolor'],
]);
final series = dataframe['Id'];

print(series);
// Id: [1, 2, 89, 90, 91]
}
```

### Map values of a dataframe

```dart
import 'package:ml_dataframe/ml_dataframe';

void main() {
final data = DataFrame([
['col_1', 'col_2', 'col_3'],
[ 2, 20, 200],
[ 3, 30, 300],
[ 4, 40, 400],
]);
// the first generic type ia a type of the source value, the second generic type is a type of the mapped value
final modifiedData = data.map((value) => value * 2);

print(modifiedData);
// DataFrame (3 x 3)
// col_1 col_2 col_3
// 4 40 400
// 6 60 600
// 8 80 800
}
```

### Map values of a specific dataframe series

```dart
import 'package:ml_dataframe/ml_dataframe';

void main() {
final data = DataFrame([
['col_1', 'col_2', 'col_3'],
[ 2, 20, 200],
[ 3, 30, 300],
[ 4, 40, 400],
]);
// the first generic type ia a type of the source value, the second generic type is a type of the mapped value
final modifiedData = data.mapSeries((value) => value * 2, name: 'col_2');

print(modifiedData);
// DataFrame (3 x 3)
// col_1 col_2 col_3
// 2 40 200
// 3 60 300
// 4 80 400
}
```

## Ways to create a dataframe

### `DataFrame` constructor

```dart
import 'package:ml_dataframe/ml_dataframe.dart';

void main() {
final data = [
['Id', 'SepalLengthCm', 'SepalWidthCm', 'PetalLengthCm', 'PetalWidthCm', 'Species'],
[ 1, 5.1, 3.5, 1.4, 0.2, 'Iris-setosa'],
[ 2, 4.9, 3.0, 1.4, 0.2, 'Iris-setosa'],
[ 89, 5.6, 3.0, 4.1, 1.3, 'Iris-versicolor'],
[ 90, 5.5, 2.5, 4.0, 1.3, 'Iris-versicolor'],
[ 91, 5.5, 2.6, 4.4, 1.2, 'Iris-versicolor'],
];

final dataframe = DataFrame(data);
}
```

By default, the very first row is considered a header. If the data does not have a header, one can use autogenerated
header by providing `headerExists: false` config to the constructor:

```dart
import 'package:ml_dataframe/ml_dataframe.dart';

void main() {
final data = [
[1, 5.1, 3.5, 1.4, 0.2, 'Iris-setosa'],
[2, 4.9, 3.0, 1.4, 0.2, 'Iris-setosa'],
[89, 5.6, 3.0, 4.1, 1.3, 'Iris-versicolor'],
[90, 5.5, 2.5, 4.0, 1.3, 'Iris-versicolor'],
[91, 5.5, 2.6, 4.4, 1.2, 'Iris-versicolor'],
];

final dataframe = DataFrame(data, headerExists: false);

print(dataframe.header);
}
```

It outputs `['col_1', 'col_2', 'col_3', 'col_4', 'col_5', 'col_6']`. `col_` is a default prefix for the autogenerated
columns.

Also, if there are no header row in the data, one can use a predefined header:

```dart
import 'package:ml_dataframe/ml_dataframe.dart';

void main() {
final data = [
[1, 5.1, 3.5, 1.4, 0.2, 'Iris-setosa'],
[2, 4.9, 3.0, 1.4, 0.2, 'Iris-setosa'],
[89, 5.6, 3.0, 4.1, 1.3, 'Iris-versicolor'],
[90, 5.5, 2.5, 4.0, 1.3, 'Iris-versicolor'],
[91, 5.5, 2.6, 4.4, 1.2, 'Iris-versicolor'],
];

final dataframe = DataFrame(data, header: ['feature_1', 'feature_2', 'feature_3', 'feature_4', 'feature_5', 'feature_6']);
}
```

### `fromCsv` function

```dart
import 'package:ml_dataframe/ml_dataframe.dart';

void main() async {
final data = await fromCsv('path/to/csv/file.csv');
}
```

If the `csv` file does not have a header row, it's needed to provide the corresponding flag:

```dart
import 'package:ml_dataframe/ml_dataframe.dart';

void main() async {
final data = await fromCsv('path/to/csv/file.csv', headerExists: false);
}
```

### Restore a dataframe previously persisted as a json file - `fromJson` function

```dart
import 'package:ml_dataframe/ml_dataframe.dart';

void main() async {
final data = await fromJson('path/to/json/file.json');
}
```

This function works in conjunction with DataFrame `saveAsJson` method.

## Dataframes with prefilled data

In order to test data processing algorithms, one can use "toy" datasets. The library exposes several of them:

### Iris dataset - function `getIrisDataFrame`

One can create a dataframe filled with [Iris](https://www.kaggle.com/datasets/uciml/iris) data:

```dart
import 'package:ml_dataframe/ml_dataframe.dart';

void main() {
final data = getIrisDataFrame();

print(data);
// DataFrame (150 x 6)
// Id SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm Species
// ...
}
```

### Pima Indians diabetes dataset - function `getPimaIndiansDiabetesDataFrame`

One can create a dataframe filled with [Pima Indians diabetes](https://www.kaggle.com/datasets/uciml/pima-indians-diabetes-database) data:

```dart
import 'package:ml_dataframe/ml_dataframe.dart';

void main() {
final data = getPimaIndiansDiabetesDataFrame();

print(data);
// DataFrame (768 x 9)
// Pregnancies Glucose BloodPressure SkinThickness Insulin BMI DiabetesPedigreeFunction Age Outcome
// ...
}
```

### Red wine quality dataset - function `getWineQualityDataframe`

One can create a dataframe filled with [Red wine quality](https://www.kaggle.com/datasets/uciml/red-wine-quality-cortez-et-al-2009) data:

```dart
import 'package:ml_dataframe/ml_dataframe.dart';

void main() {
final data = getWineQualityDataframe();

print(data);
// DataFrame (1599 x 12)
// fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,pH,sulphates,alcohol,quality
// ...
}
```

### Boston housing dataset - function `getHousingDataframe`

One can create a dataframe filled with [Boston housing](http://lib.stat.cmu.edu/datasets/boston) data:

```dart
import 'package:ml_dataframe/ml_dataframe.dart';

void main() {
final data = getHousingDataframe();

print(data);
// DataFrame (506 x 14)
// CRIM ZN INDUS CHAS NOX RM ... MEDV
// 0.00632 18.0 2.31 0 0.538 6.575 ... 24.0
// 0.02731 0.0 7.07 0 0.469 6.421 ... 21.6
// 0.02729 0.0 7.07 0 0.469 7.185 ... 34.7
// 0.03237 0.0 2.18 0 0.458 6.998 ... 33.4
// 0.06905 0.0 2.18 0 0.458 7.147 ... 36.2
// ... ... ... ... ... ... ... ...
// 0.06263 0.0 11.93 0 0.573 6.593 ... 22.4
// 0.04527 0.0 11.93 0 0.573 6.12 ... 20.6
// 0.06076 0.0 11.93 0 0.573 6.976 ... 23.9
// 0.10959 0.0 11.93 0 0.573 6.794 ... 22.0
// 0.04741 0.0 11.93 0 0.573 6.03 ... 11.9
}
```

## Contacts
If you have questions, feel free to text me on
- [Twitter](https://twitter.com/ilgyrd)
- [Facebook](https://www.facebook.com/ilya.gyrdymov)
- [Linkedin](https://www.linkedin.com/in/gyrdym/)