https://github.com/gyrdym/ml_dataframe
A way to store and manipulate data
https://github.com/gyrdym/ml_dataframe
data-science dataframe datascience dataset toy-dataset toy-datasets
Last synced: over 1 year ago
JSON representation
A way to store and manipulate data
- Host: GitHub
- URL: https://github.com/gyrdym/ml_dataframe
- Owner: gyrdym
- License: bsd-2-clause
- Created: 2019-07-23T22:21:25.000Z (almost 7 years ago)
- Default Branch: master
- Last Pushed: 2022-08-07T21:22:39.000Z (almost 4 years ago)
- Last Synced: 2025-03-18T03:02:37.354Z (over 1 year ago)
- Topics: data-science, dataframe, datascience, dataset, toy-dataset, toy-datasets
- Language: Dart
- Homepage:
- Size: 239 KB
- Stars: 18
- Watchers: 1
- Forks: 3
- Open Issues: 5
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
Awesome Lists containing this project
README
[](https://github.com/gyrdym/ml_dataframe/actions?query=branch%3Amaster+)
[](https://coveralls.io/github/gyrdym/ml_dataframe?branch=master)
[](https://pub.dartlang.org/packages/ml_dataframe)
[](https://gitter.im/gyrdym/)
# ml_dataframe
A way to store and manipulate data
The library exposes in-memory storage for dynamically typed data. The storage is represented by [DataFrame](https://github.com/gyrdym/ml_dataframe/blob/master/lib/src/data_frame/data_frame.dart) class.
## Table of contents
- [Usage example](#usage-example)
- [DataFrame API](#dataframe-api-with-examples)
- [Get the header](#get-the-header-of-the-data)
- [Get the rows](#get-the-rows-of-the-data)
- [Get the series](#get-the-series-collection-columns-of-the-data)
- [Get the shape](#get-the-shape-of-the-data)
- [Add a series](#add-a-series)
- [Drop a series by a name](#drop-a-series-by-a-series-name)
- [Drop a series by an index](#drop-a-series-by-a-series-index)
- [Sample a dataframe from rows](#sample-a-new-dataframe-from-rows-of-an-existing-dataframe)
- [Sample a dataframe from series indices](#sample-a-new-dataframe-from-series-indices-of-an-existing-dataframe)
- [Sample a dataframe from series names](#sample-a-new-dataframe-from-series-names-of-an-existing-dataframe)
- [Save a dataframe](#save-a-dataframe-to-a-json-file)
- [Shuffle rows of a dataframe](#shuffle-rows-in-a-dataframe)
- [Get a JSON representation](#get-a-json-serializable-representation)
- [Convert to Matrix](#convert-a-dataframe-to-a-matrix)
- [Get a series by name](#get-a-series-by-its-name)
- [Get a series by index](#get-a-series-by-its-index)
- [Map values](#map-values-of-a-dataframe)
- [Map values of a series](#map-values-of-a-specific-dataframe-series)
- [Ways to create a dataframe](#ways-to-create-a-dataframe)
- [DataFrame constructor](#dataframe-constructor)
- [Create a dataframe from a CSV file](#fromcsv-function)
- [Restore a dataframe from JSON](#restore-a-dataframe-previously-persisted-as-a-json-file----fromjson-function)
- [Prefilled dataframes](#dataframes-with-prefilled-data)
- [Iris dataset](#iris-dataset---function-getirisdataframe)
- [Pima Indians diabetes dataset](#pima-indians-diabetes-dataset---function-getpimaindiansdiabetesdataframe)
- [Red wine quality dataset](#red-wine-quality-dataset---function-getwinequalitydataframe)
- [Boston housing dataset](#boston-housing-dataset---function-gethousingdataframe)
- [Contacts](#contacts)
## Usage example:
```dart
import 'package:ml_dataframe/ml_dataframe.dart';
void main() {
final data = [
['Id', 'SepalLengthCm', 'SepalWidthCm', 'PetalLengthCm', 'PetalWidthCm', 'Species'],
[ 1, 5.1, 3.5, 1.4, 0.2, 'Iris-setosa'],
[ 2, 4.9, 3.0, 1.4, 0.2, 'Iris-setosa'],
[ 89, 5.6, 3.0, 4.1, 1.3, 'Iris-versicolor'],
[ 90, 5.5, 2.5, 4.0, 1.3, 'Iris-versicolor'],
[ 91, 5.5, 2.6, 4.4, 1.2, 'Iris-versicolor'],
];
final dataframe = DataFrame(data);
print(dataframe);
// DataFrame (5 x 6)
// Id SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm Species
// 1 5.1 3.5 1.4 0.2 Iris-setosa
// 2 4.9 3.0 1.4 0.2 Iris-setosa
// 89 5.6 3.0 4.1 1.3 Iris-versicolor
// 90 5.5 2.5 4.0 1.3 Iris-versicolor
// 91 5.5 2.6 4.4 1.2 Iris-versicolor
}
```
## `DataFrame` API with examples:
### Get the header of the data
By default, the very first row is considered a header, unless one specify their own header or autogenerated one. More on
this is [here](#dataframe-constructor)
```dart
import 'package:ml_dataframe/ml_dataframe.dart';
void main() {
final dataframe = DataFrame([
['Id', 'SepalLengthCm', 'SepalWidthCm', 'PetalLengthCm', 'PetalWidthCm', 'Species'],
[ 1, 5.1, 3.5, 1.4, 0.2, 'Iris-setosa'],
[ 2, 4.9, 3.0, 1.4, 0.2, 'Iris-setosa'],
[ 89, 5.6, 3.0, 4.1, 1.3, 'Iris-versicolor'],
[ 90, 5.5, 2.5, 4.0, 1.3, 'Iris-versicolor'],
[ 91, 5.5, 2.6, 4.4, 1.2, 'Iris-versicolor'],
]);
final header = dataframe.header;
print(header);
// ['Id', 'SepalLengthCm', 'SepalWidthCm', 'PetalLengthCm', 'PetalWidthCm', 'Species']
}
```
### Get the rows of the data
```dart
import 'package:ml_dataframe/ml_dataframe.dart';
void main() {
final dataframe = DataFrame([
['Id', 'SepalLengthCm', 'SepalWidthCm', 'PetalLengthCm', 'PetalWidthCm', 'Species'],
[ 1, 5.1, 3.5, 1.4, 0.2, 'Iris-setosa'],
[ 2, 4.9, 3.0, 1.4, 0.2, 'Iris-setosa'],
[ 89, 5.6, 3.0, 4.1, 1.3, 'Iris-versicolor'],
[ 90, 5.5, 2.5, 4.0, 1.3, 'Iris-versicolor'],
[ 91, 5.5, 2.6, 4.4, 1.2, 'Iris-versicolor'],
]);
final rows = dataframe.rows;
print(rows);
// [
// [1, 5.1, 3.5, 1.4, 0.2, 'Iris-setosa'],
// [2, 4.9, 3.0, 1.4, 0.2, 'Iris-setosa'],
// [89, 5.6, 3.0, 4.1, 1.3, 'Iris-versicolor'],
// [90, 5.5, 2.5, 4.0, 1.3, 'Iris-versicolor'],
// [91, 5.5, 2.6, 4.4, 1.2, 'Iris-versicolor'],
// ],
}
```
### Get the series collection (columns) of the data
```dart
import 'package:ml_dataframe/ml_dataframe.dart';
void main() {
final dataframe = DataFrame([
['Id', 'SepalLengthCm', 'SepalWidthCm', 'PetalLengthCm', 'PetalWidthCm', 'Species'],
[ 1, 5.1, 3.5, 1.4, 0.2, 'Iris-setosa'],
[ 2, 4.9, 3.0, 1.4, 0.2, 'Iris-setosa'],
[ 89, 5.6, 3.0, 4.1, 1.3, 'Iris-versicolor'],
[ 90, 5.5, 2.5, 4.0, 1.3, 'Iris-versicolor'],
[ 91, 5.5, 2.6, 4.4, 1.2, 'Iris-versicolor'],
]);
final series = dataframe.series;
print(series);
// [
// 'Id': [1, 2, 89, 90, 91],
// 'SepalLengthCm': [5.1, 4.9, 5.6, 5.5, 5.5],
// 'SepalWidthCm': [3.5, 3.0, 3.0, 2.5, 2.6],
// 'PetalLengthCm': [1.4, 1.4, 4.1, 4.0, 4.4],
// 'PetalWidthCm': [0.2, 0.2, 1.3, 1.3, 1.2],
// 'Species': ['Iris-setosa', 'Iris-setosa', 'Iris-versicolor', 'Iris-versicolor', 'Iris-versicolor'],
// ],
}
```
### Get the shape of the data
```dart
import 'package:ml_dataframe/ml_dataframe.dart';
void main() {
final dataframe = DataFrame([
['Id', 'SepalLengthCm', 'SepalWidthCm', 'PetalLengthCm', 'PetalWidthCm', 'Species'],
[ 1, 5.1, 3.5, 1.4, 0.2, 'Iris-setosa'],
[ 2, 4.9, 3.0, 1.4, 0.2, 'Iris-setosa'],
[ 89, 5.6, 3.0, 4.1, 1.3, 'Iris-versicolor'],
[ 90, 5.5, 2.5, 4.0, 1.3, 'Iris-versicolor'],
[ 91, 5.5, 2.6, 4.4, 1.2, 'Iris-versicolor'],
]);
final shape = dataframe.shape;
print(shape);
// [5, 6] - 5 rows, 6 columns
}
```
### Add a series
```dart
import 'package:ml_dataframe/ml_dataframe.dart';
void main() {
final firstSeries = Series('super_series', [1, 2, 3, 4, 5, 6]);
final dataframe = DataFrame([
['Id', 'SepalLengthCm', 'SepalWidthCm', 'PetalLengthCm', 'PetalWidthCm', 'Species'],
[ 1, 5.1, 3.5, 1.4, 0.2, 'Iris-setosa'],
[ 2, 4.9, 3.0, 1.4, 0.2, 'Iris-setosa'],
[ 89, 5.6, 3.0, 4.1, 1.3, 'Iris-versicolor'],
[ 90, 5.5, 2.5, 4.0, 1.3, 'Iris-versicolor'],
[ 91, 5.5, 2.6, 4.4, 1.2, 'Iris-versicolor'],
]);
final modifiedDataframe = dataframe.addSeries([firstSeries]); // The method doesn't mutate the original dataframe
print(modifiedDataframe.series.first);
// 'super_series': [1, 2, 3, 4, 5, 6]
}
```
### Drop a series by a series name
```dart
import 'package:ml_dataframe/ml_dataframe.dart';
void main() {
final dataframe = DataFrame([
['Id', 'SepalLengthCm', 'SepalWidthCm', 'PetalLengthCm', 'PetalWidthCm', 'Species'],
[ 1, 5.1, 3.5, 1.4, 0.2, 'Iris-setosa'],
[ 2, 4.9, 3.0, 1.4, 0.2, 'Iris-setosa'],
[ 89, 5.6, 3.0, 4.1, 1.3, 'Iris-versicolor'],
[ 90, 5.5, 2.5, 4.0, 1.3, 'Iris-versicolor'],
[ 91, 5.5, 2.6, 4.4, 1.2, 'Iris-versicolor'],
]);
print(dataframe.shape);
// [5, 6] - 6 rows, 6 columns
final modifiedDataframe = dataframe.dropSeries(names: ['Id']); // The method doesn't mutate the original dataframe
print(modifiedDataframe.shape);
// [5, 5] - after a series had been dropped, the number of columns became one lesser
}
````
### Drop a series by a series index
```dart
import 'package:ml_dataframe/ml_dataframe.dart';
void main() {
final dataframe = DataFrame([
['Id', 'SepalLengthCm', 'SepalWidthCm', 'PetalLengthCm', 'PetalWidthCm', 'Species'],
[ 1, 5.1, 3.5, 1.4, 0.2, 'Iris-setosa'],
[ 2, 4.9, 3.0, 1.4, 0.2, 'Iris-setosa'],
[ 89, 5.6, 3.0, 4.1, 1.3, 'Iris-versicolor'],
[ 90, 5.5, 2.5, 4.0, 1.3, 'Iris-versicolor'],
[ 91, 5.5, 2.6, 4.4, 1.2, 'Iris-versicolor'],
]);
print(dataframe.shape);
// [5, 6] - 5 rows, 6 columns
final modifiedDataframe = dataframe.dropSeries(indices: [0]); // The method doesn't mutate the original dataframe
print(modifiedDataframe.shape);
// [5, 5] - after a series had been dropped, the number of columns became one lesser
}
````
### Sample a new dataframe from rows of an existing dataframe
```dart
import 'package:ml_dataframe/ml_dataframe.dart';
void main() {
final dataframe = DataFrame([
['Id', 'SepalLengthCm', 'SepalWidthCm', 'PetalLengthCm', 'PetalWidthCm', 'Species'],
[ 1, 5.1, 3.5, 1.4, 0.2, 'Iris-setosa'],
[ 2, 4.9, 3.0, 1.4, 0.2, 'Iris-setosa'],
[ 89, 5.6, 3.0, 4.1, 1.3, 'Iris-versicolor'],
[ 90, 5.5, 2.5, 4.0, 1.3, 'Iris-versicolor'],
[ 91, 5.5, 2.6, 4.4, 1.2, 'Iris-versicolor'],
]);
final sampled = dataframe.sampleFromRows([0, 5]);
print(sampled);
// DataFrame (2 x 6)
// Id SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm Species
// 1 5.1 3.5 1.4 0.2 Iris-setosa
// 91 5.5 2.6 4.4 1.2 Iris-versicolor
}
````
### Sample a new dataframe from series indices of an existing dataframe
```dart
import 'package:ml_dataframe/ml_dataframe.dart';
void main() {
final dataframe = DataFrame([
['Id', 'SepalLengthCm', 'SepalWidthCm', 'PetalLengthCm', 'PetalWidthCm', 'Species'],
[ 1, 5.1, 3.5, 1.4, 0.2, 'Iris-setosa'],
[ 2, 4.9, 3.0, 1.4, 0.2, 'Iris-setosa'],
[ 89, 5.6, 3.0, 4.1, 1.3, 'Iris-versicolor'],
[ 90, 5.5, 2.5, 4.0, 1.3, 'Iris-versicolor'],
[ 91, 5.5, 2.6, 4.4, 1.2, 'Iris-versicolor'],
]);
final sampled = dataframe.sampleFromSeries(indices: [0, 1]);
print(sampled);
// DataFrame (5 x 2)
// Id SepalLengthCm
// 1 5.1
// 2 4.9
// 89 5.6
// 90 5.5
// 91 5.5
}
````
### Sample a new dataframe from series names of an existing dataframe
```dart
import 'package:ml_dataframe/ml_dataframe.dart';
void main() {
final dataframe = DataFrame([
['Id', 'SepalLengthCm', 'SepalWidthCm', 'PetalLengthCm', 'PetalWidthCm', 'Species'],
[ 1, 5.1, 3.5, 1.4, 0.2, 'Iris-setosa'],
[ 2, 4.9, 3.0, 1.4, 0.2, 'Iris-setosa'],
[ 89, 5.6, 3.0, 4.1, 1.3, 'Iris-versicolor'],
[ 90, 5.5, 2.5, 4.0, 1.3, 'Iris-versicolor'],
[ 91, 5.5, 2.6, 4.4, 1.2, 'Iris-versicolor'],
]);
final sampled = dataframe.sampleFromSeries(names: ['Id', 'SepalLengthCm']);
print(sampled);
// DataFrame (5 x 2)
// Id SepalLengthCm
// 1 5.1
// 2 4.9
// 89 5.6
// 90 5.5
// 91 5.5
}
````
### Save a dataframe to a JSON file
```dart
import 'package:ml_dataframe/ml_dataframe.dart';
void main() async {
final dataframe = DataFrame([
['Id', 'SepalLengthCm', 'SepalWidthCm', 'PetalLengthCm', 'PetalWidthCm', 'Species'],
[ 1, 5.1, 3.5, 1.4, 0.2, 'Iris-setosa'],
[ 2, 4.9, 3.0, 1.4, 0.2, 'Iris-setosa'],
[ 89, 5.6, 3.0, 4.1, 1.3, 'Iris-versicolor'],
[ 90, 5.5, 2.5, 4.0, 1.3, 'Iris-versicolor'],
[ 91, 5.5, 2.6, 4.4, 1.2, 'Iris-versicolor'],
]);
await dataframe.saveAsJson('path/to/json/file.json');
}
````
### Shuffle rows in a dataframe
```dart
import 'package:ml_dataframe/ml_dataframe.dart';
void main() {
final dataframe = DataFrame([
['Id', 'SepalLengthCm', 'SepalWidthCm', 'PetalLengthCm', 'PetalWidthCm', 'Species'],
[ 1, 5.1, 3.5, 1.4, 0.2, 'Iris-setosa'],
[ 2, 4.9, 3.0, 1.4, 0.2, 'Iris-setosa'],
[ 89, 5.6, 3.0, 4.1, 1.3, 'Iris-versicolor'],
[ 90, 5.5, 2.5, 4.0, 1.3, 'Iris-versicolor'],
[ 91, 5.5, 2.6, 4.4, 1.2, 'Iris-versicolor'],
]);
print(dataframe);
// DataFrame (5 x 6)
// Id SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm Species
// 1 5.1 3.5 1.4 0.2 Iris-setosa
// 2 4.9 3.0 1.4 0.2 Iris-setosa
// 89 5.6 3.0 4.1 1.3 Iris-versicolor
// 90 5.5 2.5 4.0 1.3 Iris-versicolor
// 91 5.5 2.6 4.4 1.2 Iris-versicolor
final shuffled = dataframe.shuffle(); // keep in mind that `shuffle` like other methods returns a new dataframe, the method doesn't mutate the source dataframe
print(shuffled);
// DataFrame (5 x 6)
// Id SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm Species
// 89 5.6 3.0 4.1 1.3 Iris-versicolor
// 1 5.1 3.5 1.4 0.2 Iris-setosa
// 91 5.5 2.6 4.4 1.2 Iris-versicolor
// 2 4.9 3.0 1.4 0.2 Iris-setosa
// 90 5.5 2.5 4.0 1.3 Iris-versicolor
}
````
One can use `seed` parameter to keep the order of rows disregard the number of `shuffle` calls:
```dart
dataframe.shuffle(seed: 10);
```
### Get a json-serializable representation
```dart
import 'package:ml_dataframe/ml_dataframe.dart';
void main() {
final dataframe = DataFrame([
['Id', 'SepalLengthCm', 'SepalWidthCm', 'PetalLengthCm', 'PetalWidthCm', 'Species'],
[ 1, 5.1, 3.5, 1.4, 0.2, 'Iris-setosa'],
[ 2, 4.9, 3.0, 1.4, 0.2, 'Iris-setosa'],
[ 89, 5.6, 3.0, 4.1, 1.3, 'Iris-versicolor'],
[ 90, 5.5, 2.5, 4.0, 1.3, 'Iris-versicolor'],
[ 91, 5.5, 2.6, 4.4, 1.2, 'Iris-versicolor'],
]);
final json = dataframe.toJson(); // json contains a serializable map
}
```
### Convert a dataframe to a matrix:
```dart
import 'package:ml_dataframe/ml_dataframe.dart';
void main() {
final dataframe = DataFrame([
['Id', 'SepalLengthCm', 'SepalWidthCm', 'PetalLengthCm', 'PetalWidthCm'],
[ 1, 5.1, 3.5, 1.4, 0.2],
[ 2, 4.9, 3.0, 1.4, 0.2],
[ 89, 5.6, 3.0, 4.1, 1.3],
[ 90, 5.5, 2.5, 4.0, 1.3],
[ 91, 5.5, 2.6, 4.4, 1.2],
]);
final matrix = dataframe.toMatrix();
print(matrix); // because of internal representation of Float32 numbers there are some round-off errors in the output
// Matrix 5 x 5:
// (1.0, 5.099999904632568, 3.5, 1.399999976158142, 0.20000000298023224)
// (2.0, 4.900000095367432, 3.0, 1.399999976158142, 0.20000000298023224)
// (89.0, 5.599999904632568, 3.0, 4.099999904632568, 1.2999999523162842)
// (90.0, 5.5, 2.5, 4.0, 1.2999999523162842)
// (91.0, 5.5, 2.5999999046325684, 4.400000095367432, 1.2000000476837158)
}
```
the method throws an error if there are inconvertible to a number values in the dataframe.
### Get a series by its index
```dart
import 'package:ml_dataframe/ml_dataframe.dart';
void main() {
final dataframe = DataFrame([
['Id', 'SepalLengthCm', 'SepalWidthCm', 'PetalLengthCm', 'PetalWidthCm', 'Species'],
[ 1, 5.1, 3.5, 1.4, 0.2, 'Iris-setosa'],
[ 2, 4.9, 3.0, 1.4, 0.2, 'Iris-setosa'],
[ 89, 5.6, 3.0, 4.1, 1.3, 'Iris-versicolor'],
[ 90, 5.5, 2.5, 4.0, 1.3, 'Iris-versicolor'],
[ 91, 5.5, 2.6, 4.4, 1.2, 'Iris-versicolor'],
]);
final series = dataframe[0];
print(series);
// Id: [1, 2, 89, 90, 91]
}
```
### Get a series by its name
```dart
import 'package:ml_dataframe/ml_dataframe.dart';
void main() {
final dataframe = DataFrame([
['Id', 'SepalLengthCm', 'SepalWidthCm', 'PetalLengthCm', 'PetalWidthCm', 'Species'],
[ 1, 5.1, 3.5, 1.4, 0.2, 'Iris-setosa'],
[ 2, 4.9, 3.0, 1.4, 0.2, 'Iris-setosa'],
[ 89, 5.6, 3.0, 4.1, 1.3, 'Iris-versicolor'],
[ 90, 5.5, 2.5, 4.0, 1.3, 'Iris-versicolor'],
[ 91, 5.5, 2.6, 4.4, 1.2, 'Iris-versicolor'],
]);
final series = dataframe['Id'];
print(series);
// Id: [1, 2, 89, 90, 91]
}
```
### Map values of a dataframe
```dart
import 'package:ml_dataframe/ml_dataframe';
void main() {
final data = DataFrame([
['col_1', 'col_2', 'col_3'],
[ 2, 20, 200],
[ 3, 30, 300],
[ 4, 40, 400],
]);
// the first generic type ia a type of the source value, the second generic type is a type of the mapped value
final modifiedData = data.map((value) => value * 2);
print(modifiedData);
// DataFrame (3 x 3)
// col_1 col_2 col_3
// 4 40 400
// 6 60 600
// 8 80 800
}
```
### Map values of a specific dataframe series
```dart
import 'package:ml_dataframe/ml_dataframe';
void main() {
final data = DataFrame([
['col_1', 'col_2', 'col_3'],
[ 2, 20, 200],
[ 3, 30, 300],
[ 4, 40, 400],
]);
// the first generic type ia a type of the source value, the second generic type is a type of the mapped value
final modifiedData = data.mapSeries((value) => value * 2, name: 'col_2');
print(modifiedData);
// DataFrame (3 x 3)
// col_1 col_2 col_3
// 2 40 200
// 3 60 300
// 4 80 400
}
```
## Ways to create a dataframe
### `DataFrame` constructor
```dart
import 'package:ml_dataframe/ml_dataframe.dart';
void main() {
final data = [
['Id', 'SepalLengthCm', 'SepalWidthCm', 'PetalLengthCm', 'PetalWidthCm', 'Species'],
[ 1, 5.1, 3.5, 1.4, 0.2, 'Iris-setosa'],
[ 2, 4.9, 3.0, 1.4, 0.2, 'Iris-setosa'],
[ 89, 5.6, 3.0, 4.1, 1.3, 'Iris-versicolor'],
[ 90, 5.5, 2.5, 4.0, 1.3, 'Iris-versicolor'],
[ 91, 5.5, 2.6, 4.4, 1.2, 'Iris-versicolor'],
];
final dataframe = DataFrame(data);
}
```
By default, the very first row is considered a header. If the data does not have a header, one can use autogenerated
header by providing `headerExists: false` config to the constructor:
```dart
import 'package:ml_dataframe/ml_dataframe.dart';
void main() {
final data = [
[1, 5.1, 3.5, 1.4, 0.2, 'Iris-setosa'],
[2, 4.9, 3.0, 1.4, 0.2, 'Iris-setosa'],
[89, 5.6, 3.0, 4.1, 1.3, 'Iris-versicolor'],
[90, 5.5, 2.5, 4.0, 1.3, 'Iris-versicolor'],
[91, 5.5, 2.6, 4.4, 1.2, 'Iris-versicolor'],
];
final dataframe = DataFrame(data, headerExists: false);
print(dataframe.header);
}
```
It outputs `['col_1', 'col_2', 'col_3', 'col_4', 'col_5', 'col_6']`. `col_` is a default prefix for the autogenerated
columns.
Also, if there are no header row in the data, one can use a predefined header:
```dart
import 'package:ml_dataframe/ml_dataframe.dart';
void main() {
final data = [
[1, 5.1, 3.5, 1.4, 0.2, 'Iris-setosa'],
[2, 4.9, 3.0, 1.4, 0.2, 'Iris-setosa'],
[89, 5.6, 3.0, 4.1, 1.3, 'Iris-versicolor'],
[90, 5.5, 2.5, 4.0, 1.3, 'Iris-versicolor'],
[91, 5.5, 2.6, 4.4, 1.2, 'Iris-versicolor'],
];
final dataframe = DataFrame(data, header: ['feature_1', 'feature_2', 'feature_3', 'feature_4', 'feature_5', 'feature_6']);
}
```
### `fromCsv` function
```dart
import 'package:ml_dataframe/ml_dataframe.dart';
void main() async {
final data = await fromCsv('path/to/csv/file.csv');
}
```
If the `csv` file does not have a header row, it's needed to provide the corresponding flag:
```dart
import 'package:ml_dataframe/ml_dataframe.dart';
void main() async {
final data = await fromCsv('path/to/csv/file.csv', headerExists: false);
}
```
### Restore a dataframe previously persisted as a json file - `fromJson` function
```dart
import 'package:ml_dataframe/ml_dataframe.dart';
void main() async {
final data = await fromJson('path/to/json/file.json');
}
```
This function works in conjunction with DataFrame `saveAsJson` method.
## Dataframes with prefilled data
In order to test data processing algorithms, one can use "toy" datasets. The library exposes several of them:
### Iris dataset - function `getIrisDataFrame`
One can create a dataframe filled with [Iris](https://www.kaggle.com/datasets/uciml/iris) data:
```dart
import 'package:ml_dataframe/ml_dataframe.dart';
void main() {
final data = getIrisDataFrame();
print(data);
// DataFrame (150 x 6)
// Id SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm Species
// ...
}
```
### Pima Indians diabetes dataset - function `getPimaIndiansDiabetesDataFrame`
One can create a dataframe filled with [Pima Indians diabetes](https://www.kaggle.com/datasets/uciml/pima-indians-diabetes-database) data:
```dart
import 'package:ml_dataframe/ml_dataframe.dart';
void main() {
final data = getPimaIndiansDiabetesDataFrame();
print(data);
// DataFrame (768 x 9)
// Pregnancies Glucose BloodPressure SkinThickness Insulin BMI DiabetesPedigreeFunction Age Outcome
// ...
}
```
### Red wine quality dataset - function `getWineQualityDataframe`
One can create a dataframe filled with [Red wine quality](https://www.kaggle.com/datasets/uciml/red-wine-quality-cortez-et-al-2009) data:
```dart
import 'package:ml_dataframe/ml_dataframe.dart';
void main() {
final data = getWineQualityDataframe();
print(data);
// DataFrame (1599 x 12)
// fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,pH,sulphates,alcohol,quality
// ...
}
```
### Boston housing dataset - function `getHousingDataframe`
One can create a dataframe filled with [Boston housing](http://lib.stat.cmu.edu/datasets/boston) data:
```dart
import 'package:ml_dataframe/ml_dataframe.dart';
void main() {
final data = getHousingDataframe();
print(data);
// DataFrame (506 x 14)
// CRIM ZN INDUS CHAS NOX RM ... MEDV
// 0.00632 18.0 2.31 0 0.538 6.575 ... 24.0
// 0.02731 0.0 7.07 0 0.469 6.421 ... 21.6
// 0.02729 0.0 7.07 0 0.469 7.185 ... 34.7
// 0.03237 0.0 2.18 0 0.458 6.998 ... 33.4
// 0.06905 0.0 2.18 0 0.458 7.147 ... 36.2
// ... ... ... ... ... ... ... ...
// 0.06263 0.0 11.93 0 0.573 6.593 ... 22.4
// 0.04527 0.0 11.93 0 0.573 6.12 ... 20.6
// 0.06076 0.0 11.93 0 0.573 6.976 ... 23.9
// 0.10959 0.0 11.93 0 0.573 6.794 ... 22.0
// 0.04741 0.0 11.93 0 0.573 6.03 ... 11.9
}
```
## Contacts
If you have questions, feel free to text me on
- [Twitter](https://twitter.com/ilgyrd)
- [Facebook](https://www.facebook.com/ilya.gyrdymov)
- [Linkedin](https://www.linkedin.com/in/gyrdym/)