https://github.com/pharo-ai/data-imputers
This project contains transformers for missing value imputation
https://github.com/pharo-ai/data-imputers
ai data data-science imputer pharo pharo-smalltalk smalltalk
Last synced: 5 months ago
JSON representation
This project contains transformers for missing value imputation
- Host: GitHub
- URL: https://github.com/pharo-ai/data-imputers
- Owner: pharo-ai
- License: mit
- Created: 2023-03-15T17:26:26.000Z (over 3 years ago)
- Default Branch: master
- Last Pushed: 2023-12-19T14:41:48.000Z (over 2 years ago)
- Last Synced: 2025-04-04T04:41:27.376Z (about 1 year ago)
- Topics: ai, data, data-science, imputer, pharo, pharo-smalltalk, smalltalk
- Language: Smalltalk
- Homepage:
- Size: 43.9 KB
- Stars: 1
- Watchers: 5
- Forks: 0
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Data Imputers
[](https://github.com/pharo-ai/data-imputers/actions/workflows/ci.yml)
[](https://coveralls.io/github/pharo-ai/data-imputers?branch=master)
[](https://pharo.org/download)
[](https://pharo.org/download)
[](https://pharo.org/download)
[](https://pharo.org/download)
[](https://raw.githubusercontent.com/PharoAI/data-imputers/master/LICENSE)
This is a Pharo library for transforming data to manage missing values.
## How to install it?
To install the project, go to the Playground (Ctrl+OW) in your [Pharo](https://pharo.org/) image and execute the following Metacello script (select it and press Do-it button or Ctrl+D):
```Smalltalk
Metacello new
baseline: 'AIDataImputers';
repository: 'github://pharo-ai/data-imputers/src';
load.
```
## How to depend on it?
If you want to add a dependency on this project to your project, include the following lines into your baseline method:
```Smalltalk
spec
baseline: 'AIDataImputers'
with: [ spec repository: 'github://pharo-ai/data-imputers/src' ].
```
If you are new to baselines and Metacello, check out this wonderful [Baselines](https://github.com/pharo-open-documentation/pharo-wiki/blob/master/General/Baselines.md) tutorial on Pharo Wiki.
## Quick Start
I can be used to fill the missing values of a collection like this:
```st
| collection|
collection := #( #( 7 2 5 6 ) #( 7 nil 5 9 ) #( 10 2 nil 6 ) ).
AISimpleImputer new
useMostFrequent;
fit: collection;
transform: collection "#( #( 7 2 5 6 ) #( 7 2 5 9 ) #( 10 2 5 6 ) )"
```
I can also be used to fill missing values of a [`DataFrame`](https://github.com/PolyMathOrg/DataFrame):
```st
AISimpleImputer mostFrequent fitAndTransform: (DataFrame withRows: #( #( 7 2 5 6 ) #( 7 nil 5 9 ) #( 10 2 nil 6 ) ))
```
## Simple Imputer
I am a simple imputer whose goal is to fill missing values in 2D collections.
To use me you need 3 steps. The first one is to define the value to replace the missing values with:
- `#useAverage` (Default value)
- `#useMedian`
- `#useMostFrequent`
- `#useContant:`
Then you need to use `#fit:` to allow me to compute the missing value. Once it is done, you can use `#statistics` to get those values.
Finally you can use `#transform:` to fill the missing values of a 2D collection.
An alternative is to use `#fitAndTransform:` if you want to fill the missing values using the same collection to compute them.
Example:
```st
| collection|
collection := #( #( 7 2 5 6 ) #( 7 nil 5 9 ) #( 10 2 nil 6 ) ).
AISimpleImputer new
useMostFrequent;
fit: collection;
statistics; "This methods allows to get the replacement values once the imputer is fitted. In this case => #( 7 2 5 6 )"
transform: collection "#( #( 7 2 5 6 ) #( 7 2 5 9 ) #( 10 2 5 6 ) )"
```
or
```st
AISimpleImputer new
useMostFrequent;
fitAndTransform: #( #( 7 2 5 6 ) #( 7 nil 5 9 ) #( 10 2 nil 6 ) ) "#( #( 7 2 5 6 ) #( 7 2 5 9 ) #( 10 2 5 6 ) )"
```
I can also be used with a [`DataFrame`](https://github.com/PolyMathOrg/DataFrame):
```st
AISimpleImputer new
useMostFrequent;
fitAndTransform: (DataFrame withRows: #( #( 7 2 5 6 ) #( 7 nil 5 9 ) #( 10 2 nil 6 ) ))
```
It is also possible to change the missing value in case you want to replace something else than nil values:
```st
AISimpleImputer new
useMostFrequent;
missingValue: false;
fitAndTransform: #( #( 7 2 5 6 ) #( 7 false 5 9 ) #( 10 2 false 6 ) ) "#( #( 7 2 5 6 ) #( 7 2 5 9 ) #( 10 2 5 6 ) )"
```