https://github.com/aporia-ai/aporia-importer
🏋️♀️ Import inference data from Amazon S3, Azure Blob Storage, Google Cloud Storage and others to Aporia
https://github.com/aporia-ai/aporia-importer
amazon-s3 azure-blob-storage csv dask google-cloud-storage importer parquet
Last synced: 6 months ago
JSON representation
🏋️♀️ Import inference data from Amazon S3, Azure Blob Storage, Google Cloud Storage and others to Aporia
- Host: GitHub
- URL: https://github.com/aporia-ai/aporia-importer
- Owner: aporia-ai
- License: mit
- Created: 2021-06-13T13:32:06.000Z (over 4 years ago)
- Default Branch: main
- Last Pushed: 2022-09-07T15:07:59.000Z (about 3 years ago)
- Last Synced: 2025-01-31T06:11:17.709Z (9 months ago)
- Topics: amazon-s3, azure-blob-storage, csv, dask, google-cloud-storage, importer, parquet
- Language: Python
- Homepage:
- Size: 282 KB
- Stars: 7
- Watchers: 4
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# 🏋️♀️ Aporia Importer

A small utility to import ML production data from your cloud storage provider and monitor it using [Aporia's monitoring platform](https://www.aporia.com/).
## Installation
```
pip install "aporia-importer[all]"
```If you only wish to install the dependencies for a specific cloud provider, you can use
```
pip install "aporia-importer[s3]"
```## Usage
```
aporia-importer /path/to/config.yaml
````aporia-importer` requires a config file as a parameter, see [configuration](#configuration)
## Configuration
`aporia-importer` uses a YAML configuration file.
There are sample configurations in the [examples](./examples) directory.Currently, the configuration requires defining a model version schema manually - the schema is a mapping of field names to field types (see [here](https://app.aporia.com/docs/getting-started/concepts/#field-types)). You can find more details [in our docs](https://app.aporia.com/docs/getting-started/integrate-your-ml-model/#step-3-create-model-version).
The following table describes all of the configuration fields in detail:
| Field | Required | Description
| - | - | -
| source | True | The path to the files you wish to upload, e.g. s3://my-bucket/my_file.csv. Glob patterns are supported.
| format | True | The format of the files you wish to upload, see [here](#supported-data-formats)
| token | True | Your Aporia authentication token
| environment | True | The environment in which Aporia will be initialized (e.g production, staging)
| model_id | True | The ID of the [model](https://app.aporia.com/docs/getting-started/concepts/#models) that the data is associated with
| model_version.name | True | A name for the [model version](https://app.aporia.com/docs/getting-started/concepts/#model-version-schema) to create
| model_version.type | True | The [type](https://app.aporia.com/docs/getting-started/concepts/#model-types) of the model (regression, binary, multiclass)
| predictions | True | A mapping of [prediction fields](https://app.aporia.com/docs/getting-started/concepts/#predictions) to their field types
| features | True | A mapping of [feature fields](https://app.aporia.com/docs/getting-started/concepts/#features) to their field types
| raw_inputs | False | A mapping of [raw inputs fields](https://app.aporia.com/docs/getting-started/concepts/#raw-inputs) to their field types
| aporia_host | False | Aporia server URL. Defaults to app.aporia.com
| aporia_port | False | Aporia server port. Defaults to 443## Supported Data Sources
* Local files
* S3## Supported Data Formats
* csv
* parquet## How does it work?
`aporia-importer` uses [dask](https://github.com/dask/dask) to load data from various cloud providers, and the [Aporia sdk](https://app.aporia.com/docs/getting-started/integrate-your-ml-model/#step-2-initialize-the-aporia-sdk) to report the data to Aporia.