Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.
https://github.com/bigmlcom/python

Python bindings for BigML.io
https://github.com/bigmlcom/python
api bigml machine-learning ml python
Last synced: 3 months ago
JSON representation
Python bindings for BigML.io
Host: GitHub
URL: https://github.com/bigmlcom/python
Owner: bigmlcom
License: apache-2.0
Created: 2012-05-01T20:42:24.000Z (about 12 years ago)
Default Branch: master
Last Pushed: 2024-02-28T23:52:48.000Z (4 months ago)
Last Synced: 2024-02-29T14:26:28.110Z (4 months ago)
Topics: api, bigml, machine-learning, ml, python
Language: Python
Homepage: http://bigml.com/api
Size: 24.9 MB
Stars: 274
Watchers: 41
Forks: 225
Open Issues: 0
Metadata Files:
- Readme: README.rst
- Changelog: HISTORY.rst
- License: LICENSE
Lists

awesome-stars - bigmlcom/python - Python bindings for BigML.io (Python)
alex-mikhalev-awesome-stars - python - Simple Python bindings for BigML.io (Python)
awesome-python-machine-learning - BigML Python Bindings - These BigML Python bindings allow you to interact with BigML.io, the API for BigML. You can use it to easily create, retrieve, list, update, and delete BigML resources (i.e., sources, datasets, models and, predictions). (Uncategorized / Uncategorized)
my-awesome-stars - bigmlcom/python - Python bindings for BigML.io (Python)
README

        BigML Python Bindings

=====================

`BigML `_ makes machine learning easy by taking care

of the details required to add data-driven decisions and predictive

power to your company. Unlike other machine learning services, BigML

creates

`beautiful predictive models `_ that

can be easily understood and interacted with.

These BigML Python bindings allow you to interact with

`BigML.io `_, the API

for BigML. You can use it to easily create, retrieve, list, update, and

delete BigML resources (i.e., sources, datasets, models and,

predictions). For additional information, see

the `full documentation for the Python

bindings on Read the Docs `_.

This module is licensed under the `Apache License, Version

2.0 `_.

Support

-------

Please report problems and bugs to our `BigML.io issue

tracker `_.

Discussions about the different bindings take place in the general

`BigML mailing list `_. Or join us

in our `Campfire chatroom `_.

Requirements

------------

Only ``Python 3`` versions are currently supported by these bindings.

Support for Python 2.7.X ended in version ``4.32.3``.

The basic third-party dependencies are the

`requests `_,

`unidecode `_,

`requests-toolbelt `_,

`bigml-chronos `_,

`msgpack `_,

`numpy `_ and

`scipy `_ libraries. These

libraries are automatically installed during the basic setup.

Support for Google App Engine has been added as of version 3.0.0,

using the `urlfetch` package instead of `requests`.

The bindings will also use ``simplejson`` if you happen to have it

installed, but that is optional: we fall back to Python's built-in JSON

libraries is ``simplejson`` is not found.

The bindings provide support to use the ``BigML`` platform to create, update,

get and delete resources, but also to produce local predictions using the

models created in ``BigML``. Most of them will be actionable with the basic

installation, but some additional dependencies are needed to use local

``Topic Models`` and Image Processing models. Please, refer to the

`Installation <#installation>`_ section for details.

OS Requirements

~~~~~~~~~~~~~~~

The basic installation of the bindings is compatible and can be used

on Linux and Windows based Operating Systems.

However, the extra options that allow working with

image processing models (``[images]`` and ``[full]``) are only supported

and tested on Linux-based Operating Systems.

For image models, Windows OS is not recommended and cannot be supported out of

the box, because the specific compiler versions or dlls required are

unavailable in general.

Installation

------------

To install the basic latest stable release with

`pip `_, please use:

.. code-block:: bash

    $ pip install bigml

Support for local Topic Distributions (Topic Models' predictions)

and local predictions for datasets that include Images will only be

available as extras, because the libraries used for that are not

usually available in all Operative Systems. If you need to support those,

please check the `Installation Extras <#installation-extras>`_ section.

Installation Extras

-------------------

Local Topic Distributions support can be installed using:

.. code-block:: bash

    pip install bigml[topics]

Images local predictions support can be installed using:

.. code-block:: bash

    pip install bigml[images]

The full set of features can be installed using:

.. code-block:: bash

    pip install bigml[full]

WARNING: Mind that installing these extras can require some extra work, as

explained in the `Requirements <#requirements>`_ section.

You can also install the development version of the bindings directly

from the Git repository

.. code-block:: bash

    $ pip install -e git://github.com/bigmlcom/python.git#egg=bigml_python

Running the Tests

-----------------

The tests will be run using `pytest `_.

You'll need to set up your authentication

via environment variables, as explained

in the authentication section. Also some of the tests need other environment

variables like ``BIGML_ORGANIZATION`` to test calls when used by Organization

members and ``BIGML_EXTERNAL_CONN_HOST``, ``BIGML_EXTERNAL_CONN_PORT``,

``BIGML_EXTERNAL_CONN_DB``, ``BIGML_EXTERNAL_CONN_USER``,

``BIGML_EXTERNAL_CONN_PWD`` and ``BIGML_EXTERNAL_CONN_SOURCE``

in order to test external data connectors.

With that in place, you can run the test suite simply by issuing

.. code-block:: bash

    $ pytest

Additionally, `Tox `_ can be used to

automatically run the test suite in virtual environments for all

supported Python versions.  To install Tox:

.. code-block:: bash

    $ pip install tox

Then run the tests from the top-level project directory:

.. code-block:: bash

    $ tox

Importing the module

--------------------

To import the module:

.. code-block:: python

    import bigml.api

Alternatively you can just import the BigML class:

.. code-block:: python

    from bigml.api import BigML

Authentication

--------------

All the requests to BigML.io must be authenticated using your username

and `API key `_ and are always

transmitted over HTTPS.

This module will look for your username and API key in the environment

variables ``BIGML_USERNAME`` and ``BIGML_API_KEY`` respectively.

Unix and MacOS

--------------

You can

add the following lines to your ``.bashrc`` or ``.bash_profile`` to set

those variables automatically when you log in:

.. code-block:: bash

    export BIGML_USERNAME=myusername

    export BIGML_API_KEY=ae579e7e53fb9abd646a6ff8aa99d4afe83ac291

refer to the next chapters to know how to do that in other operating systems.

With that environment set up, connecting to BigML is a breeze:

.. code-block:: python

    from bigml.api import BigML

    api = BigML()

Otherwise, you can initialize directly when instantiating the BigML

class as follows:

.. code-block:: python

    api = BigML('myusername', 'ae579e7e53fb9abd646a6ff8aa99d4afe83ac291')

These credentials will allow you to manage any resource in your user

environment.

In BigML a user can also work for an ``organization``.

In this case, the organization administrator should previously assign

permissions for the user to access one or several particular projects

in the organization.

Once permissions are granted, the user can work with resources in a project

according to his permission level by creating a special constructor for

each project. The connection constructor in this case

should include the ``project ID``:

.. code-block:: python

    api = BigML('myusername', 'ae579e7e53fb9abd646a6ff8aa99d4afe83ac291',

                project='project/53739b98d994972da7001d4a')

If the project used in a connection object

does not belong to an existing organization but is one of the

projects under the user's account, all the resources

created or updated with that connection will also be assigned to the

specified project.

When the resource to be managed is a ``project`` itself, the connection

needs to include the corresponding``organization ID``:

.. code-block:: python

    api = BigML('myusername', 'ae579e7e53fb9abd646a6ff8aa99d4afe83ac291',

                organization='organization/53739b98d994972da7025d4a')

Authentication on Windows

-------------------------

The credentials should be permanently stored in your system using

.. code-block:: bash

    setx BIGML_USERNAME myusername

    setx BIGML_API_KEY ae579e7e53fb9abd646a6ff8aa99d4afe83ac291

Note that ``setx`` will not change the environment variables of your actual

console, so you will need to open a new one to start using them.

Authentication on Jupyter Notebook

----------------------------------

You can set the environment variables using the ``%env`` command in your

cells:

.. code-block:: bash

    %env BIGML_USERNAME=myusername

    %env BIGML_API_KEY=ae579e7e53fb9abd646a6ff8aa99d4afe83ac291

Alternative domains

-------------------

The main public domain for the API service is ``bigml.io``, but there are some

alternative domains, either for Virtual Private Cloud setups or

the australian subdomain (``au.bigml.io``). You can change the remote

server domain

to the VPC particular one by either setting the ``BIGML_DOMAIN`` environment

variable to your VPC subdomain:

.. code-block:: bash

    export BIGML_DOMAIN=my_VPC.bigml.io

or setting it when instantiating your connection:

.. code-block:: python

    api = BigML(domain="my_VPC.bigml.io")

The corresponding SSL REST calls will be directed to your private domain

henceforth.

You can also set up your connection to use a particular PredictServer

only for predictions. In order to do so, you'll need to specify a ``Domain``

object, where you can set up the general domain name as well as the

particular prediction domain name.

.. code-block:: python

    from bigml.domain import Domain

    from bigml.api import BigML

    domain_info = Domain(prediction_domain="my_prediction_server.bigml.com",

                         prediction_protocol="http")

    api = BigML(domain=domain_info)

Finally, you can combine all the options and change both the general domain

server, and the prediction domain server.

.. code-block:: python

    from bigml.domain import Domain

    from bigml.api import BigML

    domain_info = Domain(domain="my_VPC.bigml.io",

                         prediction_domain="my_prediction_server.bigml.com",

                         prediction_protocol="https")

    api = BigML(domain=domain_info)

Some arguments for the Domain constructor are more unsual, but they can also

be used to set your special service endpoints:

- protocol (string) Protocol for the service

  (when different from HTTPS)

- verify (boolean) Sets on/off the SSL verification

- prediction_verify (boolean) Sets on/off the SSL verification

  for the prediction server (when different from the general

  SSL verification)

**Note** that the previously existing ``dev_mode`` flag:

.. code-block:: python

    api = BigML(dev_mode=True)

that caused the connection to work with the Sandbox ``Development Environment``

has been **deprecated** because this environment does not longer exist.

The existing resources that were previously

created in this environment have been moved

to a special project in the now unique ``Production Environment``, so this

flag is no longer needed to work with them.

Quick Start

-----------

Imagine that you want to use `this csv

file `_ containing the `Iris

flower dataset `_ to

predict the species of a flower whose ``petal length`` is ``2.45`` and

whose ``petal width`` is ``1.75``. A preview of the dataset is shown

below. It has 4 numeric fields: ``sepal length``, ``sepal width``,

``petal length``, ``petal width`` and a categorical field: ``species``.

By default, BigML considers the last field in the dataset as the

objective field (i.e., the field that you want to generate predictions

for).

::

    sepal length,sepal width,petal length,petal width,species

    5.1,3.5,1.4,0.2,Iris-setosa

    4.9,3.0,1.4,0.2,Iris-setosa

    4.7,3.2,1.3,0.2,Iris-setosa

    ...

    5.8,2.7,3.9,1.2,Iris-versicolor

    6.0,2.7,5.1,1.6,Iris-versicolor

    5.4,3.0,4.5,1.5,Iris-versicolor

    ...

    6.8,3.0,5.5,2.1,Iris-virginica

    5.7,2.5,5.0,2.0,Iris-virginica

    5.8,2.8,5.1,2.4,Iris-virginica

You can easily generate a prediction following these steps:

.. code-block:: python

    from bigml.api import BigML

    api = BigML()

    source = api.create_source('./data/iris.csv')

    dataset = api.create_dataset(source)

    model = api.create_model(dataset)

    prediction = api.create_prediction(model, \

        {"petal width": 1.75, "petal length": 2.45})

You can then print the prediction using the ``pprint`` method:

.. code-block:: python

    >>> api.pprint(prediction)

    species for {"petal width": 1.75, "petal length": 2.45} is Iris-setosa

Certainly, any of the resources created in BigML can be configured using

several arguments described in the `API documentation `_.

Any of these configuration arguments can be added to the ``create`` method

as a dictionary in the last optional argument of the calls:

.. code-block:: python

    from bigml.api import BigML

    api = BigML()

    source_args = {"name": "my source",

         "source_parser": {"missing_tokens": ["NULL"]}}

    source = api.create_source('./data/iris.csv', source_args)

    dataset_args = {"name": "my dataset"}

    dataset = api.create_dataset(source, dataset_args)

    model_args = {"objective_field": "species"}

    model = api.create_model(dataset, model_args)

    prediction_args = {"name": "my prediction"}

    prediction = api.create_prediction(model, \

        {"petal width": 1.75, "petal length": 2.45},

        prediction_args)

The ``iris`` dataset has a small number of instances, and usually will be

instantly created, so the ``api.create_`` calls will probably return the

finished resources outright. As BigML's API is asynchronous,

in general you will need to ensure

that objects are finished before using them by using ``api.ok``.

.. code-block:: python

    from bigml.api import BigML

    api = BigML()

    source = api.create_source('./data/iris.csv')

    api.ok(source)

    dataset = api.create_dataset(source)

    api.ok(dataset)

    model = api.create_model(dataset)

    api.ok(model)

    prediction = api.create_prediction(model, \

        {"petal width": 1.75, "petal length": 2.45})

Note that the prediction

call is not followed by the ``api.ok`` method. Predictions are so quick to be

generated that, unlike the

rest of resouces, will be generated synchronously as a finished object.

The example assumes that your objective field (the one you want to predict)

is the last field in the dataset. If that's not he case, you can explicitly

set the name of this field in the creation call using the ``objective_field``

argument:

.. code-block:: python

    from bigml.api import BigML

    api = BigML()

    source = api.create_source('./data/iris.csv')

    api.ok(source)

    dataset = api.create_dataset(source)

    api.ok(dataset)

    model = api.create_model(dataset, {"objective_field": "species"})

    api.ok(model)

    prediction = api.create_prediction(model, \

        {'sepal length': 5, 'sepal width': 2.5})

You can also generate an evaluation for the model by using:

.. code-block:: python

    test_source = api.create_source('./data/test_iris.csv')

    api.ok(test_source)

    test_dataset = api.create_dataset(test_source)

    api.ok(test_dataset)

    evaluation = api.create_evaluation(model, test_dataset)

    api.ok(evaluation)

If you set the ``storage`` argument in the ``api`` instantiation:

.. code-block:: python

    api = BigML(storage='./storage')

all the generated, updated or retrieved resources will be automatically

saved to the chosen directory.

Alternatively, you can use the ``export`` method to explicitly

download the JSON information

that describes any of your resources in BigML to a particular file:

.. code-block:: python

    api.export('model/5acea49a08b07e14b9001068',

               filename="my_dir/my_model.json")

This example downloads the JSON for the model and stores it in

the ``my_dir/my_model.json`` file.

In the case of models that can be represented in a `PMML` syntax, the

export method can be used to produce the corresponding `PMML` file.

.. code-block:: python

    api.export('model/5acea49a08b07e14b9001068',

               filename="my_dir/my_model.pmml",

               pmml=True)

You can also retrieve the last resource with some previously given tag:

.. code-block:: python

     api.export_last("foo",

                     resource_type="ensemble",

                     filename="my_dir/my_ensemble.json")

which selects the last ensemble that has a ``foo`` tag. This mechanism can

be specially useful when retrieving retrained models that have been created

with a shared unique keyword as tag.

For a descriptive overview of the steps that you will usually need to

follow to model

your data and obtain predictions, please see the `basic Workflow sketch

`_

document. You can also check other simple examples in the following documents:

- `model 101 <101_model.html>`_

- `logistic regression 101 <101_logistic_regression.html>`_

- `linear regression 101 <101_linear_regression.html>`_

- `ensemble 101 <101_ensemble.html>`_

- `cluster 101 <101_cluster>`_

- `anomaly detector 101 <101_anomaly.html>`_

- `association 101 <101_association.html>`_

- `topic model 101 <101_topic_model.html>`_

- `deepnet 101 <101_deepnet.html>`_

- `time series 101 <101_ts.html>`_

- `fusion 101 <101_fusion.html>`_

- `scripting 101 <101_scripting.html>`_

Additional Information

----------------------

We've just barely scratched the surface. For additional information, see

the `full documentation for the Python

bindings on Read the Docs `_.

Alternatively, the same documentation can be built from a local checkout

of the source by installing `Sphinx `_

(``$ pip install sphinx``) and then running

.. code-block:: bash

    $ cd docs

    $ make html

Then launch ``docs/_build/html/index.html`` in your browser.

How to Contribute

-----------------

Please follow the next steps:

  1. Fork the project on github.com.

  2. Create a new branch.

  3. Commit changes to the new branch.

  4. Send a `pull request `_.

For details on the underlying API, see the

`BigML API documentation `_.