https://github.com/blaze/blaze

NumPy and Pandas interface to Big Data
https://github.com/blaze/blaze

Last synced: 3 months ago
JSON representation

NumPy and Pandas interface to Big Data

Host: GitHub
URL: https://github.com/blaze/blaze
Owner: blaze
License: bsd-3-clause
Created: 2012-10-26T14:25:22.000Z (over 13 years ago)
Default Branch: master
Last Pushed: 2023-09-29T10:03:58.000Z (over 2 years ago)
Last Synced: 2024-10-29T15:32:43.792Z (over 1 year ago)
Language: Python
Homepage: blaze.pydata.org
Size: 21.9 MB
Stars: 3,188
Watchers: 195
Forks: 392
Open Issues: 267
Metadata Files:
- Readme: README.rst
- License: LICENSE.txt

Awesome Lists containing this project

fintech-awesome-libraries - Blaze - NumPy and Pandas interface to Big Data. (Data Analysis)
favorite-link - NumPy 和 Pandas 与大数据的接口。
awesome-python-data-science - blaze - NumPy and pandas interface to Big Data. <img height="20" src="img/pandas_big.png" alt="pandas compatible"> (Data Manipulation / Data Frames)
awesome-python - Blaze - NumPy and Pandas interface to Big Data. (Data Analysis)
awesome-python - Blaze - NumPy and Pandas interface to Big Data. (Data Analysis)
awesome-python-machine-learning-resources - GitHub - 33% open · ⏱️ 15.08.2019): (数据容器和结构)
awesome-vector-databases - Blaze - An emerging solution diversifying the options available to data engineers in the vector database landscape. ([Read more](/details/blaze.md)) `vector database` `emerging` `data engineering` (Vector Database Engines)
awesome-python-resources - GitHub - 33% open · ⏱️ 15.08.2019): (科学计算和数据分析)
awesome-machine-learning - Blaze - NumPy and Pandas interface to Big Data. (Python / General-Purpose Machine Learning)
awesome-python - Blaze - NumPy and Pandas interface to Big Data. (Data Analysis)
awesome-machine-learning - Blaze - NumPy and Pandas interface to Big Data. (Python / General-Purpose Machine Learning)
python-awesome - Blaze - NumPy and Pandas interface to Big Data. (Data Analysis)
awesome-machine-learning - Blaze - NumPy and Pandas interface to Big Data. (Python / General-Purpose Machine Learning)
my-awesome-starred - blaze - NumPy and Pandas interface to Big Data (Python)
awesome-python - Blaze - NumPy and Pandas interface to Big Data. (Data Analysis)
awesome-python - Blaze - NumPy and Pandas interface to Big Data ` 📝 2 years ago ` (Data Analysis [🔝](#readme))
awesome-python-machine-learning - Blaze - Blaze translates a subset of modified NumPy and Pandas-like syntax to databases and other computing systems. (Uncategorized / Uncategorized)
awesome-python - Blaze - NumPy and Pandas interface to Big Data. (Data Analysis)
fucking-awesome-machine-learning - Blaze - NumPy and Pandas interface to Big Data. (Python / General-Purpose Machine Learning)
fucking-awesome-python - :octocat: Blaze - :star: 3154 :fork_and_knife: 390 - NumPy and Pandas interface to Big Data. (Data Analysis)
fucking_awesome_python - blaze - NumPy and Pandas interface to Big Data. (Science and Data Analysis)
awesome-machine-learning - Blaze - NumPy and Pandas interface to Big Data. (Python / General-Purpose Machine Learning)
awesome-python - Blaze - NumPy and Pandas interface to Big Data. (Data Analysis)
awesome-machine-learning - Blaze - NumPy and Pandas interface to Big Data. (Python / General-Purpose Machine Learning)
awesome-etl - Blaze - "translates a subset of modified NumPy and Pandas-like syntax to databases and other computing systems." (Python / Libraries)
Awesome-Python - Blaze - NumPy and Pandas interface to Big Data. (Data Analysis)
Python-Awesome - Blaze - NumPy and Pandas interface to Big Data. (Data Analysis)
awesome-python-data-science - blaze - NumPy and pandas interface to Big Data. <img height="20" src="img/pandas_big.png" alt="pandas compatible"> (Data Manipulation / Data Frames)
awesome-python - blaze - NumPy and Pandas interface to Big Data (Awesome Python / Data Analysis)
awesome-machine-learning - Blaze - NumPy and Pandas interface to Big Data. (Python / General-Purpose Machine Learning)
awesome-python - Blaze - NumPy and Pandas interface to Big Data. (Data Analysis)
awesome-advanced-metering-infrastructure - Blaze - NumPy and Pandas interface to Big Data. (Python / General-Purpose Machine Learning)
awesome-python-data-science - blaze - NumPy and Pandas for databases. (Scientific)
git-github.com-vinta-awesome-python - Blaze - NumPy and Pandas interface to Big Data. (Data Analysis)

README

          .. image:: https://raw.github.com/blaze/blaze/master/docs/source/svg/blaze_med.png

   :align: center

|Build Status| |Coverage Status| |Join the chat at

https://gitter.im/blaze/blaze|

**Blaze** translates a subset of modified NumPy and Pandas-like syntax

to databases and other computing systems. Blaze allows Python users a

familiar interface to query data living in other data storage systems.

Example

=======

We point blaze to a simple dataset in a foreign database (PostgreSQL).

Instantly we see results as we would see them in a Pandas DataFrame.

.. code:: python

    >>> import blaze as bz

    >>> iris = bz.Data('postgresql://localhost::iris')

    >>> iris

        sepal_length  sepal_width  petal_length  petal_width      species

    0            5.1          3.5           1.4          0.2  Iris-setosa

    1            4.9          3.0           1.4          0.2  Iris-setosa

    2            4.7          3.2           1.3          0.2  Iris-setosa

    3            4.6          3.1           1.5          0.2  Iris-setosa

These results occur immediately. Blaze does not pull data out of

Postgres, instead it translates your Python commands into SQL (or

others.)

.. code:: python

    >>> iris.species.distinct()

               species

    0      Iris-setosa

    1  Iris-versicolor

    2   Iris-virginica

    >>> bz.by(iris.species, smallest=iris.petal_length.min(),

    ...                      largest=iris.petal_length.max())

               species  largest  smallest

    0      Iris-setosa      1.9       1.0

    1  Iris-versicolor      5.1       3.0

    2   Iris-virginica      6.9       4.5

This same example would have worked with a wide range of databases,

on-disk text or binary files, or remote data.

What Blaze is not

=================

Blaze does not perform computation. It relies on other systems like SQL,

Spark, or Pandas to do the actual number crunching. It is not a

replacement for any of these systems.

Blaze does not implement the entire NumPy/Pandas API, nor does it

interact with libraries intended to work with NumPy/Pandas. This is the

cost of using more and larger data systems.

Blaze is a good way to inspect data living in a large database, perform

a small but powerful set of operations to query that data, and then

transform your results into a format suitable for your favorite Python

tools.

In the Abstract

===============

Blaze separates the computations that we want to perform:

.. code:: python

    >>> accounts = Symbol('accounts', 'var * {id: int, name: string, amount: int}')

    >>> deadbeats = accounts[accounts.amount < 0].name

From the representation of data

.. code:: python

    >>> L = [[1, 'Alice',   100],

    ...      [2, 'Bob',    -200],

    ...      [3, 'Charlie', 300],

    ...      [4, 'Denis',   400],

    ...      [5, 'Edith',  -500]]

Blaze enables users to solve data-oriented problems

.. code:: python

    >>> list(compute(deadbeats, L))

    ['Bob', 'Edith']

But the separation of expression from data allows us to switch between

different backends.

Here we solve the same problem using Pandas instead of Pure Python.

.. code:: python

    >>> df = DataFrame(L, columns=['id', 'name', 'amount'])

    >>> compute(deadbeats, df)

    1      Bob

    4    Edith

    Name: name, dtype: object

Blaze doesn't compute these results, Blaze intelligently drives other

projects to compute them instead. These projects range from simple Pure

Python iterators to powerful distributed Spark clusters. Blaze is built

to be extended to new systems as they evolve.

Getting Started

===============

Blaze is available on conda or on PyPI

::

    conda install blaze

    pip install blaze

Development builds are accessible

::

    conda install blaze -c blaze

    pip install http://github.com/blaze/blaze --upgrade

You may want to view `the docs `__, `the

tutorial `__, `some

blogposts `__, or the `mailing list

archives `__.

Development setup

=================

The quickest way to install all Blaze dependencies with ``conda`` is as

follows

::

    conda install blaze spark -c blaze -c anaconda-cluster -y

    conda remove odo blaze blaze-core datashape -y

After running these commands, clone ``odo``, ``blaze``, and ``datashape`` from

GitHub directly.  These three projects release together.  Run ``python setup.py

develop`` to make development installations of each.

License

=======

Released under BSD license. See `LICENSE.txt `__ for

details.

Blaze development is sponsored by Continuum Analytics.

.. |Build Status| image:: https://travis-ci.org/blaze/blaze.png

   :target: https://travis-ci.org/blaze/blaze

.. |Coverage Status| image:: https://coveralls.io/repos/blaze/blaze/badge.png

   :target: https://coveralls.io/r/blaze/blaze

.. |Join the chat at https://gitter.im/blaze/blaze| image:: https://badges.gitter.im/Join%20Chat.svg

   :target: https://gitter.im/blaze/blaze?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/blaze/blaze

Awesome Lists containing this project

README