https://github.com/bjpop/gurita
A convenient and expressive tool for data analytics and plotting on the command line
https://github.com/bjpop/gurita
command-line data-analysis data-science pandas plotting python
Last synced: about 18 hours ago
JSON representation
A convenient and expressive tool for data analytics and plotting on the command line
- Host: GitHub
- URL: https://github.com/bjpop/gurita
- Owner: bjpop
- License: mit
- Created: 2019-12-20T06:54:10.000Z (about 6 years ago)
- Default Branch: master
- Last Pushed: 2024-01-11T07:19:59.000Z (about 2 years ago)
- Last Synced: 2024-01-12T16:51:59.107Z (about 2 years ago)
- Topics: command-line, data-analysis, data-science, pandas, plotting, python
- Language: Python
- Homepage:
- Size: 19.8 MB
- Stars: 4
- Watchers: 5
- Forks: 3
- Open Issues: 24
-
Metadata Files:
- Readme: README.md
- Contributing: docs/contributing.html
- License: LICENSE
Awesome Lists containing this project
README
# Gurita: a command line data analytics and plotting tool
Gurita is a command line tool for analysing and visualising tabular data in CSV or TSV format.
At its core Gurita provides a suite of commands, each of which carries out a common data analytics or plotting task.
**A unique and powerful feature of Gurita** is that commands to be chained together into flexible analysis pipelines. See the advanced example below.
It is designed to be fast and convenient, and is particularly suited to data exploration tasks. Input files with large numbers of rows (> millions) are readily supported.
Gurita commands are highly customisable, however sensible defaults are applied. Therefore simple tasks are easy to express
and complex tasks are possible.
Gurita is implemented in [Python](http://www.python.org/) and makes extensive use of the [Pandas](https://pandas.pydata.org/), [Seaborn](https://seaborn.pydata.org/), and [Scikit-learn](https://scikit-learn.org/) libraries for data processing and plot generation.
# Documentation
Please consult the [Gurita Documentation](https://bjpop.github.io/gurita/index.html) for detailed information about installation and usage.
# Examples
### Simple example
Box plot of `sepal_length` for each species in the classic [iris dataset](https://github.com/mwaskom/seaborn-data/blob/master/iris.csv/):
```bash
cat iris.csv | gurita box -x species -y sepal_length
```
### Advanced example
The following example illustrates Gurita's ability to chain commands together.
Commands in a chain are separated by the plus sign (+) and data flows from left to right in the chain.
```bash
cat iris.csv | gurita filter 'species != "virginica"' \
+ sample 0.9 \
+ pca \
+ scatter -x pc1 -y pc2 --hue species
```
In this example there are 4 commands that are executed in the following order:
1. The ``filter`` command selects all rows where ``species`` is not equal to ``virginica``.
2. The filtered rows are then passed to the ``sample`` command which randomly selects 90% of the remaining rows.
3. The sampled rows are then passed to the ``pca`` command which performs principal component analysis (PCA) as a data reduction step, yielding two extra columns in the data called ``pc1`` and ``pc2``.
4. Finally the pca-transformed data is passed to the `scatter` command which generates a scatter plot of ``pc1`` and ``pc2`` (the first two principal components).
# Licence
This program is released as open source software under the terms of [MIT License](https://raw.githubusercontent.com/bjpop/gurita/master/LICENSE).
# Authors
* [Bernie Pope](http://www.berniepope.id.au/)