https://github.com/osl-pocs/skdata
Python tools for data analysis
https://github.com/osl-pocs/skdata
data data-analysis data-science open-data python
Last synced: about 2 months ago
JSON representation
Python tools for data analysis
- Host: GitHub
- URL: https://github.com/osl-pocs/skdata
- Owner: osl-pocs
- License: mit
- Created: 2016-08-14T20:28:05.000Z (over 9 years ago)
- Default Branch: master
- Last Pushed: 2019-05-21T14:03:59.000Z (over 6 years ago)
- Last Synced: 2025-04-23T06:49:22.472Z (9 months ago)
- Topics: data, data-analysis, data-science, open-data, python
- Language: Jupyter Notebook
- Size: 387 KB
- Stars: 19
- Watchers: 4
- Forks: 2
- Open Issues: 5
-
Metadata Files:
- Readme: README.rst
- Changelog: HISTORY.rst
- Contributing: CONTRIBUTING.rst
- License: LICENSE
Awesome Lists containing this project
README
===============================
SciKit Data
===============================
.. image:: https://img.shields.io/pypi/v/scikit-data.svg
:target: https://pypi.python.org/pypi/scikit-data
.. image:: https://img.shields.io/travis/OpenDataScienceLab/skdata.svg
:target: https://travis-ci.org/OpenDataScienceLab/skdata
.. image:: https://readthedocs.org/projects/skdata/badge/?version=latest
:target: https://skdata.readthedocs.io/en/latest/?badge=latest
:alt: Documentation Status
Conda package current release info
==================================
.. image:: https://anaconda.org/conda-forge/scikit-data/badges/version.svg
:target: https://anaconda.org/conda-forge/scikit-data
:alt: Anaconda-Server Badge
.. image:: https://anaconda.org/conda-forge/scikit-data/badges/downloads.svg
:target: https://anaconda.org/conda-forge/scikit-data
:alt: Anaconda-Server Badge
About SciKit Data
=================
The propose of this library is to allow the data analysis process more easy and automatic.
General objectives:
* reduce boilerplate code;
* reduce time spent on data analysis tasks and;
* offer a reproducible data analysis workflow.
Generally, there is a lot of boilerplate code on data analysis task that could be resolved with reproducible mechanisms and easy data visualization methods. Another point is related to data publish. A lot of data analysts doesn't know about open data repositories or doesn't consider that in his/her scientific workflow communication.
Specifics objectives:
* optimize data visualization;
* integration with open data repositories to publish data;
* reproducibility on data analysis tasks through storing and recovery operations;
SkData should integrate with Pandas library (Python).
Books used as reference to guide this project:
----------------------------------------------
- https://www.packtpub.com/big-data-and-business-intelligence/clean-data
- https://www.packtpub.com/big-data-and-business-intelligence/python-data-analysis
- https://www.packtpub.com/big-data-and-business-intelligence/mastering-machine-learning-scikit-learn
- https://www.packtpub.com/big-data-and-business-intelligence/practical-data-analysis-second-edition
Some other materials used as reference:
---------------------------------------
- https://github.com/rsouza/MMD/blob/master/notebooks/3.1_Kaggle_Titanic.ipynb
- https://github.com/agconti/kaggle-titanic/blob/master/Titanic.ipynb
- https://github.com/donnemartin/data-science-ipython-notebooks/blob/master/kaggle/titanic.ipynb
Installing scikit-data
======================
Using conda
-----------
Installing `scikit-data` from the `conda-forge` channel can be achieved by adding `conda-forge` to your channels with:
.. code-block:: console
$ conda config --add channels conda-forge
Once the `conda-forge` channel has been enabled, `scikit-data` can be installed with:
.. code-block:: console
$ conda install scikit-data
It is possible to list all of the versions of `scikit-data` available on your platform with:
.. code-block:: console
$ conda search scikit-data --channel conda-forge
Using pip
---------
To install scikit-data, run this command in your terminal:
.. code-block:: console
$ pip install skdata
If you don't have `pip`_ installed, this `Python installation guide`_ can guide
you through the process.
.. _pip: https://pip.pypa.io
.. _Python installation guide: http://docs.python-guide.org/en/latest/starting/installation/
More Information
----------------
* License: MIT
* Documentation: https://skdata.readthedocs.io
References
----------
* CUESTA, Hector; KUMAR, Sampath. Practical Data Analysis. Packt Publishing Ltd, 2016.
**Electronic materials**
* [1] http://www.datasciencecentral.com/profiles/blogs/introduction-to-outlier-detection-methods