An open API service indexing awesome lists of open source software.

https://github.com/camara94/data-science-tools

In this course you’ll receive one of the most comprehensive overviews on open source and commercial tooling available for data science, and the skills on how to use them.
https://github.com/camara94/data-science-tools

data-science-tools python r

Last synced: 4 months ago
JSON representation

In this course you’ll receive one of the most comprehensive overviews on open source and commercial tooling available for data science, and the skills on how to use them.

Awesome Lists containing this project

README

          

# Data Science Tools
In this course you’ll receive one of the most comprehensive overviews on open source and commercial tooling available for data science, and the skills on how to use them.

## Introduction to Tools for Data Science
### Key Points
![keypoints](images/keypoint.png)
### Data is central to data science
![data](images/data.png)
### Data science requires Programming
![datascience](images/datasciencerequire.png)
### Automation with Data Science Tooling
![automation](images/automation.png)
### Visual Programming & Modeling
![visual](images/visualprogramming.png)
### Open Source & Commercial Tools
![opensource](images/opensourceandcommercialtools.png)
### Data Science on Cloud
![datascience](images/datascienceoncloud.png)
### Overview
![overview](images/overview.png)
## Course Overview
What are some of the most popular data science tools, how do you use them, and what are their features?

In this course, you'll learn about the day-to-day experiences of Data Scientists. You’ll be introduced to some of the programming languages commonly used, including Python, R, Scala, and SQL. You’ll work with the tools that professional Data Scientists work with, like Jupyter Notebooks, RStudio IDE, and others. You will learn about what each tool is used for, what languages they can execute, and their features and limitations. With the tools hosted in the cloud on Cognitive Class Labs, you will be able to use each tool and follow instructions to run simple code in Python, R, or Scala.
## Prerequisite
We have created this course so that anyone with basic computer skills would be able to learn about the tools for data science. The only prerequisite for this course is your desire to learn.
## Changelog
* 08 Oct 2020 (Aije Egwaikhide): Re-ordered course and dividd modules into 7 parts

* 01 Sept 2020: Updated version of the course published on edX.org.

* 01 Sept 2020 (Sonia Gupta): Replaced links to labs with links from SN Asset Library.

* 23 Mar 2020: Initial version of the course published on edX.org.

## Syllabus
* Module 0 - Welcome and Course Introduction
* Module 1 - Languages of Data Science
* Module 2 - Data Science Tools
* Module 3 - Packages, APIs, Data Sets and Models
* Module 4 - GitHub
* Module 5 - Jupyter Notebooks and JupyterLab
* Module 6 - RStudio IDE
* Module 7 - Watson Studio
## Module 1 - Language of Data Science
### Languages of Data Science
### Which language should I learn ?
![whichlanguage](images/whichlanguage.png)
### So many languages recommended in Data Science!
![manylanguage](images/manylanguage.png)
### So many popular languages!
![popularlanguage](images/popularlanguage.png)
### Roles in Data Science
![roleindatascience](images/roleindatascience.png)
### Lesson 1: Outline
![lesson1](images/lesson1.png)
### Introduction to Python
#### Diversity and Inclusion Efforts
![diversity](images/diversity.png)
![diversity](images/diversity2.png)
#### Who is Python for ?
![whoispythonfor](images/whoispythonfor.png)
#### What makes Python great:
![whatmakespythongreat](images/whatmakespythongreat.png)
### Introduction to R Language
#### Open Source Vs. Free Software
![opensourcevsfreesoftware](images/opensourcevsfreesoftware.png)
#### Back to the joys of R...
![rstudio](images/rstudio.png)
#### Who is R for ?
![whoisrfor](images/whoisrfor.png)
#### What makes R great:
![what makes R great](images/whatmakesrgreat.png)
#### Global Communicaties
![globalcommunities](images/globalcommunities.png)
### Introduction to SQL
#### What is SQL ?
![whatissql](images/whatissql.png)
#### Relational Databases
![relational databases](images/relationaldatabases.png)
#### SQL Elements
![sql elements](images/sqlelements.png)
#### What makes SQL great:
![what makes sql great](images/whatmakessqlgreat.png)
#### Many SQL Databases Available
![many sql database](images/manysqldatabases.png)
### Other Languages
![other languages](images/otherlanguages.png)
#### Java
![java](images/java1.png)
#### Scala
![scala](images/scala.png)
#### C++
![c++](images/cplusplus.png)
#### JS
![js](images/js.png)
#### Julia
![julia](images/julia.png)
## Module 2 - Data Science Tools
### Categories of Data Science Tools
#### Data Management
![datamanagement](images/datamanagement.png)
#### Data Integration and Transformation
![data integration and transformation](images/dataintegrationandtransformation.png)
#### Data Visualization
![data visualization](images/datavisualization.png)
#### Data Modeling
![data modeling](images/datamodeling.png)
#### Model Deployement
![modeldeployement](images/modeldeployement.png)
#### Model Monitoring and Assessement
![model monitoring](images/modelmonitoring.png)
#### Code Asset Management
![code asset](images/codeasset.png)
#### Data Asset Management
![dataassetmanagement](images/dataassetmanagement.png)
#### Development Environments
![development environment](images/developmentenvironment.png)
#### Execution Environments
![executionenvironment](images/executionenvironment.png)
#### Fully Integrated Visual Tools
![fully integrated visual tools](images/fullyintegratedvisualtools.png)
### Open Source Tools for Data Science - Part 1
#### Data Management Tools
![data management](images/dataassetmanagement2.png)
#### Data Integration and Transformation Tools
![data](images/dataintegrationandtransformation2.png)
#### Data Visualization Tools
![data visualization](images/datavisualization2.png)
#### Model Deployement Tools
![model deployement tools](images/modeldeployement2.png)
#### Model Monitoring and Assessement Tools
![model monitoring](images/modelmonitoring2.png)
#### Code Asset Management Tools
![code asset management tools](images/codeasset2.png)
#### Data Asset Management( or Data Government) Tools
![data asset management tools](images/dataassetmanagementtools.png)
### Open Source Tools for Data Science - Part 2
#### Development Environment Tools
![env](images/allenvironment.png)
![other tools](images/othertools.png)
##### Jupyter Tools
![jupyter](images/jupyter.png)
![jupyter](images/jupyter2.png)
##### JupyterLab Tools
![jupyterlab](images/jupyterlab.png)
![jupyterlab](images/jupyterlab2.png)
#### Apache Zepplin Notebook Tools
![apachezepplin](images/apachezepplin.png)
#### RStudio Tools
![rstudio](images/rstudio2.png)
#### Spider Tools
![spider](images/spider.png)
#### Apache Spark
![apachespark](images/apachespark.png)
#### Apache Flink
![apacheflink](images/apacheflink.png)
* Flink is a stream processing
* Flink is image processing, with its mains focus on processus
#### Ray
![ray](images/ray.png)
* it focus large-scale deep learning model training
### Tools with no programming level necessary
![non-prog](images/noprogrammingtools.png)
#### KNIME
![kanime](images/kanime.png)
![knime](images/knime.png)
#### Orange
![orange](images/orange.png)
![orange](images/orange2.png)
### Commercial Tools for Data Science
#### Data Management Tools
![data management com](images/datamanagementcom.png)
#### Data Integration and Transformation Tools
![data inte](images/dataintegrationandtransformationcom.png)
#### Data Virtualization Tools
![datavisualization](images/datavisualizationcom.png)
#### Model Building Tools
![model building](images/modelbuildingcom.png)

#### Model Deployement
![model deployement](images/modeldeployementcom.png)
#### Data Asset Tools
![dataassettools](images/dataassettoolscom.png)
#### Fully Integrated Visual Tools
![data](images/comtools.png)
##### Watson Studio Tools and watson Open Scale
![watson](images/watsonstudion.png)
![wat](images/watsonstudionandopenscale.png)
* it combine jupyter notebook with graphical tools to mixamize data scientist performance
* integrated and cover fully in the data science live cycle and all the tache we discuss previously
#### Other Data Science Tools commercial
##### H2O.ai
![h2oia](images/h2oia.png)
### Cloud Based Tools for Data Science
![cloud](images/cloudtools1.png)
This clusters are composed of multiple server machines transparently for the user in the background.
### Watson Studio, together with Watson OpenScale
![wos](images/wswos.png)
Watson Studio, together with Watson OpenScale, covers the complete development lefe cycle all data science machine learning ad IA tasks
### Azure Machine Learning
![azure](images/azuremachinelearning.png)
cloudhosted offering supporting the complete
development life cycle of all data science, machine learning and IA tasks.
### H2O.ai
![h2oai](images/h2oiacloud.png)
### Data Management
![data](images/datam.png)
![cloudapp](images/clouapp.png)
#### Amazon DynamoDB
![d](images/dynamodb.png)
it is Amazon Web services DynamoDB, a noSQL database that akkows storage and retrieval of data in key value or document store format.
#### Cloudant
![cloudant](images/cloudant.png)
it is a database as a service offering
#### CouchDB
![couchdb](images/couchdb.png)
it is apache couch DB, it has a adventage although complex operational tasks like updating backup restore and scaling are done by the cloud privider under the hood. This offering is compatible with couch DB, therefore the application can be migrated to another couch DB server without changing the application.
#### IBM DB 2 as a service as well.
![db2](images/db2.png)
### Data Integration and Transformation
![etl](images/etlel.png)
When it comes to commercial data integration tools, we talk not only about "Extract, Transform, and Load", or "ETL" tools, but also about "Extract, Load, and Transform", or "ELT" tools.
#### IBM Data Refinery
![data data refinery](images/datarefinery.png)
Data Refinery enables transformation of large amounts of raw data into consumable quality information in a spread sheet-like user interface.
### Data Visualization
![dv](images/dv.png)
#### Watson Studio
![dvwa](images/dvwatson.png)
![dvwa](images/dvwatson2.png)
![dvwa](images/dvwatson3.png)
![dvwa](images/dvwatson4.png)
![dvwa](images/dvwatson5.png)
In watson Studio, an abundance of different visualizations can be used to better understand data
### Model building
![modelbuilding](images/modelbuiling.png)
### Model deployement
![deployement](images/modeldeployementcloud.png)
### Model Monitoring and Assessement
![monitoring](images/monitoring.png)
### Module 3
### Libraries for Data Science
![labrary](images/library.png)
#### Outline
![outline](images/outline.png)
#### Scientifics Computing Libraries in Python

![scientifiq](images/scientifiquelib.png)
* The primary instrument of Pandas is a two-dimensional table consisting of columns and rows. This table is caled a "DataFrame" and designed to provide easy indexing so you can work with your data.
* Numpy libraries are based on arrays, enabling you to apply mathematical functions to these, pandas is actually build on top of Numpy.
### Visualization Libraries
Data Visualization methods are a great way to communicate with others and show the meaningfull results of analysis. These libraries enable you to create graphs, charts and maps.
![visualizationlib](images/visualizationlibraries.png)
* Matplotlib(plots & most popular) the Matplotlib package is the most well-know library data visualization, and it's excellent for making graphs and plots.
* seaborn is based on matplotlib. Seaborn makes it easy to generate plots like heat maps, time series, and violon plots.
* ### Machine Learning and Deep Learning Libraries In Python
![machinelearning libraries](images/machinelibraries.png)
* For machine learning, the Scikit-learn library contains tools for statistical modeling, including regression, classification, clustering and others. It is built on Numpy, Scipy and Matplotlib and it's relatively simple to get started.
* For Deep Learning, Keras enable you to build the standard deep learning model. Like Scikit-learning, the high-level interface enables you to build models quickly and simply. It can function using graphics processing units(GPU), but for many deep learning cases a lowel-level environment is required.

### Deep Learning Libraries in Python
![deeplearning](images/deeplearninglibraries.png)
* TensorFlow is a low-level framework used in large production of deep learning models. It designed for production but can be unwieldy for experimentation.
* PyTorch is used for experimentation, making it simple for researchers to test their ideas.
### Apache Spark
Apache Spark is a general-purpose cluster-computing framework that enables you to process data using compute clusters.
This means that you process data in parallel,
using multiple computers simultaneously.
![spark](images/sparklibrary.png)
* The Spark library has similar functionnality as Pandas, Numpy, and Scikit-learng
### Spark Data Processing
In Spark Data processing, you need Python, R, Scala, Or SQL.
![spark](images/sparkdataprocessing.png)
### Scala-Libraries
![scalalib](images/scalalibraries.png)
* Vegas is a Scala library for statistical data visualizations. With Vegas, you can work with data files as well as Spark DataFrames.
* For Deep Learning, you can use Big DL.
### R-Libraries
![rlib](images/rlibraries.png)
R has built-in function for machine learning and data visualization. There are also several coçmplementary libraries:
* ggplot2 is a popular library for data visualization
* interface with keras and tensorFlow.
* R has been the de-facto standard for open source data science but it is now being superseded by Python.
### Application Programming Interfaces (API)
#### Outline
![outline](images/outline2.png)
#### API ?
![api](images/api.png)
![api](images/api2.png)
![api](images/api3.png)
#### REST APIs
![restapi](images/resapi.png)
![restapi](images/resapi2.png)
#### REST APIs Interaction
![restapiinter](images/resapiinteract.png)
### Data Sets - Powering Data Science
#### What's a data set
![dataset](images/dataset.png)
#### Data Ownership
![ownership](images/ownership.png)
#### Where to find open data
![finddata](images/wheretofinddata.png)
#### Community Data Licence Agreement
![dataagreement](images/dataagreement.png)
### Data Asset Exchange
![dax](images/dax.png)
### Getting started width data sets
![data](images/starteddata.png)
### Exploring a data set in Watson Studio
![exploredata](images/exploredata.png)
### Machine Learning Models
#### What is a model ?
![mlmodel](images/mlmodel.png)
#### Supervised Learning
![supervised](images/supervisedlearning.png)
### Unsupervised Learning
![unsupervised](images/unsupervisedlearning.png)
### Reinforcement Learning
![reinforcement](images/reinforcementlearning.png)
### Deep Learning
![deeplearning](images/deeplearning.png)
### Deep Learning Models
![deeplearning](images/deeplearningmodel.png)
### Using models to solve a problem
![problemsolve](images/problemsolve.png)
### The Model Asset Exchange
#### MAX reduces time to value
![maxreducetime](images/maxreducestime.png)
#### MAX model-serving microservice
![model-serving](images/model-serving.png)
#### MAX model-serving microservice API
![model-serving](images/model-servingapi.png)
#### Prediction request handling
![requesthandling](images/requesthandling.png)
#### Summary
![summary](images/summary.png)
## Module 4 - GitHub
### Overview of Git/GitHub
#### Version Control
A version control system allows you to keep track of changes to your documents.
This makes it easy for you to recover older versions of your documents if you make a mistake, and it makes collaboration with others much easier.
#### Working without Version Control
![working without version](images/workingwithoutversioncontrols.png)
#### Working with Version Control
![working without version](images/workingwithversioncontrols.png)
#### Git
Git is free and open source software distributed under the GNU General Public Licence.
![git](images/git.png)
#### Github
Github is one of the most popular web-hosted services for Git repositories.
![github](images/github.png)
#### SHORT Glossary of Terms
![terms](images/terms.png)
#### Basic Git Commands
![basic](images/basicsofgithub.png)
#### https://try.github.io
![try](images/trygit.png)
### GitHub - Part 1
#### Repository
![repository](images/repository.png)
### Staging
![staging](images/staging.png)
### Remote Repositroy
![remote](images/remote.png)
## Module 5 - Jupyter Notebooks and JupyterLab
### Getting Started with Jupyter Notebooks
![jupy](images/jup1.png)
![jupy](images/jup2.png)
### Jupyter Architecture
![juparch](images/juparch.png)
### Limitation of Jupyeter
![lim](images/limitationofjup.png)
### Solution of Jupyter
![solution](images/solutionjup.png)
### Architecture Diagram
![arc](images/architecturediagram.png)
### Ressource
[Reading: Jupyter Notebooks on the Internet](https://courses.cognitiveclass.ai/courses/course-v1:IBMDeveloperSkillsNetwork+DS0105EN+v2/courseware/52427e36f14d4f4a9801c5f741e6c9c8/e834452ac5bd4b5e9ddd4e96c41f163f/?child=first)