Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/siboehm/awesome-learn-datascience
:chart_with_upwards_trend: Curated list of resources to help you get started with Data Science
https://github.com/siboehm/awesome-learn-datascience
List: awesome-learn-datascience
awesome awesome-list data-science lists machine-learning
Last synced: about 1 month ago
JSON representation
:chart_with_upwards_trend: Curated list of resources to help you get started with Data Science
- Host: GitHub
- URL: https://github.com/siboehm/awesome-learn-datascience
- Owner: siboehm
- Created: 2017-07-02T19:52:42.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2022-11-29T12:31:06.000Z (about 2 years ago)
- Last Synced: 2024-05-20T04:39:40.978Z (7 months ago)
- Topics: awesome, awesome-list, data-science, lists, machine-learning
- Homepage:
- Size: 20.5 KB
- Stars: 634
- Watchers: 29
- Forks: 76
- Open Issues: 7
-
Metadata Files:
- Readme: README.md
- Contributing: contributing.md
- Code of conduct: code-of-conduct.md
Awesome Lists containing this project
- awesome - Tutorials
- awesome-projects - Tutorials
- more-awesome - Tutorials - Resources to help you get started with Data Science. (Data Science)
- lists - awesome-learn-datascience
- awesome-ai-ml-dl - Awesome Learn Datascience
- python_more_awesome - Awesome Learn Data Science
- awesome-cn - 实验
- collection - awesome-learn-datascience
- Awesome-Web3 - Tutorials
- fucking-awesome - Tutorials
- awesome - Tutorials
- awesomelist - awesome-learn-datascience
- awesome - Tutorials
- awesome - Tutorials
- awesome - Tutorials
- fucking-lists - awesome-learn-datascience
- awesome-list - Tutorials
- awesome - Tutorials
- awesome-collection - Tutorials
- awesome-cn - Tutorials
- awesome-awesome - Tutorials
- awesome - Tutorials
- awesome-digital-scholarship - Awesome Learn Datascience - Curated list of resources to help you get started with Data Science (Related Awesome Lists)
- jimsghstars - siboehm/awesome-learn-datascience - :chart_with_upwards_trend: Curated list of resources to help you get started with Data Science (Others)
- awesome-research - Learn Data Science - 📈 Curated list of resources to help you get started with Data Science (Data Science and Data Visualization / Citation Visualization)
- ultimate-awesome - awesome-learn-datascience - :chart_with_upwards_trend: Curated list of resources to help you get started with Data Science. (Other Lists / PowerShell Lists)
- awesome - Tutorials
README
# Data Science Tutorials & Resources for Beginners [![Awesome](https://cdn.rawgit.com/sindresorhus/awesome/d7305f38d29fed78fa85652e3a63e154dd8e8829/media/badge.svg)](https://github.com/sindresorhus/awesome)
*If you want to know more about Data Science but don't know where to start this list is for you!* :chart_with_upwards_trend:
No previous knowledge is required but Python and statistics basics will definitely come in handy. These resources have been used successfully for many beginners at my local Data Science student group [ML-KA](http://ml-ka.de/).
## What is Data Science?
- ['What is Data Science?' on Quora](https://www.quora.com/What-is-data-science)
- [Explanation of important vocabulary](https://www.quora.com/What-is-the-difference-between-Data-Analytics-Data-Analysis-Data-Mining-Data-Science-Machine-Learning-and-Big-Data-1?share=1) - Differentiation of Big Data, Machine Learning, Data Science.
- [Data Science for Business (Book)](https://amzn.to/2voPJUi) - An introduction to Data Science and its use as a business asset.
- [Data Science Process: A Beginner’s Comprehensive Guide](https://www.scaler.com/blog/data-science-process/) - Technical Skills for the Data Science: This emphasizes the practical skills needed throughout the data science process.## Common Algorithms and Procedures
- [Supervised vs unsupervised learning](https://stackoverflow.com/questions/1832076/what-is-the-difference-between-supervised-learning-and-unsupervised-learning) - The two most common types of Machine Learning algorithms.
- [9 important Data Science algorithms and their implementation](https://nbviewer.jupyter.org/github/jakevdp/PythonDataScienceHandbook/blob/master/notebooks/05.05-Naive-Bayes.ipynb)
- [Cross validation](https://nbviewer.jupyter.org/github/jakevdp/PythonDataScienceHandbook/blob/master/notebooks/05.03-Hyperparameters-and-Model-Validation.ipynb) - Evaluate the performance of your algorithm/model.
- [Feature engineering](https://nbviewer.jupyter.org/github/jakevdp/PythonDataScienceHandbook/blob/master/notebooks/05.04-Feature-Engineering.ipynb) - Modifying the data to better model predictions.
- [Scientific introduction to 10 important Data Science algorithms](http://www.cs.umd.edu/%7Esamir/498/10Algorithms-08.pdf)
- [Model ensemble: Explanation](https://www.analyticsvidhya.com/blog/2017/02/introduction-to-ensembling-along-with-implementation-in-r/) - Combine multiple models into one for better performance.## Data Science using Python
This list covers only Python, as many are already familiar with this language. [Data Science tutorials using R](https://github.com/ujjwalkarn/DataScienceR).### General
- [O'Reilly Data Science from Scratch (Book)](https://amzn.to/2GSjjrK) - Data processing, implementation, and visualization with example code.
- [Coursera Applied Data Science](https://www.coursera.org/specializations/data-science-python) - Online Course using Python that covers most of the relevant toolkits.### Learning Python
- [YouTube tutorial series by sentdex](https://www.youtube.com/watch?v=oVp1vrfL_w4&list=PLQVvvaa0QuDe8XSftW-RAxdo6OmaeL85M)
- [Interactive Python tutorial website](http://www.learnpython.org/)### numpy
[numpy](http://www.numpy.org/) is a Python library which provides large multidimensional arrays and fast mathematical operations on them.- [Numpy tutorial on DataCamp](https://www.datacamp.com/community/tutorials/python-numpy-tutorial#gs.h3DvLnk)
### pandas
[pandas](http://pandas.pydata.org/index.html) provides efficient data structures and analysis tools for Python. It is build on top of numpy.- [Introduction to pandas](http://www.synesthesiam.com/posts/an-introduction-to-pandas.html)
- [DataCamp pandas foundations](https://www.datacamp.com/courses/pandas-foundations) - Paid course, but 30 free days upon account creation (enough to complete course).
- [Pandas cheatsheet](https://github.com/pandas-dev/pandas/blob/master/doc/cheatsheet/Pandas_Cheat_Sheet.pdf) - Quick overview over the most important functions.### scikit-learn
[scikit-learn](http://scikit-learn.org/stable/) is the most common library for Machine Learning and Data Science in Python.- [Introduction and first model application](https://nbviewer.jupyter.org/github/jakevdp/PythonDataScienceHandbook/blob/master/notebooks/05.02-Introducing-Scikit-Learn.ipynb)
- [Rough guide for choosing estimators](http://scikit-learn.org/stable/tutorial/machine_learning_map/)
- [Scikit-learn complete user guide](http://scikit-learn.org/stable/user_guide.html)
- [Model ensemble: Implementation in Python](http://machinelearningmastery.com/ensemble-machine-learning-algorithms-python-scikit-learn/)### Jupyter Notebook
[Jupyter Notebook](https://jupyter.org/) is a web application for easy data visualisation and code presentation.- [Downloading and running first Jupyter notebook](https://jupyter.org/install.html)
- [Example notebook for data exploration](https://www.kaggle.com/sudalairajkumar/simple-exploration-notebook-instacart)
- [Seaborn data visualization tutorial](https://elitedatascience.com/python-seaborn-tutorial) - Plot library that works great with Jupyter.### Various other helpful tools and resources
- [Template folder structure for organizing Data Science projects](https://github.com/drivendata/cookiecutter-data-science)
- [Anaconda Python distribution](https://www.continuum.io/downloads) - Contains most of the important Python packages for Data Science.
- [Spacy](https://spacy.io/) - Open source toolkit for working with text-based data.
- [LightGBM gradient boosting framework](https://github.com/Microsoft/LightGBM) - Successfully used in many Kaggle challenges.
- [Amazon AWS](https://aws.amazon.com/) - Rent cloud servers for more timeconsuming calculations (r4.xlarge server is a good place to start).## Data Science Challenges for Beginners
Sorted by increasing complexity.- [Walkthrough: House prices challenge](https://www.dataquest.io/blog/kaggle-getting-started/) - Walkthrough through a simple challenge on house prices.
- [Blood Donation Challenge](https://www.drivendata.org/competitions/2/warm-up-predict-blood-donations/) - Predict if a donor will donate again.
- [Titanic Challenge](https://www.kaggle.com/c/titanic) - Predict survival on the Titanic.
- [Water Pump Challenge](https://www.drivendata.org/competitions/7/pump-it-up-data-mining-the-water-table/) - Predict the operating condition of water pumps in Africa.## More advanced resources and lists
- [Awesome Data Science](https://github.com/bulutyazilim/awesome-datascience)
- [Data Science Python](https://github.com/ujjwalkarn/DataSciencePython)
- [Machine Learning Tutorials](https://github.com/ujjwalkarn/Machine-Learning-Tutorials)## Contribute
Contributions welcome! Read the [contribution guidelines](contributing.md) first.
## License
[![CC0](http://mirrors.creativecommons.org/presskit/buttons/88x31/svg/cc-zero.svg)](http://creativecommons.org/publicdomain/zero/1.0)
To the extent possible under law, Simon Böhm has waived all copyright and
related or neighboring rights to this work. Disclaimer: Some of the links are affiliate links.