https://github.com/ossu/data-science
📊 Path to a free self-taught education in Data Science!
https://github.com/ossu/data-science
Last synced: 10 months ago
JSON representation
📊 Path to a free self-taught education in Data Science!
- Host: GitHub
- URL: https://github.com/ossu/data-science
- Owner: ossu
- License: other
- Created: 2016-06-19T15:15:36.000Z (over 9 years ago)
- Default Branch: master
- Last Pushed: 2025-04-30T05:07:39.000Z (11 months ago)
- Last Synced: 2025-05-06T17:13:27.885Z (10 months ago)
- Homepage:
- Size: 926 KB
- Stars: 19,686
- Watchers: 975
- Forks: 3,612
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
- License: LICENSE.md
Awesome Lists containing this project
- awesome-repositories - ossu/data-science - 📊 Path to a free self-taught education in Data Science! (Others)
- StarryDivineSky - ossu/data-science - science项目是一套免费开放的数据科学自学课程体系,旨在为零基础学习者提供系统化的知识框架和实践路径。该项目以结构化的方式整合了全球优质开源资源,包含从数学基础到高级算法的完整学习路径,特别强调通过实践项目巩固理论知识。课程体系分为核心模块(如Python编程、统计学、机器学习)和进阶方向(如深度学习、大数据处理),每个阶段均配套精选教材、视频教程和编程练习,学习者可根据自身进度自由组合学习内容。 项目特色在于其模块化设计和实践导向,每个技术点均配有具体实现案例(如使用Jupyter Notebook进行数据可视化),并要求完成配套编程练习以检验掌握程度。课程还包含多个实际项目案例(如构建推荐系统、分析社交媒体数据),帮助学习者将理论转化为实战能力。所有学习资源均通过GitHub开源,学习者可自由下载或贡献内容,社区持续更新课程资料以保持技术前沿性。该项目适合希望以低成本系统学习数据科学的自学者,尤其适合具备基础编程能力但缺乏系统学习路径的学习者,通过分阶段学习和项目实践,最终可达到独立完成数据科学项目的水平。 (A01_机器学习教程)
- awesome - ossu/data-science - 📊 Path to a free self-taught education in Data Science! (Others)
- awesome-ai-data-github-repos - Open Source Society University: Path to a free self-taught Education in Data Science
- awesome-web-resources - GitHub – ossu/data-science: Path to a free self-taught education in Data Science! - taught education in Data Science! – ossu/data-science. (Learn / Proofreading and Checking)
- awesome-learning-collections - Open Source Society University Data Science Path - Teach yourself Data Science without spending a dime! (Data Science)
- A-to-Z-Resources-for-Students - Self Taught Path for Data Science
- awesome-data-analysis - OSSU Data Science - Open Source Society University's self-study path. (🏆 Awesome Data Science Repositories)
- data-science-awesome-reference - Path to a free self-taught education in Data Science! by Open Source Society University
- jimsghstars - ossu/data-science - 📊 Path to a free self-taught education in Data Science! (Others)
README
## Contents
- [About](#about)
- [Curricular Guideline](#curricular-guideline)
- [How to use this guide](#how-to-use-this-guide)
- [Community](#community)
- [Prerequisites](#prerequisites)
- [Curriculum](#curriculum)
- [How to contribute](#how-to-contribute)
- [Code of conduct](#code-of-conduct)
- [Team](#team)
## About
This is a path for those of you who want to complete the Data Science undergraduate curriculum on your own time, **for free**, with courses from the **best universities** in the World.
In our curriculum, we give preference to MOOC (Massive Open Online Course) style courses because these courses were created with our style of learning in mind.
## Curricular Guideline
OSSU Data Science uses the report [Curriculum Guidelines for Undergraduate Programs in Data Science](https://www.amstat.org/asa/files/pdfs/EDU-DataScienceGuidelines.pdf) as our guide for course recommendation.
## How to use this guide
### Duration
It is possible to finish within about 2 years if you plan carefully and devote roughly 20 hours/week to your studies. Learners can use [this spreadsheet](https://docs.google.com/spreadsheets/d/1TEGSUQDFuWL3TYNjiM8G3esly-tKOcgHSDABt92mzdA/copy) to estimate their end date. Make a copy and input your start date and expected hours per week in the `Timeline` sheet. As you work through courses you can enter your actual course completion dates in the Curriculum Data sheet and get updated completion estimates.
> **Warning:** While the spreadsheet is a useful tool to estimate the time you need to complete this curriculum, it may not be up-to-date with the curriculum. Use the spreadsheet just to estimate the time you need. Use the [the GitHub repo](https://github.com/ossu/data-science) to see what courses to do.
### Order of the classes
Some courses can be taken in parallel, while others must be taken sequentially. All of the courses within a topic should be taken in the order listed in the curriculum. The graph below demonstrates how topics should be ordered.

### Track your progress
[Fork](https://www.freecodecamp.org/news/how-to-fork-a-github-repository/) the [GitHub repo](https://github.com/ossu/data-science) into your own GitHub account and put ✅ next to the stuff you've completed as you complete it. This can serve as your [kanban board](https://en.wikipedia.org/wiki/Kanban_board) and will be faster to implement than any other solution (giving you time to spend on the courses).
### Which programming languages should I use?
Python and R are heavily used in Data Science community and our courses teach you both. Remember, the important thing for each course is to internalize the core concepts and to be able to use them with whatever tool (programming language) that you wish.
### Content Policy
You must share only files that you are allowed. **Do NOT disrespect the code of conduct** that you sign in the beginning of your courses.
## Community
We have a Discord server! This should be your first stop to talk with other OSSU students. [Why don't you introduce yourself right now?](https://discord.gg/wuytwK5s9h)
You can also interact through [GitHub issues](https://github.com/open-source-society/data-science/issues).
Add **Open Source Society University** to your [Linkedin](https://www.linkedin.com/school/11272443/) profile!
> **Warning:** There are a few third-party/deprecated/outdated material that you might find when searching for OSSU. We recommend you to ignore them, and only use the [OSSU Data Science Github Repo](https://github.com/ossu/data-science). Some known outdated materials are:
> - An unmaintained and deprecated trello board
> - Third-party notion templates
## Prerequisites
The Data Science curriculum assumes the student has taken [high school math](https://ossu.dev/precollege-math) and [statistics](https://www.khanacademy.org/math/probability).
## Curriculum
- [Introduction to Data Science](#introduction-to-data-science)
- [Introduction to Computer Science](#introduction-to-computer-science)
- [Data Structures and Algorithms](#data-structures-and-algorithms)
- [Databases](#databases)
- [Single Variable Calculus](#single-variable-calculus)
- [Linear Algebra](#linear-algebra)
- [Multivariable Calculus](#multivariable-calculus)
- [Statistics & Probability](#statistics--probability)
- [Data Science Tools & Methods](#data-science-tools--methods)
- [Machine Learning/Data Mining](#machine-learningdata-mining)
- [Final project](#final-project)
### Introduction to Data Science
[What is Data Science](https://www.coursera.org/learn/what-is-datascience)
### Introduction to Computer Science
_Students who already know basic programming in any language can skip this first course_
[Introduction to programming](coursepages/intro-programming/README.md)
[Introduction to Computer Science and Programming Using Python](coursepages/intro-cs/README.md)
[Introduction to Computational Thinking and Data Science](https://ocw.mit.edu/courses/6-0002-introduction-to-computational-thinking-and-data-science-fall-2016/)
### Data Structures and Algorithms
_The Algorithms courses are taught in Java. If students need to learn Java, they should take this course first_
[Java Programming](https://java-programming.mooc.fi/)
[Algorithms I: ArrayLists, LinkedLists, Stacks and Queues](https://www.edx.org/learn/data-structures/the-georgia-institute-of-technology-data-structures-algorithms-i-arraylists-linkedlists-stacks-and-queues)
[Algorithms II: Binary Trees, Heaps, SkipLists and HashMaps](https://www.edx.org/learn/data-structures/the-georgia-institute-of-technology-data-structures-algorithms-ii-binary-trees-heaps-skiplists-and-hashmaps)
[Algorithms III: AVL and 2-4 Trees, Divide and Conquer Algorithms](https://www.edx.org/learn/data-structures/the-georgia-institute-of-technology-data-structures-algorithms-iii-avl-and-2-4-trees-divide-and-conquer-algorithms)
[Algorithms IV: Pattern Matching, Dijkstra’s, MST, and Dynamic Programming Algorithms](https://www.edx.org/learn/data-structures/the-georgia-institute-of-technology-data-structures-algorithms-iv-pattern-matching-dijkstras-mst-and-dynamic-programming-algorithms)
### Databases
[Database Management Essentials](https://www.coursera.org/learn/database-management)
[Data Warehouse Concepts, Design, and Data Integration](https://www.coursera.org/learn/dwdesign)
[Relational Database Support for Data Warehouses](https://www.coursera.org/learn/dwrelational)
[Business Intelligence Concepts, Tools, and Applications](https://www.coursera.org/learn/business-intelligence-tools)
[Design and Build a Data Warehouse for Business Intelligence Implementation](https://www.coursera.org/learn/data-warehouse-bi-building)
[MongoDB for Developers Learning Path](https://learn.mongodb.com/pages/mongodb-developer-learning-paths)
### Single Variable Calculus
[Calculus 1A: Differentiation](https://mitxonline.mit.edu/courses/course-v1:MITxT+18.01.1x/)
[Calculus 1B: Integration](https://mitxonline.mit.edu/courses/course-v1:MITxT+18.01.2x/)
[Calculus 1C: Coordinate Systems & Infinite Series](https://mitxonline.mit.edu/courses/course-v1:MITxT+18.01.3x/)
### Linear Algebra
[Essence of Linear Algebra](https://www.youtube.com/playlist?list=PLZHQObOWTQDPD3MizzM2xVFitgF8hE_ab)
[Linear Algebra](https://ocw.mit.edu/courses/mathematics/18-06sc-linear-algebra-fall-2011/)
### Multivariable Calculus
[Multivariable Calculus](http://ocw.mit.edu/courses/mathematics/18-02sc-multivariable-calculus-fall-2010/index.htm)
### Statistics & Probability
[Introduction to Probability](https://projects.iq.harvard.edu/stat110/home)
[Intro to Descriptive Statistics](https://www.udacity.com/course/intro-to-descriptive-statistics--ud827)
[Intro to Inferential Statistics](https://www.udacity.com/course/intro-to-inferential-statistics--ud201)
[Statistical Learning with Python by Stanford University on EdX](https://www.edx.org/learn/python/stanford-university-statistical-learning-with-python) ([Textbook](https://hastie.su.domains/ISLP/ISLP_website.pdf.download.html), [Textbook resources](https://www.statlearning.com/resources-python)) or [Statistical Learning With R by Stanford University on EdX](https://www.edx.org/learn/statistics/stanford-university-statistical-learning) ([Textbook](https://hastie.su.domains/ISLR2/ISLRv2_corrected_June_2023.pdf.download.html), [Textbook resources](https://www.statlearning.com/resources-second-edition))
### Data Science Tools & Methods
[Tools for Data Science](https://www.coursera.org/learn/open-source-tools-for-data-science)
[Data Science Methodology](https://www.coursera.org/learn/data-science-methodology)
[Data Science: Wrangling](https://www.edx.org/course/data-science-wrangling)
### Machine Learning/Data Mining
[Supervised Machine Learning: Regression and Classification](https://www.coursera.org/learn/machine-learning)
[Advanced Learning Algorithms](https://www.coursera.org/learn/advanced-learning-algorithms)
[Unsupervised Learning, Recommenders, Reinforcement Learning](https://www.coursera.org/learn/unsupervised-learning-recommenders-reinforcement-learning)
[Intro to Machine Learning](https://www.udacity.com/course/intro-to-machine-learning--ud120)
[Mining Massive Datasets](https://www.edx.org/course/mining-massive-datasets)
[Process Mining](https://www.coursera.org/learn/process-mining)
### Final project
Part of learning is doing.
The assignments and exams for each course are to prepare you to use your knowledge to solve real-world problems.
After you've completed the curriculum,
you should identify a problem that you can solve using the knowledge you've acquired.
You can create something entirely new, or you can improve some tool/program that you use and wish were better.
Students who would like more guidance in creating a project may choose to use a series of project oriented courses.
A sample of options
(many more are available, at this point you should be capable of identifying a series that is interesting and relevant to you)
are available on [this page](extras/specializations.md).
### Congratulations
After completing the requirements of the curriculum above,
you will have completed the equivalent of a full bachelor's degree in Data Science.
Congratulations!
What is next for you? The possibilities are boundless and overlapping:
- Look for a job as a data scientist!
- Check out the [readings](extras/books.md) for classic books you can read that will sharpen your skills and expand your knowledge.
- Join a local data science meetup (e.g. via [meetup.com](https://www.meetup.com/)).
- Pay attention to emerging technologies in the world of data science.

## How to contribute
You can [open an issue](https://help.github.com/articles/creating-an-issue/) and give us your suggestions as to how we can improve this guide, or what we can do to improve the learning experience.
You can also [fork this project](https://help.github.com/articles/fork-a-repo/) and send a [pull request](https://help.github.com/articles/using-pull-requests/) to fix any mistakes that you have found.
If you want to suggest a new resource, send a pull request adding such resource to the [extras](https://github.com/open-source-society/data-science/tree/master/extras) section. The **extras** section is a place where all of us will be able to submit interesting additional articles, books, courses and specializations.
## Code of Conduct
[OSSU's code of conduct](https://github.com/ossu/code-of-conduct).
## Team
* **Curriculum Maintainer**: [Waciuma Wanjohi](https://github.com/waciumawanjohi)
* **Contributors**: [contributors](https://github.com/open-source-society/data-science/graphs/contributors)