Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/PetalsOnWind/Statistics-and-Econometrics-for-Data-Science
This repository contains notebooks for understanding some concepts of statistics and econometrics that can be helpful in data science
https://github.com/PetalsOnWind/Statistics-and-Econometrics-for-Data-Science
Last synced: 7 days ago
JSON representation
This repository contains notebooks for understanding some concepts of statistics and econometrics that can be helpful in data science
- Host: GitHub
- URL: https://github.com/PetalsOnWind/Statistics-and-Econometrics-for-Data-Science
- Owner: PetalsOnWind
- License: mit
- Created: 2020-11-29T18:29:03.000Z (almost 4 years ago)
- Default Branch: main
- Last Pushed: 2021-08-27T21:20:31.000Z (about 3 years ago)
- Last Synced: 2024-08-02T17:36:25.255Z (3 months ago)
- Language: Jupyter Notebook
- Size: 13.8 MB
- Stars: 50
- Watchers: 4
- Forks: 92
- Open Issues: 66
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
Possible project for Kharagpur Winter of Code 2020
# Statistics and Econometrics for Data Science
![GitHub Repo stars](https://img.shields.io/github/stars/PetalsOnWind/Statistics-and-Econometrics-for-Data-Science?color=2E61C5&logo=Github&style=for-the-badge)
![GitHub forks](https://img.shields.io/github/forks/PetalsOnWind/Statistics-and-Econometrics-for-Data-Science?color=2E61C5&logo=Github&style=for-the-badge)
![GitHub contributors](https://img.shields.io/github/contributors/PetalsOnWind/Statistics-and-Econometrics-for-Data-Science?color=2E61C5&logo=Github&style=for-the-badge)
![GitHub pull requests](https://img.shields.io/github/issues-pr/PetalsOnWind/Statistics-and-Econometrics-for-Data-Science?logo=Github&style=for-the-badge)
![GitHub issues](https://img.shields.io/github/issues/PetalsOnWind/Statistics-and-Econometrics-for-Data-Science?logo=Github&style=for-the-badge)## Table of Contents
1. How are the topics even related to ML?
2. What will the project entail?
3. How to start with the project?
4. What are the prerequisites for the project?
5. What can you contribute to the project?
6. Expectations from the project
7. How much is ML and how much is statistics/econometrics?
8. Who to contact?
## How are the topics even related to ML?
Often while building models in ML we become too concerned with accuracy and forget whether
the model does what we initially set out to do. Statistics and Econometrics help in
building better models and understanding the data. They can help in better feature engineering,
and a better understanding of the assumptions which can help in ultimately building better models.
Running linear regression sounds easy but what if someone asks you what assumptions you made
while running the model. If your answer is "Umm..." then you are on the track to understanding
what these topics can contribute to ML (if you didn't already know).
Due to certain limitations, for the time being, we are concerned with only Linear Regression.
This is just a very small subset of ML but let's start with tiny steps to progress.
## What will the project entail?
The project aims to have a series of notebooks that will help in understanding the basic topics.
The notebooks could be used to get a broad overview of the topic or to quickly revise the topic.
The notebooks can be helpful in the following ways:
- You are participating in a competition and you want to run some quick checks on the data/model
- You are sitting for internship/placement and need to revise some topics fast
- You want some code snippet for a certain test and how to interpret the test results.
## How to start with the project?
1. Install Jupyter Notebook, recommended installing with [Anaconda](https://www.anaconda.com/products/individual)
2. Learn how to use Jupyter Notebook, and python libraries NumPy, pandas, and matplotlib
3. Clone this repo and make a new branch
4. Each ipynb file should be able to stand independently so you should be able to open it using Jupyter Notebook
## What are the prerequisites for the project?
- Basic knowledge of at least one programming language (preferable python)
- Basic knowledge of probability (class 12 level)
- Desire to learn statistics
## What can you contribute to the project?
Easy: Make some changes to the existing graphs or explanation to make them look better,
add new ideas to 'ideas.md', check if existing notebooks make sense
Intermediate: Start with a new notebook of your own
Advanced: Make a series of notebooks or explain a complicated/advanced topic
## Expectations from the project
There will be a variety of issues, some easy to get you started and one harder to make you
significantly contribute. But I'll set down the minimum expected work that you should do to
pass. By medievals, you should have at least one new notebook and by endevals, you should have
at least three new notebooks ready. Each notebook should have some introduction to the topic,
mathematical proofs if required, the code to implement that topic from scratch, and any ready-made
library code, if available.
The notebook referred to here are Jupyter Notebooks.
## How much is ML and how much is statistics/econometrics?
Well, your learning from this will be less towards ML. These topics are to provide support to ML
and do not replace the importance of doing a course/project purely based on machine learning.
## Who to contact?
The project was started by PetalsOnWind (Pankhuri Saxena, a fourth-year Economics student at IIT KGP).
She can be reached at pankhurisaxena[dot]iitkgp[at]gmail[dot]com.## Contributors:
### Credits goes to these people:✨