{"id":19860981,"url":"https://github.com/razielar/datascience_cheetsheet","last_synced_at":"2026-03-05T06:31:11.831Z","repository":{"id":128735703,"uuid":"538225013","full_name":"razielar/DataScience_CheetSheet","owner":"razielar","description":"Personal DataScience Cheet Sheet","archived":false,"fork":false,"pushed_at":"2022-09-30T18:29:32.000Z","size":178,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-01T00:12:50.257Z","etag":null,"topics":["cheetsheet","data-science"],"latest_commit_sha":null,"homepage":"","language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/razielar.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-09-18T19:55:32.000Z","updated_at":"2022-09-30T17:53:01.000Z","dependencies_parsed_at":"2023-04-27T19:02:58.429Z","dependency_job_id":null,"html_url":"https://github.com/razielar/DataScience_CheetSheet","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/razielar/DataScience_CheetSheet","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/razielar%2FDataScience_CheetSheet","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/razielar%2FDataScience_CheetSheet/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/razielar%2FDataScience_CheetSheet/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/razielar%2FDataScience_CheetSheet/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/razielar","download_url":"https://codeload.github.com/razielar/DataScience_CheetSheet/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/razielar%2FDataScience_CheetSheet/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":30112220,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-05T03:40:26.266Z","status":"ssl_error","status_checked_at":"2026-03-05T03:39:15.902Z","response_time":93,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cheetsheet","data-science"],"created_at":"2024-11-12T15:07:43.561Z","updated_at":"2026-03-05T06:31:11.800Z","avatar_url":"https://github.com/razielar.png","language":null,"funding_links":[],"categories":[],"sub_categories":[],"readme":"# Data Science Cheetsheet\n\n1. [Probability](#prob)\n2. [Statistics](#stats)\n3. [Machine Learning](#ml)\n\n## 1) \u003ca id='prob'\u003e\u003c/a\u003e Probability\n\n### 1.1) Conditional Probability\n\n### 1.2) Counting\n\n#### 1.2.1) Permutation\n\n``` python\nfrom itertools import permutations\n\na = [1,2,3]\nperm = permutations(a)\nprint(list(perm))\n\n# [(1, 2, 3), (1, 3, 2), (2, 1, 3), (2, 3, 1), (3, 1, 2), (3, 2, 1)]\n\n```\n\n#### 1.2.1) Combination\n\n``` python\nfrom itertools import combinations\n\na = [1,2,3,4]\ncomb = combinations(a, 2)\nprint(list(comb))\n\n# [(1, 2), (1, 3), (1, 4), (2, 3), (2, 4), (3, 4)]\n\n```\n\n### 1.3) Probability Distributions\n\n#### 1.3.1) Discrete Probability Distributions\n\n| Num   | Distribution   | Definition | Usage |\n|---:|:-------------|------------:|---------:|\n|  1 | Binomial distribution |  Probability of *k* number of successes in *n* independent trial              |  Coin flips (number of heads in *n* flips)                  | \n|  2 | Poisson distribution  |  Number of events occurring within a particular fixed interval \\( $\\lambda$ \\)    |  Number of visits to a website in a certain period of time    | \n\n\n#### 1.3.2) Continuous Probability Distributions\n\n| Num   | Distribution   | Definition | Usage |\n|---:|:-------------|------------:|---------:|\n|  1 | Uniform distribution     | Constant probability of *X* falling between *a* and *b*      | In sampling and hypothesis testing cases  | \n|  2 | Exponential distribution | Poisson for continous data                                   | The time until a credit defaul occurs     |\n|  3 | Normal distribution      | Probability according to the bell curve over a range of *Xs* | The Central Limit Theorem                 |\n\n\n### 1.4) Markov Chains\n\n## 2) \u003ca id='stats'\u003e\u003c/a\u003e Statistics\n\n### 2.1) Random Variables\n\n### 2.2) Central Limit Theorem\n\n### 2.3) Hypothesis Testing\n\n#### 2.3.1) General Information\n\n#### 2.3.2) Type I and Type II Errors\n\n#### 2.3.3) *p-values* \u0026 Confidence Intervals\n\n#### 2.3.4) Test Statistics\n\n### 2.4) MLE \u0026 MAP\n\nMaximum Likelihood Estimation (MLE) and Maximum A Posteriori (MAP) estimation. The difference among them is the inclusion of the prior in MAP. Moreover, MLE can be seen as a special case of MAP with a uniform prior. \n\n\n## 3) \u003ca id='ml'\u003e\u003c/a\u003e Machine Learning\n\n### 3.1) Linear Algebra\n\n#### 3.1.1) Eigenvalues and Eigenvectors\n\n### 3.2) Model Evaluation and Selection\n\n#### 3.2.1) Bias-Variance Trade-off\n\n\u003cdiv align=\"center\"\u003e\n\u003cimg src=\"https://github.com/razielar/DataScience_CheetSheet/blob/main/img/diagram_bias-variance.png\" alt=\"logo\" width=\"400\" height=\"400\"\u003e\u003c/img\u003e\n\u003c/div\u003e\n\n\n\n\n### 3.3) Model Training \n\n#### 3.3.1) Hyperparameter Tuning\n\n### 3.4) Linear Regression\n\nLinear regression assumptions \n\n| Num   | Assumption   |   Description    |\n|----------|----------|:-------------:|\n| 1  | Linearity        | The relationship between the features and the target variable is linear |\n| 2  | Homoscedasticity | The variance of the residuals is constant                               |\n| 3  | Independence     | All observations are independent of each other                          |\n| 4  | Normality        | The distribution of the target variable (*Y*) is assumed to be normal   |\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frazielar%2Fdatascience_cheetsheet","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frazielar%2Fdatascience_cheetsheet","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frazielar%2Fdatascience_cheetsheet/lists"}