https://github.com/slavikdev/ml-runbook
Collection of solutions for common ML problems
https://github.com/slavikdev/ml-runbook
ai cheatsheet machine-learning ml runbook
Last synced: 2 months ago
JSON representation
Collection of solutions for common ML problems
- Host: GitHub
- URL: https://github.com/slavikdev/ml-runbook
- Owner: slavikdev
- Created: 2020-08-19T12:10:38.000Z (almost 5 years ago)
- Default Branch: master
- Last Pushed: 2020-08-20T14:51:00.000Z (almost 5 years ago)
- Last Synced: 2025-01-29T11:33:31.466Z (4 months ago)
- Topics: ai, cheatsheet, machine-learning, ml, runbook
- Homepage:
- Size: 8.79 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# ML Runbook
Collection of solutions for common ML problems. Contributions are welcome :)
## Dataset
### When to use a large dataset
- If you have high variance (overfitting).
- If the features are good enough for prediciton and a human expert can do manual estimation based on them.
- If the algorithm has many parameters and can represent fairly complex functions.## High variance (overfitting)
Your model is performing very well on the training set, but poorly on the test set.
### In general
- Try getting more training examples.
- Try smaller set of features.
- When using regularization, try increasing the `lambda` parameter.### SVM
- Try decreasing the parameter `C` (1/lambda).
- Try increasing the parameter `sigma^2`.## High bias (underfitting)
Your model performs poorly on both training and test sets.
### In general
- Try adding more features features
- Try adding polynomial features i.e. `x^2`, `x1*x2` etc.
- When using regularization, try decreasing the `lambda` parameter.### SVM
- Try increasing the parameter `C` (1/lambda).
- Try decreasing the parameter `sigma^2`.## Choosing the right algorithm
### Logistic regression vs SVM
- If the number of features is large (relative to the number of examples), use either logistic regression or SVM without a kernel.
- If the number of features is small and the number of examples is intermediate (up to 10K), use SVM with Gaussian kernel.
- If the number of features is small, but the number of examples is large (over 10k), create/add more features, then use logistic regression or SVM without a kernel.### Anomaly detection vs supervised learning
When to use anomaly detection algorithm (e.g. Gaussian distribution):
- You expect a very small number of anomalies (up to 20) and a large number of non-anomalous examples.
- You expect different types of anomalies and future anomalies may look like nothing you’ve seen so far.When to use supervised learning:
- You expect a relatively large number of anomalies.
- Future examples are likely to be similar to the ones in the training set.