https://github.com/eugeneyan/nocode-ml

😝 End-to-end machine learning; "no code" required!
https://github.com/eugeneyan/nocode-ml
deployment joke machine-learning nocode workflow
Last synced: 6 months ago
JSON representation
😝 End-to-end machine learning; "no code" required!
Host: GitHub
URL: https://github.com/eugeneyan/nocode-ml
Owner: eugeneyan
License: mit
Created: 2020-08-31T00:04:20.000Z (almost 6 years ago)
Default Branch: master
Last Pushed: 2020-08-31T00:04:53.000Z (almost 6 years ago)
Last Synced: 2025-01-01T04:32:09.095Z (over 1 year ago)
Topics: deployment, joke, machine-learning, nocode, workflow
Homepage:
Size: 3.91 KB
Stars: 12
Watchers: 3
Forks: 5
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project

README

          # nocode-ml

Package for `#nocode` machine learning. Supported in all `#nocode` programming languages.

**10 easy steps**:  

1. [Install this package](#1-install-this-package-via-the-command-below-then-click-here)  

2. [Define business problem & objective](#2-define-your-business-problem-and-objective-then-click-here)  

3. [Provide clean data](#3-provide-your-pristine-data-then-click-here)  

4. [Define offline validation approach](#4-set-offline-validation-approach-and-metrics-then-click-here)  

5. [Train the model (one-click)](#5-train-the-machine-learning-model-via-one-click)  

6. [Validate the model offline](#6-validate-the-model-offline-then-click-here)  

7. [Deploy the model (one-click)](#7-deploy-the-ml-model-via-one-click)  

8. [Validate the model online](#8-validate-the-model-online-ie-ab-testing-then-click-here)  

9. [Share the results](#9-share-the-results-then-click-here)  

10. [Maintain the model](#10-maintain-the-model)   

## Quick Start

### 1. Install this package via the command below. Then, click [here](#2-define-your-business-problem-and-objective-then-click-here).

```

```

### 2. Define your business problem and objective. Then, click [here](#3-provide-your-pristine-data-then-click-here).

- Get buy-in from all stakeholders ~~involved~~, including their pet 🐶/🐱/🐔/🌵.   

- (Note: There ~~might~~ will be conflicting objectives. E.g., customer experience wants to remove counterfeit/low-quality products (to protect customers) but commercial refuses as they _think_ it'll reduce revenue.)

- It's okay if you don't have the problem defined. Let's train some ML first and figure it out later. 

- It's okay if you don't have the objective defined. You can decide after viewing the A/B test results.  

- (Optional) Decide how your ML model will benefit customers. Will it (i) be integrated into an existing system, (ii) need a new UI, (iii) augment decision-making, (iv) something else?

### 3. Provide your pristine data. Then, click [here](#4-set-offline-validation-approach-and-metrics-then-click-here).

- Upload your data as a single denormalized `csv`; file size should not exceed 1gb.  

- Data should not have missing values. Decide whether to exclude at row or column level, impute via statistics (e.g., median, mode), machine learning, or a specified null value (e.g., `NA`, `-1`).  

- **For string values**: `ASCII` encoded, lowercased, spellchecked & normalized (see "60 ways to spell Philidelphia" below), [naughty words](https://github.com/LDNOOBW/List-of-Dirty-Naughty-Obscene-and-Otherwise-Bad-Words) removed.  

- **For numerics**: Parsed correctly (e.g., `"$1.00"`, `"USD1.00"`, `"0.85 €"` should all be `1.0`), exclude errors (e.g., age > 200) and possibly outliers.  

- **For date**: Formatted based on [ISO 8601](https://en.wikipedia.org/wiki/ISO_8601). 

- **For human genes**: Formatted based on [industry best practice](https://www.theverge.com/2020/8/6/21355674/human-genes-rename-microsoft-excel-misreading-dates).

- (Optional) Remove redundant columns (e.g., only a single value, >95% missing values, low variance, etc.)

- (Optional) Remove redundant rows (e.g., exact duplicates, >95% missing values, etc.)  

60 ways to spell "Philidelphia"

PHAILLIDELPPHA  

PHIADELPHIA  

PHIALDELPHIA  

PHIDAELPHIA  

PHIELADELPHIA  

PHIILADELPHIA  

PHILA  

PHILA.  

PHILAD  

PHILADALPHIA  

PHILADEDLPHIA  

PHILADELAPHIA  

PHILADELELPHIA  

PHILADELHIA  

PHILADELHIPHILADELHPIA  

PHILADELHPIA  

PHILADELOHIA  

PHILADELPH  

PHILADELPHA  

PHILADELPHAI  

PHILADELPHI  

PHILADELPHIA  

PHILADELPHIA PA  

PHILADELPHIA,  

PHILADELPHIA, PA  

PHILADELPHIA.  

PHILADELPHIAPHIA  

PHILADELPHIOA  

PHILADELPHIOE  

PHILADELPIA  

PHILADELPOHIA  

PHILADELPPHIA  

PHILADEPHA  

PHILADEPHIA  

PHILADEPHILA  

PHILADEPLHIA  

PHILADLEPHIA  

PHILADPHIA  

PHILAELPHIA  

PHILDADELPHIA  

PHILDADLPHIA  

PHILDEALPHIA  

PHILDEALPHIA  

PHILDELPHIA  

PHILDELPHILA  

PHILDEPPHIA  

PHILDRLPHIA  

PHILEAPHIA  

PHILIAHELPHIA  

PHILIDELPHIA  

PHILLA  

PHILLADELPHIA  

PHILLY  

PHILOADELPHIA  

PHLADELPHIA  

PHOLADELPHIA  

PHPILADELPHIA  

PIHLADELPHIA  



**Suggested `#nocode` tools:**   

- To query and join database tables to get the single `csv`, try these `#nocode` tools: [~~SQL~~](https://en.wikipedia.org/wiki/SQL)

- To clean tabular data, try: [~~pandas~~](https://en.wikipedia.org/wiki/Pandas_(software)), [~~Spark~~](https://en.wikipedia.org/wiki/Apache_Spark), [Excel](https://www.microsoft.com/en-us/microsoft-365/excel), [Numbers](https://www.apple.com/numbers/), [Sheets](https://www.google.com/sheets/about/), [OpenOffice Calc](https://www.openoffice.org/product/calc.html)

- To clean and augment images, try: [~~torchvision~~](https://pytorch.org/docs/stable/torchvision/transforms.html), [Paint](https://support.microsoft.com/en-us/help/4027410/windows-10-open-microsoft-paint), [Preview](https://support.apple.com/en-sg/guide/preview/welcome/mac)

### 4. Set (offline) validation approach and metrics. Then, click [here](#5-train-the-machine-learning-model-via-one-click).

- Decide how to split the data into train, validation, and test. (By default, _random-split_ is used, though a [_time-based split_](https://www.fast.ai/2017/11/13/validation-sets/) should be used in most production settings.)

- Decide on metric(s). (By default, [RMSE](https://en.wikipedia.org/wiki/Root-mean-square_deviation) _and_ [accuracy](https://en.wikipedia.org/wiki/Accuracy_and_precision#In_binary_classification) are selected; pick whichever looks best after validation.)

- (Note: Upgrade to PRO edition and get 100+ metrics sorted in order of "What looks best").

### 5. Train the machine learning model via [one-click](#6-validate-the-model-offline-then-click-here). 

- This is the easiest step of all; click on this above ☝️

- The package will run all supervised, unsupervised, semi-supervised, self-supervised, reinforcement, transfer, ensemble, meta, few-shot, one-shot, blockchain learning models, starting with the _most_ compute-intensive.

### 6. Validate the model offline. Then, click [here](#7-deploy-the-ml-model-via-one-click).

- If you have done steps [**2**](#2-define-your-business-problem-and-objective-then-click-here) and [**4**](#4-set-offline-validation-approach-and-metrics-then-click-here) properly, this will be straightforward.

- (Optional) Model analysis via [learning curves](https://en.wikipedia.org/wiki/Learning_curve_(machine_learning)), [precision-recall trade-offs](https://en.wikipedia.org/wiki/Precision_and_recall), [residual analysis](https://en.wikipedia.org/wiki/Regression_analysis#Underlying_assumptions), etc.

- (Optional) Feature importance and [selection](https://en.wikipedia.org/wiki/Feature_selection). (By default, all features are selected even if only 1% are useful.)

- (Optional) [Error analysis](https://www.coursera.org/lecture/machine-learning/error-analysis-x62iE), examine [counterfactuals](https://en.wikipedia.org/wiki/Counterfactual_conditional) and skewed classes (and [adjust distribution](https://en.wikipedia.org/wiki/Oversampling_and_undersampling_in_data_analysis)).  

### 7. Deploy the ML-model via [one-click](#8-validate-the-model-online-ie-ab-testing-then-click-here).

- This is also easy; click on this above ☝️

- By default, the model with the best metric is deployed (even if it requires _10x compute and data_ for training, has _100x inference latency_, and _0.001% improvement_ relative to the 2nd best).

- (Optional) Decide serving approach: Cache, microservice, or embedded in app? (By default, served via `csv`).

- (Optional) Perform QA, integration testing, and stress testing to ensure optimal customer experience.

### 8. Validate the model online (i.e., A/B testing). Then, click [here](#9-share-the-results-then-click-here).

- Estimate effect size and decide on sample size required.

- Decide on random assignment condition: By customer, session, or product?

- Decide on [attribution](https://en.wikipedia.org/wiki/Attribution_(marketing)) model: First touch, last touch, multi-touch, or no-touch?

- Decide on statistical approach: Frequentist, Bayesian, or [Torturean](https://en.wiktionary.org/wiki/if_you_torture_the_data_long_enough,_it_will_confess_to_anything)?

### 9. Share the results. Then, click [here](#10-maintain-the-model).

- Share the best results (even if it's a warehouse optimization model but recommendation CTR goes up).

- Design fancy slides and label everything related to statistics and ML as Artificial Intelligence™.

### 10. Maintain the model.

- You're done! 🎉

- ML models don't need to be refreshed or maintained. (But if you want unnecessary work, read [this](https://eugeneyan.com/writing/practical-guide-to-maintaining-machine-learning/).)

## FAQ

#### How can I contribute to the source code?

There's no need to—there's `#nocode`! But if you want to contribute to the README, raise a [PR](https://github.com/eugeneyan/nocode-ml/pulls).

#### I found a bug! How should I report it?

Impossible! Our package has `#nobugs` as it is `#nocode`.

#### Is this a joke or is this real?

Yes.

#### No seriously, what is this?

It's partly (i) a joke, (ii) a point about the non-ML code related work, and (iii) a basic ML workflow.

## To Do

- [x] Add quick start

- [x] Add no code style guide

- [x] Add license

- [ ] Add unit tests

- [ ] Add code coverage checks

- [ ] Add lint checks

- [ ] Add type checks

- [ ] Add CI/CD pipeline

- [ ] ~~Build CLI for developer experience~~ (out of scope as `#nocode`)
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/eugeneyan/nocode-ml

Awesome Lists containing this project

README