{"id":15641623,"url":"https://github.com/yell/mnist-challenge","last_synced_at":"2025-07-04T01:04:57.661Z","repository":{"id":83176760,"uuid":"83079016","full_name":"yell/mnist-challenge","owner":"yell","description":"My solution to TUM's Machine Learning MNIST challenge 2016-2017 [winner]","archived":false,"fork":false,"pushed_at":"2019-10-09T10:47:02.000Z","size":14431,"stargazers_count":70,"open_issues_count":0,"forks_count":13,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-04-30T09:12:52.662Z","etag":null,"topics":["data-augmentation","deep-learning","deep-neural-networks","gaussian-processes","k-nn","kernel","logistic-regression","machine-learning","mnist","neural-network","pca","python","rbm"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/yell.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-02-24T20:14:32.000Z","updated_at":"2024-01-04T16:11:46.000Z","dependencies_parsed_at":null,"dependency_job_id":"8255d69c-c6ad-4e2c-a6f3-8bfdac83a16c","html_url":"https://github.com/yell/mnist-challenge","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/yell/mnist-challenge","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yell%2Fmnist-challenge","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yell%2Fmnist-challenge/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yell%2Fmnist-challenge/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yell%2Fmnist-challenge/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/yell","download_url":"https://codeload.github.com/yell/mnist-challenge/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yell%2Fmnist-challenge/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":263427302,"owners_count":23464842,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-augmentation","deep-learning","deep-neural-networks","gaussian-processes","k-nn","kernel","logistic-regression","machine-learning","mnist","neural-network","pca","python","rbm"],"created_at":"2024-10-03T11:43:53.124Z","updated_at":"2025-07-04T01:04:57.578Z","avatar_url":"https://github.com/yell.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# ML MNIST Challenge\nThis contest was offered within TU Munich's course Machine Learning (IN2064).\u003cbr\u003e\nThe goal was to implement k-NN, Neural Network, Logistic Regression and Gaussian Process Classifier in \npython from scratch and achieve minimal average test error among these classifiers on well-known MNIST dataset, \nwithout ensemble learning.\n\n## Results\n| Algorithm | \u003cdiv align=\"center\"\u003eDescription\u003c/div\u003e | Test Error, % |\n| :---: | :--- | :---: |\n| ***k-NN*** | 3-NN, Euclidean distance, uniform weights.\u003cbr/\u003e*Preprocessing*: Feature vectors extracted from ***NN***. | **1.13** |\n| ***k-NN\u003csub\u003e2\u003c/sub\u003e*** | 3-NN, Euclidean distance, uniform weights.\u003cbr/\u003e*Preprocessing*: Augment (training) data (\u0026#215;9) by using random rotations,\u003cbr/\u003eshifts, Gaussian blur and dropout pixels; PCA-35 whitening and multiplying\u003cbr/\u003eeach feature vector by e\u003csup\u003e11.6 \u0026#183; ***s***\u003c/sup\u003e, where ***s*** \u0026ndash; normalized explained\u003cbr/\u003evariance by the respective principal axis. (equivalent to applying PCA\u003cbr/\u003ewhitening with accordingly weighted Euclidean distance). | **2.06** |\n| ***NN*** | MLP 784-1337-D(0.05)-911-D(0.1)-666-333-128-10 (D \u0026ndash; dropout);\u003cbr/\u003ehidden activations \u0026ndash; LeakyReLU(0.01), output \u0026ndash; softmax; loss \u0026ndash; categorical\u003cbr/\u003ecross-entropy; 1024 batches; 42 epochs; optimizer \u0026ndash; *Adam* (learning rate\u003cbr/\u003e5 \u0026#183; 10\u003csup\u003e\u0026ndash;5\u003c/sup\u003e, rest \u0026ndash; defaults from paper).\u003cbr/\u003e*Preprocessing*: Augment (training) data (\u0026#215;5) by using random rotations,\u003cbr/\u003e shifts, Gaussian blur. | **1.04** |\n| ***LogReg*** | 32 batches; 91 epoch; L2-penalty, \u0026#955; = 3.16 \u0026#183; 10\u003csup\u003e\u0026ndash;4\u003c/sup\u003e; optimizer \u0026ndash; *Adam* (learning\u003cbr/\u003erate 10\u003csup\u003e\u0026ndash;3\u003c/sup\u003e, rest \u0026ndash; defaults from paper)\u003cbr/\u003e*Preprocessing*: Feature vectors extracted from ***NN***. | **1.01** |\n| ***GPC*** | 794 random data points were used for training; \u0026#963;\u003csub\u003en\u003c/sub\u003e = 0; RBF kernel (\u0026#963;\u003csub\u003ef\u003c/sub\u003e = 0.4217,\u003cbr/\u003e\u0026#947; = 1/2l\u003csup\u003e2\u003c/sup\u003e = 0.0008511); Newton iterations for Laplace approximation till\u003cbr/\u003e\u0026#916;Log-Marginal-Likelihood \u0026leq; 10\u003csup\u003e\u0026ndash;7\u003c/sup\u003e; solve linear systems iteratively using CG with\u003cbr/\u003e 10\u003csup\u003e\u0026ndash;7\u003c/sup\u003e tolerance; for prediction generate 2000 samples for each test point.\u003cbr/\u003e*Preprocessing*: Feature vectors extracted from ***NN***. | **1.59** |\n\n## Visualizations\n![1](img/demo2.png)\nAnd more available in `experiments/plots/`.\n\n## How to install\n```bash\ngit clone https://github.com/yell/mnist-challenge\ncd mnist-challenge/\npip install -r requirements.txt\n```\nAfter installation, tests can be run with:\n```bash\nmake test\n```\n\n## How to run\nCheck [main.py](main.py) to reproduce training and testing the final models:\n```bash\nusage: main.py [-h] [--load-nn] model\n\npositional arguments:\n  model       which model to run, {'gp', 'knn', 'knn-without-nn', 'logreg',\n              'nn'}\n\noptional arguments:\n  -h, --help  show this help message and exit\n  --load-nn   whether to use pretrained neural network, ignored if 'knn-\n              without-nn' is used (default: False)\n```\n\n## Experiments\nCheck also [this notebook](experiments/cross_validations.ipynb) to see what I've tried.\u003cbr/\u003e\n**Note**: the approach RBM + LogReg gave only at most `91.8%` test accuracy since RBM takes too long to train with given pure python code, thus it was only trained on small subset of data (and still underfitted). However, with properly trained RBM on the whole training set, this approach can give `1.83%` test error (see my [Boltzmann machines project](https://github.com/yell/boltzmann-machines))\n\n## Features\n* Apart from specified algorithms, there are also PCA and RBM implementations\n* Most of the classes contain doctests so they are easy to understand\n* All randomness in algorithms or functions is reproducible (seeds)\n* Support of simple readable serialization (JSON)\n* There are also some infrastructure for model selection, feature selection, data augmentation, metrics, plots etc.)\n* Support for ***MNIST*** or ***Fashion MNIST*** (both have the same structure thus both can be loaded using the [same routine](mnist-challenge/utils/dataset.py)), haven't tried the latter yet, though\n\n## System\nAll computations and time measurements were made on laptop `i7-5500U CPU @ 2.40GHz x 4` `12GB RAM`\n\n## Possible future work\nHere the list of what can also be tried regarding these particular 4 ML algorithms (didn't have time to check it, or it was forbidden by the rules, e.g. ensemble learning):\n* Model averaging for k-NN: train a group of k-NNs with different parameter *k* (say, 2, 4, ..., 128) and average their predictions;\n* More sophisticated metrics (say, from `scipy.spatial.distance`) for k-NN;\n* Weighting metrics according to some other functions of explained variance from PCA;\n* NCA;\n* Different kernels or compound kernels for k-NN;\n* Commitee of MLPs, CNN, commitee of CNNs or more advanced NNs;\n* Unsupervised pretraining for MLP/CNN;\n* Different kernels or compound kernels for GPCs;\n* 10 one-vs-rest GPCs;\n* Use derivatives of Log-Marginal-Likelihood for multiclass Laplace approximation w.r.t kernel parameters for more efficient gradient-based optimization;\n* Model averaging for GPCs: train a collection of GPCs on different parts of the data and then average their predictions (or bagging);\n* IVM.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fyell%2Fmnist-challenge","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fyell%2Fmnist-challenge","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fyell%2Fmnist-challenge/lists"}