{"id":13394836,"url":"https://github.com/humphd/have-fun-with-machine-learning","last_synced_at":"2025-05-14T18:05:31.228Z","repository":{"id":15136321,"uuid":"77629496","full_name":"humphd/have-fun-with-machine-learning","owner":"humphd","description":"An absolute beginner's guide to Machine Learning and Image Classification with Neural Networks","archived":false,"fork":false,"pushed_at":"2021-12-19T18:38:53.000Z","size":7878,"stargazers_count":5100,"open_issues_count":8,"forks_count":541,"subscribers_count":191,"default_branch":"master","last_synced_at":"2025-04-13T10:57:44.357Z","etag":null,"topics":["caffe","image-classification","machine-learning","neural-network","tutorial"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/humphd.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2016-12-29T17:43:50.000Z","updated_at":"2025-04-10T14:28:38.000Z","dependencies_parsed_at":"2022-07-19T05:47:03.442Z","dependency_job_id":null,"html_url":"https://github.com/humphd/have-fun-with-machine-learning","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/humphd%2Fhave-fun-with-machine-learning","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/humphd%2Fhave-fun-with-machine-learning/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/humphd%2Fhave-fun-with-machine-learning/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/humphd%2Fhave-fun-with-machine-learning/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/humphd","download_url":"https://codeload.github.com/humphd/have-fun-with-machine-learning/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254198514,"owners_count":22030965,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["caffe","image-classification","machine-learning","neural-network","tutorial"],"created_at":"2024-07-30T17:01:33.294Z","updated_at":"2025-05-14T18:05:26.218Z","avatar_url":"https://github.com/humphd.png","language":"Python","funding_links":[],"categories":["Python","Introduction","Buzz-Words","📦 Legacy \u0026 Inactive Projects","📚 Project Purpose","Learning"],"sub_categories":["Machine Learning","Machine Learning (Entry-Level)"],"readme":"# Have Fun with Machine Learning: A Guide for Beginners\nAlso available in [Chinese (Traditional)](README_zh-tw.md).  \nAlso available in [Korean](README_ko-KR.md).\n\n## Preface\n\nThis is a **hands-on guide** to machine learning for programmers with *no background* in\nAI. Using a neural network doesn’t require a PhD, and you don’t need to be the person who\nmakes the next breakthrough in AI in order to *use* what exists today.  What we have now\nis already breathtaking, and highly usable.  I believe that more of us need to play with\nthis stuff like we would any other open source technology, instead of treating it like a\nresearch topic.\n\nIn this guide our goal will be to write a program that uses machine learning to predict, with a\nhigh degree of certainty, whether the images in [data/untrained-samples](data/untrained-samples)\nare of **dolphins** or **seahorses** using only the images themselves, and without\nhaving seen them before.  Here are two example images we'll use:\n\n![A dolphin](data/untrained-samples/dolphin1.jpg?raw=true \"Dolphin\")\n![A seahorse](data/untrained-samples/seahorse1.jpg?raw=true \"Seahorse\")\n\nTo do that we’re going to train and use a [Convolutional Neural Network (CNN)](https://en.wikipedia.org/wiki/Convolutional_neural_network).\nWe’re going to approach this from the point of view of a practitioner vs.\nfrom first principles. There is so much excitement about AI right now,\nbut much of what’s being written feels like being taught to do\ntricks on your bike by a physics professor at a chalkboard instead\nof your friends in the park.\n\nI’ve decided to write this on Github vs. as a blog post\nbecause I’m sure that some of what I’ve written below is misleading, naive, or\njust plain wrong.  I’m still learning myself, and I’ve found the lack of solid\nbeginner documentation an obstacle.  If you see me making a mistake or missing\nimportant details, please send a pull request. \n\nWith all of that out the way, let me show you how to do some tricks on your bike!\n\n## Overview\n\nHere’s what we’re going to explore:\n\n* Setup and use existing, open source machine learning technologies, specifically [Caffe](http://caffe.berkeleyvision.org/) and [DIGITS](https://developer.nvidia.com/digits)\n* Create a dataset of images\n* Train a neural network from scratch\n* Test our neural network on images it has never seen before\n* Improve our neural network’s accuracy by fine tuning existing neural networks (AlexNet and GoogLeNet)\n* Deploy and use our neural network\n\nThis guide won’t teach you how neural networks are designed, cover much theory,\nor use a single mathematical expression.  I don’t pretend to understand most of\nwhat I’m going to show you.  Instead, we’re going to use existing things in\ninteresting ways to solve a hard problem.\n\n\u003e Q: \"I know you said we won’t talk about the theory of neural networks, but I’m\n\u003e feeling like I’d at least like an overview before we get going.  Where should I start?\"\n\nThere are literally hundreds of introductions to this, from short posts to full\nonline courses.  Depending on how you like to learn, here are three options\nfor a good starting point:\n\n* This fantastic [blog post](https://jalammar.github.io/visual-interactive-guide-basics-neural-networks/) by J Alammar,\nwhich introduces the concepts of neural networks using intuitive examples.\n* Similarly, [this video](https://www.youtube.com/watch?v=FmpDIaiMIeA) introduction by [Brandon Rohrer](https://www.youtube.com/channel/UCsBKTrp45lTfHa_p49I2AEQ) is a really good intro to\nConvolutional Neural Networks like we'll be using\n* If you’d rather have a bit more theory, I’d recommend [this online book](http://neuralnetworksanddeeplearning.com/chap1.html) by [Michael Nielsen](http://michaelnielsen.org/).\n\n## Setup\n\nInstalling the software we'll use (Caffe and DIGITS) can be frustrating, depending on your platform\nand OS version.  By far the easiest way to do it is using Docker.  Below we examine how to do it with Docker,\nas well as how to do it natively.\n\n### Option 1a: Installing Caffe Natively\n\nFirst, we’re going to be using the [Caffe deep learning framework](http://caffe.berkeleyvision.org/)\nfrom the Berkely Vision and Learning Center (BSD licensed).\n\n\u003e Q: “Wait a minute, why Caffe? Why not use something like TensorFlow,\n\u003e which everyone is talking about these days…”  \n\nThere are a lot of great choices available, and you should look at all the\noptions.  [TensorFlow](https://www.tensorflow.org/) is great, and you should\nplay with it.  However, I’m using Caffe for a number of reasons:\n\n* It’s tailormade for computer vision problems\n* It has support for C++, Python, (with [node.js support](https://github.com/silklabs/node-caffe) coming)\n* It’s fast and stable\n\nBut the **number one reason** I’m using Caffe is that you **don’t need to write any code** to work\nwith it.  You can do everything declaratively (Caffe uses structured text files to define the\nnetwork architecture) and using command-line tools.  Also, you can use some nice front-ends for Caffe to make\ntraining and validating your network a lot easier.  We’ll be using\n[nVidia’s DIGITS](https://developer.nvidia.com/digits) tool below for just this purpose.\n\nCaffe can be a bit of work to get installed.  There are [installation instructions](http://caffe.berkeleyvision.org/installation.html)\nfor various platforms, including some prebuilt Docker or AWS configurations.  \n\n**NOTE:** when making my walkthrough, I used the following non-released version of Caffe from their Github repo:\nhttps://github.com/BVLC/caffe/commit/5a201dd960840c319cefd9fa9e2a40d2c76ddd73\n\nOn a Mac it can be frustrating to get working, with version issues halting\nyour progress at various steps in the build.  It took me a couple of days\nof trial and error.  There are a dozen guides I followed, each with slightly\ndifferent problems.  In the end I found [this one](https://gist.github.com/doctorpangloss/f8463bddce2a91b949639522ea1dcbe4) to be the closest.\nI’d also recommend [this post](https://eddiesmo.wordpress.com/2016/12/20/how-to-set-up-caffe-environment-and-pycaffe-on-os-x-10-12-sierra/),\nwhich is quite recent and links to many of the same discussions I saw. \n\nGetting Caffe installed is by far the hardest thing we'll do, which is pretty\nneat, since you’d assume the AI aspects would be harder!  Don’t give up if you have\nissues, it’s worth the pain.  If I was doing this again, I’d probably use an Ubuntu VM\ninstead of trying to do it on Mac directly.  There's also a [Caffe Users](https://groups.google.com/forum/#!forum/caffe-users) group, if you need answers.\n\n\u003e Q: “Do I need powerful hardware to train a neural network? What if I don’t have\n\u003e access to fancy GPUs?”\n\nIt’s true, deep neural networks require a lot of computing power and energy to\ntrain...if you’re training them from scratch and using massive datasets.\nWe aren’t going to do that.  The secret is to use a pretrained network that someone\nelse has already invested hundreds of hours of compute time training, and then to fine\ntune it to your particular dataset.  We’ll look at how to do this below, but suffice\nit to say that everything I’m going to show you, I’m doing on a year old MacBook\nPro without a fancy GPU.\n\nAs an aside, because I have an integrated Intel graphics card vs. an nVidia GPU,\nI decided to use the [OpenCL Caffe branch](https://github.com/BVLC/caffe/tree/opencl),\nand it’s worked great on my laptop.\n\nWhen you’re done installing Caffe, you should have, or be able to do all of the following:\n\n* A directory that contains your built caffe.  If you did this in the standard way,\nthere will be a `build/` dir which contains everything you need to run caffe,\nthe Python bindings, etc.  The parent dir that contains `build/` will be your\n`CAFFE_ROOT` (we’ll need this later).\n* Running `make test \u0026\u0026 make runtest` should pass\n* After installing all the Python deps (doing `pip install -r requirements.txt` in `python/`),\nrunning `make pycaffe \u0026\u0026 make pytest` should pass\n* You should also run `make distribute` in order to create a distributable version of caffe with all necessary headers, binaries, etc. in `distribute/`.\n\nOn my machine, with Caffe fully built, I’ve got the following basic layout in my CAFFE_ROOT dir:\n\n```\ncaffe/\n    build/\n        python/\n        lib/\n        tools/\n            caffe ← this is our main binary \n    distribute/\n        python/\n        lib/\n        include/\n        bin/\n        proto/\n```\n\nAt this point, we have everything we need to train, test, and program with neural\nnetworks.  In the next section we’ll add a user-friendly, web-based front end to\nCaffe called DIGITS, which will make training and testing our networks much easier.\n\n### Option 1b: Installing DIGITS Natively\n\nnVidia’s [Deep Learning GPU Training System, or DIGITS](https://github.com/NVIDIA/DIGITS),\nis BSD-licensed Python web app for training neural networks.  While it’s\npossible to do everything DIGITS does in Caffe at the command-line, or with code,\nusing DIGITS makes it a lot easier to get started.  I also found it more fun, due\nto the great visualizations, real-time charts, and other graphical features.\nSince you’re experimenting and trying to learn, I highly recommend beginning with DIGITS.\n\nThere are quite a few good docs at https://github.com/NVIDIA/DIGITS/tree/master/docs,\nincluding a few [Installation](https://github.com/NVIDIA/DIGITS/blob/master/docs/BuildDigits.md),\n[Configuration](https://github.com/NVIDIA/DIGITS/blob/master/docs/Configuration.md),\nand [Getting Started](https://github.com/NVIDIA/DIGITS/blob/master/docs/GettingStarted.md)\npages.  I’d recommend reading through everything there before you continue, as I’m not\nan expert on everything you can do with DIGITS.  There's also a public [DIGITS User Group](https://groups.google.com/forum/#!forum/digits-users) if you have questions you need to ask.\n\nThere are various ways to install and run DIGITS, from Docker to pre-baked packages\non Linux, or you can build it from source. I’m on a Mac, so I built it from source.\n\n**NOTE:** In my walkthrough I've used the following non-released version of DIGITS\nfrom their Github repo: https://github.com/NVIDIA/DIGITS/commit/81be5131821ade454eb47352477015d7c09753d9\n\nBecause it’s just a bunch of Python scripts, it was fairly painless to get working.\nThe one thing you need to do is tell DIGITS where your `CAFFE_ROOT` is by setting\nan environment variable before starting the server:\n\n```bash\nexport CAFFE_ROOT=/path/to/caffe\n./digits-devserver\n```\n\nNOTE: on Mac I had issues with the server scripts assuming my Python binary was\ncalled `python2`, where I only have `python2.7`.  You can symlink it in `/usr/bin`\nor modify the DIGITS startup script(s) to use the proper binary on your system.\n\nOnce the server is started, you can do everything else via your web browser at http://localhost:5000, which is what I'll do below.\n\n### Option 2: Caffe and DIGITS using Docker\n\nInstall [Docker](https://www.docker.com/), if not already installed, then run the following command\nin order to pull and run a full Caffe + Digits container.  A few things to note:\n* make sure port 8080 isn't allocated by another program. If so, change it to any other port you want.\n* change `/path/to/this/repository` to the location of this cloned repo, and `/data/repo` within the container\nwill be bound to this directory.  This is useful for accessing the images discussed below.\n\n```bash\ndocker run --name digits -d -p 8080:5000 -v /path/to/this/repository:/data/repo kaixhin/digits\n```\n\nNow that we have our container running you can open up your web browser and open `http://localhost:8080`. Everything in the repository is now in the container directory `/data/repo`.  That's it. You've now got Caffe and DIGITS working.\n\nIf you need shell access, use the following command:\n\n```bash\ndocker exec -it digits /bin/bash\n```\n\n## Training a Neural Network\n\nTraining a neural network involves a few steps:\n\n1. Assemble and prepare a dataset of categorized images\n2. Define the network’s architecture\n3. Train and Validate this network using the prepared dataset\n\nWe’re going to do this 3 different ways, in order to show the difference\nbetween starting from scratch and using a pretrained network, and also to show\nhow to work with two popular pretrained networks (AlexNet, GoogLeNet) that are\ncommonly used with Caffe and DIGITs.\n\nFor our training attempts, we’ll use a small dataset of Dolphins and Seahorses.\nI’ve put the images I used in [data/dolphins-and-seahorses](data/dolphins-and-seahorses).\nYou need at least 2 categories, but could have many more (some of the networks\nwe’ll use were trained on 1000+ image categories).  Our goal is to be able to\ngive an image to our network and have it tell us whether it’s a Dolphin or a Seahorse.\n\n### Prepare the Dataset\n\nThe easiest way to begin is to divide your images into a categorized directory layout:\n\n```\ndolphins-and-seahorses/\n    dolphin/\n        image_0001.jpg\n        image_0002.jpg\n        image_0003.jpg\n        ...\n    seahorse/\n        image_0001.jpg\n        image_0002.jpg\n        image_0003.jpg\n        ...\n```\n\nHere each directory is a category we want to classify, and each image within\nthat category dir an example we’ll use for training and validation. \n\n\u003e Q: “Do my images have to be the same size?  What about the filenames, do they matter?”\n\nNo to both. The images sizes will be normalized before we feed them into\nthe network.  We’ll eventually want colour images of 256 x 256 pixels, but\nDIGITS will crop or squash (we'll squash) our images automatically in a moment.\nThe filenames are irrelevant--it’s only important which category they are contained\nwithin.\n\n\u003e Q: “Can I do more complex segmentation of my categories?”\n\nYes. See https://github.com/NVIDIA/DIGITS/blob/digits-4.0/docs/ImageFolderFormat.md.\n\nWe want to use these images on disk to create a **New Dataset**, and specifically,\na **Classification Dataset**.\n\n![Create New Dataset](images/create-new-dataset.png?raw=true \"Create New Dataset\")\n\nWe’ll use the defaults DIGITS gives us, and point **Training Images** at the path\nto our [data/dolphins-and-seahorses](data/dolphins-and-seahorses) folder.\nDIGITS will use the categories (`dolphin` and `seahorse`) to create a database\nof squashed, 256 x 256 Training (75%) and Testing (25%) images.\n\nGive your Dataset a name,`dolphins-and-seahorses`, and click **Create**.\n\n![New Image Classification Dataset](images/new-image-classification-dataset.png?raw=true \"New Image Classification Dataset\")\n\nThis will create our dataset, which took only 4s on my laptop.  In the end I\nhave 92 Training images (49 dolphin, 43 seahorse) in 2 categories, with 30\nValidation images (16 dolphin, 14 seahorse).  It’s a really small dataset, but perfect\nfor our experimentation and learning purposes, because it won’t take forever to train\nand validate a network that uses it. \n\nYou can **Explore the db** if you want to see the images after they have been squashed. \n\n![Explore the db](images/explore-dataset.png?raw=true \"Explore the db\")\n\n### Training: Attempt 1, from Scratch\n\nBack in the DIGITS Home screen, we need to create a new **Classification Model**:\n\n![Create Classification Model](images/create-classification-model.png?raw=true \"Create Classification Model\")\n\nWe’ll start by training a model that uses our `dolphins-and-seahorses` dataset,\nand the default settings DIGITS provides.  For our first network, we’ll choose to\nuse one of the standard network architectures, [AlexNet (pdf)](http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf). [AlexNet’s design](http://vision.stanford.edu/teaching/cs231b_spring1415/slides/alexnet_tugce_kyunghee.pdf)\nwon a major computer vision competition called ImageNet in 2012.  The competition\nrequired categorizing 1000+ image categories across 1.2 million images.\n \n![New Classification Model 1](images/new-image-classification-model-attempt1.png?raw=true \"Model 1\")\n\nCaffe uses structured text files to define network architectures.  These text files\nare based on [Google’s Protocol Buffers](https://developers.google.com/protocol-buffers/).\nYou can read the [full schema](https://github.com/BVLC/caffe/blob/master/src/caffe/proto/caffe.proto) Caffe uses.\nFor the most part we’re not going to work with these, but it’s good to be aware of their\nexistence, since we’ll have to modify them in later steps.  The AlexNet prototxt file\nlooks like this, for example: https://github.com/BVLC/caffe/blob/master/models/bvlc_alexnet/train_val.prototxt. \n\nWe’ll train our network for **30 epochs**, which means that it will learn (with our\ntraining images) then test itself (using our validation images), and adjust the\nnetwork’s weights depending on how well it’s doing, and repeat this process 30 times.\nEach time it completes a cycle we’ll get info about **Accuracy** (0% to 100%,\nwhere higher is better) and what our **Loss** is (the sum of all the mistakes that were\nmade, where lower is better).  Ideally we want a network that is able to predict with\nhigh accuracy, and with few errors (small loss).\n\n**NOTE:** some people have [reported hitting errors in DIGITS](https://github.com/humphd/have-fun-with-machine-learning/issues/17)\ndoing this training run. For many, the problem related to available memory (the process\nneeds a lot of memory to work).  If you're using Docker, you might want to try\nincreasing the amount of memory available to DIGITS (in Docker, preferences -\u003e advanced -\u003e memory).\n\nInitially, our network’s accuracy is a bit below 50%.  This makes sense, because at first it’s\njust “guessing” between two categories using randomly assigned weights.  Over time\nit’s able to achieve 87.5% accuracy, with a loss of 0.37.  The entire 30 epoch run\ntook me just under 6 minutes.\n\n![Model Attempt 1](images/model-attempt1.png?raw=true \"Model Attempt 1\")\n\nWe can test our model using an image we upload or a URL to an image on the web.\nLet’s test it on a few examples that weren’t in our training/validation dataset:\n\n![Model 1 Classify 1](images/model-attempt1-classify1.png?raw=true \"Model 1 Classify 1\")\n\n![Model 1 Classify 2](images/model-attempt1-classify2.png?raw=true \"Model 1 Classify 2\")\n\nIt almost seems perfect, until we try another:\n\n![Model 1 Classify 3](images/model-attempt1-classify3.png?raw=true \"Model 1 Classify 3\")\n\nHere it falls down completely, and confuses a seahorse for a dolphin, and worse,\ndoes so with a high degree of confidence.\n\nThe reality is that our dataset is too small to be useful for training a really good\nneural network.  We really need 10s or 100s of thousands of images, and with that, a\nlot of computing power to process everything.\n\n### Training: Attempt 2, Fine Tuning AlexNet\n\n#### How Fine Tuning works\n\nDesigning a neural network from scratch, collecting data sufficient to train\nit (e.g., millions of images), and accessing GPUs for weeks to complete the\ntraining is beyond the reach of most of us.  To make it practical for smaller amounts\nof data to be used, we employ a technique called **Transfer Learning**, or **Fine Tuning**.\nFine tuning takes advantage of the layout of deep neural networks, and uses\npretrained networks to do the hard work of initial object detection.\n\nImagine using a neural network to be like looking at something far away with a \npair of binoculars.  You first put the binoculars to your eyes, and everything is\nblurry.  As you adjust the focus, you start to see colours, lines, shapes, and eventually\nyou are able to pick out the shape of a bird, then with some more adjustment you can\nidentify the species of bird.\n\nIn a multi-layered network, the initial layers extract features (e.g., edges), with\nlater layers using these features to detect shapes (e.g., a wheel, an eye), which are\nthen feed into final classification layers that detect items based on accumulated \ncharacteristics from previous layers (e.g., a cat vs. a dog).  A network has to be \nable to go from pixels to circles to eyes to two eyes placed in a particular orientation, \nand so on up to being able to finally conclude that an image depicts a cat.\n\nWhat we’d like to do is to specialize an existing, pretrained network for classifying \na new set of image classes instead of the ones on which it was initially trained. Because\nthe network already knows how to “see” features in images, we’d like to retrain \nit to “see” our particular image types.  We don’t need to start from scratch with the \nmajority of the layers--we want to transfer the learning already done in these layers \nto our new classification task.  Unlike our previous attempt, which used random weights, \nwe’ll use the existing weights of the final network in our training.  However, we’ll \nthrow away the final classification layer(s) and retrain the network with *our* image \ndataset, fine tuning it to our image classes.\n\nFor this to work, we need a pretrained network that is similar enough to our own data\nthat the learned weights will be useful.  Luckily, the networks we’ll use below were \ntrained on millions of natural images from [ImageNet](http://image-net.org/), which \nis useful across a broad range of classification tasks.\n\nThis technique has been used to do interesting things like screening for eye diseases \nfrom medical imagery, identifying plankton species from microscopic images collected at \nsea, to categorizing the artistic style of Flickr images.\n\nDoing this perfectly, like all of machine learning, requires you to understand the\ndata and network architecture--you have to be careful with overfitting of the data, \nmight need to fix some of the layers, might need to insert new layers, etc. However,\nmy experience is that it “Just Works” much of the time, and it’s worth you simply doing\nan experiment to see what you can achieve using our naive approach.\n\n#### Uploading Pretrained Networks\n\nIn our first attempt, we used AlexNet’s architecture, but started with random\nweights in the network’s layers.  What we’d like to do is download and use a\nversion of AlexNet that has already been trained on a massive dataset.\n\nThankfully we can do exactly this.  A snapshot of AlexNet is available for download: https://github.com/BVLC/caffe/tree/master/models/bvlc_alexnet.\nWe need the binary `.caffemodel` file, which is what contains the trained weights, and it’s\navailable for download at http://dl.caffe.berkeleyvision.org/bvlc_alexnet.caffemodel.\n\nWhile you’re downloading pretrained models, let’s get one more at the same time.\nIn 2014, Google won the same ImageNet competition with [GoogLeNet](https://research.google.com/pubs/pub43022.html) (codenamed Inception):\na 22-layer neural network. A snapshot of GoogLeNet is available for download\nas well, see https://github.com/BVLC/caffe/tree/master/models/bvlc_googlenet.\nAgain, we’ll need the `.caffemodel` file with all the pretrained weights,\nwhich is available for download at http://dl.caffe.berkeleyvision.org/bvlc_googlenet.caffemodel. \n\nWith these `.caffemodel` files in hand, we can upload them into DIGITs.  Go to\nthe **Pretrained Models** tab in DIGITs home page and choose **Upload Pretrained Model**:\n\n![Load Pretrained Model](images/load-pretrained-model.png?raw=true \"Load Pretrained Model\")\n\nFor both of these pretrained models, we can use the defaults DIGITs provides\n(i.e., colour, squashed images of 256 x 256).  We just need to provide the \n`Weights (**.caffemodel)` and `Model Definition (original.prototxt)`.\nClick each of those buttons to select a file.\n\nFor the model definitions we can use https://github.com/BVLC/caffe/blob/master/models/bvlc_googlenet/train_val.prototxt\nfor GoogLeNet and https://github.com/BVLC/caffe/blob/master/models/bvlc_alexnet/train_val.prototxt\nfor AlexNet.  We aren’t going to use the classification labels of these networks,\nso we’ll skip adding a `labels.txt` file:\n \n![Upload Pretrained Model](images/upload-pretrained-model.png?raw=true \"Upload Pretrained Model\")\n\nRepeat this process for both AlexNet and GoogLeNet, as we’ll use them both in the coming steps.\n\n\u003e Q: \"Are there other networks that would be good as a basis for fine tuning?\"\n\nThe [Caffe Model Zoo](http://caffe.berkeleyvision.org/model_zoo.html) has quite a few other\npretrained networks that could be used, see https://github.com/BVLC/caffe/wiki/Model-Zoo.\n\n#### Fine Tuning AlexNet for Dolphins and Seahorses\n\nTraining a network using a pretrained Caffe Model is similar to starting from scratch,\nthough we have to make a few adjustments.  First, we’ll adjust the **Base Learning Rate**\nto 0.001 from 0.01, since we don’t need to make such large jumps (i.e., we’re fine tuning).\nWe’ll also use a **Pretrained Network**, and **Customize** it.\n\n![New Image Classification](images/new-image-classification-model-attempt2.png?raw=true \"New Image Classification\")\n\nIn the pretrained model’s definition (i.e., prototext), we need to rename all\nreferences to the final **Fully Connected Layer** (where the end result classifications\nhappen).  We do this because we want the model to re-learn new categories from\nour dataset vs. its original training data (i.e., we want to throw away the current\nfinal layer).  We have to rename the last fully connected layer from “fc8” to\nsomething else, “fc9” for example.  Finally, we also need to adjust the number\nof categories from `1000` to `2`, by changing `num_output` to `2`.\n\nHere are the changes we need to make:\n\n```diff\n@@ -332,8 +332,8 @@\n }\n layer {\n-  name: \"fc8\"\n+  name: \"fc9\"\n   type: \"InnerProduct\"\n   bottom: \"fc7\"\n-  top: \"fc8\"\n+  top: \"fc9\"\n   param {\n     lr_mult: 1\n@@ -345,5 +345,5 @@\n   }\n   inner_product_param {\n-    num_output: 1000\n+    num_output: 2\n     weight_filler {\n       type: \"gaussian\"\n@@ -359,5 +359,5 @@\n   name: \"accuracy\"\n   type: \"Accuracy\"\n-  bottom: \"fc8\"\n+  bottom: \"fc9\"\n   bottom: \"label\"\n   top: \"accuracy\"\n@@ -367,5 +367,5 @@\n   name: \"loss\"\n   type: \"SoftmaxWithLoss\"\n-  bottom: \"fc8\"\n+  bottom: \"fc9\"\n   bottom: \"label\"\n   top: \"loss\"\n@@ -375,5 +375,5 @@\n   name: \"softmax\"\n   type: \"Softmax\"\n-  bottom: \"fc8\"\n+  bottom: \"fc9\"\n   top: \"softmax\"\n   include { stage: \"deploy\" }\n```\n\nI’ve included the fully modified file I’m using in [src/alexnet-customized.prototxt](src/alexnet-customized.prototxt).\n\nThis time our accuracy starts at ~60% and climbs right away to 87.5%, then to 96%\nand all the way up to 100%, with the Loss steadily decreasing. After 5 minutes we\nend up with an accuracy of 100% and a loss of 0.0009.\n\n![Model Attempt 2](images/model-attempt2.png?raw=true \"Model Attempt 2\")\n\nTesting the same seahorse image our previous network got wrong, we see a complete\nreversal: 100% seahorse.\n\n![Model 2 Classify 1](images/model-attempt2-classify1.png?raw=true \"Model 2 Classify 1\")\n\nEven a children’s drawing of a seahorse works:\n\n![Model 2 Classify 2](images/model-attempt2-classify2.png?raw=true \"Model 2 Classify 2\")\n\nThe same goes for a dolphin:\n\n![Model 2 Classify 3](images/model-attempt2-classify3.png?raw=true \"Model 2 Classify 3\")\n\nEven with images that you think might be hard, like this one that has multiple dolphins\nclose together, and with their bodies mostly underwater, it does the right thing:\n\n![Model 2 Classify 4](images/model-attempt2-classify4.png?raw=true \"Model 2 Classify 4\")\n\n### Training: Attempt 3, Fine Tuning GoogLeNet\n\nLike the previous AlexNet model we used for fine tuning, we can use GoogLeNet as well.\nModifying the network is a bit trickier, since you have to redefine three fully\nconnected layers instead of just one.\n\nTo fine tune GoogLeNet for our use case, we need to once again create a\nnew **Classification Model**:\n\n![New Classification Model](images/new-image-classification-model-attempt3.png?raw=true \"New Classification Model\")\n\nWe rename all references to the three fully connected classification layers,\n`loss1/classifier`, `loss2/classifier`, and `loss3/classifier`, and redefine\nthe number of categories (`num_output: 2`).  Here are the changes we need to make\nin order to rename the 3 classifier layers, as well as to change from 1000 to 2 categories:\n\n```diff\n@@ -917,10 +917,10 @@\n   exclude { stage: \"deploy\" }\n }\n layer {\n-  name: \"loss1/classifier\"\n+  name: \"loss1a/classifier\"\n   type: \"InnerProduct\"\n   bottom: \"loss1/fc\"\n-  top: \"loss1/classifier\"\n+  top: \"loss1a/classifier\"\n   param {\n     lr_mult: 1\n     decay_mult: 1\n@@ -930,7 +930,7 @@\n     decay_mult: 0\n   }\n   inner_product_param {\n-    num_output: 1000\n+    num_output: 2\n     weight_filler {\n       type: \"xavier\"\n       std: 0.0009765625\n@@ -945,7 +945,7 @@\n layer {\n   name: \"loss1/loss\"\n   type: \"SoftmaxWithLoss\"\n-  bottom: \"loss1/classifier\"\n+  bottom: \"loss1a/classifier\"\n   bottom: \"label\"\n   top: \"loss1/loss\"\n   loss_weight: 0.3\n@@ -954,7 +954,7 @@\n layer {\n   name: \"loss1/top-1\"\n   type: \"Accuracy\"\n-  bottom: \"loss1/classifier\"\n+  bottom: \"loss1a/classifier\"\n   bottom: \"label\"\n   top: \"loss1/accuracy\"\n   include { stage: \"val\" }\n@@ -962,7 +962,7 @@\n layer {\n   name: \"loss1/top-5\"\n   type: \"Accuracy\"\n-  bottom: \"loss1/classifier\"\n+  bottom: \"loss1a/classifier\"\n   bottom: \"label\"\n   top: \"loss1/accuracy-top5\"\n   include { stage: \"val\" }\n@@ -1705,10 +1705,10 @@\n   exclude { stage: \"deploy\" }\n }\n layer {\n-  name: \"loss2/classifier\"\n+  name: \"loss2a/classifier\"\n   type: \"InnerProduct\"\n   bottom: \"loss2/fc\"\n-  top: \"loss2/classifier\"\n+  top: \"loss2a/classifier\"\n   param {\n     lr_mult: 1\n     decay_mult: 1\n@@ -1718,7 +1718,7 @@\n     decay_mult: 0\n   }\n   inner_product_param {\n-    num_output: 1000\n+    num_output: 2\n     weight_filler {\n       type: \"xavier\"\n       std: 0.0009765625\n@@ -1733,7 +1733,7 @@\n layer {\n   name: \"loss2/loss\"\n   type: \"SoftmaxWithLoss\"\n-  bottom: \"loss2/classifier\"\n+  bottom: \"loss2a/classifier\"\n   bottom: \"label\"\n   top: \"loss2/loss\"\n   loss_weight: 0.3\n@@ -1742,7 +1742,7 @@\n layer {\n   name: \"loss2/top-1\"\n   type: \"Accuracy\"\n-  bottom: \"loss2/classifier\"\n+  bottom: \"loss2a/classifier\"\n   bottom: \"label\"\n   top: \"loss2/accuracy\"\n   include { stage: \"val\" }\n@@ -1750,7 +1750,7 @@\n layer {\n   name: \"loss2/top-5\"\n   type: \"Accuracy\"\n-  bottom: \"loss2/classifier\"\n+  bottom: \"loss2a/classifier\"\n   bottom: \"label\"\n   top: \"loss2/accuracy-top5\"\n   include { stage: \"val\" }\n@@ -2435,10 +2435,10 @@\n   }\n }\n layer {\n-  name: \"loss3/classifier\"\n+  name: \"loss3a/classifier\"\n   type: \"InnerProduct\"\n   bottom: \"pool5/7x7_s1\"\n-  top: \"loss3/classifier\"\n+  top: \"loss3a/classifier\"\n   param {\n     lr_mult: 1\n     decay_mult: 1\n@@ -2448,7 +2448,7 @@\n     decay_mult: 0\n   }\n   inner_product_param {\n-    num_output: 1000\n+    num_output: 2\n     weight_filler {\n       type: \"xavier\"\n     }\n@@ -2461,7 +2461,7 @@\n layer {\n   name: \"loss3/loss\"\n   type: \"SoftmaxWithLoss\"\n-  bottom: \"loss3/classifier\"\n+  bottom: \"loss3a/classifier\"\n   bottom: \"label\"\n   top: \"loss\"\n   loss_weight: 1\n@@ -2470,7 +2470,7 @@\n layer {\n   name: \"loss3/top-1\"\n   type: \"Accuracy\"\n-  bottom: \"loss3/classifier\"\n+  bottom: \"loss3a/classifier\"\n   bottom: \"label\"\n   top: \"accuracy\"\n   include { stage: \"val\" }\n@@ -2478,7 +2478,7 @@\n layer {\n   name: \"loss3/top-5\"\n   type: \"Accuracy\"\n-  bottom: \"loss3/classifier\"\n+  bottom: \"loss3a/classifier\"\n   bottom: \"label\"\n   top: \"accuracy-top5\"\n   include { stage: \"val\" }\n@@ -2489,7 +2489,7 @@\n layer {\n   name: \"softmax\"\n   type: \"Softmax\"\n-  bottom: \"loss3/classifier\"\n+  bottom: \"loss3a/classifier\"\n   top: \"softmax\"\n   include { stage: \"deploy\" }\n }\n```\n\nI’ve put the complete file in [src/googlenet-customized.prototxt](src/googlenet-customized.prototxt).\n\n\u003e Q: \"What about changes to the prototext definitions of these networks?\n\u003e We changed the fully connected layer name(s), and the number of categories.\n\u003e What else could, or should be changed, and in what circumstances?\"\n\nGreat question, and it's something I'm wondering, too.  For example, I know that we can\n[\"fix\" certain layers](https://github.com/BVLC/caffe/wiki/Fine-Tuning-or-Training-Certain-Layers-Exclusively)\nso the weights don't change.  Doing other things involves understanding how the layers work,\nwhich is beyond this guide, and also beyond its author at present!\n\nLike we did with fine tuning AlexNet, we also reduce the learning rate by\n10% from `0.01` to `0.001`.\n\n\u003e Q: \"What other changes would make sense when fine tuning these networks?\n\u003e What about different numbers of epochs, batch sizes, solver types (Adam, AdaDelta, AdaGrad, etc),\n\u003e learning rates, policies (Exponential Decay, Inverse Decay, Sigmoid Decay, etc),\n\u003e step sizes, and gamma values?\"\n\nGreat question, and one that I wonder about as well.  I only have a vague understanding of these\nand it’s likely that there are improvements we can make if you know how to alter these\nvalues when training.  This is something that needs better documentation.\n\nBecause GoogLeNet has a more complicated architecture than AlexNet, fine tuning it requires\nmore time.  On my laptop, it takes 10 minutes to retrain GoogLeNet with our dataset,\nachieving 100% accuracy and a loss of 0.0070:\n\n![Model Attempt 3](images/model-attempt3.png?raw=true \"Model Attempt 3\")\n\nJust as we saw with the fine tuned version of AlexNet, our modified GoogLeNet\nperforms amazing well--the best so far:\n\n![Model Attempt 3 Classify 1](images/model-attempt3-classify1.png?raw=true \"Model Attempt 3 Classify 1\")\n\n![Model Attempt 3 Classify 2](images/model-attempt3-classify2.png?raw=true \"Model Attempt 3 Classify 2\")\n\n![Model Attempt 3 Classify 3](images/model-attempt3-classify3.png?raw=true \"Model Attempt 3 Classify 3\")\n\n## Using our Model\n\nWith our network trained and tested, it’s time to download and use it.  Each of the models\nwe trained in DIGITS has a **Download Model** button, as well as a way to select different\nsnapshots within our training run (e.g., `Epoch #30`):\n\n![Trained Models](images/trained-models.png?raw=true \"Trained Models\")\n\nClicking **Download Model** downloads a `tar.gz` archive containing the following files:\n\n```\ndeploy.prototxt\nmean.binaryproto\nsolver.prototxt\ninfo.json\noriginal.prototxt\nlabels.txt\nsnapshot_iter_90.caffemodel\ntrain_val.prototxt\n```\n\nThere’s a [nice description](https://github.com/BVLC/caffe/wiki/Using-a-Trained-Network:-Deploy) in\nthe Caffe documentation about how to use the model we just built.  It says:\n\n\u003e A network is defined by its design (.prototxt), and its weights (.caffemodel). As a network is\n\u003e being trained, the current state of that network's weights are stored in a .caffemodel. With both\n\u003e of these we can move from the train/test phase into the production phase.\n\u003e\n\u003e In its current state, the design of the network is not designed for deployment. Before we can\n\u003e release our network as a product, we often need to alter it in a few ways:\n\u003e\n\u003e 1. Remove the data layer that was used for training, as for in the case of classification we are no longer providing labels for our data.\n\u003e 2. Remove any layer that is dependent upon data labels.\n\u003e 3. Set the network up to accept data.\n\u003e 4. Have the network output the result.\n\nDIGITS has already done the work for us, separating out the different versions of our `prototxt` files.\nThe files we’ll care about when using this network are:\n\n* `deploy.prototxt` - the definition of our network, ready for accepting image input data\n* `mean.binaryproto` - our model will need us to subtract the image mean from each image that it processes, and this is the mean image.\n* `labels.txt` - a list of our labels (`dolphin`, `seahorse`) in case we want to print them vs. just the category number\n* `snapshot_iter_90.caffemodel` - these are the trained weights for our network\n\nWe can use these files in a number of ways to classify new images.  For example, in our\n`CAFFE_ROOT` we can use `build/examples/cpp_classification/classification.bin` to classify one image:\n\n```bash\n$ cd $CAFFE_ROOT/build/examples/cpp_classification\n$ ./classification.bin deploy.prototxt snapshot_iter_90.caffemodel mean.binaryproto labels.txt dolphin1.jpg\n```\n\nThis will spit out a bunch of debug text, followed by the predictions for each of our two categories:\n\n```\n0.9997 - “dolphin”\n0.0003 - “seahorse”\n```\n\nYou can read the [complete C++ source](https://github.com/BVLC/caffe/tree/master/examples/cpp_classification)\nfor this in the [Caffe examples](https://github.com/BVLC/caffe/tree/master/examples).\n\nFor a classification version that uses the Python interface, DIGITS includes a [nice example](https://github.com/NVIDIA/DIGITS/tree/master/examples/classification).  There's also a fairly\n[well documented Python walkthrough](https://github.com/BVLC/caffe/blob/master/examples/00-classification.ipynb) in the Caffe examples.\n\n### Python example\n\nLet's write a program that uses our fine-tuned GoogLeNet model to classify the untrained images\nwe have in [data/untrained-samples](data/untrained-samples).  I've cobbled this together based on\nthe examples above, as well as the `caffe` [Python module's source](https://github.com/BVLC/caffe/tree/master/python),\nwhich you should prefer to anything I'm about to say.\n\nA full version of what I'm going to discuss is available in [src/classify-samples.py](src/classify-samples.py).\nLet's begin!\n\nFirst, we'll need the [NumPy](http://www.numpy.org/) module.  In a moment we'll be using [NumPy](http://www.numpy.org/)\nto work with [`ndarray`s](https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.html), which Caffe uses a lot.\nIf you haven't used them before, as I had not, you'd do well to begin by reading this\n[Quickstart tutorial](https://docs.scipy.org/doc/numpy-dev/user/quickstart.html).\n\nSecond, we'll need to load the `caffe` module from our `CAFFE_ROOT` dir.  If it's not already included\nin your Python environment, you can force it to load by adding it manually. Along with it we'll\nalso import caffe's protobuf module:\n\n```python\nimport numpy as np\n\ncaffe_root = '/path/to/your/caffe_root'\nsys.path.insert(0, os.path.join(caffe_root, 'python'))\nimport caffe\nfrom caffe.proto import caffe_pb2\n```\n\nNext we need to tell Caffe whether to [use the CPU or GPU](https://github.com/BVLC/caffe/blob/61944afd4e948a4e2b4ef553919a886a8a8b8246/python/caffe/_caffe.cpp#L50-L52).\nFor our experiments, the CPU is fine:\n\n```python\ncaffe.set_mode_cpu()\n```\n\nNow we can use `caffe` to load our trained network.  To do so, we'll need some of the files we downloaded\nfrom DIGITS, namely:\n\n* `deploy.prototxt` - our \"network file\", the description of the network.\n* `snapshot_iter_90.caffemodel` - our trained \"weights\"\n\nWe obviously need to provide the full path, and I'll assume that my files are in a dir called `model/`:\n\n```python\nmodel_dir = 'model'\ndeploy_file = os.path.join(model_dir, 'deploy.prototxt')\nweights_file = os.path.join(model_dir, 'snapshot_iter_90.caffemodel')\nnet = caffe.Net(deploy_file, caffe.TEST, weights=weights_file)\n```\n\nThe `caffe.Net()` [constructor](https://github.com/BVLC/caffe/blob/61944afd4e948a4e2b4ef553919a886a8a8b8246/python/caffe/_caffe.cpp#L91-L117)\ntakes a network file, a phase (`caffe.TEST` or `caffe.TRAIN`), as well as an optional weights filename.  When\nwe provide a weights file, the `Net` will automatically load them for us. The `Net` has a number of\n[methods and attributes](https://github.com/BVLC/caffe/blob/master/python/caffe/pycaffe.py) you can use.\n\n**Note:** There is also a [deprecated version of this constructor](https://github.com/BVLC/caffe/blob/61944afd4e948a4e2b4ef553919a886a8a8b8246/python/caffe/_caffe.cpp#L119-L134),\nwhich seems to get used often in sample code on the web. It looks like this, in case you encounter it:\n\n```python\nnet = caffe.Net(str(deploy_file), str(model_file), caffe.TEST)\n```\n\nWe're interested in loading images of various sizes into our network for testing. As a result,\nwe'll need to *transform* them into a shape that our network can use (i.e., colour, 256x256).\nCaffe provides the [`Transformer` class](https://github.com/BVLC/caffe/blob/61944afd4e948a4e2b4ef553919a886a8a8b8246/python/caffe/io.py#L98)\nfor this purpose.  We'll use it to create a transformation appropriate for our images/network:\n\n```python\ntransformer = caffe.io.Transformer({'data': net.blobs['data'].data.shape})\n# set_transpose: https://github.com/BVLC/caffe/blob/61944afd4e948a4e2b4ef553919a886a8a8b8246/python/caffe/io.py#L187\ntransformer.set_transpose('data', (2, 0, 1))\n# set_raw_scale: https://github.com/BVLC/caffe/blob/61944afd4e948a4e2b4ef553919a886a8a8b8246/python/caffe/io.py#L221\ntransformer.set_raw_scale('data', 255)\n# set_channel_swap: https://github.com/BVLC/caffe/blob/61944afd4e948a4e2b4ef553919a886a8a8b8246/python/caffe/io.py#L203\ntransformer.set_channel_swap('data', (2, 1, 0))\n```\n\nWe can also use the `mean.binaryproto` file DIGITS gave us to set our transformer's mean:\n\n```python\n# This code for setting the mean from https://github.com/NVIDIA/DIGITS/tree/master/examples/classification\nmean_file = os.path.join(model_dir, 'mean.binaryproto')\nwith open(mean_file, 'rb') as infile:\n    blob = caffe_pb2.BlobProto()\n    blob.MergeFromString(infile.read())\n    if blob.HasField('shape'):\n        blob_dims = blob.shape\n        assert len(blob_dims) == 4, 'Shape should have 4 dimensions - shape is %s' % blob.shape\n    elif blob.HasField('num') and blob.HasField('channels') and \\\n            blob.HasField('height') and blob.HasField('width'):\n        blob_dims = (blob.num, blob.channels, blob.height, blob.width)\n    else:\n        raise ValueError('blob does not provide shape or 4d dimensions')\n    pixel = np.reshape(blob.data, blob_dims[1:]).mean(1).mean(1)\n    transformer.set_mean('data', pixel)\n```\n\nIf we had a lot of labels, we might also choose to read in our labels file, which we can use\nlater by looking up the label for a probability using its position (e.g., 0=dolphin, 1=seahorse):\n\n```python\nlabels_file = os.path.join(model_dir, 'labels.txt')\nlabels = np.loadtxt(labels_file, str, delimiter='\\n')\n``` \n\nNow we're ready to classify an image.  We'll use [`caffe.io.load_image()`](https://github.com/BVLC/caffe/blob/61944afd4e948a4e2b4ef553919a886a8a8b8246/python/caffe/io.py#L279)\nto read our image file, then use our transformer to reshape it and set it as our network's data layer:\n\n```python\n# Load the image from disk using caffe's built-in I/O module\nimage = caffe.io.load_image(fullpath)\n# Preprocess the image into the proper format for feeding into the model\nnet.blobs['data'].data[...] = transformer.preprocess('data', image)\n```\n\n\u003e Q: \"How could I use images (i.e., frames) from a camera or video stream instead of files?\"\n\nGreat question, here's a skeleton to get you started:\n\n```python\nimport cv2\n...\n# Get the shape of our input data layer, so we can resize the image\ninput_shape = net.blobs['data'].data.shape\n...\nwebCamCap = cv2.VideoCapture(0) # could also be a URL, filename\nif webCamCap.isOpened():\n    rval, frame = webCamCap.read()\nelse:\n    rval = False\n\nwhile rval:\n    rval, frame = webCamCap.read()\n    net.blobs['data'].data[...] = transformer.preprocess('data', frame)\n    ...\n\nwebCamCap.release()\n```\n\nBack to our problem, we next need to run the image data through our network and read out\nthe probabilities from our network's final `'softmax'` layer, which will be in order by label category:\n\n```python\n# Run the image's pixel data through the network\nout = net.forward()\n# Extract the probabilities of our two categories from the final layer\nsoftmax_layer = out['softmax']\n# Here we're converting to Python types from ndarray floats\ndolphin_prob = softmax_layer.item(0)\nseahorse_prob = softmax_layer.item(1)\n\n# Print the results. I'm using labels just to show how it's done\nlabel = labels[0] if dolphin_prob \u003e seahorse_prob else labels[1]\nfilename = os.path.basename(fullpath)\nprint '%s is a %s dolphin=%.3f%% seahorse=%.3f%%' % (filename, label, dolphin_prob*100, seahorse_prob*100)\n```\n\nRunning the full version of this (see [src/classify-samples.py](src/classify-samples.py)) using our\nfine-tuned GoogLeNet network on our [data/untrained-samples](data/untrained-samples) images gives\nme the following output:\n\n```\n[...truncated caffe network output...]\ndolphin1.jpg is a dolphin dolphin=99.968% seahorse=0.032%\ndolphin2.jpg is a dolphin dolphin=99.997% seahorse=0.003%\ndolphin3.jpg is a dolphin dolphin=99.943% seahorse=0.057%\nseahorse1.jpg is a seahorse dolphin=0.365% seahorse=99.635%\nseahorse2.jpg is a seahorse dolphin=0.000% seahorse=100.000%\nseahorse3.jpg is a seahorse dolphin=0.014% seahorse=99.986%\n```\n\nI'm still trying to learn all the best practices for working with models in code. I wish I had more\nand better documented code examples, APIs, premade modules, etc to show you here. To be honest,\nmost of the code examples I’ve found are terse, and poorly documented--Caffe’s\ndocumentation is spotty, and assumes a lot.\n\nIt seems to me like there’s an opportunity for someone to build higher-level tools on top of the\nCaffe interfaces for beginners and basic workflows like we've done here.  It would be great if\nthere were more simple modules in high-level languages that I could point you at that “did the\nright thing” with our model; someone could/should take this on, and make *using* Caffe\nmodels as easy as DIGITS makes *training* them.  I’d love to have something I could use in node.js,\nfor example.  Ideally one shouldn’t be required to know so much about the internals of the model or Caffe.\nI haven’t used it yet, but [DeepDetect](https://deepdetect.com/) looks interesting on this front,\nand there are likely many other tools I don’t know about.\n\n## Results\n\nAt the beginning we said that our goal was to write a program that used a neural network to\ncorrectly classify all of the images in [data/untrained-samples](data/untrained-samples).\nThese are images of dolphins and seahorses that were never used in the training or validation\ndata:\n\n### Untrained Dolphin Images\n\n![Dolphin 1](data/untrained-samples/dolphin1.jpg?raw=true \"Dolphin 1\")\n![Dolphin 2](data/untrained-samples/dolphin2.jpg?raw=true \"Dolphin 2\")\n![Dolphin 3](data/untrained-samples/dolphin3.jpg?raw=true \"Dolphin 3\")\n\n### Untrained Seahorse Images\n\n![Seahorse 1](data/untrained-samples/seahorse1.jpg?raw=true \"Seahorse 1\")\n![Seahorse 2](data/untrained-samples/seahorse2.jpg?raw=true \"Seahorse 2\")\n![Seahorse 3](data/untrained-samples/seahorse3.jpg?raw=true \"Seahorse 3\")\n\nLet's look at how each of our three attempts did with this challenge:\n\n### Model Attempt 1: AlexNet from Scratch (3rd Place)\n\n| Image | Dolphin | Seahorse | Result | \n|-------|---------|----------|--------|\n|[dolphin1.jpg](data/untrained-samples/dolphin1.jpg)| 71.11% | 28.89% | :expressionless: |\n|[dolphin2.jpg](data/untrained-samples/dolphin2.jpg)| 99.2% | 0.8% | :sunglasses: |\n|[dolphin3.jpg](data/untrained-samples/dolphin3.jpg)| 63.3% | 36.7% | :confused: |\n|[seahorse1.jpg](data/untrained-samples/seahorse1.jpg)| 95.04% | 4.96% | :disappointed: |\n|[seahorse2.jpg](data/untrained-samples/seahorse2.jpg)| 56.64% | 43.36 |  :confused: |\n|[seahorse3.jpg](data/untrained-samples/seahorse3.jpg)| 7.06% | 92.94% |  :grin: |\n\n### Model Attempt 2: Fine Tuned AlexNet (2nd Place)\n\n| Image | Dolphin | Seahorse | Result | \n|-------|---------|----------|--------|\n|[dolphin1.jpg](data/untrained-samples/dolphin1.jpg)| 99.1% | 0.09% |  :sunglasses: |\n|[dolphin2.jpg](data/untrained-samples/dolphin2.jpg)| 99.5% | 0.05% |  :sunglasses: |\n|[dolphin3.jpg](data/untrained-samples/dolphin3.jpg)| 91.48% | 8.52% |  :grin: |\n|[seahorse1.jpg](data/untrained-samples/seahorse1.jpg)| 0% | 100% |  :sunglasses: |\n|[seahorse2.jpg](data/untrained-samples/seahorse2.jpg)| 0% | 100% |  :sunglasses: |\n|[seahorse3.jpg](data/untrained-samples/seahorse3.jpg)| 0% | 100% |  :sunglasses: |\n\n### Model Attempt 3: Fine Tuned GoogLeNet (1st Place)\n\n| Image | Dolphin | Seahorse | Result | \n|-------|---------|----------|--------|\n|[dolphin1.jpg](data/untrained-samples/dolphin1.jpg)| 99.86% | 0.14% |  :sunglasses: |\n|[dolphin2.jpg](data/untrained-samples/dolphin2.jpg)| 100% | 0% |  :sunglasses: |\n|[dolphin3.jpg](data/untrained-samples/dolphin3.jpg)| 100% | 0% |  :sunglasses: |\n|[seahorse1.jpg](data/untrained-samples/seahorse1.jpg)| 0.5% | 99.5% |  :sunglasses: |\n|[seahorse2.jpg](data/untrained-samples/seahorse2.jpg)| 0% | 100% |  :sunglasses: |\n|[seahorse3.jpg](data/untrained-samples/seahorse3.jpg)| 0.02% | 99.98% |  :sunglasses: |\n\n## Conclusion\n\nIt’s amazing how well our model works, and what’s possible by fine tuning a pretrained network.\nObviously our dolphin vs. seahorse example is contrived, and the dataset overly limited--we really\ndo want more and better data if we want our network to be robust.  But since our goal was to examine\nthe tools and workflows of neural networks, it’s turned out to be an ideal case, especially since it\ndidn’t require expensive equipment or massive amounts of time.\n\nAbove all I hope that this experience helps to remove the overwhelming fear of getting started.\nDeciding whether or not it’s worth investing time in learning the theories of machine learning and\nneural networks is easier when you’ve been able to see it work in a small way.  Now that you’ve got\na setup and a working approach, you can try doing other sorts of classifications.  You might also look\nat the other types of things you can do with Caffe and DIGITS, for example, finding objects within an\nimage, or doing segmentation.\n\nHave fun with machine learning!\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhumphd%2Fhave-fun-with-machine-learning","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhumphd%2Fhave-fun-with-machine-learning","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhumphd%2Fhave-fun-with-machine-learning/lists"}