{"id":13936640,"url":"https://github.com/deepgram/kur","last_synced_at":"2025-05-15T18:06:11.695Z","repository":{"id":62574816,"uuid":"74182569","full_name":"deepgram/kur","owner":"deepgram","description":"Descriptive Deep Learning","archived":false,"fork":false,"pushed_at":"2024-02-05T20:47:29.000Z","size":1873,"stargazers_count":821,"open_issues_count":17,"forks_count":107,"subscribers_count":55,"default_branch":"master","last_synced_at":"2025-03-31T21:49:08.526Z","etag":null,"topics":["deep-learning","deep-learning-tutorial","deep-neural-networks","image-recognition","machine-learning","neural-network","neural-networks","speech-recognition","speech-to-text"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/deepgram.png","metadata":{"files":{"readme":"README.rst","changelog":"CHANGES.rst","contributing":"CONTRIBUTING.rst","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2016-11-19T02:42:09.000Z","updated_at":"2025-03-26T18:36:20.000Z","dependencies_parsed_at":"2024-11-30T17:01:19.390Z","dependency_job_id":"4cc13af9-b76a-4190-a950-4d14643b7acf","html_url":"https://github.com/deepgram/kur","commit_stats":null,"previous_names":[],"tags_count":12,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/deepgram%2Fkur","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/deepgram%2Fkur/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/deepgram%2Fkur/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/deepgram%2Fkur/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/deepgram","download_url":"https://codeload.github.com/deepgram/kur/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247737788,"owners_count":20987721,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["deep-learning","deep-learning-tutorial","deep-neural-networks","image-recognition","machine-learning","neural-network","neural-networks","speech-recognition","speech-to-text"],"created_at":"2024-08-07T23:02:52.754Z","updated_at":"2025-04-07T22:10:18.213Z","avatar_url":"https://github.com/deepgram.png","language":"Python","funding_links":[],"categories":["Python","Table of Contents"],"sub_categories":[],"readme":".. |LICENSE| image:: https://img.shields.io/badge/license-Apache%202-blue.svg\n   :target: https://github.com/deepgram/kur/blob/master/LICENSE\n.. |PYTHON| image:: https://img.shields.io/badge/python-3.4%2C3.5%2C3.6-lightgrey.svg\n   :target: https://kur.deepgram.com/installing.html\n.. |BUILD| image:: https://travis-ci.org/deepgram/kur.svg?branch=master\n   :target: https://travis-ci.org/deepgram/kur\n.. |GITTER| image:: https://badges.gitter.im/deepgram-kur/Lobby.svg\n   :target: https://gitter.im/deepgram-kur/Lobby\n\n.. _Facebook: https://www.facebook.com/sharer/sharer.php?u=https%3A//kur.deepgram.com\n.. _Google+: https://plus.google.com/share?url=https%3A//kur.deepgram.com\n.. _LinkedIn: https://www.linkedin.com/shareArticle?mini=true\u0026url=https%3A//kur.deepgram.com\u0026title=Kur%20-%20descriptive%20deep%20learning\u0026summary=Kur%20is%20the%20future%20of%20deep%20learning%3A%20advanced%20AI%20without%20programming!\u0026source=\n.. _Twitter: https://twitter.com/home?status=%40DeepgramAI%20has%20released%20the%20future%20of%20deep%20learning.%20https%3A//kur.deepgram.com%20%23Kur\n\n.. image:: https://kur.deepgram.com/images/logo-small.png\n   :align: center\n   :target: https://deepgram.com\n\n.. package_readme_starts_here\n\n.. _Tutorial: https://kur.deepgram.com/tutorial.html\n\n******************************\nKur: Descriptive Deep Learning\n******************************\n\n.. package_readme_ignore\n\n|BUILD| |LICENSE| |PYTHON| |GITTER|\n\nIntroduction\n============\n\nWelcome to Kur! You've found the future of deep learning!\n\n- Install Kur easily with ``pip install kur``.\n- Design, train, and evaluate models *without ever needing to code*.\n- Describe your model with easily understandable concepts.\n- Quickly explore better versions of your model with the power of the `Jinja2\n  \u003chttp://jinja.pocoo.org\u003e`_ templating engine.\n- Supports Theano, TensorFlow, and PyTorch, and supports **multi-GPU**\n  out-of-the-box.\n- **COMING SOON**: Share your models with the community, making it incredibly\n  easy to collaborate on sophisticated models.\n\nGo ahead and give it a whirl: `Get the Code`_ and then jump into\nthe `Examples`_! Then build your own model in our Tutorial_. Remember to check\nout our `homepage \u003chttps://kur.deepgram.com\u003e`_ for complete documentation and\nthe newest news.\n\n.. package_readme_ignore\n\nLike us? Share!\n\n.. package_readme_ignore\n\n- Facebook_\n- `Google+`_\n- LinkedIn_\n- Twitter_\n\nWhat is Kur?\n------------\n\nKur is a system for quickly building and applying state-of-the-art deep\nlearning models to new and exciting problems. Kur was designed to appeal to the\nentire machine learning community, from novices to veterans. It uses\nspecification files that are simple to read and author, meaning that you can\nget started building sophisticated models *without ever needing to code*. Even\nso, Kur exposes a friendly and extensible API to support advanced deep learning\narchitectures or workflows. Excited? Jump straight into the `Examples`_.\n\n.. _get_the_code:\n\nGet the Code\n============\n\nKur is really easy to install! You can pick either one of these two options for\ninstalling Kur.\n\n**NOTE**: Kur requires **Python 3.4** or greater. Take a look at our\n`installation guide \u003chttps://kur.deepgram.com/install.html\u003e`_ for\nstep-by-step instructions for installing Kur and setting up a `virtual\nenvironment \u003chttps://virtualenv.pypa.io/\u003e`_.\n\nLatest Pip Release\n------------------\n\nIf you know what you are doing, then this is easy:\n\n.. code-block:: bash\n\n\tpip install kur\n\nLatest Development Release\n--------------------------\n\nJust check it out and run the setup script:\n\n.. code-block:: bash\n\n\tgit clone https://github.com/deepgram/kur\n\tcd kur\n\tpip install .\n\n**Quick Start**: Or, if you already have `Python 3 installed\n\u003chttps://kur.deepgram.com/installing.html\u003e`_, then here's a few quick-start\nlines to get you training your first model:\n\n**Quick Start For Using pip:**\n\n.. code-block:: bash\n\n\tpip install virtualenv                      # Make sure virtualenv is present\n\tvirtualenv -p $(which python3) ~/kur-env    # Create a Python 3 environment for Kur\n\t. ~/kur-env/bin/activate                    # Activate the Kur environment\n\tpip install kur                             # Install Kur\n\tkur --version                               # Check that everything works\n\tgit clone https://github.com/deepgram/kur   # Get the examples\n\tcd kur/examples                             # Change directories\n\tkur train mnist.yml                         # Start training!\n\n**Quick Start For Using git:**\n\n.. code-block:: bash\n\n\tpip install virtualenv                      # Make sure virtualenv is present\n\tvirtualenv -p $(which python3) ~/kur-env    # Create a Python 3 environment for Kur\n\t. ~/kur-env/bin/activate                    # Activate the Kur environment\n\tgit clone https://github.com/deepgram/kur   # Check out the latest code\n\tcd kur                                      # Change directories\n\tpip install .                               # Install Kur\n\tkur --version                               # Check that everything works\n\tcd examples                                 # Change directories\n\tkur train mnist.yml                         # Start training!\n\nUsage\n-----\n\nIf everything has gone well, you shoud be able to use Kur:\n\n.. code-block:: bash\n\n\tkur --version\n\nYou'll typically be using Kur in commands like ``kur train model.yml`` or ``kur\ntest model.yml``. You'll see these in the `Examples`_, which is\nwhere you should head to next!\n\nTroubleshooting\n---------------\n\nIf you run into any problems installing or using Kur, please check out our\n`troubleshooting \u003chttps://kur.deepgram.com/troubleshooting.html\u003e`_ page for\nlots of useful help. And if you want more detailed installation instructions,\nwith help on setting up your environment, before sure to see our `installation\n\u003chttps://kur.deepgram.com/installing.html\u003e`_ page.\n\n.. package_readme_ends_here\n\n.. _the_examples:\n\nExamples\n********\n\nLet's look at some examples of how fun and easy Kur makes state-of-the-art deep\nlearning.\n\n.. _mnist_example:\n\nMNIST: Handwriting recognition\n==============================\n\nLet's jump right in and see how awesome Kur is! The first example we'll look at\nis Yann LeCun's `MNIST \u003chttp://yann.lecun.com/exdb/mnist/\u003e`_ dataset. This is a\ndataset of 28x28 pixel images of individual handwritten digits between 0 and 9.\nThe goal of our model will be to perform image recognition, tagging the image\nwith the most likely digit it represents.\n\n**NOTE**: As with most command line examples, lines preceded by ``$`` are lines\nthat you are supposed to type (followed by the ``ENTER`` key). Lines without an\ninitial ``$`` are lines which are printed to the screen (you don't type them).\n\nFirst, you need to `Get the Code`_! If you installed via\n``pip``, you'll need to checkout the ``examples`` directory from the\nrepository, like this:\n\n.. code-block:: bash\n\n\tgit clone https://github.com/deepgram/kur\n\tcd kur/examples\n\nIf you installed via ``git``, then you alreay have the ``examples`` directory\nlocally, so just move into the example directory:\n\n.. code-block:: bash\n\n\t$ cd examples\n\nNow let's train the MNIST model. This will download the data directly from the\nweb, and then start training for 10 epochs.\n\n.. code-block:: bash\n\n\t$ kur train mnist.yml\n\tDownloading: 100%|█████████████████████████████████| 9.91M/9.91M [03:44\u003c00:00, 44.2Kbytes/s]\n\tDownloading: 100%|█████████████████████████████████| 28.9K/28.9K [00:00\u003c00:00, 66.1Kbytes/s]\n\tDownloading: 100%|█████████████████████████████████| 1.65M/1.65M [00:31\u003c00:00, 52.6Kbytes/s]\n\tDownloading: 100%|█████████████████████████████████| 4.54K/4.54K [00:00\u003c00:00, 19.8Kbytes/s]\n\n\tEpoch 1/10, loss=1.524: 100%|███████████████████████| 480/480 [00:02\u003c00:00, 254.97samples/s]\n\tValidating, loss=0.829: 100%|█████████████████████| 3200/3200 [00:03\u003c00:00, 889.91samples/s]\n\n\tEpoch 2/10, loss=0.628: 100%|███████████████████████| 480/480 [00:02\u003c00:00, 228.25samples/s]\n\tValidating, loss=0.533: 100%|████████████████████| 3200/3200 [00:03\u003c00:00, 1046.12samples/s]\n\n\tEpoch 3/10, loss=0.547: 100%|███████████████████████| 480/480 [00:02\u003c00:00, 185.77samples/s]\n\tValidating, loss=0.491: 100%|████████████████████| 3200/3200 [00:03\u003c00:00, 1030.57samples/s]\n\n\tEpoch 4/10, loss=0.488: 100%|███████████████████████| 480/480 [00:02\u003c00:00, 225.42samples/s]\n\tValidating, loss=0.443: 100%|████████████████████| 3200/3200 [00:03\u003c00:00, 1046.23samples/s]\n\n\tEpoch 5/10, loss=0.464: 100%|███████████████████████| 480/480 [00:03\u003c00:00, 115.17samples/s]\n\tValidating, loss=0.403: 100%|█████████████████████| 3200/3200 [00:04\u003c00:00, 799.46samples/s]\n\n\tEpoch 6/10, loss=0.486: 100%|███████████████████████| 480/480 [00:03\u003c00:00, 183.11samples/s]\n\tValidating, loss=0.400: 100%|████████████████████| 3200/3200 [00:02\u003c00:00, 1134.17samples/s]\n\n\tEpoch 7/10, loss=0.369: 100%|███████████████████████| 480/480 [00:02\u003c00:00, 214.10samples/s]\n\tValidating, loss=0.366: 100%|█████████████████████| 3200/3200 [00:04\u003c00:00, 735.61samples/s]\n\n\tEpoch 8/10, loss=0.353: 100%|███████████████████████| 480/480 [00:03\u003c00:00, 204.33samples/s]\n\tValidating, loss=0.351: 100%|████████████████████| 3200/3200 [00:02\u003c00:00, 1147.05samples/s]\n\n\tEpoch 9/10, loss=0.399: 100%|███████████████████████| 480/480 [00:02\u003c00:00, 219.17samples/s]\n\tValidating, loss=0.343: 100%|████████████████████| 3200/3200 [00:02\u003c00:00, 1149.07samples/s]\n\n\tEpoch 10/10, loss=0.307: 100%|██████████████████████| 480/480 [00:02\u003c00:00, 220.97samples/s]\n\tValidating, loss=0.324: 100%|████████████████████| 3200/3200 [00:02\u003c00:00, 1142.78samples/s]\n\nWhat just happened? Kur downloaded the MNIST dataset from LeCun's website, and\nthen trained a model for ten epochs. Awesome!\n\nNow let's see how well our model actually performs:\n\n.. code-block:: bash\n\n\t$ kur evaluate mnist.yml\n\tEvaluating: 100%|██████████████████████████████| 10000/10000 [00:06\u003c00:00, 1537.74samples/s]\n\tLABEL     CORRECT   TOTAL     ACCURACY  \n\t0         969       980        98.9%\n\t1         1118      1135       98.5%\n\t2         910       1032       88.2%\n\t3         926       1010       91.7%\n\t4         923       982        94.0%\n\t5         735       892        82.4%\n\t6         871       958        90.9%\n\t7         884       1028       86.0%\n\t8         818       974        84.0%\n\t9         868       1009       86.0%\n\tALL       9022      10000      90.2%\n\nWow! Across the board, we already have 90% accuracy for recognizing\nhandwritten digits, and we only used 0.8% of the training set! That's how\nawesome Kur is.\n\nExcited yet? Read on!\n\n**NOTE**: Clever readers will notice that each training epoch only used 480\ntraining samples. But MNIST provides 60,000 training samples total, so what\ngives?  Simple: lots of us are running this code on consumer hardware; in fact,\nI'm running this example on my tiny ultrabook on an Intel Core m7 CPU. As\nyou'll see in `Under the Hood`_, I truncate the training process to only train\non 10 batches of 32 samples each, just to make the training loop finish in a\nreasonable amount of time. It's not cheating: you still get 90% accuracy! But\nif you have awesome hardware, or just want to see how good your accuracy can\nget, then by all means read on and we'll show you how to modify that.\n\nUnder the Hood\n--------------\n\nSo what exactly is going on here? Let's take a look at the MNIST example\nspecification file:\n\n.. code-block:: yaml\n\n\ttrain:\n\t  data:\n\t    - mnist:\n\t        images:\n\t          url: \"http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz\"\n\t        labels:\n\t          url: \"http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz\"\n\n\tmodel:\n\t  - input: images\n\t  - convolution:\n\t      kernels: 64\n\t      size: [3, 3]\n\t  - activation: relu\n\t  - flatten:\n\t  - dense: 10\n\t  - activation: softmax\n\t    name: labels\n\n\tinclude: mnist-defaults.yml\n\nThis is just plain, old `YAML \u003chttp://yaml.org\u003e`_, a markup language meant to\nbe easy for humans to interpret (for a good overview of YAML language features,\nlook at the `Ansible overview\n\u003chttps://docs.ansible.com/ansible/YAMLSyntax.html\u003e`_).\n\nThere's a section to put the data. That's this:\n\n.. code-block:: yaml\n\n\ttrain:\n\t  data:\n\t    - mnist:\n\t        images:\n\t          url: \"http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz\"\n\t        labels:\n\t          url: \"http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz\"\n\nAnd then there's a spot to define your model:\n\n.. code-block:: yaml\n\n\tmodel:\n\t  - input: images\n\t  - convolution:\n\t      kernels: 64\n\t      size: [3, 3]\n\t  - activation: relu\n\t  - flatten:\n\t  - dense: 10\n\t  - activation: softmax\n\t    name: labels\n\nAnd there is an \"include\" part that just contains some default settings\n(advanced users might want to tweak these---don't worry, it's still simple):\n\n.. code-block:: yaml\n\n\tinclude: mnist-defaults.yml\n\nVery simple! Kur downloaded our data directly from LeCun's website for us,\nthat's easy. But what goes into in a Kur model? Just a nice, gentle list of\nthings you want your deep learning model to do. Let's break it down:\n\n- We have an ``input`` called ``images`` (yep, it's the same ``images`` from our\n  ``train`` section).\n- We pass the input to a ``convolution`` layer.\n- We add a regularized linear unit (\"ReLU\") activation.\n- We collapse (``flatten``) the high-dimensional output of a convolution into a\n  nice, flat, 1-dimensional shape appropriate for sending into the\n  fully-connected layers.\n- We add a fully-connected (``dense``) layer with 10 outputs.\n- We add a softmax activation (appropriate for classification tasks like MNIST),\n  and mark it as producing labels (``name: labels``).\n\nAnd that's it! It's pretty naïve: one convolution + activation +\nfully-connected + activation.  But it works: we got 90% accuracy after only\nshowing it a small subset of the training set.\n\nBut let's think about make it more complicated. What if we want two\nconvolutional layers instead? Easy! Just add another ``convolution`` section to\nthe model.  We'll also add in another non-linearity (ReLU activation) between\nthe two convolutions.\n\n.. code-block:: yaml\n\n\tmodel:\n\t  - input: images\n\t  - convolution:\n\t      kernels: 64\n\t      size: [3, 3]\n\t  - activation: relu\n\t  - convolution:\n\t      kernels: 64\n\t      size: [3, 3]\n\t  - activation: relu\n\t  - flatten:\n\t  - dense: 10\n\t  - activation: softmax\n\t    name: labels\n\nWe can also add more dense (fully-connected) layers. You probably want them\nseparated by activation layers, too. So if we add a 32-node fully-connected\nlayer to our model, it now looks like this:\n\n.. code-block:: yaml\n\n\tmodel:\n\t  - input: images\n\t  - convolution:\n\t      kernels: 64\n\t      size: [3, 3]\n\t  - activation: relu\n\t  - convolution:\n\t      kernels: 64\n\t      size: [3, 3]\n\t  - activation: relu\n\t  - flatten:\n\t  - dense: 32\n\t  - activation: relu\n\t  - dense: 10\n\t  - activation: softmax\n\t    name: labels\n\nLet's give it a try! Save your changes, a just run the same ``kur train\nmnist.yml`` and ``kur evaluate mnist.yml`` commands from before.\n\n**NOTE**: A more complex model will likely need more data. So be sure to look\nat the tip in `More Advanced Things`_ to train on more of the data set.\n\nIf you want to know more, the YAML specification that Kur uses is described in\ngreater detail in our `Using Kur\n\u003chttps://kur.deepgram.com/getting_started.html\u003e`_ page.\n\n.. _more_advanced_things:\n\nMore Advanced Things\n--------------------\n\nThe one line in the ``mnist.yml`` specification that we didn't cover is the\n``include: mnist-defaults.yml`` line. This is just a convenient way for us to\nseparate out the default behavior of the MNIST example.\n\nIf you tweak this file, probably the big thing you want to remove is the\n``num_batches: 10`` line, which is what limits training to just the first 10\nbatches every epoch. Just delete the line or comment it out, and Kur will train\non the whole dataset.\n\nA Better MNIST\n--------------\n\n90% is pretty good! But can we do better? Absolutely! Let's see how.\n\nWe need to build a more expressive, deeper model. We will use more\nconvolutional layers, with occassional pooling layers. \n\n.. code-block:: yaml\n\n\tmodel:\n\t  - input: images\n\n\t  - convolution:\n\t      kernels: 64\n\t      size: [3, 3]\n\t  - activation: relu\n\n\t  - convolution:\n\t      kernels: 96\n\t      size: [3, 3]\n\t  - activation: relu\n\n\t  - pool: [3, 3]\n\n\t  - convolution:\n\t      kernels: 96\n\t      size: [3, 3]\n\t  - activation: relu\n\n\t  - flatten:\n\t  - dense: [64, 10]\n\n\t  - activation: softmax\n\t    name: labels\n\nSo we have three convolutions with a 3-by-3 pooling layer in the middle, and\ntwo fully-connected layers.  Try training this model: ``kur train mnist.yml``.\nThen evaluate it to see how it does: ``kur eval mnist.yml``. We got better than\n95% *by training on only 0.8% of the training set*.\n\nWhat happens if we give it more data? Like we `mentioned above`__, we can\nadjust the amount of data we give Kur by twiddling the ``num_batches`` entry in\nthe ``train`` section of ``mnist-defaults.yml``. Let's try using 5% of the\ndataset.  To do this, we'll set ``num_batches: 94`` (because 5% of 60,000 is\n3000, and for the default batch size of 32, this comes out to about 94\nbatches). Now try training and evaluating again. We got almost 98%!\n\n__ more_advanced_things_\n\nDon't stop now, let's train on the whole thing (just remove the ``num_batches``\nline altogether, or set ``num_batches: null``). Still training only 10 epochs,\nwe got 98.6%. Wow. Let's compare this to state of the art, which Yann LeCun\ntracks on the `MNIST website \u003chttp://yann.lecun.com/exdb/mnist/\u003e`_. It looks\nlike the best error rate also uses convolutions and achieved a 0.23% error rate\n(so 99.77% accuracy). With just a couple tweaks, we are already only a percent\naway from the world's best. Kur rocks.\n\n.. _cifar_10:\n\nCIFAR-10: Image Classification\n==============================\n\nOkay, MNIST was pretty cool, but Kur can do much, much more. Imagine if you\nwanted to have an arbitrary number of convolution layers. Imagine if each\nconvolution should have a different number of kernels. Imagine if you truly\nwant *flexibility*. You've come to the right place.\n\nFlexibility: Variables\n----------------------\n\nKur uses an *engine* to determine how do variable substitution. `Jinja2\n\u003chttp://jinja.pocoo.org\u003e`_ is the default templating engine, and it is very\npowerful and extensible. Let's see how to use it!\n\nLet's look at the `CIFAR-10 \u003chttps://www.cs.toronto.edu/~kriz/cifar.html\u003e`_\ndataset. This is a image classification dataset of small 32 by 32 pixel color\n(RGB) images, each with one of ten classes (airplane, automobile, bird, cat,\ndeer, dog, frog, horse, ship, truck). You might decide to start with a very\nsimilar model to the MNIST example:\n\n.. code-block:: yaml\n\n\tmodel:\n\t  - input: images\n\t  - convolution:\n\t      kernels: 64\n\t      size: [3, 3]\n\t  - activation: relu\n\t  - flatten:\n\t  - dense: 10\n\t  - activation: softmax\n\t    name: labels\n\nWe will start with a simple modification: let's make the convolution `size` a\nvariable, so we can easily change it later. We can do it like this:\n\n.. code-block:: yaml\n\n\tsettings:\n\t  cnn:\n\t    size: [3, 3]\n\n\tmodel:\n\t  - input: images\n\t  - convolution:\n\t      kernels: 64\n\t      size: \"{{ cnn.size }}\"\n\t  - activation: relu\n\t  - flatten:\n\t  - dense: 10\n\t  - activation: softmax\n\t    name: labels\n\nOkay, what just happened? First, we added a ``settings:`` section. This section\nis the appropriate place to declare variables, settings, and hyperparameters\nthat will be used by the model (or for training, evaluation, etc.). We declared\na variable named ``cnn`` with a nested ``size`` variable. In Python, this would\nbe equivalent to a dictionary: ``{\"cnn\": {\"size\": [3, 3]}}``.\n\nThen we used the variable in the model's convolution layer: ``size: \"{{\ncnn.size }}\"``.  This is standard Jinja2 grammar. The double-brackets indicate\nthat variable substitution should take place (without the brackets, we would\naccidently assign ``size`` to the literal string \"cnn.size\", which doesn't make\nsense). The variable we grab is ``cnn.size``, corresponding to the variables we\nadded in the ``settings`` section.\n\nCool! So we can use variables now. But how does that help us? It seems like we\njust made it more complicated. Well, let's imagine if we added another\nconvolution layer. We already know how to add extra convolutions by just adding\nanother `convolution` block (and usually you want another `activation: relu`\nlayer, too). So this would look like:\n\n.. code-block:: yaml\n\n\tsettings:\n\t  cnn:\n\t    size: [3, 3]\n\n\tmodel:\n\t  - input: images\n\t  - convolution:\n\t      kernels: 64\n\t      size: \"{{ cnn.size }}\"\n\t  - activation: relu\n\t  - convolution:\n\t      kernels: 64\n\t      size: \"{{ cnn.size }}\"\n\t  - activation: relu\n\t  - flatten:\n\t  - dense: 10\n\t  - activation: softmax\n\t    name: labels\n\nAh! So now we can see why variablizing the convolution size was nice: if we\nwant to play with a model that uses different size kernels, we only need to\nedit one line instead of two.\n\nBut there are still two problems we might encounter:\n\n- What if we wanted to try out lots of models with different numbers of\n  convolutions?\n- What if we wanted to use *different* ``size`` or ``kernel`` values in each\n  convolution?\n\nKur can do it!\n\nFlexibility: Loops\n------------------\n\nLet's address the first problem: what if we want to make the number of\nconvolutions? Kur supports many \"meta-layers\" that it calls \"operators.\" A\nvery simple operator is the classic `\"for\" loop\n\u003chttps://en.wikipedia.org/wiki/For_loop\u003e`_. This allows us to add many\nconvolution + activation layers at once. It looks like this:\n\n.. code-block:: yaml\n\n\tsettings:\n\t  cnn:\n\t    size: [3, 3]\n\n\tmodel:\n\t  - input: images\n\t  - for:\n\t      range: 2\n\t      iterate:\n\t        - convolution:\n\t            kernels: 64\n\t            size: \"{{ cnn.size }}\"\n\t        - activation: relu\n\t  - flatten:\n\t  - dense: 10\n\t  - activation: softmax\n\t    name: labels\n\nThis is equivalent to the version without the \"for\" loop. The ``for:`` loop\ntells us to do everything in the ``iterate:`` section twice. (Why twice?\nBecause ``range: 2``.) And of course, we can variabilize the number of\niterations like this:\n\n.. code-block:: yaml\n\n\tsettings:\n\t  cnn:\n\t    size: [3, 3]\n\t    layers: 2\n\n\tmodel:\n\t  - input: images\n\t  - for:\n\t      range: \"{{ cnn.layers }}\"\n\t      iterate:\n\t        - convolution:\n\t            kernels: 64\n\t            size: \"{{ cnn.size }}\"\n\t        - activation: relu\n\t  - flatten:\n\t  - dense: 10\n\t  - activation: softmax\n\t    name: labels\n\nThink about this for a minute. Does it make sense? It should. The model looks\nlike this:\n\n- An ``input`` layer of images.\n- A number of ``convolution`` and ``activation`` layers. How many?\n  ``cnn.layers``, so 2.\n- The rest of the model is as expected: a dense operation followed by an\n  activation.\n\nFlexibility: Variable-length Loops\n----------------------------------\n\nSo we solved the problem of allowing for a variable number of convolutions. But\nwhat if each convolution should use a different number of kernels (or sizes,\netc.)?  Well, Kur can happily handle this, too. In fact, the ``for:`` loop\nalready does most of the work. Every ``for:`` loop creates its own \"local\"\nvariable to let you know which iteration it is on. The default name for this\nvariable is ``index``. So if we want to use a different number of kernels for\neach convolution, we can do this:\n\n.. code-block:: yaml\n\n\tsettings:\n\t  cnn:\n\t    size: [3, 3]\n\t    kernels: [64, 32]\n\t    layers: 2\n\n\tmodel:\n\t  - input: images\n\t  - for:\n\t      range: \"{{ cnn.layers }}\"\n\t      iterate:\n\t        - convolution:\n\t            kernels: \"{{ cnn.kernels[index] }}\"\n\t            size: \"{{ cnn.size }}\"\n\t        - activation: relu\n\t  - flatten:\n\t  - dense: 10\n\t  - activation: softmax\n\t    name: labels\n\nAgain, this is just Jinja2 substitution: we are asking for the ``index``-th\nelement of the ``cnn.kernels`` list. Each iteration of the ``for:`` loop\ntherefore grabs a different value for ``kernels:``. Cool, huh?\n\nBut we can do one better.\n\nFlexibility: Filters\n--------------------\n\nThe annoying thing about our current model is that nothing forces the ``layers``\nvalue to be the same as the length of the ``kernels`` variable. If you make\nreally long (like, length seventeen) but leave ``layers`` at two, you probably\nmade a mistake. (Why did you put in seventeen layers but then only use the first\ntwo in the loop?) What you really want is to make sure that ``layers`` is set to\nthe length of the ``kernels`` list. Or put another way, you want add as many\nconvolutions as you have kernels in the list.\n\nJinja2 supports a concept called \"filters,\" which are basically functions that\nyou can apply to objects. You can even define your own filters. But what we\nwant right now is a way to get the length of a list. It's easy and it looks\nlike this:\n\n.. code-block:: yaml\n\n\tsettings:\n\t  cnn:\n\t    size: [3, 3]\n\t    kernels: [64, 32]\n\n\tmodel:\n\t  - input: images\n\t  - for:\n\t      range: \"{{ cnn.kernels|length }}\"\n\t      iterate:\n\t        - convolution:\n\t            kernels: \"{{ cnn.kernels[index] }}\"\n\t            size: \"{{ cnn.size }}\"\n\t        - activation: relu\n\t  - flatten:\n\t  - dense: 10\n\t  - activation: softmax\n\t    name: labels\n\nYou'll notice that the ``layers`` variable is gone, and we have this funky\n``|length`` thing in the \"for\" loop's ``range``. This is standard Jinja2: the\n``length`` filter returns the length of a list. So now we are asking the \"for\"\nloop to iterate as many times as we have another kernel size.\n\nThis is really cool if you think about it. You want to add another convolution\nto the network? *All you do is add it's size to the* ``kernels`` *list*. And\nlook!  You're model is now more general, more reuseable. You could have used\nthe same model for MNIST! Or CIFAR! Or many different applications.\n\nThis is the heart of the **Kur philosophy: you should describe your model once\nand simply.** The specification *describes** your model: a bunch of\nconvolutions and then a fully-connected layer. You can specify the details (how\nmany convolutions, their parameters, etc.) elsewhere. The model should stay\nelegant.\n\n**NOTE**: Of course, it isn't always easy to write reusable models. And the\nlearning curve can get in the way. When we say that models should be \"simple,\"\nwe don't mean that you don't need to think about it. We mean that it should be\nsimple to use, simple to modify, and simple to share. A more general model is\nelegant: making changes to it is easy (you only modify the settings). And this\nmakes it easier to reuse in new contexts or to share with the community.\nSimplicity is power.\n\nActually Training a CIFAR-10 Model\n----------------------------------\n\nGreat, we now have a simple, but powerful and general model. Let's train it. As\nbefore, you'll need to ``cd examples`` first.\n\n.. code-block:: bash\n\n\tkur train cifar.yml\n\nAgain, evaluation is just as simple:\n\n.. code-block:: bash\n\n\tkur evaluate cifar.yml\n\nAdvanced Features\n-----------------\n\nThe ``cifar.yml`` specification file is more complicated than the MNIST one,\nmostly to expose you to some more knobs you can tweak. For example, you'll see\nthese lines in the ``train`` section:\n\n.. code-block:: yaml\n\n\tprovider:\n\t  batch_size: 32\n\t  num_batches: 2\n\nAs in the MNIST case, ``num_batches`` tells Kur to only train on that many\nbatches of data each epoch (mostly so that if you don't have a nice GPU, the\nexample still finishes in a reasonable amount of time). The ``batch_size`` value\nindicates the number of training samples that should be used in each batch.\n\n.. _using_binary_logger:\n\nThe ``train`` section also has a ``log: cifar-log`` line. This tells Kur to\nsave a log file to ``cifar-log`` (in the current working directory). This log\ncontains lots of interesting information about current training loss, batch\nloss, and the number of epochs. By default, they are binary-encoded files, but\nyou can load them using the Kur API (in Python 3):\n\n.. code-block:: python\n\n\tfrom kur.loggers import BinaryLogger\n\tdata = BinaryLogger.load_column(LOG_PATH, STATISTIC)\n\nwhere ``LOG_PATH`` is the path to the log file (e.g., ``cifar-log``) and\n``STATISTIC`` is one of the logged statistics. ``data`` will be a `Numpy\n\u003chttp://www.numpy.org/\u003e`_ array. To find available statistics, just list the\navailable files in the ``LOG_PATH``, like this:\n\n.. code-block:: bash\n\n\t$ ls cifar-log\n\ttraining_loss_labels\n\ttraining_loss_total\n\tvalidation_loss_labels\n\tvalidation_loss_total\n\nFor an example of using this log data, see our Tutorial_.\n\nAnother difference from the MNIST examples is that there are more files\nreferring to weights in the CIFAR specification. For example, in the\n``validate`` section there is:\n\n.. code-block:: yaml\n\n\tweights: cifar.best.valid.w\n\nThis tells Kur to save the best models weights (corresponding to the lowest\nloss on the *validation* set) to ``cifar.best.valid.w``. Similarly, in the\n``train`` section there is this:\n\n.. code-block:: yaml\n\n\tweights:\n\t  initial: cifar.best.valid.w\n\t  save_best: cifar.best.train.w\n\t  last: cifar.last.w\n\nThe ``initial`` key tells Kur to try and load ``cifar.best.valid.w`` (the best\nweights with respect to the *validation* loss) at the beginning of training. If\nthis file doesn't exist, nothing happens. This means that if you run the\ntraining cycle many times (with many calls to ``kur train cifar.yml``), you\nalways \"restart\" from the best model weights.\n\nWe are also saving the best weights (with respect to the *training* loss) to\n``cifar.best.train.w``.  The most recent weights are saved to ``cifar.last.w``. \n\n**NOTE**: The weights depend on the model architecture. Say you you train CIFAR\nand produce ``cifar.best.valid.w``. Then you tweak the model in the\nspecification file. If you try to resume training (``kur train cifar.yml``),\nKur will try to load ``cifar.best.valid.w``. But the weights many not fit the\nnew architecture! So, to be safe, you should always delete (or backup) your\nweight files before trying to train a fresh, tweaked model. In a production\nenvironment, you probably want to have different sub-directories for each\nvariation/tweak to the model so that you never run into this problem.\n\nThe CIFAR-10 example also explicitly specifies an optimizer in the ``train``\nsection:\n\n.. code-block:: yaml\n\n\toptimizer:\n\t  name: adam\n\t  learning_rate: 0.001\n\nThe optimizer function is set in the ``name`` field and all other parameters\n(such as ``learning_rate``) are defined in the other fields. You can safely\nchange the optimizer without breaking backwards-compatibility with older weight\nfiles.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdeepgram%2Fkur","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdeepgram%2Fkur","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdeepgram%2Fkur/lists"}