https://github.com/kfish/micrograd-cpp-2023

A C++ port of karpathy/micrograd, a tiny scalar-valued autograd engine and a neural net library
https://github.com/kfish/micrograd-cpp-2023
Last synced: 2 months ago
JSON representation
A C++ port of karpathy/micrograd, a tiny scalar-valued autograd engine and a neural net library
Host: GitHub
URL: https://github.com/kfish/micrograd-cpp-2023
Owner: kfish
License: mit
Created: 2023-10-06T22:21:25.000Z (almost 2 years ago)
Default Branch: main
Last Pushed: 2023-11-24T21:22:08.000Z (over 1 year ago)
Last Synced: 2025-04-01T06:16:00.035Z (3 months ago)
Language: C++
Homepage:
Size: 260 KB
Stars: 13
Watchers: 2
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project

README

        [![CMake on multiple platforms](https://github.com/kfish/micrograd-cpp-2023/actions/workflows/cmake-multi-platform.yml/badge.svg)](https://github.com/kfish/micrograd-cpp-2023/actions/workflows/cmake-multi-platform.yml)

# micrograd-cpp-2023

A C++ implementation of

[karpathy/micrograd](https://github.com/karpathy/micrograd).

Each step of the first episode of *Neural Nets: Zero to Hero*:

[The spelled-out intro to neural networks and backpropagation: building micrograd](https://youtu.be/VMj-3S1tku0)

is included.

![](https://i.ytimg.com/vi/VMj-3S1tku0/hqdefault.jpg)

This roughly follows the flow of Karpathy's YouTube tutorial, with details specific to this C++ implementation:

 * [What is micrograd-cpp and why is it interesting?](#what-is-micrograd-cpp-and-why-is-it-interesting)

   - [Example usage](#example-usage)

   - [C++ implementation notes](#c-implementation-notes)

 * [Building out the Value object](#building-out-the-value-object)

 * [Visualizing the expression graph](#visualizing-the-expression-graph)

 * [Backpropagation](#backpropagation)

 * [Backpropagation through a neuron](#backpropagation-through-a-neuron)

   - [Activation function](#activation-function)

   - [Math operations](#math-operations)

   - [Multiply-Accumulate](#multiply-accumulate)

   - [randomValue, randomArray](#randomvalue-randomarray)

 * [Multi-Layer Perceptron](#multi-layer-perceptron)

   - [Layer](#layer)

   - [BuildLayers](#buildlayers)

   - [MLP](#mlp)

   - [MLP1](#mlp1)

 * [Loss function](#loss-function)

   - [MSELoss](#mseloss)

 * [Gradient descent](#gradient-descent)

   - [Adjusting parameters](#adjusting-parameters)

   - [CanBackProp](#canbackprop)

   - [BackProp](#backprop)

   - [Binary Classifier](#binary-classifier)

with [References](#references) at the end for further reading about automatic differentiation and C++ implementations.

See [kfish/makemore-cpp-2023](https://github.com/kfish/makemore-cpp-2023) for a continuation to other

videos in the series, expanding the codebase to handle automatic differentiation of vectors and matrices.

## What is micrograd-cpp and why is it interesting?

micrograd-cpp introduces some nuances of backpropagation (reverse-mode autodiff) and its

implementation. At its core is an expression graph which can be evaluated forwards, where

an expression like `a+b` is implemented using `operator+`, and differentiated in reverse

using a `std::function` attached to each graph node to calculate its gradient.

This implementation allows computations using `Value` objects to be written as

normal-looking C++ code. It also includes generic classes for evaluation and learning for anything that can produce `Value`.

Like micrograd, the point is still to be educational, with a focus on some implementation details

and flexibility for exploring learning algorithms.

### Building

For simplicity, there are no code dependencies. Data is manipulated using `std::array<>` and graphing and plotting is done

with external tools like `dot` and `gnuplot`.

Build with CMake, eg

```bash

$ mkdir build

$ cd build

$ cmake ..

$ make

```

### Example usage

This is [examples/example-usage.cpp](examples/example-usage.cpp):

```c++

#include 

#include "value.h"

using namespace ai;

int main(int argc, char *argv[])

{

    auto a = make_value(-4.0);

    auto b = make_value(2.0);

    auto c = a + b;

    auto d = a * b + pow(b, 3);

    c += c + 1;

    c += 1 + c + (-a);

    d += d * 2 + relu(b + a);

    d += 3 * d + relu(b - a);

    auto e = c - d;

    auto f = pow(e, 2);

    auto g = f / 2.0;

    g += 10.0 / f;

    printf("%.4f\n", g->data()); // prints 24.7041, the outcome of this forward pass

    backward(g);

    printf("%.4f\n", a->grad()); // prints 138.8338, i.e. the numerical value of dg/da

    printf("%.4f\n", b->grad()); // prints 645.5773, i.e. the numerical value of dg/db

}

```

### C++ implementation notes

1. Data type

Neural nets generally don't require many bits of precision on individual node values,

so let's not limit ourselves to `float` or `double`

(or [FP8](https://github.com/opencomputeproject/FP8/blob/main/ofp8_references.pdf)).

We template using `Value`.

2. Sharing

Nodes may appear as inputs to multiple other nodes in the expression graph,

especially for neural networks, so we use a `shared_ptr`:

```c++

using Value = std::shared_ptr>;

```

3. Removal of cycles

The expression `c += c + 1` refers to itself, so it contains a cycle. This cycle needs to be

removed in order to implement backpropagation.

![c += c + 1](examples/c-plus-equals-cycle.svg)

In Python, `x += y` usually translates to `x.__iadd__(y)` which modifies `x` in-place.

However, the `Value` objects in `micrograd` don't implement `__iadd__`, so Python falls back to using `__add__`

followed by assignment. That means `a += b` is roughly equivalent to `a = a + b`. Each time the + operator

is invoked, a new Value object is created and the graph gets extended, so it is not modifying the existing

objects in-place.

In C++, `operator+=` requires an explicit implementation which modifies its value in-place.

We create a copy of the old value and re-write all earlier references in the expression graph

to point to the copy.

![c += c + 1](examples/c-plus-equals-rewrite.svg)

Note that this aspect of the implementation is peculiar to the operational semantics of C++

and in-place assignment operators. It is straightforward to implement a neural network

without calling these operators, so the overhead of node copying and graph rewriting could

easily be removed. We include it here only for the translation of micrograd to C++.

## Building out the Value object

> Neural nets are some pretty scary expressions. We need some data structures to maintain 

> these expressions.

In order to handle basic expressions like:

```c++

    auto a = make_value(2.0, "a");

    auto b = make_value(-3.0, "b");

    auto c = make_value(10.0, "c");

    auto d = (a*b) + c;

    std::cout << d << std::endl;

```

we start sketching out the underlying `RawValue` class, implementing operators for `+`

and `*`, and storing the inputs (children) of each for the evaluation graph.

```c++

template 

class RawValue {

    public:

        using ptr = std::shared_ptr>;

    private:

        RawValue(const T& data, const std::string& label="")

            : data_(data), label_(label)

        {}

        RawValue(const T& data, std::set& children, const std::string& op="")

            : data_(data), prev_(children), op_(op)

        {}

    public:

        template 

        static ptr make(Args&&... args) {

            return ptr(new RawValue(std::forward(args)...));

        }

        friend ptr operator+(const ptr& a, const ptr& b) {

            std::set children = {a, b};

            return make(a->data() + b->data(), children, "+");

        }

        friend ptr operator*(const ptr& a, const ptr& b) {

            std::set children = {a, b};

            return make(a->data() * b->data(), children, "*");

        }

    private:

        T data_;

        std::set prev_{};

        std::string op_{""};

};

template 

static inline std::ostream& operator<<(std::ostream& os, const RawValue& value) {

    return os << "Value("

        << "data=" << value.data() << ", "

        << "op=" << value.op()

        << ")";

}

```

In code we use `Value`, which is an alias for `shared_ptr>`:

```c++

template 

using Value = typename RawValue::ptr;

template 

static Value make_value(const T& data, Args&&... args) {

    return RawValue::make(data, std::forward(args)...);

}

template 

static inline std::ostream& operator<<(std::ostream& os, const std::shared_ptr>& value) {

    return os << value.get() << "=&" << *value;

}

```

## Visualizing the expression graph

We provide a `Graph` class that can wrap any `Value`. It has a custom `operator<<` that writes in `dot`

language. The implementation is in [include/graph.h](include/graph.h). We also introduce a `label` to the `Value`

object for labelling graph nodes, and an `expr` factory function for creating labelled expressions.

We can pipe the output of a program to `dot -Tsvg` to produce an svg image, or to `xdot` to view it interactively:

```bash

$ build/examples/graph | dot -Tsvg -o graph.svg

$ build/examples/graph | xdot -

```

![Example graph](examples/graph.svg)

## Backpropagation

We add a member variable `grad_` that maintains the gradient with respect to the final output.

How each operation affects the output is written as a lambda function, `backward_`.

It copies the `Value` `shared_ptr`s of each node's children in order to increment their reference counts.

```c++

        friend ptr operator+(const ptr& a, const ptr& b) {

            auto out = make(a->data() + b->data(), children, "+");

            out->backward_ = [=]() {

                a->grad_ += out->grad_;

                b->grad_ += out->grad_;

            };

            return out;

        }

        friend ptr operator*(const ptr& a, const ptr& b) {

            std::set children = {a, b};

            auto out = make(a->data() * b->data(), children, "*");

            out->backward_ = [=]() {

                a->grad_ += b->data() * out->grad();

                b->grad_ += a->data() * out->grad();

            };

            return out;

        }

```

We recursively apply the local derivatives using the chain rule backwards through the expression graph:

```c++

        friend void backward(const ptr& node) {

            std::vector*> topo;

            std::set*> visited;

            std::function build_topo = [&](const ptr& v) {

                if (!visited.contains(v.get())) {

                    visited.insert(v.get());

                    for (auto && c : v->children()) {

                        build_topo(c);

                    }

                    topo.push_back(v.get());

                }

            };

            build_topo(node);

            for (auto & v : topo) {

                v->grad_ = 0.0;

            }

            node->grad_ = 1.0;

            for (auto it = topo.rbegin(); it != topo.rend(); ++it) {

                const RawValue* v = *it;

                auto f = v->backward_;

                if (f) f();

            }

        }

```

## Backpropagation through a neuron

We begin the implementation of a neuron, in [include/nn.h](include/nn.h):

```c++

template 

class Neuron {

    public:

        Neuron()

            : weights_(randomArray()), bias_(randomValue())

        {

        }

        Value operator()(const std::array, Nin>& x) const {

            Value y = mac(weights_, x, bias_);

            return expr(tanh(y), "n");

        }

        ...

};

```

The resulting expression graph for a neuron with four inputs (code in [examples/neuron.cpp](examples/neuron.cpp)):

![Neuron graph](examples/neuron.svg)

### Activation function

In general an activation function modifies the output of a neuron, perhaps so that all neurons have similar ranges of output value or to smooth or filter large and negative values.

Whichever activation function we use, we need to implement a `backward_` function.

This implementation includes `relu` (which just replaces any negative values with zero) and `tanh`, which squashes the output into the range ±1.0. `tanh` is used in the video and has an obvious and continuous effect on the gradient:

```c++

        friend ptr tanh(const ptr& a) {

            std::set children = {a};

            double x = a->data();

            double e2x = exp(2.0*x);

            double t = (e2x-1)/(e2x+1);

            auto out = make(t, children, "tanh");

            out->backward_ = [=]() {

                a->grad_ += (1.0 - t*t) * out->grad_;

            };

            return out;

        }

```

### Math operations

We must implement all required math operations on `Value`, including pow, exp, and division,

so that we can accumulate gradients and run backpropagation.

For convenience we also provide operator specializations where one operand is an arithmetic value, so that instead of

writing `a * make_value(7.0)` you can write `a * 7.0` or `7.0 * a`:

```c++

        template::value, int> = 0>

        friend ptr operator*(const ptr& a, N n) { return a * make(n); }

        template::value, int> = 0>

        friend ptr operator*(N n, const ptr& a) { return make(n) * a; }

```

### Multiply-Accumulate

A neuron takes a number of input values, applies a weight to each, and sums the result. We can abstract this out as a common multiply-accumulate function.

It is usual to use a hardware-optimized, eg. GPU, implementation.

In order to use our explicit `Value` object, we provide a generic implementation is in [include/mac.h](include/mac.h).

This uses `std::execution` to allow the compiler to choose an optimized execution method, allowing parallel and vectorized execution:

```c++

template 

T mac(const std::array& a, const std::array& b, T init = T{}) {

    return std::transform_reduce(

        std::execution::par_unseq, // Use parallel and vectorized execution

        a.begin(), a.end(), // Range of first vector

        b.begin(), // Range of second vector

        init, //static_cast(0), // Initial value

        std::plus<>(), // Accumulate

        std::multiplies<>() // Multiply

    );

}

```

### randomValue, randomArray

We provide helper functions to create random values statically, in deterministic order. This helps with reproducibility for debugging.

The implementation is in [include/random.h](include/random.h).

```c++

// Static inline function to generate a random T

template 

static inline Value randomValue() {

    static unsigned int seed = 42;

    static thread_local std::mt19937 gen(seed++);

    std::uniform_real_distribution dist(-1.0, 1.0);

    seed = gen(); // update seed for next time

    return make_value(dist(gen));

}

// Static inline function to generate a random std::array

template 

static inline std::array, N> randomArray() {

    std::array, N> arr;

    for (auto& element : arr) {

        element = randomValue();

    }

    return arr;

}

```

## Multi-Layer Perceptron

We arrange neurons in a series of layers. Each layer is just an array of neurons.

A layer `Layer` consists of `Nout` neurons, and is callable:

  * The same input (array of `Nin` values) is passed to each of the neurons

  * Each neuron produces a single output value

  * These output values are collected into an output array of `Nout` values.

### Layer

```c++

template 

class Layer {

    public:

        std::array, Nout> operator()(const std::array, Nin>& x) {

            std::array, Nout> output{};

            std::transform(std::execution::par_unseq, neurons_.begin(), neurons_.end(),

                    output.begin(), [&](const auto& n) { return n(x); });

            return output;

        }

    private:

        std::array, Nout> neurons_{};

};

```

### BuildLayers

We introduce a helper type that allows us to specify a sequence of layers of different sizes.

```c++

template 

struct BuildLayers;

template 

struct BuildLayers {

    using type = decltype(std::tuple_cat(

        std::tuple>{},

        typename BuildLayers::type{}

    ));

    static constexpr size_t nout = BuildLayers::nout;

};

template 

struct BuildLayers {

    using type = std::tuple>;

    static constexpr size_t nout = Last;

};

```

We make an alias for the type of such a sequence, like `Layers<3, 4, 4, 1>`:

```c++

template 

using Layers = typename BuildLayers::type;

```

and a helper to extract the final number of outputs, eg. `LayersNout<3, 4, 4, 1>` is 1:

```c++

template 

static constexpr size_t LayersNout = BuildLayers::nout;

```

### MLP

Finaly we use `Layers<>` in a class `MLP<>`, which:

  * Forwards its input to the first layer

  * Passes the output of each layer to the next layer, in turn

  * Returns the output of the last layer

```c++

template 

class MLP {

public:

    static constexpr size_t Nout = LayersNout;

    std::array, Nout> operator()(const std::array, Nin>& input) {

        return forward<0, Nin, Nouts...>(input);

    }

    std::array, Nout> operator()(const std::array& input) {

        return this->operator()(value_array(input));

    }

private:

    template 

    auto forward(const std::array, NinCurr>& input) -> decltype(auto) {

        auto & p = std::get(layers_);

        auto output = std::get(layers_)(input);

        if constexpr (sizeof...(NoutsRest) > 0) {

            return forward(output);

        } else {

            return output;

        }

    }


private:

    Layers layers_;

};

```

### MLP1

If we want a single-valued output from our neural network, we create a wrapper class `MLP1<>` that returns only the first element

of the output of the wrapped `MLP<>`:

```c++

template 

class MLP1 : public MLP

{

    public:

        MLP1()

            : MLP()

        {}

        Value operator()(const std::array, Nin>& input) {

            return MLP::operator()(input)[0];

        }

        Value operator()(const std::array& input) {

            return MLP::operator()(input)[0];

        }

};

```

With the following code from [examples/mlp1.cpp](examples/mlp1.cpp) we can make a 3 layer neural net with 1 output,

run backpropagation over it and show the resulting expression graph:

```c++

    MLP1 n;

    std::array input = {{ 2.0, 3.0, -1.0 }};

    auto output = n(input);

    backward(output);

    std::cout << Graph(output) << std::endl;

```

![MLP1 graph](examples/mlp1.svg)

## Loss function

Now that we can make a neural net, run it forwards to produce a value and backwards to calculate gradients, we can begin adjusting it to learn.

We introduce generic evaluation and learning classes for anything that can produce `Value`.

### MSELoss

We evaluate a prediction against a known "ground truth". The difference between these is the *Error*, and we

take the square of the error to approximate distance.

We average these out when considering an array of predictions and their ground truths. This is the Mean Squared Error.

Implementation in [include/loss.h](include/loss.h).

```c++

template 

Value mse_loss(const Value& predicted, const Value& ground_truth) {

    static_assert(std::is_arithmetic::value, "Type must be arithmetic");

    return pow(predicted - ground_truth, 2);

}

```

```c++

template

Value mse_loss(const std::array, N>& predictions, const std::array& ground_truth) {

    Value sum_squared_error = std::inner_product(predictions.begin(), predictions.end(), ground_truth.begin(), make_value(0),

        std::plus<>(),

        [](Value pred, T truth) { return pow(pred - truth, 2); }

    );

    return sum_squared_error / make_value(N);

}

```

We provide a wrapper class to calculate the Mean Squared Error of any `std::function(const Arg&)>`:

```c++

template

class MSELoss {

    public:

        MSELoss(const std::function(const Arg&)>& func)

            : func_(func)

        {

        }

        Value operator()(std::array& input, const std::array& ground_truth, bool verbose=false) {

            if (verbose) std::cerr << "Predictions: ";

            for (size_t i = 0; i < N; ++i) {

                predictions_[i] = func_(input[i]);

                if (verbose) std::cerr << predictions_[i]->data() << " ";

            }

            if (verbose) std::cerr << '\n';

            return mse_loss(predictions_, ground_truth);

        }

    private:

        const std::function(const Arg&)> func_;

        std::array, N> predictions_;

};

```

## Gradient descent

The gradients calculated by running `backward(loss)` annotate how each parameter contributes to the error loss.

By adjusting each parameter down (against the gradient) we aim to minimize the error.

### Adjusting parameters

We introduce an `adjust()` function in `Value` to modify `data_` according to the calculated gradient `grad_`.

This takes a parameter `learning_rate`, usually in the range `[0.0 .. 1.0]` to scale of the adjustment:

```c++

        void adjust(const T& learning_rate) {

            data_ += -learning_rate * grad_;

        }

```

We then provide `adjust()` functions to adjust all the parameters of an neural net: the weights and bias of a Neuron:

```c++

template 

class Neuron {

    ...

        const std::array, Nin>& weights() const {

            return weights_;

        }

        Value bias() const {

            return bias_;

        }

        void adjust_weights(const T& learning_rate) {

            for (const auto& w : weights_) {

                w->adjust(learning_rate);

            }

        }

        void adjust_bias(const T& learning_rate) {

            bias_->adjust(learning_rate);

        }

        void adjust(const T& learning_rate) {

            adjust_weights(learning_rate);

            adjust_bias(learning_rate);

        }

    ...

};

```

then adjust all the Neurons in a Layer:

```c++

template 

class Layer {

    ...

        void adjust(const T& learning_rate) {

            for (auto & n : neurons_) {

                n.adjust(learning_rate);

            }

        }

    ...

};

```

and all the Layers in an MLP:

```c++

template

void layers_adjust(std::tuple& layers, const T& learning_rate) {

    std::apply([&learning_rate](auto&... layer) {

        // Use fold expression to call adjust on each layer

        (..., layer.adjust(learning_rate));

    }, layers);

}

```

```c++

template 

class MLP {

    ...

        void adjust(const T& learning_rate) {

            layers_adjust(layers_, learning_rate);

        }

    ...

};

```

### CanBackProp

We introduce a concept `CanBackProp` to describe any function that can be evaluated and adjusted:

```c++

template 

concept CanBackProp = requires(F f, Arg arg, T learning_rate) {

    { f(arg) } -> std::convertible_to>;

    { f.adjust(learning_rate) } -> std::convertible_to;

};

```

For example, `CanBackProp` is true for `MLP1`.

### BackProp

We create a wrapper class for any function that matches the `CanBackProp` concept. This class is callable

with input and ground truth arguments, which are used to iteratively:

  * make predictions

  * evaluate error loss against ground truth

  * adjust parameters to minimize loss

The loss at each step is recorded in an output file `loss_path`.

```c++

template

class BackPropImpl {

    public:

        BackPropImpl(const F& func, const std::string& loss_path)

            : func_(func), loss_output_(loss_path)

        {

        }

        MSELoss loss_function() const {

            return MSELoss(func_);

        }

        T operator()(std::array& input, const std::array& ground_truth,

                T learning_rate, int iterations, bool verbose=false)

        {

            auto loss_f = loss_function();

            T result;

            for (int i=0; i < iterations; ++i) {

                Value loss = loss_f(input, ground_truth, verbose);

                result = loss->data();

                loss_output_ << iter_ << '\t' << result << '\n';

                if (verbose) {

                    std::cerr << "Loss (" << iter_ << "):\t" << result << std::endl;

                }

                backward(loss);

                func_.adjust(learning_rate);

                ++iter_;

            }

            return result;

        }

    private:

        F func_;

        std::ofstream loss_output_;

        int iter_{0};

};

```

The helper `BackProp` template allows us to instantiate without specifying the type of `F`, as the compiler can infer it from the constructor argument:

```c++

template

requires CanBackProp

auto BackProp(const F& func, const std::string& loss_path)

{

    return BackPropImpl(func, loss_path);

}

```

### Binary Classifier

Full example: [binary-classifier.cpp](examples/binary-classifier.cpp):

```c++

#include 

#include "backprop.h"

#include "nn.h"

using namespace ai;

int main(int argc, char *argv[])

{

    // Define a neural net

    MLP1 n;

    std::cerr << n << std::endl;

    // A set of training inputs

    std::array, 4> input = {{

        {2.0, 3.0, -1.0},

        {3.0, -1.0, 0.5},

        {0.5, 1.0, 1.0},

        {1.0, 1.0, -1.0}

    }};

    // Corresponding ground truth values for these inputs

    std::array y = {1.0, -1.0, -1.0, 1.0};

    std::cerr << "y (gt):\t" << PrettyArray(y) << std::endl;

    double learning_rate = 0.9;

    auto backprop = BackProp, 4>(n, "loss.tsv");

    // Run backprop for 20 iterations, verbose=true

    double loss = backprop(input, y, learning_rate, 20, true);

}

```

This quickly converges close to the ground truth `y = {1.0, -1.0, -1.0, 1.0}`:

```

Predictions: 0.773488 0.796802 0.870344 0.736159

Loss (0):	1.7119

Predictions: -0.316783 -0.714319 -0.588441 -0.396673

Loss (1):	0.983902

Predictions: 0.997675 0.996129 0.996903 0.99767

Loss (2):	1.99304

Predictions: 0.997454 0.995671 0.996576 0.997448

Loss (3):	1.99226

Predictions: 0.997183 0.995088 0.996169 0.997177

Loss (4):	1.99127

Predictions: 0.996845 0.99432 0.995649 0.996838

Loss (5):	1.98999

Predictions: 0.996409 0.993259 0.99496 0.996403

Loss (6):	1.98824

Predictions: 0.995827 0.991692 0.994 0.995819

Loss (7):	1.98573

Predictions: 0.995001 0.989122 0.992564 0.994993

Loss (8):	1.98174

Predictions: 0.993729 0.984072 0.990152 0.993718

Loss (9):	1.97433

Predictions: 0.991462 0.969564 0.985135 0.991443

Loss (10):	1.95502

Predictions: 0.985916 0.862539 0.967274 0.985854

Loss (11):	1.8349

Predictions: 0.947625 -0.182752 0.497795 0.945489

Loss (12):	0.72925

Predictions: -0.0306315 -0.996451 -0.996155 -0.512822

Loss (13):	0.837715

Predictions: 0.999406 -0.977287 -0.542644 0.999393

Loss (14):	0.0524229

Predictions: 0.999027 -0.997707 -0.997583 0.998998

Loss (15):	3.26197e-06

Predictions: 0.999027 -0.997707 -0.997584 0.998998

Loss (16):	3.26134e-06

Predictions: 0.999027 -0.997708 -0.997584 0.998998

Loss (17):	3.26072e-06

Predictions: 0.999027 -0.997708 -0.997584 0.998998

Loss (18):	3.26009e-06

Predictions: 0.999027 -0.997708 -0.997585 0.998998

Loss (19):	3.25946e-06

```

## Conclusion

We ported all the features of micrograd introduced in Karpathy's YouTube tutorial to C++, giving a different perspective on implementation details.

We also considered some more generic aspects of model evaluation and iterative learning to develop re-usable C++ classes.

## References

### Automatic differentiation in C++

* [Differentiable Programming in C++ - Vassil Vassilev & William Moses - CppCon 2021](https://www.youtube.com/watch?v=1QQj1mAV-eY) [YouTube]

* [Automatic Differentiation in C++](https://compiler-research.org/assets/presentations/CladInROOT_15_02_2020.pdf)[PDF]

* [FastAD: Expression Template-Based C++ Library for Fast and Memory-Efficient Automatic Differentiation](https://arxiv.org/abs/2102.03681) [PDF]

### Automatic differentiation

* [ad](https://hackage.haskell.org/package/ad)

## Next up: [kfish/makemore-cpp-2023](https://github.com/kfish/makemore-cpp-2023)
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/kfish/micrograd-cpp-2023

Awesome Lists containing this project

README