{"id":27290326,"url":"https://github.com/dylanmuir/fmin_adam","last_synced_at":"2025-04-11T21:28:10.373Z","repository":{"id":87510629,"uuid":"82065015","full_name":"DylanMuir/fmin_adam","owner":"DylanMuir","description":"Matlab implementation of the Adam stochastic gradient descent optimisation algorithm","archived":false,"fork":false,"pushed_at":"2017-02-22T10:52:19.000Z","size":112,"stargazers_count":48,"open_issues_count":2,"forks_count":25,"subscribers_count":5,"default_branch":"master","last_synced_at":"2023-10-20T19:34:03.570Z","etag":null,"topics":["gradient-descent","matlab","optimization","optimization-algorithms","stochastic-gradient-descent"],"latest_commit_sha":null,"homepage":"http://dylan-muir.com/articles/adam_optimiser/","language":"Matlab","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/DylanMuir.png","metadata":{"files":{"readme":"Readme.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2017-02-15T13:44:26.000Z","updated_at":"2023-10-20T19:34:07.062Z","dependencies_parsed_at":null,"dependency_job_id":"224cff38-f069-4606-a85c-658d6389e09e","html_url":"https://github.com/DylanMuir/fmin_adam","commit_stats":null,"previous_names":[],"tags_count":0,"template":null,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DylanMuir%2Ffmin_adam","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DylanMuir%2Ffmin_adam/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DylanMuir%2Ffmin_adam/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DylanMuir%2Ffmin_adam/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/DylanMuir","download_url":"https://codeload.github.com/DylanMuir/fmin_adam/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248482415,"owners_count":21111324,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["gradient-descent","matlab","optimization","optimization-algorithms","stochastic-gradient-descent"],"created_at":"2025-04-11T21:28:09.730Z","updated_at":"2025-04-11T21:28:10.355Z","avatar_url":"https://github.com/DylanMuir.png","language":"Matlab","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Adam optimiser\nThis is a `Matlab` implementation of the Adam optimiser from Kingma and Ba [[1]], designed for stochastic gradient descent. It maintains estimates of the moments of the gradient independently for each parameter.\n\n## Usage\n` [x, fval, exitflag, output] = fmin_adam(fun, x0 \u003c, stepSize, beta1, beta2, epsilon, nEpochSize, options\u003e)`\n\n`fmin_adam` is an implementation of the Adam optimisation algorithm (gradient descent with Adaptive learning rates individually on each parameter, with Momentum) from Kingma and Ba [[1]]. Adam is designed to work on stochastic gradient descent problems; i.e. when only small batches of data are used to estimate the gradient on each iteration, or when stochastic dropout regularisation is used [[2]].\n\n## Examples\n###Simple regression problem with gradients\n\nSet up a simple linear regression problem: ![$$$y = x\\cdot\\phi_1 + \\phi_2 + \\zeta$$$](https://latex.codecogs.com/svg.latex?%5Cinline%20y%20%3D%20x%5Ccdot%5Cphi_1%20\u0026plus;%20%5Cphi_2%20\u0026plus;%20%5Czeta), where ![$$$\\zeta \\sim N(0, 0.1)$$$](https://latex.codecogs.com/svg.latex?%5Cinline%20%5Czeta%20%5Csim%20N%280%2C%200.1%29). We'll take ![$$$\\phi = \\left[3, 2\\right]$$$](https://latex.codecogs.com/svg.latex?%5Cinline%20%5Cphi%20%3D%20%5Cleft%5B3%2C%202%5Cright%5D) for this example. Let's draw some samples from this problem:\n\n```matlab\nnDataSetSize = 1000;\nvfInput = rand(1, nDataSetSize);\nphiTrue = [3 2];\nfhProblem = @(phi, vfInput) vfInput .* phi(1) + phi(2);\nvfResp = fhProblem(phiTrue, vfInput) + randn(1, nDataSetSize) * .1;\nplot(vfInput, vfResp, '.'); hold;\n```\n\n\u003cimg src=\"images/regression_scatter.png\" /\u003e\n\nNow we define a cost function to minimise, which returns analytical gradients:\n\n```matlab\nfunction [fMSE, vfGrad] = LinearRegressionMSEGradients(phi, vfInput, vfResp)\n   % - Compute mean-squared error using the current parameter estimate\n   vfRespHat = vfInput .* phi(1) + phi(2);\n   vfDiff = vfRespHat - vfResp;\n   fMSE = mean(vfDiff.^2) / 2;\n   \n   % - Compute the gradient of MSE for each parameter\n   vfGrad(1) = mean(vfDiff .* vfInput);\n   vfGrad(2) = mean(vfDiff);\nend\n```\n\nInitial parameters `phi0` are Normally distributed. Call the `fmin_adam` optimiser with a learning rate of 0.01.\n\n```matlab\nphi0 = randn(2, 1);\nphiHat = fmin_adam(@(phi)LinearRegressionMSEGradients(phi, vfInput, vfResp), phi0, 0.01)\nplot(vfInput, fhProblem(phiHat, vfInput), '.');\n````\n\nOutput:\n\n     Iteration   Func-count         f(x)   Improvement    Step-size\n    ----------   ----------   ----------   ----------   ----------\n          2130         4262       0.0051        5e-07      0.00013\n    ----------   ----------   ----------   ----------   ----------\n\n    Finished optimization.\n       Reason: Function improvement [5e-07] less than TolFun [1e-06].\n\n    phiHat =\n        2.9498\n        2.0273\n\n\u003cimg src=\"images/regression_fit.png\" /\u003e\n\n###Linear regression with minibatches\n\nSet up a simple linear regression problem, as above.\n\n```matlab\nnDataSetSize = 1000;\nvfInput = rand(1, nDataSetSize);\nphiTrue = [3 2];\nfhProblem = @(phi, vfInput) vfInput .* phi(1) + phi(2);\nvfResp = fhProblem(phiTrue, vfInput) + randn(1, nDataSetSize) * .1;\n```\n\nConfigure minibatches. Minibatches contain random sets of indices into the data.\n\n```matlab\nnBatchSize = 50;\nnNumBatches = 100;\nmnBatches = randi(nDataSetSize, nBatchSize, nNumBatches);\ncvnBatches = mat2cell(mnBatches, nBatchSize, ones(1, nNumBatches));\nfigure; hold;\ncellfun(@(b)plot(vfInput(b), vfResp(b), '.'), cvnBatches);\n```\n\u003cimg src=\"images/regression_minibatches.png\" /\u003e\n       \nDefine the function to minimise; in this case, the mean-square error over the regression problem. The iteration index `nIter` defines which mini-batch to evaluate the problem over.\n\n```matlab\nfhBatchInput = @(nIter) vfInput(cvnBatches{mod(nIter, nNumBatches-1)+1});\nfhBatchResp = @(nIter) vfResp(cvnBatches{mod(nIter, nNumBatches-1)+1});\nfhCost = @(phi, nIter) mean((fhProblem(phi, fhBatchInput(nIter)) - fhBatchResp(nIter)).^2);\n```\nTurn off analytical gradients for the `adam` optimiser, and ensure that we permit sufficient function calls.\n\n```matlab\nsOpt = optimset('fmin_adam');\nsOpt.GradObj = 'off';\nsOpt.MaxFunEvals = 1e4;\n```\n\nCall the `fmin_adam` optimiser with a learning rate of `0.1`. Initial parameters are Normally distributed.\n\n```matlab\nphi0 = randn(2, 1);\nphiHat = fmin_adam(fhCost, phi0, 0.1, [], [], [], [], sOpt)\n```\nThe output of the optimisation process (which will differ over random data and random initialisations):\n\n    Iteration   Func-count         f(x)   Improvement    Step-size\n    ----------   ----------   ----------   ----------   ----------\n           711         2848          0.3       0.0027      3.8e-06\n    ----------   ----------   ----------   ----------   ----------\n\n    Finished optimization.\n       Reason: Step size [3.8e-06] less than TolX [1e-05].\n\n    phiHat =\n        2.8949\n        1.9826\n    \n## Detailed usage\n### Input arguments\n`fun` is a function handle `[fCost \u003c, vfCdX\u003e] = @(x \u003c, nIter\u003e)` defining the function to minimise . It must return the cost at the parameter `x`, optionally evaluated over a mini-batch of data. If analytical gradients are available (recommended), then `fun` must return the gradients in `vfCdX`, evaluated at `x` (optionally over a mini-batch). If analytical gradients are not available, then complex-step finite difference estimates will be used.\n\nTo use analytical gradients (default), set `options.GradObj = 'on'`. To force the use of finite difference gradient estimates, set `options.GradObj = 'off'`.\n\n`fun` must be deterministic in its calculation of `fCost` and `vfCdX`, even if mini-batches are used. To this end, `fun` can accept a parameter `nIter` which specifies the current iteration of the optimisation algorithm. `fun` must return estimates over identical problems for a given value of `nIter`.\n\nSteps that do not lead to a reduction in the function to be minimised are not taken.\n\n### Output arguments\n`x` will be a set of parameters estimated to minimise `fCost`. `fval` will be the value returned from `fun` at `x`.\n \n`exitflag` will be an integer value indicating why the algorithm terminated:\n\n* 0: An output or plot function indicated that the algorithm should terminate.\n* 1: The estimated reduction in 'fCost' was less than TolFun.\n* 2: The norm of the current step was less than TolX.\n* 3: The number of iterations exceeded MaxIter.\n* 4: The number of function evaluations exceeded MaxFunEvals.\n \n`output` will be a structure containing information about the optimisation process:\n\n*      `.stepsize` — Norm of current parameter step\n*      `.gradient` — Vector of current gradients evaluated at `x`\n*      `.funccount` — Number of calls to `fun` made so far\n*      `.iteration` — Current iteration of algorithm\n*      `.fval` — Value returned by `fun` at `x`\n*      `.exitflag` — Flag indicating reason that algorithm terminated\n*      `.improvement` — Current estimated improvement in `fun`\n \nThe optional parameters `stepSize`, `beta1`, `beta2` and `epsilon` are  parameters of the Adam optimisation algorithm (see [[1]]). Default values  of `{1e-3, 0.9, 0.999, sqrt(eps)}` are reasonable for most problems.\n \nThe optional argument `nEpochSize` specifies how many iterations comprise  an epoch. This is used in the convergence detection code.\n \nThe optional argument `options` is used to control the optimisation process (see `optimset`). Relevant fields:\n\n*      `.Display`\n*      `.GradObj`\n*      `.DerivativeCheck`\n*      `.MaxFunEvals`\n*      `.MaxIter`\n*      `.TolFun`\n*      `.TolX`\n*      `.UseParallel`\n\n## References\n[[1]] Diederik P. Kingma, Jimmy Ba. \"Adam: A Method for Stochastic\n         Optimization\", ICLR 2015. [https://arxiv.org/abs/1412.6980](https://arxiv.org/abs/1412.6980)\n\n[[2]] Geoffrey E Hinton, Nitish Srivastava, Alex Krizhevsky, Ilya Sutskever, and Ruslan R. Salakhutdinov. \"Improving neural networks by preventing co-adaptation of feature detectors.\" arXiv preprint. [https://arxiv.org/abs/1207.0580](https://arxiv.org/abs/1207.0580)\n\n\n[1]: https://arxiv.org/abs/1412.6980\n[2]: https://arxiv.org/abs/1207.0580\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdylanmuir%2Ffmin_adam","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdylanmuir%2Ffmin_adam","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdylanmuir%2Ffmin_adam/lists"}