{"id":19245093,"url":"https://github.com/rxtoolkit/sgd","last_synced_at":"2025-02-23T15:15:44.152Z","repository":{"id":214912866,"uuid":"737663990","full_name":"rxtoolkit/sgd","owner":"rxtoolkit","description":"📊 RxJS implementation of stochastic gradient descent (SGD) (classifier).","archived":false,"fork":false,"pushed_at":"2024-01-01T18:11:34.000Z","size":75,"stargazers_count":1,"open_issues_count":1,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-02-06T11:26:01.350Z","etag":null,"topics":["ai","data-science","fp","functional-programming","machine-learning","ml","observables","package","reactive-programming","rxjs","statistics"],"latest_commit_sha":null,"homepage":"","language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/rxtoolkit.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-01-01T01:29:08.000Z","updated_at":"2024-01-02T15:44:53.000Z","dependencies_parsed_at":null,"dependency_job_id":"24a50fea-827c-4b2b-a2b4-32e091f37453","html_url":"https://github.com/rxtoolkit/sgd","commit_stats":null,"previous_names":["rxtoolkit/sgd"],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rxtoolkit%2Fsgd","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rxtoolkit%2Fsgd/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rxtoolkit%2Fsgd/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rxtoolkit%2Fsgd/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/rxtoolkit","download_url":"https://codeload.github.com/rxtoolkit/sgd/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":240331361,"owners_count":19784646,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","data-science","fp","functional-programming","machine-learning","ml","observables","package","reactive-programming","rxjs","statistics"],"created_at":"2024-11-09T17:26:37.201Z","updated_at":"2025-02-23T15:15:44.130Z","avatar_url":"https://github.com/rxtoolkit.png","language":"JavaScript","readme":"# @rxtk/sgd\n\u003e 📊 RxJS implementation of stochastic gradient descent (SGD) (classifier).\n\n## About SGDs\nStochastic Gradient Descent (SGD) classifiers are a fairly simple and commonly used machine learning algorithm. They scale well and train quickly under most conditions since they can be trained incrementally on each new item.\n\n## Getting started\n\n```bash\nnpm i @rxtk/sgd\n```\n\n```bash\nyarn add @rxtk/sgd\n```\n\n## API\n\n### `classifier`\n```js\nimport { of, range } from 'rxjs';\nimport { takeLast } from 'rxjs/operators';\nimport { classifier } from '@rxtk/sgd';\n\nconst trainingData = [ // in the form of [row, label]\n  [[2.7810836, 2.550537003], 0],\n  [[1.465489372, 2.362125076], 0],\n  [[3.396561688, 4.400293529], 0],\n  [[1.38807019, 1.850220317], 0],\n  [[3.06407232, 3.005305973], 0],\n  [[7.627531214, 2.759262235], 1],\n  [[5.332441248, 2.088626775], 1],\n  [[6.922596716, 1.77106367], 1],\n  [[8.675418651, -0.242068655], 1],\n  [[7.673756466, 3.508563011], 1],\n];\n\n// Run over the training data 100 times (epochs) and fit an SGD classifier to it:\nconst myClassifier = function myClassifier() {\n  return range(1, 100).pipe(\n    mapTo(trainingData),\n    mergeMap(rows =\u003e of(...rows)), // flatten the observable so each emission is a row\n    classifier() // fit an SGD classifier to the data\n  );\n};\n\nexport default myClassifier;\n```\n\n\u003e 💡 In this case, the training data was a hardcoded array. But it could be any data stream: a database table, a Mongo query, a series of HTTP requests, a CSV file from AWS S3... You name it!\n\nThe classifier is reactive. It will update every time a new item is ingested. These incremental classifiers are often useful if you want to understand how the model is improving (or overfitting itself) as it munches on more data. If you just want to see the final classifier then you can simply ignore the incremental results and take the last one:\n\n```js\nimport {takeLast} from 'rxjs/operators';\n\nimport myClassifier from './myClassifier';\n\n// get the final, fitted classifier\nconst classifier$ = myClassifier().pipe(\n  takeLast(1)\n);\nclassifier$.subscribe(console.log);\n// {\n//   intercept: -0.8596443546618895, \n//   weights: [1.5223825112460012, -2.2187002105650175]\n// }\n```\n\n\u003e 💡 **SGD explained**: Each instance of the classifier is just a simple JavaScript object with a few keys, most importantly the model parameters. Internally, SGD classifiers are actually very simple. The original implementation of this operator was perhaps only around 50 lines of code. They just calculate an intercept value and attach a weight to each feature column. That's all there is to it!\n\n### `predict`\nThe easiest way to use a trained classifier is to store its parameters for later use. Taking the classifier from the prior example, we can plugin its fitted parameters to make new predictions like this:\n```js\nimport { from } from 'rxjs';\nimport { predict } from '@rxtk/sgd';\n\nconst newRow$ = from([\n  [7.673756466, 3.508563011],\n  [1.38807019, 1.850220317],\n]);\n\n// the classifier tells us these values:\nconst modelParameters = {\n  intercept: -0.8596443546618895, \n  weights: [1.5223825112460012, -2.2187002105650175],\n};\n\n// now run predictions on the new data\nconst prediction$ = newRow$.pipe(\n  predict(modelParameters) // run predictions using the model parameters\n);\nprediction$.subscribe(console.log);\n// 0.9542746551950381\n// 0.054601004287547356\n// the first value above is closer to 1 (\u003e0.5), so it is positive/true\n// the second value is closer to 0 (\u003c0.5) so it is negative/false\n```\nWhat if you don't want to store the model parameters? In that case, they can be recalculated as needed:\n```js\nimport {takeLast, mergeMap} from 'rxjs/operators';\nimport {predict} from '@rxtk/sgd';\n\n// import the classifier from the previous example\nimport myClassifier from './myClassifier';\n\nconst newRow$ = from([\n  [7.673756466, 3.508563011],\n  [1.38807019, 1.850220317],\n]);\n\nconst classifier$ = myClassifier();\nconst prediction$ = classifier$.pipe(\n  takeLast(1),\n  mergeMap(classifier =\u003e testData$.pipe(\n    predict(classifier)\n  ))\n);\nprediction$.subscribe(console.log);\n```\n\n## Create a reactive data pipeline\nHaving the ability to train SGD models on-the-fly from RxJS streams offers some very interesting possibilities. For example, in the real world, it's often best to re-train models from time to time to improve their accuracy. This code would retrain an SGD model every 30 minutes and replace the previous model with a new one based on more recent data:\n\n```js\nimport { map, mergeMap, share, takeLast, windowTime } from 'rxjs/operators';\nimport {classifier} from '@rxtk/sgd';\n\n// this could be any function that returns an appropriate input Observable\nconst input$ = streamMyData();\n\n// every 30 minutes, train a new classifier on the\nconst latestClassifier$ = input$.pipe(\n  windowTime(30 * 60 * 1000), // every 30 minutes\n  map(window$ =\u003e window$.pipe(\n    classifier(),\n    takeLast(1)\n  )),\n  mergeMap(classifier$ =\u003e classifier$),\n  share()\n);\n// this will emit a new classifier at the end of each 30-minute window\nlatestClassifier.subscribe();\n```\n\nA better version of the code might test the performance of each new classifier to see if it does a better job than the previous classifier. Or the results could be shipped to a backend job runner which would test each model to see if they are better than prior models. Reactive programming makes it easy to do these sorts of things.\n\nWe hope to add more information to this guide in the future. Hopefully this enough information to get you started!\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frxtoolkit%2Fsgd","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frxtoolkit%2Fsgd","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frxtoolkit%2Fsgd/lists"}