{"id":13879643,"url":"https://github.com/ankane/eps","last_synced_at":"2025-11-17T14:07:58.501Z","repository":{"id":46581897,"uuid":"139631171","full_name":"ankane/eps","owner":"ankane","description":"Machine learning for Ruby","archived":false,"fork":false,"pushed_at":"2025-09-30T00:45:11.000Z","size":921,"stargazers_count":681,"open_issues_count":0,"forks_count":15,"subscribers_count":19,"default_branch":"master","last_synced_at":"2025-11-11T03:34:36.935Z","etag":null,"topics":["automl","machine-learning","rubyml"],"latest_commit_sha":null,"homepage":"","language":"Ruby","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ankane.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2018-07-03T19:53:52.000Z","updated_at":"2025-09-30T00:45:15.000Z","dependencies_parsed_at":"2024-02-17T19:28:06.242Z","dependency_job_id":"c4ad9626-a523-4d66-a2b5-3426cc80e557","html_url":"https://github.com/ankane/eps","commit_stats":{"total_commits":288,"total_committers":2,"mean_commits":144.0,"dds":0.00347222222222221,"last_synced_commit":"cd57175f86e8d7b2f3230a0e699b2ddd9b2f0d66"},"previous_names":[],"tags_count":18,"template":false,"template_full_name":null,"purl":"pkg:github/ankane/eps","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ankane%2Feps","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ankane%2Feps/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ankane%2Feps/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ankane%2Feps/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ankane","download_url":"https://codeload.github.com/ankane/eps/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ankane%2Feps/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":284538270,"owners_count":27022378,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-11-15T02:00:06.050Z","response_time":57,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["automl","machine-learning","rubyml"],"created_at":"2024-08-06T08:02:27.571Z","updated_at":"2025-11-17T14:07:58.495Z","avatar_url":"https://github.com/ankane.png","language":"Ruby","funding_links":[],"categories":["Ruby","Machine Learning Libraries"],"sub_categories":["Frameworks"],"readme":"# Eps\n\nMachine learning for Ruby\n\n- Build predictive models quickly and easily\n- Serve models built in Ruby, Python, R, and more\n\nCheck out [this post](https://ankane.org/rails-meet-data-science) for more info on machine learning with Rails\n\n[![Build Status](https://github.com/ankane/eps/actions/workflows/build.yml/badge.svg)](https://github.com/ankane/eps/actions)\n\n## Installation\n\nAdd this line to your application’s Gemfile:\n\n```ruby\ngem \"eps\"\n```\n\nOn Mac, also install OpenMP:\n\n```sh\nbrew install libomp\n```\n\n## Getting Started\n\nCreate a model\n\n```ruby\ndata = [\n  {bedrooms: 1, bathrooms: 1, price: 100000},\n  {bedrooms: 2, bathrooms: 1, price: 125000},\n  {bedrooms: 2, bathrooms: 2, price: 135000},\n  {bedrooms: 3, bathrooms: 2, price: 162000}\n]\nmodel = Eps::Model.new(data, target: :price)\nputs model.summary\n```\n\nMake a prediction\n\n```ruby\nmodel.predict(bedrooms: 2, bathrooms: 1)\n```\n\nStore the model\n\n```ruby\nFile.write(\"model.pmml\", model.to_pmml)\n```\n\nLoad the model\n\n```ruby\npmml = File.read(\"model.pmml\")\nmodel = Eps::Model.load_pmml(pmml)\n```\n\nA few notes:\n\n- The target can be numeric (regression) or categorical (classification)\n- Pass an array of hashes to `predict` to make multiple predictions at once\n- Models are stored in [PMML](https://en.wikipedia.org/wiki/Predictive_Model_Markup_Language), a standard for model storage\n\n## Building Models\n\n### Goal\n\nOften, the goal of building a model is to make good predictions on future data. To help achieve this, Eps splits the data into training and validation sets if you have 30+ data points. It uses the training set to build the model and the validation set to evaluate the performance.\n\nIf your data has a time associated with it, it’s highly recommended to use that field for the split.\n\n```ruby\nEps::Model.new(data, target: :price, split: :listed_at)\n```\n\nOtherwise, the split is random. There are a number of [other options](#validation-options) as well.\n\nPerformance is reported in the summary.\n\n- For regression, it reports validation RMSE (root mean squared error) - lower is better\n- For classification, it reports validation accuracy - higher is better\n\nTypically, the best way to improve performance is feature engineering.\n\n### Feature Engineering\n\nFeatures are extremely important for model performance. Features can be:\n\n1. numeric\n2. categorical\n3. text\n\n#### Numeric\n\nFor numeric features, use any numeric type.\n\n```ruby\n{bedrooms: 4, bathrooms: 2.5}\n```\n\n#### Categorical\n\nFor categorical features, use strings or booleans.\n\n```ruby\n{state: \"CA\", basement: true}\n```\n\nConvert any ids to strings so they’re treated as categorical features.\n\n```ruby\n{city_id: city_id.to_s}\n```\n\nFor dates, create features like day of week and month.\n\n```ruby\n{weekday: sold_on.strftime(\"%a\"), month: sold_on.strftime(\"%b\")}\n```\n\nFor times, create features like day of week and hour of day.\n\n```ruby\n{weekday: listed_at.strftime(\"%a\"), hour: listed_at.hour.to_s}\n```\n\n#### Text\n\nFor text features, use strings with multiple words.\n\n```ruby\n{description: \"a beautiful house on top of a hill\"}\n```\n\nThis creates features based on [word count](https://en.wikipedia.org/wiki/Bag-of-words_model).\n\nYou can specify text features explicitly with:\n\n```ruby\nEps::Model.new(data, target: :price, text_features: [:description])\n```\n\nYou can set advanced options with:\n\n```ruby\ntext_features: {\n  description: {\n    min_occurrences: 5,         # min times a word must appear to be included in the model\n    max_features: 1000,         # max number of words to include in the model\n    min_length: 1,              # min length of words to be included\n    case_sensitive: true,       # how to treat words with different case\n    tokenizer: /\\s+/,           # how to tokenize the text, defaults to whitespace\n    stop_words: [\"and\", \"the\"]  # words to exclude from the model\n  }\n}\n```\n\n## Full Example\n\nWe recommend putting all the model code in a single file. This makes it easy to rebuild the model as needed.\n\nIn Rails, we recommend creating a `app/ml_models` directory. Be sure to restart Spring after creating the directory so files are autoloaded.\n\n```sh\nbin/spring stop\n```\n\nHere’s what a complete model in `app/ml_models/price_model.rb` may look like:\n\n```ruby\nclass PriceModel \u003c Eps::Base\n  def build\n    houses = House.all\n\n    # train\n    data = houses.map { |v| features(v) }\n    model = Eps::Model.new(data, target: :price, split: :listed_at)\n    puts model.summary\n\n    # save to file\n    File.write(model_file, model.to_pmml)\n\n    # ensure reloads from file\n    @model = nil\n  end\n\n  def predict(house)\n    model.predict(features(house))\n  end\n\n  private\n\n  def features(house)\n    {\n      bedrooms: house.bedrooms,\n      city_id: house.city_id.to_s,\n      month: house.listed_at.strftime(\"%b\"),\n      listed_at: house.listed_at,\n      price: house.price\n    }\n  end\n\n  def model\n    @model ||= Eps::Model.load_pmml(File.read(model_file))\n  end\n\n  def model_file\n    File.join(__dir__, \"price_model.pmml\")\n  end\nend\n```\n\nBuild the model with:\n\n```ruby\nPriceModel.build\n```\n\nThis saves the model to `price_model.pmml`. Check this into source control or use a tool like [Trove](https://github.com/ankane/trove) to store it.\n\nPredict with:\n\n```ruby\nPriceModel.predict(house)\n```\n\n## Monitoring\n\nWe recommend monitoring how well your models perform over time. To do this, save your predictions to the database. Then, compare them with:\n\n```ruby\nactual = houses.map(\u0026:price)\npredicted = houses.map(\u0026:predicted_price)\nEps.metrics(actual, predicted)\n```\n\nFor RMSE and MAE, alert if they rise above a certain threshold. For ME, alert if it moves too far away from 0. For accuracy, alert if it drops below a certain threshold.\n\n## Other Languages\n\nEps makes it easy to serve models from other languages. You can build models in Python, R, and others and serve them in Ruby without having to worry about how to deploy or run another language.\n\nEps can serve LightGBM, linear regression, and naive Bayes models. Check out [ONNX Runtime](https://github.com/ankane/onnxruntime) and [Scoruby](https://github.com/asafschers/scoruby) to serve other models.\n\n### Python\n\nTo create a model in Python, install the [sklearn2pmml](https://github.com/jpmml/sklearn2pmml) package\n\n```sh\npip install sklearn2pmml\n```\n\nAnd check out the examples:\n\n- [LightGBM Regression](test/support/python/lightgbm_regression.py)\n- [LightGBM Classification](test/support/python/lightgbm_classification.py)\n- [Linear Regression](test/support/python/linear_regression.py)\n- [Naive Bayes](test/support/python/naive_bayes.py)\n\n### R\n\nTo create a model in R, install the [pmml](https://cran.r-project.org/package=pmml) package\n\n```r\ninstall.packages(\"pmml\")\n```\n\nAnd check out the examples:\n\n- [Linear Regression](test/support/r/linear_regression.R)\n- [Naive Bayes](test/support/r/naive_bayes.R)\n\n### Verifying\n\nIt’s important for features to be implemented consistently when serving models created in other languages. We highly recommend verifying this programmatically. Create a CSV file with ids and predictions from the original model.\n\nhouse_id | prediction\n--- | ---\n1 | 145000\n2 | 123000\n3 | 250000\n\nOnce the model is implemented in Ruby, confirm the predictions match.\n\n```ruby\nmodel = Eps::Model.load_pmml(\"model.pmml\")\n\n# preload houses to prevent n+1\nhouses = House.all.index_by(\u0026:id)\n\nCSV.foreach(\"predictions.csv\", headers: true, converters: :numeric) do |row|\n  house = houses[row[\"house_id\"]]\n  expected = row[\"prediction\"]\n\n  actual = model.predict(bedrooms: house.bedrooms, bathrooms: house.bathrooms)\n\n  success = actual.is_a?(String) ? actual == expected : (actual - expected).abs \u003c 0.001\n  raise \"Bad prediction for house #{house.id} (exp: #{expected}, act: #{actual})\" unless success\n\n  putc \"✓\"\nend\n```\n\n## Data\n\nA number of data formats are supported. You can pass the target variable separately.\n\n```ruby\nx = [{x: 1}, {x: 2}, {x: 3}]\ny = [1, 2, 3]\nEps::Model.new(x, y)\n```\n\nData can be an array of arrays\n\n```ruby\nx = [[1, 2], [2, 0], [3, 1]]\ny = [1, 2, 3]\nEps::Model.new(x, y)\n```\n\nOr Numo arrays\n\n```ruby\nx = Numo::NArray.cast([[1, 2], [2, 0], [3, 1]])\ny = Numo::NArray.cast([1, 2, 3])\nEps::Model.new(x, y)\n```\n\nOr a Rover data frame\n\n```ruby\ndf = Rover.read_csv(\"houses.csv\")\nEps::Model.new(df, target: \"price\")\n```\n\nOr a Daru data frame\n\n```ruby\ndf = Daru::DataFrame.from_csv(\"houses.csv\")\nEps::Model.new(df, target: \"price\")\n```\n\nWhen reading CSV files directly, be sure to convert numeric fields. The `table` method does this automatically.\n\n```ruby\nCSV.table(\"data.csv\").map { |row| row.to_h }\n```\n\n## Algorithms\n\nPass an algorithm with:\n\n```ruby\nEps::Model.new(data, algorithm: :linear_regression)\n```\n\nEps supports:\n\n- LightGBM (default)\n- Linear Regression\n- Naive Bayes\n\n### LightGBM\n\nPass the learning rate with:\n\n```ruby\nEps::Model.new(data, learning_rate: 0.01)\n```\n\n### Linear Regression\n\nBy default, an intercept is included. Disable this with:\n\n```ruby\nEps::Model.new(data, intercept: false)\n```\n\nTo speed up training on large datasets with linear regression, [install GSL](https://github.com/ankane/gslr#gsl-installation). With Homebrew, you can use:\n\n```sh\nbrew install gsl\n```\n\nThen, add this line to your application’s Gemfile:\n\n```ruby\ngem \"gslr\", group: :development\n```\n\nIt only needs to be available in environments used to build the model.\n\n## Probability\n\nTo get the probability of each category for predictions with classification, use:\n\n```ruby\nmodel.predict_probability(data)\n```\n\nNaive Bayes is known to produce poor probability estimates, so stick with LightGBM if you need this.\n\n## Validation Options\n\nPass your own validation set with:\n\n```ruby\nEps::Model.new(data, validation_set: validation_set)\n```\n\nSplit on a specific value\n\n```ruby\nEps::Model.new(data, split: {column: :listed_at, value: Date.parse(\"2025-01-01\")})\n```\n\nSpecify the validation set size (the default is `0.25`, which is 25%)\n\n```ruby\nEps::Model.new(data, split: {validation_size: 0.2})\n```\n\nDisable the validation set completely with:\n\n```ruby\nEps::Model.new(data, split: false)\n```\n\n## Database Storage\n\nThe database is another place you can store models. It’s good if you retrain models automatically.\n\n\u003e We recommend adding monitoring and guardrails as well if you retrain automatically\n\nCreate an Active Record model to store the predictive model.\n\n```sh\nrails generate model Model key:string:uniq data:text\n```\n\nStore the model with:\n\n```ruby\nstore = Model.where(key: \"price\").first_or_initialize\nstore.update(data: model.to_pmml)\n```\n\nLoad the model with:\n\n```ruby\ndata = Model.find_by!(key: \"price\").data\nmodel = Eps::Model.load_pmml(data)\n```\n\n## Jupyter \u0026 IRuby\n\nYou can use [IRuby](https://github.com/SciRuby/iruby) to run Eps in [Jupyter](https://jupyter.org/) notebooks. Here’s how to get [IRuby working with Rails](https://ankane.org/jupyter-rails).\n\n## Weights\n\nSpecify a weight for each data point\n\n```ruby\nEps::Model.new(data, weight: :weight)\n```\n\nYou can also pass an array\n\n```ruby\nEps::Model.new(data, weight: [1, 2, 3])\n```\n\nWeights are supported for metrics as well\n\n```ruby\nEps.metrics(actual, predicted, weight: weight)\n```\n\nReweighing is one method to [mitigate bias](https://fairlearn.org/) in training data\n\n## History\n\nView the [changelog](https://github.com/ankane/eps/blob/master/CHANGELOG.md)\n\n## Contributing\n\nEveryone is encouraged to help improve this project. Here are a few ways you can help:\n\n- [Report bugs](https://github.com/ankane/eps/issues)\n- Fix bugs and [submit pull requests](https://github.com/ankane/eps/pulls)\n- Write, clarify, or fix documentation\n- Suggest or add new features\n\nTo get started with development:\n\n```sh\ngit clone https://github.com/ankane/eps.git\ncd eps\nbundle install\nbundle exec rake test\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fankane%2Feps","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fankane%2Feps","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fankane%2Feps/lists"}