{"id":13452698,"url":"https://github.com/airbnb/aerosolve","last_synced_at":"2025-05-14T09:08:52.242Z","repository":{"id":31938004,"uuid":"35507603","full_name":"airbnb/aerosolve","owner":"airbnb","description":"A machine learning package built for humans.","archived":false,"fork":false,"pushed_at":"2024-09-23T23:32:30.000Z","size":6888,"stargazers_count":4794,"open_issues_count":10,"forks_count":563,"subscribers_count":352,"default_branch":"master","last_synced_at":"2024-10-29T15:34:27.513Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"http://airbnb.github.io/aerosolve/","language":"Scala","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/airbnb.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2015-05-12T19:11:57.000Z","updated_at":"2024-10-29T05:05:54.000Z","dependencies_parsed_at":"2024-11-19T03:55:14.973Z","dependency_job_id":null,"html_url":"https://github.com/airbnb/aerosolve","commit_stats":{"total_commits":811,"total_committers":29,"mean_commits":27.96551724137931,"dds":0.6707768187422934,"last_synced_commit":"442e76ebb7cbc7f60b04fbfac30dbf862aaffc67"},"previous_names":[],"tags_count":101,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/airbnb%2Faerosolve","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/airbnb%2Faerosolve/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/airbnb%2Faerosolve/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/airbnb%2Faerosolve/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/airbnb","download_url":"https://codeload.github.com/airbnb/aerosolve/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247949449,"owners_count":21023325,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-07-31T08:00:20.687Z","updated_at":"2025-04-09T00:29:56.080Z","avatar_url":"https://github.com/airbnb.png","language":"Scala","readme":"\naerosolve\n=========\n\nMachine learning **for humans**.\n\n[![Build Status](https://travis-ci.org/airbnb/aerosolve.svg)](https://travis-ci.org/airbnb/aerosolve)\n[ ![Download](https://api.bintray.com/packages/airbnb/aerosolve/aerosolve-core/images/download.svg) ](https://bintray.com/airbnb/aerosolve/aerosolve-core/_latestVersion)\n[ ![Download](https://api.bintray.com/packages/airbnb/aerosolve/aerosolve-training/images/download.svg) ](https://bintray.com/airbnb/aerosolve/aerosolve-training/_latestVersion)\n\nWhat is it?\n-----------\n\nA machine learning library designed from the ground up to be human friendly.\nIt is different from other machine learning libraries in the following ways:\n\n  * A [thrift based feature representation](https://github.com/airbnb/aerosolve/tree/master/core/src/main/thrift) that enables pairwise ranking loss and single context multiple item representation.\n  * A [feature transform language](https://github.com/airbnb/aerosolve/tree/master/core/src/main/java/com/airbnb/aerosolve/core/transforms) gives the user a lot of control over the features\n  * Human friendly [debuggable models](https://github.com/airbnb/aerosolve/tree/master/core/src/main/java/com/airbnb/aerosolve/core/models)\n  * Separate lightweight [Java inference code](https://github.com/airbnb/aerosolve/tree/master/core/src/main/java/com/airbnb/aerosolve/core)\n  * Scala code for [training](https://github.com/airbnb/aerosolve/tree/master/training/src/main/scala/com/airbnb/aerosolve/training)\n  * Simple [image content analysis code](https://github.com/airbnb/aerosolve/tree/master/core/src/main/java/com/airbnb/aerosolve/core/images) suitable for ordering or ranking images\n\nThis library is meant to be used with sparse, interpretable features such as those that commonly occur in search\n(search keywords, filters) or pricing (number of rooms, location, price). It is not as interpretable with problems with very dense\nnon-human interpretable features such as raw pixels or audio samples.\n\nThere are a few reasons to focus on interpretability:\n\n  * Your corpus is new and not fully defined and you want more insight into your corpus\n  * Having interpretable models lets you iterate quickly. Figure out where the model disagrees most and have insight into what kind of new features are needed.\n  * Debugging noisy features. By plotting the feature weights you can discover buggy features or fit them to splines and discover features that are unexpectedly complex (which usually indicates overfitting).\n  * You can discover relationships between different variables and your target prediction. e.g. For the Airbnb demand model, plotting graphs of reviews and 3-star reviews is more interpretable than many nested if then else rules.\n\n![Graph of reviews and 3-star reviews and feature weight](reviews.png)\n\nHow to get started?\n-------------------\n\nThe artifacts for aerosolve are [hosted on bintray](https://bintray.com/airbnb/aerosolve). If you use Maven, SBT or Gradle you can just point to bintray\nas a repository and automatically fetch the artifacts.\n\nCheck out the image impression demo where you can learn how to teach\nthe algorithm to paint in the pointillism style of painting.\n[Image Impressionism Demo.](https://github.com/airbnb/aerosolve/tree/master/demo/image_impressionism)\n\nThere is also an income prediction demo based on a popular\nmachine learning benchmark.\n[Income Prediction Demo.](https://github.com/airbnb/aerosolve/tree/master/demo/income_prediction)\n\nFeature Representation\n----------------------\n\nThis section dives into the [thrift based feature representation](https://github.com/airbnb/aerosolve/tree/master/core/src/main/thrift).\n\nFeatures are grouped into logical groups called families of features. The reason for this is so we can express transformations on an entire feature family\nat once or interact two different families of features together to create a new feature family.\n\nThere are three kinds of features per FeatureVector:\n\n  * stringFeatures - this is a map of feature family to binary feature strings. For example \"GEO\" -\u003e { \"San Francisco\", \"CA\", \"USA\" }\n  * floatFeatures - this is a map of feature family to feature name and value. For example \"LOC\" -\u003e { \"Latitude\" : 37.75, \"Longitude\" : -122.43 }\n  * denseFeatures - this is a map of feature family to a dense array of floats. Not really used except for the image content analysis code.\n\nExample Representation\n----------------------\n\nExamples are the basic unit of creating training data and scoring.\nA single example is composed of:\n\n  * context - this is a FeatureVector that occurs once in the example. It could be the features representing a search session for example. e.g. \"Keyword\" -\u003e \"Free parking\"\n  * example(0..N) - this is a repeated list of FeatureVectors that represent the items being scored. These can correspond to documents in a search session. e.g. \"LISTING CITY\" -\u003e \"San Francisco\"\n\nThe reasons for having this structure are:\n\n  * having one context for hundreds of items saves a lot of space during RPCs or even on disk\n  * you can compute the transforms for the context once, then apply the transformed context repeatedly in conjunction with each item\n  * having a list of items allows the use of list based loss functions such as pairwise ranking loss, domination loss etc where we evaluate multiple items at once\n\nFeature Transform language\n--------------------------\n\nThis section dives into the [feature transform language](https://github.com/airbnb/aerosolve/tree/master/core/src/main/java/com/airbnb/aerosolve/core/transforms).\n\nFeature transforms are applied with a separate [transformer module](https://github.com/airbnb/aerosolve/blob/master/core/src/main/java/com/airbnb/aerosolve/core/transforms/Transformer.java) that is decoupled from the model. This allows the user to break apart transforms or transform data ahead of time of scoring for example. e.g. in an application the items in a corpus may be transformed ahead of time and stored, while the context is not known until runtime. Then at runtime, one can transform the context and combined them with each transformed item to get the final feature vector that is then fed to the models.\n\nFeature transforms allow us to modify FeatureVectors on the fly. This allows engineers to rapidly iterate on feature engineering\nquickly and in a controlled way.\n\nHere are some examples of feature transforms that are commonly used:\n\n  * [List transform](https://github.com/airbnb/aerosolve/blob/master/core/src/main/java/com/airbnb/aerosolve/core/transforms/ListTransform.java). A meta transform that specifies other transforms to be applied\n  * [Cross transform](https://github.com/airbnb/aerosolve/blob/master/core/src/main/java/com/airbnb/aerosolve/core/transforms/CrossTransform.java). Operates only on stringFeatures. Allows interactions between two different string feature families. e.g. \"Keyword\" cross \"LISTING CITY\" creates the new feature family \"Keyword_x_city\" -\u003e \"Free parking^San Francisco\"\n  * [Multiscale grid transform](https://github.com/airbnb/aerosolve/blob/master/core/src/main/java/com/airbnb/aerosolve/core/transforms/MultiscaleGridQuantizeTransform.java) Constructs multiple nested grids for 2D coordinates. Useful for modelling geography.\n\nPlease see the [corresponding unit tests](https://github.com/airbnb/aerosolve/tree/master/core/src/test/java/com/airbnb/aerosolve/core/transforms) as to what these transforms do, what kind of features they operate on and what kind of config they expect.\n\nModels\n------\n\nThis section covers [debuggable models](https://github.com/airbnb/aerosolve/tree/master/core/src/main/java/com/airbnb/aerosolve/core/models).\n\nAlthough there are several models in the model directory only two are the main debuggable models. The rest are experimental or sub-models that create transforms for the interpretable models.\n\n[Linear model.](https://github.com/airbnb/aerosolve/blob/master/core/src/main/java/com/airbnb/aerosolve/core/models/LinearModel.java)\nSupports hinge, logistic, epsilon insensitive regression, ranking loss functions.\nOnly operates on stringFeatures.\nThe label for the task is stored in a special feature family and specified by rank_key in the config.\nSee the [linear model unit tests](https://github.com/airbnb/aerosolve/blob/master/training/src/test/scala/com/airbnb/aerosolve/training/LinearClassificationTrainerTest.scala) on how to set up the models.\nNote that in conjunction with quantization and crosses you can get incredible amounts of complexity from the \"linear\" model, so it is not actually your regular linear model but something more complex and can be thought of as a bushy, very wide decision tree with millions of branches.\n\n[Spline model.](https://github.com/airbnb/aerosolve/blob/master/core/src/main/java/com/airbnb/aerosolve/core/models/SplineModel.java)\nA general additive linear piecewise spline model.\nThe training is done at a higher resolution specified by num_buckets between the min and max of a feature's range.\nAt the end of each iteration we attempt to project the linear piecewise spline into a lower dimensional function such as a polynomial spline with Dirac delta endpoints.\nIf the RMSE of the projection is above threshold, we leave the spline alone in the high resolution piecewise linear mode.\nThis allows us to debug the spline model for features that are buggy or unexpectedly complex (e.g. jumping up and down when we expect some kind of smoothness)\n\n   * Boosted stumps model - small compact model. Not very interpretable but at small sizes useful for feature selection.\n   * Decision tree model - in memory only. Mostly used to generate transforms for the linear or spline model.\n   * Maxout neural network model. Experimental and mostly used as a comparison baseline.\n\nIDE\n------\nIf you use intellij, try build first, so that thrift classes is available and to fix the spark compiling error inside intellij, type `command+;` and click dependency and change related files from test to compile, such as org.apache.spark and org.apache.hadoop:hadoop-common.\nWe keep gradle config as testCompile so that to reduce jar file size.\n\nSupport\n-------\n\n[Hackpad](https://aerosolve.hackpad.com/Welcome-to-Aerosolve-xZEVtJC9D8a)\n\n[Dev group](https://groups.google.com/forum/#!forum/aerosolve-dev)\n\n[User group](https://groups.google.com/forum/#!forum/aerosolve-users)\n\nIn the wild\n-------\nOrganizations and projects using `aerosolve` can list themselves [here](inthewild.md).\n","funding_links":[],"categories":["Java","Machine_Learning","Scala","II. Databases, search engines, big data and machine learning","人工智能","📚 Project Purpose"],"sub_categories":["Tools","[Tools](#tools-1)","Speech Recognition","8. Machine Learning","Machine Learning (Intermediate-Level"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fairbnb%2Faerosolve","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fairbnb%2Faerosolve","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fairbnb%2Faerosolve/lists"}