{"id":18118067,"url":"https://github.com/mozilla/taar","last_synced_at":"2025-12-30T09:18:45.019Z","repository":{"id":54648413,"uuid":"85124900","full_name":"mozilla/taar","owner":"mozilla","description":"Telemetry-Aware Addon Recommender","archived":false,"fork":false,"pushed_at":"2023-07-25T15:14:10.000Z","size":6701,"stargazers_count":29,"open_issues_count":7,"forks_count":20,"subscribers_count":6,"default_branch":"main","last_synced_at":"2024-10-29T11:05:45.709Z","etag":null,"topics":["addons","recommendations","telemetry"],"latest_commit_sha":null,"homepage":null,"language":"HTML","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mpl-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mozilla.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2017-03-15T21:59:28.000Z","updated_at":"2024-10-11T11:32:16.000Z","dependencies_parsed_at":"2023-01-24T06:30:56.415Z","dependency_job_id":"c4e6ac7b-d43e-46ad-91f9-29df5a72fcaf","html_url":"https://github.com/mozilla/taar","commit_stats":{"total_commits":310,"total_committers":17,"mean_commits":"18.235294117647058","dds":0.3612903225806452,"last_synced_commit":"f542a1ec1ea50812c81a9782922447adc0a5bfab"},"previous_names":[],"tags_count":47,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mozilla%2Ftaar","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mozilla%2Ftaar/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mozilla%2Ftaar/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mozilla%2Ftaar/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mozilla","download_url":"https://codeload.github.com/mozilla/taar/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245450130,"owners_count":20617294,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["addons","recommendations","telemetry"],"created_at":"2024-11-01T05:08:24.632Z","updated_at":"2025-12-14T12:15:29.892Z","avatar_url":"https://github.com/mozilla.png","language":"HTML","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Taar\nTelemetry-Aware Addon Recommender\n\n[![CircleCI](https://circleci.com/gh/mozilla/taar.svg?style=svg)](https://circleci.com/gh/mozilla/taar)\n\n\nTable of Contents\n=================\n\n* [Taar](#taar)\n  * [How does it work?](#how-does-it-work)\n    * [Supported models](#supported-models)\n  * [Build and run tests](#build-and-run-tests)\n  * [Pinning dependencies](#pinning-dependencies)\n  * [Instructions for releasing updates to production](#instructions-for-releasing-updates-to-production)\n  * [Collaborative Recommender](#collaborative-recommender)\n  * [Ensemble Recommender](#ensemble-recommender)\n  * [Locale Recommender](#locale-recommender)\n  * [Similarity Recommender](#similarity-recommender)\n  * [Google Cloud Platform resources](#google-cloud-platform-resources)\n    * [Google Cloud BigQuery](#google-cloud-bigquery)\n    * [Google Cloud Storage](#google-cloud-storage)\n    * [Google Cloud BigTable](#google-cloud-bigtable)\n  * [Production Configuration Settings](#production-configuration-settings)\n  * [Deleting individual user data from all TAAR resources](#deleting-individual-user-data-from-all-taar-resources)\n  * [Airflow environment configuration](#airflow-environment-configuration)\n  * [Staging Environment](#staging-environment)\n  * [A note on cdist optimization\\.](#a-note-on-cdist-optimization)\n\n\n## How does it work?\nThe recommendation strategy is implemented through the\n[RecommendationManager](taar/recommenders/recommendation_manager.py).\nOnce a recommendation is requested for a specific [client\nid](https://firefox-source-docs.mozilla.org/toolkit/components/telemetry/telemetry/data/common-ping.html),\nthe recommender iterates through all the registered models (e.g.\n[CollaborativeRecommender](taar/recommenders/collaborative_recommender.py))\nlinearly in their registered order. Results are returned from the\nfirst module that can perform a recommendation.\n\nEach module specifies its own sets of rules and requirements and thus\ncan decide if it can perform a recommendation independently from the\nother modules.\n\n### Supported models\nThis is the ordered list of the currently supported models:\n\n| Order | Model | Description | Conditions | Generator job |\n|-------|-------|-------------|------------|---------------|\n| 1 | [Collaborative](taar/recommenders/collaborative_recommender.py) | recommends add-ons based on add-ons installed by other users (i.e. [collaborative filtering](https://en.wikipedia.org/wiki/Collaborative_filtering))|Telemetry data is available for the user and the user has at least one enabled add-on|[source](https://github.com/mozilla/telemetry-batch-view/blob/master/src/main/scala/com/mozilla/telemetry/ml/AddonRecommender.scala)|\n| 2 | [Similarity](taar/recommenders/similarity_recommender.py) | recommends add-ons based on add-ons installed by similar representative users|Telemetry data is available for the user and a suitable representative donor can be found|[source](https://github.com/mozilla/telemetry-airflow/blob/master/jobs/taar_similarity.py)|\n| 3 | [Locale](taar/recommenders/locale_recommender.py) |recommends add-ons based on the top addons for the user's locale|Telemetry data is available for the user and the locale has enough users|[source](https://github.com/mozilla/telemetry-airflow/blob/master/jobs/taar_locale.py)|\n| 4 | [Ensemble](taar/recommenders/ensemble_recommender.py) \u0026#42;|recommends add-ons based on the combined (by [stacked generalization](https://en.wikipedia.org/wiki/Ensemble_learning#Stacking)) recomendations of other available recommender modules.|More than one of the other Models are available to provide recommendations.|[source](https://github.com/mozilla/telemetry-airflow/blob/master/jobs/taar_ensemble.py)|\n\nAll jobs are scheduled in Mozilla's instance of\n[Airflow](https://github.com/mozilla/telemetry-airflow).  The\nCollaborative, Similarity and Locale jobs are executed on a\n[daily](https://github.com/mozilla/telemetry-airflow/blob/master/dags/taar_daily.py)\nschedule, while the ensemble job is scheduled on a\n[weekly](https://github.com/mozilla/telemetry-airflow/blob/master/dags/taar_weekly.py)\nschedule.\n\n\n## Build and run tests\nYou should be able to build taar using Python 3.5 or 3.7. \nTo run the testsuite, execute ::\n\n```python\n$ python setup.py develop\n$ python setup.py test\n```\n\nAlternately, if you've got GNUMake installed, a Makefile is included\nwith\n[`build`](https://github.com/mozilla/taar/blob/more_docs/Makefile#L20)\nand\n[`test-container`](https://github.com/mozilla/taar/blob/more_docs/Makefile#L55)\ntargets.\n\nYou can just run `make\nbuild; make test-container` which will build a complete Docker\ncontainer and run the test suite inside the container.\n\n## Pinning dependencies\n\nTAAR uses miniconda and a environment.yml file to manage versioning.\n\nTo update versions, edit the `environment.yml` with the new dependency\nyou need then run `make conda_update`.\n\nIf you are unfamiliar with using conda, see the [official\ndocumentation](https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html)\nfor reference.\n\n## Instructions for releasing updates to production\n\nBuilding a new release of TAAR is fairly involved.  Documentation to\ncreate a new release has been split out into separate\n[instructions](https://github.com/mozilla/taar/blob/master/docs/release_instructions.md).\n\n\n## Dependencies\n\n### Google Cloud Storage resources\n\nThe final TAAR models are stored in:\n\n```gs://moz-fx-data-taar-pr-prod-e0f7-prod-models```\n\nThe TAAR production model bucket is defined in Airflow under the\nvariable `taar_etl_model_storage_bucket`\n\nTemporary models that the Airflow  ETL jobs require are stored in a\ntemporary bucket defined in the Airflow variable `taar_etl_storage_bucket`\n\nRecommendation engines load models from GCS.\n\nThe following table is a complete list of all resources per\nrecommendation engine.\n\nRecommendation Engine |  GCS Resource \n--- | ---\nRecommendationManager Whitelist | gs://moz-fx-data-taar-pr-prod-e0f7-prod-models/addon_recommender/only_guids_top_200.json.bz2\nSimilarity Recommender | gs://moz-fx-data-taar-pr-prod-e0f7-prod-models/taar/similarity/donors.json.bz2 \u003cbr\u003e gs://moz-fx-data-taar-pr-prod-e0f7-prod-models/taar/similarity/lr_curves.json.bz2\nCollaborativeRecommender |  gs://moz-fx-data-taar-pr-prod-e0f7-prod-models/addon_recommender/item_matrix.json.bz2 \u003cbr\u003e gs://moz-fx-data-taar-pr-prod-e0f7-prod-models/addon_recommender/addon_mapping.json.bz2\nLocaleRecommender | gs://moz-fx-data-taar-pr-prod-e0f7-prod-models/taar/locale/top10_dict.json.bz2\nEnsembleRecommender | gs://moz-fx-data-taar-pr-prod-e0f7-prod-models/taar/ensemble/ensemble_weight.json.bz2\nTAAR lite | gs://moz-fx-data-taar-pr-prod-e0f7-prod-models/taar/lite/guid_install_ranking.json.bz2 \u003cbr/\u003e gs://moz-fx-data-taar-pr-prod-e0f7-prod-models/taar/lite/guid_coinstallation.json.bz2\n\n\n# Production environment variables required for TAAR\n\n## Collaborative Recommender\n\nEnv Variable | Value \n------- | --- \nTAAR_ITEM_MATRIX_BUCKET | \"moz-fx-data-taar-pr-prod-e0f7-prod-models\"\nTAAR_ITEM_MATRIX_KEY  | \"addon_recommender/item_matrix.json.bz2\"\nTAAR_ADDON_MAPPING_BUCKET | \"moz-fx-data-taar-pr-prod-e0f7-prod-models\"\nTAAR_ADDON_MAPPING_KEY | \"addon_recommender/addon_mapping.json.bz2\"\n\n## Ensemble Recommender\n\nEnv Variable | Value\n--- | --- \nTAAR_ENSEMBLE_BUCKET  | \"moz-fx-data-taar-pr-prod-e0f7-prod-models\"\nTAAR_ENSEMBLE_KEY | \"taar/ensemble/ensemble_weight.json.bz2\"\n\n## Locale Recommender\n\nEnv Variable | Value\n--- | --- \nTAAR_LOCALE_BUCKET | \"moz-fx-data-taar-pr-prod-e0f7-prod-models\"\nTAAR_LOCALE_KEY | \"taar/locale/top10_dict.json.bz2\"\n\n## Similarity Recommender\n\nEnv Variable | Value\n--- | --- \nTAAR_SIMILARITY_BUCKET | \"moz-fx-data-taar-pr-prod-e0f7-prod-models\"\nTAAR_SIMILARITY_DONOR_KEY | \"taar/similarity/donors.json.bz2\"\nTAAR_SIMILARITY_LRCURVES_KEY | \"taar/similarity/lr_curves.json.bz2\"\n\n\n## TAAR Lite\n\nEnv Variable | Value\n--- | --- \nTAARLITE_GUID_COINSTALL_BUCKET | \"moz-fx-data-taar-pr-prod-e0f7-prod-models\"\nTAARLITE_GUID_COINSTALL_KEY | \"taar/lite/guid_coinstallation.json.bz2\"\nTAARLITE_GUID_RANKING_KEY | \"taar/lite/guid_install_ranking.json.bz2\"\n\n\n## Google Cloud Platform resources\n### Google Cloud BigQuery\n\nCloud BigQuery uses the GCP project defined in Airflow in the\nvariable `taar_gcp_project_id`.\n\nDataset  \n* `taar_tmp`\n\nTable ID \n* `taar_tmp_profile`\n\nNote that this table only exists for the duration of the taar_weekly\njob, so there should be no need to manually manage this table.\n\n### Google Cloud Storage \n\nThe taar user profile extraction puts Avro format files into \na GCS bucket defined by the following two variables in Airflow:\n\n* `taar_gcp_project_id`\n* `taar_etl_storage_bucket`\n\nThe bucket is automatically cleared at the *start* and *end* of\nthe TAAR weekly ETL job.\n\n### Google Cloud BigTable \n\nThe final TAAR user profile data is stored in a Cloud BigTable\ninstance defined by the following two variables in Airflow:\n\n* `taar_gcp_project_id`\n* `taar_bigtable_instance_id`\n\nThe table ID for user profile information is `taar_profile`.\n\n\n------\n\n## Production Configuration Settings\n\nProduction environment settings are stored in a [private repository](https://github.com/mozilla-services/cloudops-deployment/blob/master/projects/data/puppet/yaml/type/data.api.prod.taar.yaml).\n\n\n## Deleting individual user data from all TAAR resources\n\nDeletion of records in TAAR is fairly straight forward.  Once a user\ndisables telemetry from Firefox, all that is required is to delete\nrecords from TAAR.\n\nDeletion of records from the TAAR BigTable instance will remove the\nclient's list of addons from TAAR.  No further work is required.\n\nRemoval of the records from BigTable will cause JSON model updates to\nno longer take the deleted record into account.  JSON models are\nupdated on a daily basis via the\n[`taar_daily`](https://github.com/mozilla/telemetry-airflow/blob/master/dags/taar_daily.py)\n\nUpdates in the weekly Airflow job in \n[`taar_weekly`](https://github.com/mozilla/telemetry-airflow/blob/master/dags/taar_weekly.py) only update the ensemble weights and the user profile information.\n\nIf the user profile information in `clients_last_seen` continues to\nhave data for the user's telemetry-id, TAAR will repopulate the user\nprofile data.  \n\nUsers who wish to remove their data from TAAR need to: \n1. Disable telemetry in Firefox\n2. Have user telemetry data removed from all telemetry storage systems\n   in GCP. Primarily this means the `clients_last_seen` table in\n   BigQuery.\n3. Have user data removed from BigTable.\n\n\n\n## Airflow environment configuration\n\nTAAR requires some configuration to be stored in Airflow variables for\nthe ETL jobs to run to completion correctly.\n\nAirflow Variable | Value \n--- | ---\ntaar_gcp_project_id | The Google Cloud Platform project where BigQuery temporary tables, Cloud Storage buckets for Avro files and BigTable reside for TAAR.\ntaar_etl_storage_bucket | The Cloud Storage bucket name where temporary Avro files will reside when transferring data from BigQuery to BigTable. \ntaar_etl_model_storage_bucket | The main GCS bucket where the models are stored\ntaar_bigtable_instance_id | The BigTable instance ID for TAAR user profile information\ntaar_dataflow_subnetwork | The subnetwork required to communicate between Cloud Dataflow\n\n\n## Staging Environment\n\nThe staging environment of the TAAR service in GCP can be reached using\ncurl.\n\n```\ncurl https://user@pass:stage.taar.nonprod.dataops.mozgcp.net/v1/api/recommendations/\u003chashed_telemetry_id\u003e\n```\n\nRequests for a TAAR-lite recommendation can be made using curl as\nwell:\n\n```\ncurl https://stage.taar.nonprod.dataops.mozgcp.net/taarlite/api/v1/addon_recommendations/\u003caddon_guid\u003e/\n```\n\n\n## TAARlite cache tools\n\nThere is a taarlite-redis tool to manage the taarlit redis cache.\n\nThe cache needs to be populated using the `--load` command or TAARlite\nwill return no results.\n\nIt is safe to reload new data while TAARlite is running - no\nperformance degradation is expected.\n\nThe cache contains a 'hot' buffer for reads and a 'cold' buffer to\nwrite updated data to.\n\nSubsequent invocations to `--load` will update the cache in the cold\nbuffer.  After data is successfully loaded, the hot and cold buffers\nare swapped.\n\nRunning the the taarlite-redis tool inside the container:\n\n```\n$ docker run -it taar:latest bin/run python /opt/conda/bin/taarlite-redis.py --help\n\nUsage: taarlite-redis.py [OPTIONS]\n\n  Manage the TAARLite redis cache.\n\n  This expecte that the following environment variables are set:\n\n  REDIS_HOST REDIS_PORT\n\nOptions:\n  --reset  Reset the redis cache to an empty state\n  --load   Load data into redis\n  --info   Display information about the cache state\n  --help   Show this message and exit.\n```\n\n\n## Testing\n\n\nTAARLite will respond with suggestions given an addon GUID.\n\nA sample URL path may look like this:\n\n`/taarlite/api/v1/addon_recommendations/uBlock0%40raymondhill.net/`\n\nTAAR will treat any client ID with only repeating digits (ie: 0000) as\na test client ID and will return a dummy response.\n\nA URL with the path : `/v1/api/recommendations/0000000000/` will\nreturn a valid JSON result\n\n\n## A note on cdist optimization. \ncdist can speed up distance computation by a factor of 10 for the computations we're doing.\nWe can use it without problems on the canberra distance calculation.\n\nUnfortunately there are multiple problems with it accepting a string array. There are different\nproblems in 0.18.1 (which is what is available on EMR), and on later versions. In both cases \ncdist attempts to convert a string to a double, which fails. For versions of scipy later than\n0.18.1 this could be worked around with:\n\n    distance.cdist(v1, v2, lambda x, y: distance.hamming(x, y))\n\nHowever, when you manually provide a callable to cdist, cdist can not do it's baked in \noptimizations (https://github.com/scipy/scipy/blob/v1.0.0/scipy/spatial/distance.py#L2408)\nso we can just apply the function `distance.hamming` to our array manually and get the same\nperformance.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmozilla%2Ftaar","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmozilla%2Ftaar","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmozilla%2Ftaar/lists"}