{"id":16847432,"url":"https://github.com/indygreg/mozilla-build-analyzer","last_synced_at":"2025-08-31T22:06:23.298Z","repository":{"id":7757378,"uuid":"9125254","full_name":"indygreg/mozilla-build-analyzer","owner":"indygreg","description":"Aggregation, storage, and analysis of Mozilla build metadata","archived":false,"fork":false,"pushed_at":"2013-08-21T22:07:27.000Z","size":256,"stargazers_count":2,"open_issues_count":0,"forks_count":2,"subscribers_count":4,"default_branch":"master","last_synced_at":"2025-01-24T01:57:13.218Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/indygreg.png","metadata":{"files":{"readme":"README.rst","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2013-03-31T03:09:11.000Z","updated_at":"2015-03-02T15:26:25.000Z","dependencies_parsed_at":"2022-09-13T13:21:55.642Z","dependency_job_id":null,"html_url":"https://github.com/indygreg/mozilla-build-analyzer","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/indygreg%2Fmozilla-build-analyzer","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/indygreg%2Fmozilla-build-analyzer/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/indygreg%2Fmozilla-build-analyzer/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/indygreg%2Fmozilla-build-analyzer/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/indygreg","download_url":"https://codeload.github.com/indygreg/mozilla-build-analyzer/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":244173553,"owners_count":20410300,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-13T13:07:55.596Z","updated_at":"2025-03-18T07:17:08.556Z","avatar_url":"https://github.com/indygreg.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"===========================\nMozilla Build Data Analyzer\n===========================\n\nThis project provides a mechanism for retrieving, storing, and analyzing\ndata from automated builds conducted to build Firefox and other related\nMozilla projects.\n\nThe bulk of this project is a data store that is effectively a shadow-copy\nof build data that is canonically stored on Mozilla's servers. We use\nCassandra as the storage backend. Only some analysis is provided. However,\nadditional analysis can be facilitated by combing through the mountain of\ndata collected that you now have easy access to.\n\nYou can think of this project as a combination of the TBPL database,\nMozilla's FTP server (which hosts the output of all jobs), and Datazilla\n(a database used to store Talos and other build job results).\n\nInitial Setup\n=============\n\nOnce you've cloned the repository, you'll need to set up your run-time\nenvironment. It starts by setting up a virtualenv::\n\n    $ virtualenv my_virtualenv\n    $ source my_virtualenv/bin/activate\n    $ python setup.py develop\n\nNext, you'll need to get a Cassandra instance running. If you already have\none running, great. You can connect to that. If not, run the following to\nbootstrap a Cassandra instance::\n\n    $ python bin/bootstrap_cassandra.py /path/to/install/root\n\n    # e.g.\n    $ python bin/bootstrap_cassandra.py cassandra.local\n\n\nThat program will print out some important info about environment state.\nPlease note it!\n\nYou can copy and paste the output of the bootstrapper to launch\nCassandra. It will look something like::\n\n    $ CASSANDRA_CONF=/home/gps/src/mozilla-build-analyzer/cassandra.local/conf cassandra.local/apache-cassandra-1.2.3/bin/cassandra -f\n\nIf you don't want to run Cassandra in the foreground, just leave off\nthe *-f*.\n\n**At this time, it is highly recommended to use the local Cassandra instance\ninstead of connecting to a production cluster.**\n\nWorkflow\n========\n\nNow that you have your environment configured and Cassandra running, you'll\nwant to populate some data.\n\nThe main interface to all actions is **mbd**. It's just a gateway script\nthat invokes sub commands (like *mach* if you are a Firefox developer).\n\nPopulating Data\n===============\n\nYou need to explicitly tell your deployment which Mozilla build data to\nimport. The sections below detail the different types of build data\nthat can be loaded.\n\nBuild Metadata\n--------------\n\nBuild metadata is the most important data type. It defines what things are\nbuilt and details and how and when they are built. Build metadata consists\nof the following data types:\n\nbuilders\n    Describes a specific build configuration. e.g. *xpcshell tests on\n    mozilla-central on Windows 7 for opt builds* is a builder.\n\nbuilds\n    These are a specific invocation of a builder. These are arguably the\n    most important data type. Most of our data is stored against a\n    specific build instance.\n\nslaves\n    These are the machines that perform build jobs. There are over 1000\n    of them in Mozilla's network.\n\nmasters\n    These coordinate what the slaves do. You don't need to be too concerned\n    with these.\n\nBuild metadata is canonically defined by a bunch of JSON files sitting\non a public HTTP server. The first step to loading build metadata is to\nsynchronize these files with a local copy::\n\n    $ mbd build-files-synchronize\n\nThis will take a while to run initially because there are cumulatively many\ngigabytes of data. If you don't feel like waiting around for all of them to\ndownload, just ctrl-c after you've fetched enough!\n\nNow that we have a copy of the raw build data, we need to extract the\nuseful parts and load them into our local store.\n\nHere is how we load the last week of data::\n\n    $ mbd build-metadata-load --day-count 7\n\nAt this point, you can conduct analysis of build metadata!\n\nRemember, as time goes by, you'll need to continually refresh the build\nfiles and re-load build metadata!\n\nBuild Logs\n----------\n\nBuild metadata can be supplemented with data parsed from the logs of\nindividual jobs. Loading log data follows the same mechanism as build\nmetadata.\n\nBecause logs are quite large (tens of gigabytes) and since you are likely\nonly interested in a subset of logs, it's usually a good idea to import\nonly what you need. Here are some ideas for intelligently importing logs::\n\n    # Only import logs for mozilla-central.\n    $ mbd build-logs-synchronize --category mozilla-central\n\n    # Only import xpcshell logs for PGO builds.\n    $ mbd build-logs-synchronize --builder-pattern '*pgo*xpchsell*'\n\n    # Windows 7 reftests.\n    $ mbd build-logs-synchronize --builder-pattern mozilla-central_win7_test_pgo-reftest\n\n    # Only import logs for mozilla-central after 2013-03-28.\n    $ mbd build-logs-synchronize --after 2013-03-28 --category mozilla-central\n\nImporting logs takes a long time. And, it consumes a *lot* of bandwidth.\nBut, the good news is you only need to do this once (at least once per\nbuild) because logs are idempotent.\n\nAnalyzing Data\n==============\n\nRun mbd with --help for a list of all the commands. Here are some::\n\n    # Print the names of all slaves.\n    $ mbd slave-names\n\n    # Print builds performed on a specific slave.\n    $ mbd slave-builds bld-linux64-ec2-413\n\n    # Print a table listing total times slaves were running builds.\n    $ mbd slave-efficiencies\n\n    # Print all the builders associated with a builder category.\n    $ mbd builders-in-category --print-name mozilla-central\n\n    # Print names of all known builders.\n    $ mbd builder-names\n\n    # Print build ID that occurred on a builder.\n    $ mbd builds-for-builder mozilla-central_ubuntu32_vm_test-xpcshell\n\n    # Print the raw log output for a build.\n    $ mbd log-cat 21177014\n\n    # View times for all mozilla-central builders.\n    $ mbd build-times --category mozilla-central\n\nYou can even perform some advanced pipeline tricks, such as printing all the\nlogs for a single builder::\n\n    $ mbd builds-for-builder mozilla-central_ubuntu32_vm_test-xpcshell | xargs mbd log-cat\n\nDisclaimer\n==========\n\nThe current state of this project is very alpha. Schemas will likely change.\nThere are no guarantees that time spent importing data will not be lost. But\nif you have a faster internet connection and don't mind the inconvenience, go\nright ahead.\n\nPlanned Features\n================\n\nThis project is still in its infancy. There are many planned features.\n\nOne of the biggest areas for future features is more log parsing. One of the\noriginal goals was to facilitate extraction of per-test metadata from things\nlike xpcshell test logs, for example.\n\nWe may also consider collecting additional files from public servers. e.g.\nthere's no reason we can't store the binary archives and perform symbol\nanalysis, etc.\n\nFrequently Asked Questions\n==========================\n\nWhy?\n----\n\nThe original author (Gregory Szorc) frequently wanted to perform analysis\nover large sets of build data. Fetching logs individually was often slow\nand had high latency. He didn't want to deal with this so he instead\ncreated a system for interacting with an offline shadow copy. The results\nare what you see.\n\nWhy Cassandra?\n--------------\n\nWhile SQL would have been a fine choice, the author didn't want to deal\nwith writing SQL. He also had previous experience with Cassandra from\nbefore it hit 1.0. He was not only interested in seeing what all has\nchanged, but he was also looking for something familiar he could easily\nimplement. Even if the author didn't have experience with Cassandra, he\nwould still consider Cassandra because of its operational characteristics.\n\nIs this an official Mozilla project?\n------------------------------------\n\nNot at this time. Although, it's very similar to Datazilla and TBPL, so\nit's possible it may evolve into one. There's no Bugzilla component.\nDo everything on GitHub.\n\nBy copying everything you are creating high load on Mozilla's FTP servers\n-------------------------------------------------------------------------\n\nYup. But if you perform analysis on all of this data, the net outcome\nis good for the central servers because you don't touch them after\nthe initial data fetch.\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Findygreg%2Fmozilla-build-analyzer","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Findygreg%2Fmozilla-build-analyzer","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Findygreg%2Fmozilla-build-analyzer/lists"}