{"id":15048111,"url":"https://github.com/github/dat-analysis","last_synced_at":"2025-10-04T07:31:06.690Z","repository":{"id":8185478,"uuid":"9612362","full_name":"github/dat-analysis","owner":"github","description":"Analyze results from dat-science.","archived":true,"fork":false,"pushed_at":"2014-05-03T02:39:02.000Z","size":193,"stargazers_count":130,"open_issues_count":0,"forks_count":13,"subscribers_count":316,"default_branch":"master","last_synced_at":"2025-09-01T17:56:54.134Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Ruby","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/github.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2013-04-23T01:08:39.000Z","updated_at":"2024-12-24T23:12:41.000Z","dependencies_parsed_at":"2022-08-06T21:00:34.892Z","dependency_job_id":null,"html_url":"https://github.com/github/dat-analysis","commit_stats":null,"previous_names":[],"tags_count":2,"template":false,"template_full_name":null,"purl":"pkg:github/github/dat-analysis","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/github%2Fdat-analysis","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/github%2Fdat-analysis/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/github%2Fdat-analysis/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/github%2Fdat-analysis/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/github","download_url":"https://codeload.github.com/github/dat-analysis/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/github%2Fdat-analysis/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":278041354,"owners_count":25920222,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-02T02:00:08.890Z","response_time":67,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-09-24T21:08:15.337Z","updated_at":"2025-10-04T07:31:06.408Z","avatar_url":"https://github.com/github.png","language":"Ruby","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Dat-analysis\n\nA Ruby library for analyzing the results of [dat-science][dsc] experiments.  For\nthe motivation behind this library, and documentation on setting up experiments,\ngo check out [dat-science][dsc]'s documentation.\n\n[dsc]: https://github.com/github/dat-science/\n\n## What do I do with all these experiment results?\n\nOnce you've started a `dat-science` experiment and published some results,\nyou'll want to analyze the mismatches from your experiment.  In `dat-analysis`\nyou'll find an analysis toolkit to help understand experiment results.\n\nWe designed the analysis tools to be run from your ruby console (`irb` or\n`script/console` if you're doing science on a Rails app).  You create an analyzer\nand then interactively fetch experiment results and study them to determine the\nreason the control method's results differ from the candidate method's results.\n\n### Your very own analyzer\n\nThe `Dat::Analysis` base class provides a number of tools for analysis.  Since\nthe process of retrieving your experiment results depends on how you used\n`publish` in your experiment, you'll need to create a subclass of `Dat::Analysis`\nwhich implements methods to handle reading and processing results.\n\nYou will need to define `read` and `count` to return the next published experiment\nresult, and the count of remaining published experiment results, respectively.\nYou can optionally define `cook` to do any decoding, un-marshalling, or whatever\nother pre-processing you desire on the raw experiment result returned by `read`.\n\n\n``` ruby\nrequire 'dat/analysis'\n\nmodule MyApp\n  # Public: Perform dat analysis on a dat-science experiment.\n  #\n  # This is a subclass of Dat::Analysis which provides the concrete implementation\n  # of the `#read`, `#count`, and `#cook` methods to interact with our Redis data\n  # store, and decodes our science mismatch results from JSON.\n  class Analysis \u003c Dat::Analysis\n    # Public: Read the next available science mismatch result.\n    #\n    # Returns the next raw science mismatch result from Redis.\n    def read\n      Redis.rpop \"dat-science.#{experiment_name}.results\"\n    end\n\n    # Public: Get the number of pending science mismatch results.\n    #\n    # Returns the number of pending science mismatch results from redis.\n    def count\n      Redis.llen \"dat-science.#{experiment_name}.results\"\n    end\n\n    # Public: \"Cook\" a raw science mismatch result.\n    #\n    # raw_result - a raw science mismatch result\n    #\n    # Returns nil if raw_result is nil.\n    # Returns the JSON-parsed raw_result.\n    def cook(raw_result)\n      return nil unless raw_result\n      JSON.parse(raw_result)\n    end\n  end\nend\n\n```\n\n#### Instantiating the analyzer\n\nThis analyzer can be used with many experiments, so you'll need to instantiate an\nanalyzer instance for your current experiment:\n\n``` ruby\nirb\u003e a = MyApp::Analysis.new('widget-permissions')\n=\u003e #\u003cMyApp::Analysis:0x007fae4a0101f8 ...\u003e\n```\n\n### Working with individual results\n\nFirst, let's look at how you can work with single experiment mismatch results.\nThe `#result` method (also available as `#current`) will show you the most\nrecently fetched experiment result.  Before you've fetched any results, this\nwill be empty:\n\n``` ruby\nirb\u003e a.result\n=\u003e nil\nirb\u003e a.current\n=\u003e nil\n```\n\nWe can use the `#more?` predicate method to see if there are experiment results\npending, and `#count` to see just how many results are available:\n\n``` ruby\nirb\u003e a.more?\n=\u003e true\nirb\u003e a.count\n=\u003e 103\n```\n\nLet's fetch a result:\n\n``` ruby\nirb\u003e a.fetch\n=\u003e {\"experiment\"=\u003e\"widget-permissions\", \"user\"=\u003e{ ... } .... }\nirb\u003e a.result\n=\u003e {\"experiment\"=\u003e\"widget-permissions\", \"user\"=\u003e{ ... } .... }\nirb\u003e a.result.keys\n=\u003e [\"experiment\", \"user\", \"timestamp\", \"candidate\", \"control\", \"first\"]\nirb\u003e a.result.experiment_name\n=\u003e \"widget-permissions\"\nirb\u003e a.result['first']\n=\u003e \"candidate\"\nirb\u003e a.result.first\n=\u003e \"candidate\"\nirb\u003e a.result['control']\n=\u003e {\"duration\"=\u003e12.307, \"exception\"=\u003enil, \"value\"=\u003efalse}\nirb\u003e a.result.control\n=\u003e {\"duration\"=\u003e12.307, \"exception\"=\u003enil, \"value\"=\u003efalse}\nirb\u003e a.result['candidate']\n=\u003e {\"duration\"=\u003e12.366999999999999, \"exception\"=\u003enil, \"value\"=\u003etrue}\nirb\u003e a.result.candidate\n=\u003e {\"duration\"=\u003e12.366999999999999, \"exception\"=\u003enil, \"value\"=\u003etrue}\nirb\u003e a.result['first']\n=\u003e \"control\"\nirb\u003e a.result['timestamp']\n=\u003e \"2013-04-22T13:31:32-05:00\"\nirb\u003e a.result.timestamp\n=\u003e 2013-04-22 13:31:32 -0500\nirb\u003e a.result.timestamp.class\n=\u003e Time\nirb\u003e a.result.timestamp.to_i\n=\u003e 1366655492\nirb\u003e a.result['user']\n=\u003e {\"login\"=\u003e\"somed00d\", ... }\n```\n\nResults will contain entries for the duration (in milliseconds), exceptions,\nand values returned by both the candidate and control methods for the experiment;\nthe time when the result was recorded; whether the candidate or the control method\nwas run first; and an entry for every object saved via a `context` call during\nthe experiment.\n\nNote that the `#result` method will continue to return the previously fetched\nresult, until we overwrite it with another `#fetch`, `#skip`, or `#analyze`\n(see below).\n\n#### Skipping results\n\nSometimes we make changes to the code we're running experiments against, and\nsometimes those changes cause experiment results to be out of date -- if we've\nfixed a bug we found via science, it's not much point in looking at results\ngenerated while our code still had that bug.  To jump past a batch of results,\nuse `#skip`, giving it a block to test for the condition we want to skip\npast:\n\n``` ruby\nirb\u003e a.skip {|r| 5.minutes.ago \u003c a.result.timestamp }\n=\u003e 43\nirb\u003e a.skip {|r| true }\n=\u003e nil\n```\n\n### Batch analysis of results\n\nAfter sifting through a handful of results from an experiment, it usually\nbecomes obvious that a single behavior in our studied code is often responsible\nfor many results published in an experiment.  If a behavior difference  can be\neasily fixed by improving the candidate code, and your production release cycle\nis short, then you just update the candidate method and continuing running your\nexperiment.\n\nIt's often the case that the relevant code can't be changed that quickly.\nPerhaps the assumptions made when writing the candidate code were wrong in a way\nthat requires deeper consideration and discussion with your team.  It could be\nthat the experiment results actually turn up bugs in the implementation of the\ncontrol method -- in which case there will likely be even more discussion\nneeded, and possibly a fairly long cycle to get production behaving properly.\n\nThat doesn't mean that analysis can't continue, but it could well be that a\nmajority of the experimental results to analyze are already examples of already\nknown behaviors.  In this case, it's useful to be able to identify these results\nand skip over them, to find results which can't be accounted for by any\ncurrently known  explanation.\n\nThe `#analyze` method, in conjunction with \"matcher classes\", makes this possible.\n\n### `#analyze`\n\nYou can run `#analyze` to automate the fetching of pending results.  If a result\nis identifiable by a matcher class, then a summary of the identified result will\nbe printed and that result will skipped.  This process continues until either an\nunidentifiable result is found, or there are no more results available. When an\nunidentifiable result is found, a summary of the identified results is output,\nand then the first unidentified result is displayed in detail.\n\n```\nirb\u003e a.analyze\nUser [somed00d] is staff (see http://github.com/our/project/issues/123)\nPermission [totesadmin] is obsolete (see http://github.com/dat/thing/issues/5234)\nUser [somed00d] is staff (see http://github.com/our/project/issues/123)\nPermission [totesadmin] is obsolete (see http://github.com/dat/thing/issues/5234)\nUser [0th3rd00d] is staff (see http://github.com/our/project/issues/123)\nPermission [totesadmin] is obsolete (see http://github.com/dat/thing/issues/5234)\nUser [0th3rd00d] is staff (see http://github.com/our/project/issues/123)\nPermission [totesadmin] is obsolete (see http://github.com/dat/thing/issues/5234)\nPermission [totesadmin] is obsolete (see http://github.com/dat/thing/issues/5234)\nPermission [totesadmin] is obsolete (see http://github.com/dat/thing/issues/5234)\nUser [0th3rd00d] is staff (see http://github.com/our/project/issues/123)\nPermission [totesadmin] is obsolete (see http://github.com/dat/thing/issues/5234)\nUser [somed00d] is staff (see http://github.com/our/project/issues/123)\nUser [somed00d] is staff (see http://github.com/our/project/issues/123)\nPermission [totesadmin] is obsolete (see http://github.com/dat/thing/issues/5234)\nUser [0th3rd00d] is staff (see http://github.com/our/project/issues/123)\nUser [0th3rd00d] is staff (see http://github.com/our/project/issues/123)\nUser [0th3rd00d] is staff (see http://github.com/our/project/issues/123)\nPermission [totesadmin] is obsolete (see http://github.com/dat/thing/issues/5234)\nUser [somed00d] is staff (see http://github.com/our/project/issues/123)\nUser [somed00d] is staff (see http://github.com/our/project/issues/123)\nUser [0th3rd00d] is staff (see http://github.com/our/project/issues/123)\nUser [0th3rd00d] is staff (see http://github.com/our/project/issues/123)\nPermission [totesadmin] is obsolete (see http://github.com/dat/thing/issues/5234)\n\nSummary of identified results:\n\n         StaffFunninessMatcher:     14\n          ZOMGIssue5423Matcher:     10\n                         TOTAL:     24\n\nFirst unidentifiable result:\n\nExperiment [widget-permissions]   first:  candidate @ 2013-04-19T18:55:23-05:00\nDuration:  control (  0.01) | candidate (  1.36)\n\nControl value:   [false]\nCandidate value: [true]\n\n            user =\u003e {\n                                    id =\u003e 1234876\n                                 login =\u003e \"somed00d\"\n [...]\n                    }\n=\u003e 32\n```\n\nNote that the number of pending results is returned as the result of the\nanalysis.\n\n\n### Matcher classes\n\nThe purpose of a matcher class is to identify a behavior which results in\nmismatches in your experiment. For example, if permissions for staff users are\nnot implemented properly by your candidate code, you might create a matcher that\nrecognizes when the user involved is a staff user.\n\nYou create a matcher class by subclassing `Dat::Analysis::Matcher` and writing a\n`#match?` method that returns true if the experiment result (available as\n`result`) is an example of the behavior we know about:\n\n``` ruby\nclass StaffFunninessMatcher \u003c Dat::Analysis::Matcher\n  # our staff role permissions are just soooo busted\n  def match?\n    User.find_by_login(result['user']['login']).staff?\n  end\n\n  def readable\n    \"User [#{result['user']['login']}] is staff (see http://github.com/our/project/issues/123)\"\n  end\nend\n```\n\nIf you create a matcher class in the console, use `#add_matcher` to let your\nanalyzer know about it:\n\n``` ruby\nirb\u003e a.add_matcher StaffFunninessMatcher\nLoading matcher class [StaffFunninessMatcher]\n=\u003e [StaffFunninessMatcher]\n```\n\nNow, when you run `#analyze`, all the results with staff users recorded in the\n`user` context will be tallied and skipped.\n\nSee \"Maintaining a library of matchers and wrappers\" below for a more durable\nway to let your analyzers keep track of your helper classes.\n\n#### Getting a summary of an identified result\n\nThe `#summary` method on the analyzer will return a readable version of the\ncurrent result.  This is by default a fairly voluminous output (it's what you saw\nat the end of an `#analyze` run above), but if your matcher defines a\n`#readable` method.\n\n``` ruby\nirb\u003e a.summary\n=\u003e \"User [somed00d] is staff (see http://github.com/our/project/issues/123)\"\n```\n\nThe `#analyze` method uses these `#readable` methods to produce a more succinct\nsummary of identified results, like we showed above.\n\n**Define a `#readable` method for cleaner `#analyze` output!**\n\n### Adding methods to results (wrappers)\n\nFor many experiments there is information in the results which is used often\nenough that you'll get tired of doing repetitive lookups in the results hash.\nWhen this happens, you can create result wrapper classes for your experiment\nwhich can add methods to every result returned. Simply subclass\n`Dat::Analysis::Result` and define the instance methods you want:\n\n``` ruby\nclass PermissionsWrapper \u003c Dat::Analysis::Result\n  def user\n    User.find_by_login!(result['user']['login'])\n  rescue\n    \"Could not find user, id=[#{result['actor']['id']}]\"\n  end\n\n  def permission\n    Permission.find_by_handle!(result['permission']['handle'])\n  rescue\n    \"Could not find permission, handle=[#{result['permission']['handle']}]\"\n  end\n  alias_method :perm, :permission\nend\n```\n\nThen, add the wrapper to your analyzer:\n\n``` ruby\nirb\u003e a.add_wrapper(PermissionsWrapper)\n=\u003e [PermissionsWrapper]\nirb\u003e a.result.user\n=\u003e #\u003cUser id: 1234876, login: \"somed00d\", ...\u003e\n```\n\nThese wrappers can also be used in your matchers classes:\n\n``` ruby\nclass StaffFunninessMatcher \u003c Dat::Analysis::Matcher\n  # our staff role permissions are just soooo busted\n  def match?\n    result.user.staff?\n  end\n\n  def readable\n    \"User [#{result.user.login}] is staff (see http://github.com/our/project/issues/123)\"\n  end\nend\n```\n\n#### Skipping class naming\n\nInventing new non-conflicting class names for matcher and wrapper classes is a\nbit of a pain.  Often we just declare an anonymous class and skip the naming\naltogether.  If you do this, you'll probably want to define a readable `.name`\nmethod for your class, so that `#analyze` summaries are readable:\n\n``` ruby\nClass.new(Dat::Analysis::Matcher) do\n  def self.name\n    \"Staff Permission Silliness\"\n  end\n\n  def match?\n    result.user.staff?\n  end\n\n  def readable\n    \"User [#{result.user.login}] is staff (see http://github.com/our/project/issues/123)\"\n  end\nend\n```\n\n### Maintaining a library of matchers and result wrappers\n\nBeing able to add matchers and result wrappers to an analyzer during a console\nsession is a fast way to iteratively identify problems and work through a batch of\nresults.  Keeping those matchers around for the next session is usually in order.\nYour `Dat::Analysis` subclass can define a `#path` instance method, which points\nto the place on the filesystem where your matcher and wrapper classes live.  The\nanalyzer will look here, in a sub-directory named for your experiment, and load\nany ruby files it finds there:\n\n``` ruby\nrequire 'dat/analysis'\n\nmodule MyApp\n  # Public: Perform dat analysis on a dat-science experiment.\n  #\n  # This is a subclass of Dat::Analysis which provides the concrete implementation\n  # of the `#read`, `#count`, and `#cook` methods to interact with our Redis data\n  # store, and decodes our science mismatch results from JSON.\n  class Analysis \u003c Dat::Analysis\n    def path\n      '/path/to/dat-science/experiments/'\n    end\n  end\nend\n```\n\nIn this example, the analyzer for the `widget-permissions` experiment will look\nin `/path/to/dat-science/experiments/widget-permissions/` for matcher and\nwrapper classes.\n\n## Hacking on dat-analysis\n\nBe on a Unixy box. Make sure a modern Bundler is available. `script/test` runs\nthe unit tests. All development dependencies will be installed automatically if\nthey're not available. Dat science happens primarily on Ruby 1.9.3 and 1.8.7,\nbut science should be universal.\n\n## Maintainers\n\n[@jbarnette](https://github.com/jbarnette) and [@rick](https://github.com/rick)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgithub%2Fdat-analysis","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgithub%2Fdat-analysis","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgithub%2Fdat-analysis/lists"}