{"id":21629575,"url":"https://github.com/abrie/data-challenge-2014","last_synced_at":"2026-05-20T19:08:01.529Z","repository":{"id":20024012,"uuid":"23291899","full_name":"abrie/data-challenge-2014","owner":"abrie","description":"Bundled-Edge Views of the Github Event Graph ","archived":false,"fork":false,"pushed_at":"2014-09-17T15:40:45.000Z","size":4820,"stargazers_count":1,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-03-18T21:23:40.500Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"abrie.github.io/data-challenge-2014","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/abrie.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2014-08-24T20:27:54.000Z","updated_at":"2017-12-12T11:23:39.000Z","dependencies_parsed_at":"2022-08-27T03:41:32.175Z","dependency_job_id":null,"html_url":"https://github.com/abrie/data-challenge-2014","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/abrie/data-challenge-2014","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/abrie%2Fdata-challenge-2014","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/abrie%2Fdata-challenge-2014/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/abrie%2Fdata-challenge-2014/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/abrie%2Fdata-challenge-2014/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/abrie","download_url":"https://codeload.github.com/abrie/data-challenge-2014/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/abrie%2Fdata-challenge-2014/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33271559,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-20T15:12:43.734Z","status":"ssl_error","status_checked_at":"2026-05-20T15:12:42.300Z","response_time":356,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-25T02:08:05.707Z","updated_at":"2026-05-20T19:08:01.511Z","avatar_url":"https://github.com/abrie.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"## Motivation\n\n[Github Data Challenge 2014](https://github.com/blog/1864-third-annual-github-data-challenge)\n\n## Results\n\n[Bundled-Edged Views of the Github Event Graph](http://abrie.github.io/data-challenge-2014)\n\n## Methodology and Source Code\n\nAt the heart of this plot are the SQL (or rather, BQL) queries sent to Google Bigquery. The queries consist of two types: a [model query](https://github.com/abrie/data-challenge-2014/blob/master/sql/repo-model.sql) and [state query](https://github.com/abrie/data-challenge-2014/blob/master/sql/repo-state.sql). The model query collects the data necessary for building a [markov matrix](http://en.wikipedia.org/wiki/Stochastic_matrix) by [counting transitions between sequential events](https://en.wikipedia.org/wiki/Adjacency_matrix). The state query computes a census of the most recent events (i.e. events not followed by another event). These two sets of data are then [\"munged\"](http://en.wikipedia.org/wiki/Data_wrangling) by [`munger.py`](https://github.com/abrie/data-challenge-2014/blob/master/munger.py). A [cluster detection algorithm](http://micans.org/mcl/) is used to group events. The results are gathered into a single JSON structure (example:[results.json](https://github.com/abrie/data-challenge-2014/blob/gh-pages/data/repo/results.json)). The [front end](https://github.com/abrie/data-challenge-2014/blob/master/pages/main.js) retrieves the results via [AJAX](http://en.wikipedia.org/wiki/Ajax_(programming)) and generates the illustrations using [D3js](http://d3js.org).\n\n## Dependencies\n\nThis application uses [MCL](http://micans.org/mcl/). The source is contained in the `external/` directory. Install as follows:\n\n- `tar xfz mcl-latest.tar.gz`\n- `cd mcl-14-137`\n- ``configure --prefix=`pwd```\n- `make`\n- `make install`\n\nMCL will then be installed to `external/mcl-14-137`. If you install to a different path, then change the [`MCL_BIN`](https://github.com/abrie/data-challenge-2014/blob/master/mclinterface.py#L7) string found in `mclinterface.py`.\n\nNumpy/Scipy are also required. If you're using Mavericks, use this: [ScipySuperpack](https://github.com/fonnesbeck/ScipySuperpack). \n## Authorization of APIs\n\nThis application uses Google Bigquery. You'll need to supply authenticated credentials:\n\n- [ ] Log into [Google Developer Console](https://console.developers.google.com/)\n- [ ] Navigate to the [project list](https://console.developers.google.com/project)\n- [ ] Create a new project. (Or use one you may have previously created :)\n- [ ] Enable the BigQuery API: Select Project -\u003e APIs and Auth -\u003e API's -\u003e BigQuery\n- [ ] Generate a client_secrets JSON -\u003e API's and Auth -\u003e Credentials -\u003e Create New Client ID\n- [ ] Download the generated JSON and save as `client_secrets.json` to the root of this project.\n- [ ] When you run the app a browser window will open and request authorization.\n- [ ] Authorize it.\n\n## Data Collection and Munging\n\n[`main.py`](https://github.com/abrie/data-challenge-2014/blob/master/main.py) is where interested readers should begin. It is invoked as follows:\n\n`python main.py -i identifier -q bigquery-id model:model.sql state:state.sql`\n\n- `-i [setId]` This identifies the set. The query results will be stored in a folder named data/[setId]. If no query is specified using `-q`, then the most recent queries in the [setId] folder will be (re)munged. \n- `-q [projectId] [name:sql name2:sql2] ...` projectId is a BigQuery project number (ex: 'spark-mark-911'). The [name:sql] entries specify sql files and the id to use when storing the results. Each of the sql files will be sent to BigQuery, and the responses recorded under `data/[setId]/[name]`. The munger will subsequently process the responses to produce `results.json`.\n\n## Use the Scripts\n\n[collect.sh](https://github.com/abrie/data-challenge-2014/blob/master/collect.sh) demonstrates the use of `main.py`. It is the same script used to generate the results used by [this page](http://abrie.github.io/data-challenge-2014). If you wish watch it operate:\n\n`./collect.sh [projectId]` You'll need to specify the projectId obtained from your Google developer console. \n\n[deploy.sh](https://github.com/abrie/data-challenge-2014/blob/master/deploy.sh) generates the presentation pages and writes them to the specified directory. It assumes that `collect.sh` has completed successfully. The generated site should be served through a webserver because the `results.json` files are loaded through Ajax. If you do not have a local webserver then [node http-server](https://github.com/nodeapps/http-server) or Python's SimpleHTTPServer are easy and recommended:\n\n- `./deploy.sh deployed/path/`\n- `http-server deployed/path/` or `cd deployed/path \u0026\u0026 python -m SimpleHTTPServer 8080`\n- Navigate to http://localhost:8080\n\n## Citations\n- Stijn van Dongen, Graph Clustering by Flow Simulation. PhD thesis, University of Utrecht, May 2000. [link](http://micans.org/mcl/lit/svdthesis.pdf.gz)\n\n## Contact\nabrhie@gmail.com\n\n## Addendum\n\nThese images show the evolution from very hairball, to less hairball, to combed hairball.\n\n![first](https://raw.githubusercontent.com/abrie/data-challenge-2014/master/README_assets/2.png)\n![second](https://raw.githubusercontent.com/abrie/data-challenge-2014/master/README_assets/3.png)\n![third](https://raw.githubusercontent.com/abrie/data-challenge-2014/master/README_assets/1.png)\n\n## More Features\n\nPlease visit the dev branch: [README.md](https://github.com/abrie/data-challenge-2014/tree/new_features) for additional features developed after submission. The conclusion of this competition shares poetic cohomology with a certain video of a polar bear and a can of condensed milk (video no longer available).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fabrie%2Fdata-challenge-2014","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fabrie%2Fdata-challenge-2014","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fabrie%2Fdata-challenge-2014/lists"}