{"id":13608996,"url":"https://github.com/divolte/divolte-collector","last_synced_at":"2025-12-17T21:11:53.109Z","repository":{"id":18985331,"uuid":"22206807","full_name":"divolte/divolte-collector","owner":"divolte","description":"Divolte Collector","archived":true,"fork":false,"pushed_at":"2021-08-16T02:50:40.000Z","size":9417,"stargazers_count":283,"open_issues_count":65,"forks_count":77,"subscribers_count":30,"default_branch":"master","last_synced_at":"2024-11-07T14:42:39.627Z","etag":null,"topics":["analytics","analytics-tracking","avro","clickstream","divolte-collector","gcs","hdfs","java","kafka","pubsub"],"latest_commit_sha":null,"homepage":"https://divolte.io/","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/divolte.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2014-07-24T09:15:03.000Z","updated_at":"2024-04-02T17:41:49.000Z","dependencies_parsed_at":"2022-08-05T02:01:21.861Z","dependency_job_id":null,"html_url":"https://github.com/divolte/divolte-collector","commit_stats":null,"previous_names":[],"tags_count":13,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/divolte%2Fdivolte-collector","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/divolte%2Fdivolte-collector/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/divolte%2Fdivolte-collector/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/divolte%2Fdivolte-collector/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/divolte","download_url":"https://codeload.github.com/divolte/divolte-collector/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248605475,"owners_count":21132176,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["analytics","analytics-tracking","avro","clickstream","divolte-collector","gcs","hdfs","java","kafka","pubsub"],"created_at":"2024-08-01T19:01:31.580Z","updated_at":"2025-12-17T21:11:53.064Z","avatar_url":"https://github.com/divolte.png","language":"Java","funding_links":[],"categories":["Java"],"sub_categories":[],"readme":"[![Build Status](https://travis-ci.org/divolte/divolte-collector.svg?branch=master)](https://travis-ci.org/divolte/divolte-collector)\n\nDivolte Collector\n=================\n\n### *Scalable clickstream collection for Hadoop and Kafka* ###\n\nDivolte Collector is a scalable and performant server for collecting clickstream data in HDFS and on Kafka topics. It uses a JavaScript tag on the client side to gather user interaction data, similar to many other web tracking solutions. Divolte Collector can be used as the foundation to build anything from basic web analytics dashboarding to real-time recommender engines or banner optimization systems.\n\n[https://divolte.io](https://divolte.io)\n\n![Divolte Collector](https://divolte.io/images/architecture.png)\n\nOnline documentation and downloads\n----------------------------------\n\nYou can find the latest downloads and documentation on our [project website](https://divolte.io). There is a series of examples for working with collected data in Spark, Hive / Impala, and Kafka in this repository: [https://github.com/divolte/divolte-examples](https://github.com/divolte/divolte-examples).\n\nFeatures\n--------\n\n- **Single tag site integration**: Including Divolte Collector is a HTML one-liner. Just load the JavaScript at the end of your document body.\n- **Built for Hadoop and Kafka, with experimental support for Google Cloud Storage**: All collected data is written directly to HDFS, GCS or Kafka. No ETL or intermediate storage.\n- **Structured data collection**: All data is captured in Apache Avro records using your own schema definition. Divolte Collector does not enforce a particular structure on your data.\n- **User agent parsing**: It's not just a string. Add rich user-agent information to your click event records on the fly.\n- **ip2geo lookup**: Attach geo-coordinates to requests on the fly. (This requires a third-party database; a free version is available.)\n- **Fast**: Handle many thousands of requests per second on a single node. Scale out as you need.\n- **Custom events**: Just like any web analytics solution, you can log any event. Supply custom parameters in your page or JavaScript and map them onto your Avro schema.\n- **Integrate with anything**: Work with anything that understands Avro and HDFS, GCS or Kafka. Hive, Impala, Spark, Spark Streaming, Storm, etc. No log file parsing is required.\n- **Open source**: Divolte Collector is hosted on GitHub and released under the Apache License, Version 2.0.\n\nBuilding Prerequisites\n----------------------\n\nIn order to build the Divolte Collector you need to have following installed:\n\n - [Java 8 SDK](http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html)\n   (or newer). We build and test with Oracle's SDK; other variants should work. (Let us know!)\n - [Sphinx 1.2.x](http://sphinx-doc.org) (or newer). This is only required for building the user\n   documentation.\n\nBuilding\n--------\n\nTo build the Divolte Collector server itself:\n\n    % ./gradlew zip\n\nor\n\n    % ./gradlew tarball\n\nThis will build everything and produce an elementary distribution archive under the\n`build/distributions/` directory.\n\nTo build the User Guide:\n\n    % ./gradlew userdoc\n\nThis will build the documentation and place it under the `build/userdoc/html/` directory.\n\nTesting\n-------\n\nUnit tests can be executed with:\n\n    % ./gradlew test\n\nBy default this will skip browser-based integration tests. Currently browser-based\ntesting is supported using:\n\n - [ChromeDriver](https://sites.google.com/a/chromium.org/chromedriver/)\n - [PhantomJS](http://phantomjs.org)\n - [SauceLabs](http://saucelabs.com)\n\n### Chromedriver ###\n\nChromeDriver must be installed locally. Under OS X this can be installed via\nHomeBrew:\n\n    % brew install chromedriver\n\nTests can then be executed:\n\n    % SELENIUM_DRIVER=chrome CHROME_DRIVER=$(which chromedriver) ./gradlew test\n\n### Safari Webdriver ###\n\nSafari (from version 10) has native Webdriver support. To set this up:\n\n1. Enable the developer menu: _Preferences_|_Advanced_|_Show Develop menu in menu bar_\n2. In the _Develop_ menu, enable _Allow Remote Automation_.\n3. First time only, execute `safaridriver -p 0` from the command-line and authorise the driver to connect to Safari.\n\nTests can then be executed:\n\n    % SELENIUM_DRIVER=safari ./gradlew test\n\n### PhantomJS ###\n\nPhantomJS must be installed locally. Under OS X this can be installed via\nHomeBrew:\n\n    % brew install phantomjs\n\nTests can then be executed:\n\n    % SELENIUM_DRIVER=phantom ./gradlew test\n\n### SauceLabs ###\n\nIf you have a SauceLabs account, you can test against a wide variety of browsers.\nOnce you have a username and API key and\n[Sauce Connect](https://docs.saucelabs.com/reference/sauce-connect/) running, tests\ncan then be executed:\n\n    % export SAUCE_USERNAME=\u003cusername\u003e\n    % export SAUCE_ACCESS_KEY=\u003capi key\u003e\n    % SELENIUM_DRIVER=sauce ./gradlew test\n\nThese tests can take quite some time to execute. Not all succeed.\n\nLicense\n-------\n\nThe Divolte Collector is licensed under the terms of the Apache License, Version 2.0.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdivolte%2Fdivolte-collector","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdivolte%2Fdivolte-collector","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdivolte%2Fdivolte-collector/lists"}