{"id":16388590,"url":"https://github.com/phact/streaming-ml-product-recommendation","last_synced_at":"2025-10-20T01:05:16.091Z","repository":{"id":71205846,"uuid":"88548109","full_name":"phact/streaming-ml-product-recommendation","owner":"phact","description":null,"archived":false,"fork":false,"pushed_at":"2018-07-09T14:49:10.000Z","size":10628,"stargazers_count":2,"open_issues_count":0,"forks_count":5,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-05-05T21:11:31.585Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Scala","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/phact.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-04-17T20:31:53.000Z","updated_at":"2018-07-09T14:49:11.000Z","dependencies_parsed_at":null,"dependency_job_id":"aa9b7a52-c196-4b17-849b-986f3003e29e","html_url":"https://github.com/phact/streaming-ml-product-recommendation","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/phact%2Fstreaming-ml-product-recommendation","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/phact%2Fstreaming-ml-product-recommendation/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/phact%2Fstreaming-ml-product-recommendation/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/phact%2Fstreaming-ml-product-recommendation/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/phact","download_url":"https://codeload.github.com/phact/streaming-ml-product-recommendation/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252577020,"owners_count":21770721,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-11T04:29:28.644Z","updated_at":"2025-10-20T01:05:11.052Z","avatar_url":"https://github.com/phact.png","language":"Scala","funding_links":[],"categories":[],"sub_categories":[],"readme":"# StreamingMLProductRecommendation\n\nThis is a guide for how to use the power tools machine learning streaming product recommendation asset brought to you by the Vanguard team.\n\nUpgraded for DSE 6.0\n\n### Motivation\n\nMachine learning powered recommendation engines have wide applications across multiple industries as companies seeking to provide their end customers with deep insights by leveraging data in the moment. Although there are many tools that allow for historical analysis that yield recommendations, DataStax Enterprise (DSE) is particularly well suited to power real-time recommendation / personalization systems. It is when it comes to operationalizing and productionizing analytical systems that DSE will prove most useful. This is largely due to DSEs design objectives of operating at scale, in a distributed fashion, and while fulfilling performance and availability requirements required for user facing, mission critical applications.\n\n### What is included?\n\nThis field asset includes a working application for real-time recommendations leveraging the following DSE functionality:\n\n* Machine Learning\n* Streaming analytics\n* Batch analytics\n* Real-time JDBC / SQL (dynamic caching)\n* DSEFS\n\n### Business Take Aways\n\nBy streaming customer market basket data from a retail organization through DSE analytics and using it to train a Collaborative Filtering Machine Learning model, we are able to maintain a top K list of recommended products by customer that reflect their historical and recent buying patterns.\n\nIn the retail industry, both online and brick and mortar businesses are leveraging ML and real-time analytics pipelines to gather insights that become differentiators for them in the marketplace. The DataStax stack is the foundation for enterprise personalization / recommendation systems across multiple industries.\n\n### Technical Take Aways\n\nFor a technical deep dive, take a look at the following sections:\n\n- Machine learning model\n- Streaming analytics pipeline\n- Real-time JDBC / SQL (dynamic caching)\n\n## Startup Script\n\nThis Asset leverages\n[simple-startup](https://github.com/jshook/simple-startup). To start the entire\nasset run `./startup all` for other options run `./startup`\n\n## Manual Usage:\n\n### Prep the files:\n\nMake sure dsefs is turned on `dse.yaml` \n\n    dsefs_option\n        enabled: true\n\nAnd push the raw data file into the root directory of dsefs:\n\n```\ndsefs / \u003e put ./sales_observations sales_observations\ndsefs / \u003e ls sales_observations\nsales_observations\n```\n\n### Streaming Job:\nTo run this on your local machine, you need to first run a Netcat server\n\n    $ nc -lk 9999\n\nBuild:\n\n    mvn package\n\nand then run the example:\n\n    $ dse spark-submit --deploy-mode cluster --supervise  --class\n    com.datastax.powertools.analytics.SparkMLProductRecommendationStreamingJob\n    ./target/StreamingMLProductRecommendations-0.1.jar localhost 9999\n\nTo run the  model, predict via streaming, and serve results via JDBC, run the\nServeJDBC class\n\n    $ dse spark-submit --class\n    com.datastax.powertools.analytics.SparkMLProductRecommendationServeJDBC\n    ./target/StreamingMLProductRecommendations-0.1.jar localhost 9999\n\n\n    $ dse beeline\n\n    \u003e !connect jdbc:hive2://localhost:10000\n\n    \u003e select * from recommendations.predictions where user=10277 order by prediction desc;\n\n\nInto the `nc` prompt paste a few records and see the change in beeline:\n\n```\n102779564.0000\n1027795649564.0000\n1027515254.0000\n1027795649564744.0000\n10277956495647442304.0000\n102751525415254.0000\n102779564956474423042304.0000\n10277   221     4\n10277   221     1\n```\n\nAlternatively, you can run `./socketstream` to write a record per second to the stream from bash\n\n\n### Docs\n\npull in your submodules\n\n    git submodule update --init\n    git submodule sync\n\nthen run the server\n\n    hugo server ./content\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fphact%2Fstreaming-ml-product-recommendation","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fphact%2Fstreaming-ml-product-recommendation","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fphact%2Fstreaming-ml-product-recommendation/lists"}