{"id":21813541,"url":"https://github.com/loggerhead/kdd_2012_track1","last_synced_at":"2025-04-13T23:31:14.935Z","repository":{"id":74767558,"uuid":"48030268","full_name":"loggerhead/KDD_2012_Track1","owner":"loggerhead","description":"A simple solution of 2012 KDD Cup Track 1","archived":false,"fork":false,"pushed_at":"2017-03-17T03:35:08.000Z","size":16,"stargazers_count":4,"open_issues_count":0,"forks_count":4,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-03-27T13:45:26.691Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"wtfpl","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/loggerhead.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2015-12-15T08:39:52.000Z","updated_at":"2019-10-11T01:37:07.000Z","dependencies_parsed_at":null,"dependency_job_id":"1dbbd089-0b17-4b6b-97f7-088f2856a019","html_url":"https://github.com/loggerhead/KDD_2012_Track1","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/loggerhead%2FKDD_2012_Track1","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/loggerhead%2FKDD_2012_Track1/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/loggerhead%2FKDD_2012_Track1/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/loggerhead%2FKDD_2012_Track1/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/loggerhead","download_url":"https://codeload.github.com/loggerhead/KDD_2012_Track1/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248796136,"owners_count":21162908,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-27T14:29:52.335Z","updated_at":"2025-04-13T23:31:14.905Z","avatar_url":"https://github.com/loggerhead.png","language":"Python","readme":"This is a simple solution of [2012 KDD Cup Track 1](http://www.kddcup2012.org/c/kddcup2012-track1), which implemented Latent Factor Model by using Stochastic Gradient Descent algorithm, and most idea is came from `2.2` and `3.1` sections of paper [Context-aware Ensemble of Multifaceted Factorization Models for Recommendation Prediction in Social Networks](https://kaggle2.blob.core.windows.net/competitions/kddcup2012/2748/media/Shanda3.pdf). \n\n# Run\n\nFor saving your time, I strongly recommend you to install `PyPy` which is roughly three times faster than `CPython` in my test.\n\n1. Change `config.py` file to tell the program where to find the datasets.\n2. `./run.sh`\n3. Press `Ctrl-C` in terminal whenever you want to end the training loop.\n\n# Dataset\n\nThere are four datasets needed for running:\n\n* [rec_log_train](https://coding.net/u/loggerhead/p/KDD_2012_Track1/git/raw/master/data/rec_log_train.csv.lrz)\n* [rec_log_test](https://coding.net/u/loggerhead/p/KDD_2012_Track1/git/raw/master/data/rec_log_test.csv.lrz)\n* [KDD_Track1_solution](https://coding.net/u/loggerhead/p/KDD_2012_Track1/git/raw/master/data/KDD_Track1_solution.csv)\n* [user_profile](https://coding.net/u/loggerhead/p/KDD_2012_Track1/git/raw/master/data/user_profile.csv.lrz)\n\nI have made some little changes to the orignal datasets:\n\n* remove header from each file\n* replace separator from `\\t` (tab) to `,` (comma)\n\nIf you download datasets from above links, you will found some `.lrz` files and you need use [lrzip](https://github.com/ckolivas/lrzip) to uncompress.\n\n```bash\n# install `lrzip`\napt-get install lrzip \n# if you are OSX user, run below command to install `lrzip`\n# brew install lrzip\n\nlrzip -d *.lrz\n```\n\n# Running log\n\n```\nGetting summary of training dataset...\n======================== Summary of 'rec_log_train.csv' ========================\nUsers: 1392873  Items: 4710     Users/Items: 295.73\n+1: 5253828     -1: 67955449    +1/-1: 0.08\nBegin time: 1318348785  End time:1321027199     Interval: 2678414s = 744.00 h = 31.00 d\n================== Distribution of user active time (in hour) ==================\n 00: |\n 01: |\n 07: |\n 08: ||\n 09: |||\n 10: |||\n 11: |||\n 12: |||\n 13: |||\n 14: |||\n 15: |||\n 16: |||\n 17: |||\n 18: |||\n 19: |||\n 20: |||\n 21: |||\n 22: |||\n 23: ||\nGetting summary of user profile...\n============================= Distribution of age ==============================\n  0: |\n  1: |\n  2: |\n  3: |\n 12: |\n 13: |\n 14: ||\n 15: ||\n 16: ||\n 17: ||\n 18: ||\n 19: ||\n 20: |||\n 21: |||\n 22: ||||\n 23: |||\n 24: |||\n 25: |||\n 26: ||\n 27: ||\n 28: |\n 29: |\n 30: |\n 31: |\n 32: |\n 33: |\n============================ Distribution of gender ============================\n  0: |\n  1: |||||||||||||||||||||||||\n  2: ||||||||||||||||||||||||\n============================ Distribution of tweet =============================\n  0: ||||\n  1: |\n  2: |\n  3: |\n  4: |\n  5: |\n  6: |\n  7: |\n  8: |\n  9: |\n 10: |\n 11: |\n 12: |\n 13: |\n 14: |\n============================= Distribution of tags =============================\n  1: ||||||||||||||||||||||||||||||||||||\n  2: |\n  3: |\n  4: |\n  5: |\n  6: |\n  7: |\n  8: |\n  9: ||\n 10: ||||\nPreprocessing...\nTraining...\ninit LFM...             26.158s\n408th trainning used 21.1ss     |e[u][i]| = 0.251114^C\nExit program after finish current work!\n409th trainning used 22.6s      |e[u][i]| = 0.251115\npredict and write result...             500.564s\nConverting predicted result to submission format...\nconvert predict result to dict...               155.102s\nconvert to submission format...                 50.497s\nComputing mAP@3...\n Public rank: 412       mAP@3: 0.31774\nPrivate rank: 422       mAP@3: 0.30857\n```\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Floggerhead%2Fkdd_2012_track1","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Floggerhead%2Fkdd_2012_track1","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Floggerhead%2Fkdd_2012_track1/lists"}