{"id":15519912,"url":"https://github.com/ceteri/clksim","last_synced_at":"2025-03-05T06:30:36.125Z","repository":{"id":19450540,"uuid":"22694743","full_name":"ceteri/clksim","owner":"ceteri","description":"Clickstream log data simulator in Python","archived":false,"fork":false,"pushed_at":"2014-08-07T06:54:03.000Z","size":162,"stargazers_count":7,"open_issues_count":0,"forks_count":3,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-02-24T12:23:04.946Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ceteri.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2014-08-06T19:11:07.000Z","updated_at":"2024-02-24T01:02:37.000Z","dependencies_parsed_at":"2022-08-21T09:41:01.897Z","dependency_job_id":null,"html_url":"https://github.com/ceteri/clksim","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ceteri%2Fclksim","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ceteri%2Fclksim/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ceteri%2Fclksim/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ceteri%2Fclksim/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ceteri","download_url":"https://codeload.github.com/ceteri/clksim/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":241979203,"owners_count":20052093,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-02T10:23:34.418Z","updated_at":"2025-03-05T06:30:35.635Z","avatar_url":"https://github.com/ceteri.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"## clksim\nThis code implements a simple *clickstream simulator* written in Python, along with a few TSV data files used to seed it.\nIt has been used for generating the log files used in the [CCAI workshop](http://liber118.com/course/ccai/).\n\nThe simulations used here are embarrassingly simplistic.\nEven so, these generate realistic log files that provide:\n\n  * impressions based on ad campaigns, with seasonal variation\n  * landing pages and click-through\n  * registrations\n  * orders\n  * chargebacks\n\nThere are some geo aspects in the fraud simulation, which students have used for excellent visualizations.\nThe fraud patterns are somewhat realistic -- based on what we'd experienced in a popular e-commerce firm, circa 2011.\nStudents have used the generated log data to build:\n\n  * marketing funnel KPI + optimization\n  * anti-fraud classifiers\n  * product recommenders\n\nThe product recommender aspects are rather light -- that part could be embellished much more.\n\n### Schema\n\n`city_prob.tsv`: *fraud_prob, city, latitude, longitude*\n\n`product.tsv`: *prod_area, product_id, amount*\n\n`campaign.tsv`: *campaign_id, network, rate_metric, rate_amount, keyword, min_lat*\n\n`impression.tsv`: *date, campaign_id, keyword, cookie*\n\n`clicks.tsv`: *date, cookie, landing_page*\n\n`register.tsv`: *date, cookie, customer_id, latitude, longitude*\n\n`orders.tsv`: *date, transaction_id, customer_id, product_id, amount, latitude, longitude*\n\n`chargeback.tsv`: *date, transaction_id, amount*\n\n### Usage\n\n    # geo distribution by cities\n    # probability, city, latitude, longitude\n    ./src/city.py \u003e city_prob.tsv\n\n    # product catalog\n    # product_area, product_id, amount\n    ./src/prod.py \u003e product.tsv\n\n    # online marketing campaigns -- hard-coded\n    # campaign_id, network, rate_metric, rate_amount, keyword\n    ls dat/campaign.tsv\n\n    # ad impressions\n    # date, campaign_id, keyword, cookie\n    ./src/impr.py dat/campaign.tsv 10000000 | sort \u003e impression.tsv\n\n    # click-through\n    #  date, cookie, landing_page\n    ./src/clik.py impression.tsv | sort \u003e clicks.tsv\n\n    # customer registrations\n    # date, cookie, customer_id, latitude, longitude\n    ./src/regs.py clicks.tsv | sort \u003e register.tsv\n\n    # e-commerce orders\n    # date, transaction_id, customer_id, product_id, amount, latitude, longitude\n    ./src/ords.py register.tsv | sort \u003e valid_orders.tsv\n\n    # chargebacks\n    # date, transaction_id, amount\n    ./src/frau.py valid_orders.tsv chargeback.tsv \u003e fraud_orders.tsv\n    cat valid_orders.tsv fraud_orders.tsv | sort \u003e orders.tsv\n\n### Packaging Results\nThe following commands create a tarball for the workshop:\n\n    rm -rf datasets.tgz\n    tar cvzf datasets.tgz SCHEMA.md campaign.tsv city_geo.tsv \\\n      impression.tsv clicks.tsv register.tsv orders.tsv chargeback.tsv\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fceteri%2Fclksim","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fceteri%2Fclksim","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fceteri%2Fclksim/lists"}