https://github.com/ceteri/clksim
Clickstream log data simulator in Python
https://github.com/ceteri/clksim
Last synced: over 1 year ago
JSON representation
Clickstream log data simulator in Python
- Host: GitHub
- URL: https://github.com/ceteri/clksim
- Owner: ceteri
- License: apache-2.0
- Created: 2014-08-06T19:11:07.000Z (almost 12 years ago)
- Default Branch: master
- Last Pushed: 2014-08-07T06:54:03.000Z (almost 12 years ago)
- Last Synced: 2025-02-24T12:23:04.946Z (over 1 year ago)
- Language: Python
- Size: 158 KB
- Stars: 7
- Watchers: 2
- Forks: 3
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
## clksim
This code implements a simple *clickstream simulator* written in Python, along with a few TSV data files used to seed it.
It has been used for generating the log files used in the [CCAI workshop](http://liber118.com/course/ccai/).
The simulations used here are embarrassingly simplistic.
Even so, these generate realistic log files that provide:
* impressions based on ad campaigns, with seasonal variation
* landing pages and click-through
* registrations
* orders
* chargebacks
There are some geo aspects in the fraud simulation, which students have used for excellent visualizations.
The fraud patterns are somewhat realistic -- based on what we'd experienced in a popular e-commerce firm, circa 2011.
Students have used the generated log data to build:
* marketing funnel KPI + optimization
* anti-fraud classifiers
* product recommenders
The product recommender aspects are rather light -- that part could be embellished much more.
### Schema
`city_prob.tsv`: *fraud_prob, city, latitude, longitude*
`product.tsv`: *prod_area, product_id, amount*
`campaign.tsv`: *campaign_id, network, rate_metric, rate_amount, keyword, min_lat*
`impression.tsv`: *date, campaign_id, keyword, cookie*
`clicks.tsv`: *date, cookie, landing_page*
`register.tsv`: *date, cookie, customer_id, latitude, longitude*
`orders.tsv`: *date, transaction_id, customer_id, product_id, amount, latitude, longitude*
`chargeback.tsv`: *date, transaction_id, amount*
### Usage
# geo distribution by cities
# probability, city, latitude, longitude
./src/city.py > city_prob.tsv
# product catalog
# product_area, product_id, amount
./src/prod.py > product.tsv
# online marketing campaigns -- hard-coded
# campaign_id, network, rate_metric, rate_amount, keyword
ls dat/campaign.tsv
# ad impressions
# date, campaign_id, keyword, cookie
./src/impr.py dat/campaign.tsv 10000000 | sort > impression.tsv
# click-through
# date, cookie, landing_page
./src/clik.py impression.tsv | sort > clicks.tsv
# customer registrations
# date, cookie, customer_id, latitude, longitude
./src/regs.py clicks.tsv | sort > register.tsv
# e-commerce orders
# date, transaction_id, customer_id, product_id, amount, latitude, longitude
./src/ords.py register.tsv | sort > valid_orders.tsv
# chargebacks
# date, transaction_id, amount
./src/frau.py valid_orders.tsv chargeback.tsv > fraud_orders.tsv
cat valid_orders.tsv fraud_orders.tsv | sort > orders.tsv
### Packaging Results
The following commands create a tarball for the workshop:
rm -rf datasets.tgz
tar cvzf datasets.tgz SCHEMA.md campaign.tsv city_geo.tsv \
impression.tsv clicks.tsv register.tsv orders.tsv chargeback.tsv