{"id":21846946,"url":"https://github.com/duyet/shopee-track-demo","last_synced_at":"2025-04-14T13:32:51.079Z","repository":{"id":37291520,"uuid":"433111132","full_name":"duyet/shopee-track-demo","owner":"duyet","description":"Demonstrates how to schedule GitHub Workflows to run scripts for monitoring product availability on the shopee.com","archived":false,"fork":false,"pushed_at":"2024-06-18T10:17:44.000Z","size":99356,"stargazers_count":23,"open_issues_count":1,"forks_count":9,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-03-28T02:39:04.691Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"https://flatgithub.com/duyet/shopee-track-demo?filename=data%2Fmaster.csv\u0026sha=c1864770b4361d9863aef4688ca7d3f67e40c8aa","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/duyet.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-11-29T16:12:01.000Z","updated_at":"2025-03-06T04:49:53.000Z","dependencies_parsed_at":"2023-12-25T14:41:48.537Z","dependency_job_id":null,"html_url":"https://github.com/duyet/shopee-track-demo","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/duyet%2Fshopee-track-demo","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/duyet%2Fshopee-track-demo/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/duyet%2Fshopee-track-demo/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/duyet%2Fshopee-track-demo/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/duyet","download_url":"https://codeload.github.com/duyet/shopee-track-demo/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248888687,"owners_count":21178093,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-27T23:16:14.777Z","updated_at":"2025-04-14T13:32:51.031Z","avatar_url":"https://github.com/duyet.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Data Collector Demo\n\n[![.github/workflows/cronjob.yaml](https://github.com/duyet/shopee-track-demo/actions/workflows/cronjob.yaml/badge.svg)](https://github.com/duyet/shopee-track-demo/actions/workflows/cronjob.yaml)\n\n**[How it works?](#how-it-works)** |\n**[Shopee Dataset Viewer](https://flatgithub.com/duyet/shopee-track-demo?filename=data%2Fmaster.csv\u0026sha=d2f8a9914c69056b3b5cd418425c790ba24b464f)** |\n**[Data Studio Dashboard](https://datastudio.google.com/reporting/c4e332ca-d94a-45e3-882c-b56f96e04c50)**\n\nThis repo is a demo of my presentation at University, intended to be run on a Github Workflows schedule, retrieving data from Shopee.vn\nand creating a commit if there is any change to the fetched data into `./data/` folder.\n\nThis demonstrate offer a simple pattern for working on small data project that you can collect, process\nand store datasets into your repositories and versioning them.\n\nData is collected and stored changes to the `data` folder **every 30 minutes** with `cron: \"0,30 * * * *\"` (see: [.github/workflows/cronjob.yaml](https://github.com/duyet/shopee-track-demo/blob/master/.github/workflows/cronjob.yaml).\n\n# How it works?\n\n![Architecture](.github/architecture.png)\n\n\u003c!-- Draw.io source: https://app.diagrams.net/#G186E1MfMGSuhpoQl6bvcvjuhTuOlBIHx1 --\u003e\n\nGithub Workflows is located at [.github/workflows/cronjob.yaml](.github/workflows/cronjob.yaml),\nit runs the `main.py` script every hour.\n\nThe script will:\n\n1. First, reading the configuration from `config.yaml` (see: [What is YAML?](https://www.redhat.com/en/topics/automation/what-is-yaml))\n\n```yaml\nurls:\n  - https://shopee.vn/Apple-MacBook-Air-(2020)-M1-Chip-13.3-inch-8GB-256GB-SSD-i.88201679.5873954476\n  - https://shopee.vn/Đồng-hồ-nam-Fossil-NEUTRA-CHRONO-dây-da-FS5381-màu-đen-i.318790862.10664696259\n  - https://shopee.vn/Apple-iPhone-13-128GB-i.88201679.10753341705\n```\n\n2. For each url in `urls`, it will try to parse url to get `itemid` and `shopid`\n\n  - URL example:\nhttps://shopee.vn/-M%C3%A3-ELMALL1TR5-gi%E1%BA%A3m-8-%C4%91%C6%A1n-5TR-Apple-MacBook-Air-(2020)-M1-Chip-13.3-inch-8GB-256GB-SSD-i.88201679.5873954476\n\n  - Parsing example:\n      - itemid = 5873954476\n      - shopid = 88201679\n\n3. Call the Shopee API to get the JSON data: https://shopee.vn/api/v4/item/get?itemid=5873954476\u0026shopid=88201679\n\n```json\n{\n  \"data\": {\n    \"itemid\": 3391609027,\n    \"shopid\": 85794847,\n    \"userid\": 0,\n    \"price_max_before_discount\": 19900000000,\n    \"has_lowest_price_guarantee\": false,\n    \"price_before_discount\": 6500000000,\n    \"price_min_before_discount\": 6500000000,\n    \"exclusive_price_info\": null,\n    \"hidden_price_display\": null,\n    \"price_min\": 5000000000,\n    \"price_max\": 12400000000,\n    \"price\": 5000000000,\n    \"stock\": 31,\n    \"discount\": \"50%\",\n    \"historical_sold\": 729,\n    \"sold\": 229,\n    \"show_discount\": 50,\n    \"raw_discount\": 50,\n    \"min_purchase_limit\": 0,\n    \"overall_purchase_limit\": {\n      \"order_max_purchase_limit\": 0,\n      \"overall_purchase_limit\": null,\n      \"item_overall_quota\": null,\n      \"start_date\": null,\n      \"end_date\": null\n    },\n    \"pack_size\": null,\n    \"is_live_streaming_price\": null,\n    \"name\": \"(BH 12 tháng) Sạc Nhanh PD 18W USB-C To Lightning, Sạc 8,X,11,12,13 (Củ Sạc Nhanh PD 18W + Cáp Sạc Nhanh PD)\",\n    \"ctime\": 1623402573,\n    \"item_status\": \"normal\",\n    \"status\": 1,\n    \"condition\": 1,\n    \"catid\": 100013,\n    \"description\": \"....\",\n    \"is_mart\": false,\n    \"show_shopee_verified_label\": true,\n    \"size_chart\": null,\n    \"reference_item_id\": \"\",\n    \"brand\": null,\n    \"item_rating\": {\n      \"rating_star\": 4.890070921985815,\n      \"rating_count\": []\n    },\n    \"label_ids\": [],\n    \"attributes\": [],\n    \"liked\": false,\n    \"liked_count\": 2102,\n    \"cmt_count\": 282,\n    \"flag\": 2,\n    \"shopee_verified\": true,\n    \"is_adult\": false,\n    \"is_preferred_plus_seller\": true,\n    \"bundle_deal_id\": 0,\n    \"can_use_bundle_deal\": false,\n    \"add_on_deal_info\": {\n      \"add_on_deal_id\": 5687796,\n      \"add_on_deal_label\": \"Mua Kèm Deal Sốc\",\n      \"sub_type\": 0,\n      \"status\": 1\n    },\n    \"bundle_deal_info\": null,\n    \"can_use_wholesale\": false,\n    \"wholesale_tier_list\": [],\n    \"is_group_buy_item\": null,\n    \"group_buy_info\": null,\n    \"welcome_package_type\": 0,\n    \"welcome_package_info\": null,\n    \"tax_code\": null,\n    \"invoice_option\": null,\n    \"complaint_policy\": null,\n    \"image\": \"2c3eb5d46df5721d7d1b64cfdb0d4c6c\",\n    \"video_info_list\": null,\n    \"item_type\": 0,\n    \"is_official_shop\": false,\n    \"show_official_shop_label_in_title\": false,\n    \"shop_location\": \"Hà Nội\",\n    \"coin_earn_label\": null,\n    \"cb_option\": 0,\n    \"is_pre_order\": false,\n    \"estimated_days\": 2,\n    \"badge_icon_type\": 0,\n    \"show_free_shipping\": true,\n    \"shipping_icon_type\": 0,\n    \"cod_flag\": 0,\n    \"show_original_guarantee\": false,\n    \"other_stock\": 577,\n    \"item_has_post\": false,\n    \"discount_stock\": 31,\n    \"current_promotion_has_reserve_stock\": true,\n    \"current_promotion_reserved_stock\": 31,\n    \"normal_stock\": 570,\n    \"brand_id\": 0,\n    \"is_alcohol_product\": false,\n    \"show_recycling_info\": false,\n    \"coin_info\": {\n      \"spend_cash_unit\": 100000,\n      \"coin_earn_items\": []\n    },\n    \"spl_info\": {\n      \"installment_info\": null,\n      \"user_credit_info\": null,\n      \"channel_id\": null,\n      \"show_spl\": false,\n      \"show_spl_lite\": true\n    },\n    \"preview_info\": null,\n    \"presale_info\": null,\n    \"is_cc_installment_payment_eligible\": false,\n    \"is_non_cc_installment_payment_eligible\": false,\n    \"flash_sale\": {\n      \"flash_sale_type\": 2,\n      \"extra_discount_info\": null,\n      \"promotionid\": 2031370660,\n      \"start_time\": 1638270000,\n      \"end_time\": 1638280800,\n      \"promo_images\": null,\n      \"price\": 5000000000,\n      \"flash_sale_stock\": 36,\n      \"stock\": 31,\n      \"hidden_price_display\": null,\n      \"promo_overlay_image\": null,\n      \"price_before_discount\": 6500000000\n    },\n    \"upcoming_flash_sale\": null,\n    \"deep_discount\": null,\n    \"has_low_fulfillment_rate\": false,\n    \"is_partial_fulfilled\": false,\n    \"makeups\": null,\n    \"shop_vouchers\": null,\n    \"global_sold\": null\n  }\n}\n```\n\n4. Compare and update the historical data at `./data/info/{itemid}.yaml` and `./data/history/{itemid}.csv`.\n   It updates the master [./data/master.csv](/data/master.csv) as well.\n   Explore the master.csv by using Github Flat Viewer: https://flatgithub.com/duyet/shopee-track-demo?filename=data%2Fmaster.csv\u0026sha=d2f8a9914c69056b3b5cd418425c790ba24b464f\n   \n   (see: [Github Flat Data](https://next.github.com/projects/flat-data))\n\n![Github Flat Viewer](.github/screenshot-flat.png)\n\n5. You can use any tool to use this output csv file [./data/master.csv](./data/master.csv).\n\nFor example, i'm using Google Data Studio to build a dashboard. Please find the live version here: https://datastudio.google.com/reporting/c4e332ca-d94a-45e3-882c-b56f96e04c50\n\n![Data Studio Dashboard](.github/screenshot-data-studio.png)\n\n# Discussion\n\n- How to scale this project to 100 million URLs?\n- How to scan every single product on Shopee?\n- How to deal with duplication?\n- How to design the database if the `master.csv` becomes bigger than 10GB, 100GB, ...?\n- What if Shopee blocked us by too many requests?\n- What if Github Actions blocked us by bad situations?\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fduyet%2Fshopee-track-demo","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fduyet%2Fshopee-track-demo","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fduyet%2Fshopee-track-demo/lists"}