{"id":31684510,"url":"https://github.com/omarsolieman/socialgiveawaydataanalysis","last_synced_at":"2026-05-14T12:33:35.781Z","repository":{"id":312393981,"uuid":"1047357205","full_name":"omarsolieman/SocialGiveAwayDataAnalysis","owner":"omarsolieman","description":"This project involved cleaning, analyzing, and processing data from an Instagram giveaway to ensure a fair and data-driven winner selection process. The primary goal was to automate the process of identifying valid entries, weighting them based on engagement (likes and multiple entries), and performing a post-giveaway analysis","archived":false,"fork":false,"pushed_at":"2025-08-30T09:34:24.000Z","size":167,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-08-30T10:20:38.058Z","etag":null,"topics":["data-analysis","data-science","data-visualization","instagram","scraping","threejs"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/omarsolieman.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-08-30T08:30:55.000Z","updated_at":"2025-08-30T09:41:33.000Z","dependencies_parsed_at":"2025-08-30T10:20:39.358Z","dependency_job_id":"e69291d0-f89f-42b0-b8b3-1002d4d29973","html_url":"https://github.com/omarsolieman/SocialGiveAwayDataAnalysis","commit_stats":null,"previous_names":["omarsolieman/socialgiveawaydataanalysis"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/omarsolieman/SocialGiveAwayDataAnalysis","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/omarsolieman%2FSocialGiveAwayDataAnalysis","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/omarsolieman%2FSocialGiveAwayDataAnalysis/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/omarsolieman%2FSocialGiveAwayDataAnalysis/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/omarsolieman%2FSocialGiveAwayDataAnalysis/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/omarsolieman","download_url":"https://codeload.github.com/omarsolieman/SocialGiveAwayDataAnalysis/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/omarsolieman%2FSocialGiveAwayDataAnalysis/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":278909739,"owners_count":26066894,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-08T02:00:06.501Z","response_time":56,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-analysis","data-science","data-visualization","instagram","scraping","threejs"],"created_at":"2025-10-08T08:09:18.612Z","updated_at":"2025-10-08T08:11:13.673Z","avatar_url":"https://github.com/omarsolieman.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# SocialGiveAwayDataAnalysis\n# 📊 Project: Instagram Giveaway Analysis \u0026 Winner Selection Automation\n\n## 🚀 Project Overview\nThis project involved cleaning, analyzing, and processing data from an Instagram giveaway to ensure a fair and data-driven winner selection process. The primary goal was to **automate the process of identifying valid entries, weighting them based on engagement (likes and multiple entries), and performing a post-giveaway analysis** to understand participant engagement.\n\n---\n\n## 🤯 The Challenge: Dirty Data\nThe initial dataset, exported from an Instagram scraping tool (`instagram.csv`), presented several challenges:\n\n- **Cryptic Column Headers:** Columns were labeled with non-descriptive names (e.g., `x1i10hfl href`, `_ap3a`), making it difficult to understand the data.\n- **Ambiguous \"Likes\" Data:** The number of likes on a comment was embedded in a text string like `\"1 like\"` or `\"15 likes\"` within an `action_type` column.\n- **Defining a \"Valid Entry\":** The rules for winning (tagging at least 3 people) needed to be programmatically verified. Entries with only tags and no comment text were at risk of being incorrectly discarded.\n- **Duplicate Entries:** The raw data contained numerous duplicate rows. A simple duplicate removal could unfairly penalize users who made multiple, legitimate entries.\n- **Potential for Spam/Bot Activity:** Some users had an unusually high number of entries (e.g., over 150), requiring investigation to ensure they were not automated spam.\n\n---\n\n## 💡 My Solution: A Multi-Stage Python Scripting Process\nI developed a series of **Python scripts** to create a repeatable and transparent workflow.\n\n### 1. **Advanced Data Cleaning** (`advanced_cleaner.py`)\n- **Relabeled Columns:** Renamed cryptic column names to descriptive ones like `username`, `comment_text`, and `mentioned_user_1_username`.\n- **Intelligent Duplicate Removal:** Only removed rows that were 100% identical, preserving all legitimate multiple entries.\n- **Handled Empty Comments:** Added logic to ensure comments containing only tags were considered valid.\n\n### 2. **Fair Winner Selection** (`pick_winner.py`)\n- **Identified Valid Entries:** Filtered the dataset to find all comments where at least **three unique users** were mentioned.\n- **Weighted Chance System:** Calculated a *winning score* for each participant based on the sum of `1 + likes` for all their valid entries.\n- **Multi-Winner Selection:** Configured the script to select **10 unique winners**.\n\n### 3. **Post-Giveaway Analysis** (`analyze_giveaway.py`)\n- **Generated Detailed Statistics:** Provided a breakdown of engagement metrics after winners were chosen.\n- **Created Summary Tables:** Exported a clean summary of the 10 winners to a `.csv` file.\n- **Spam/Bot Investigation:** Added a feature to flag users with exceptionally high entries and generate a report with comment samples for manual review.\n\n---\n\n## 💪 Key Challenges \u0026 How I Overcame Them\n- **Challenge:** Incorrectly removing valid entries.  \n  **Solution:** Iterated on the duplicate removal logic to only remove true scraping errors.\n  \n- **Challenge:** Extracting numerical \"likes\" from text.  \n  **Solution:** Used Pandas `.str.extract()` with a regex to parse like counts.\n\n- **Challenge:** Verifying a user with 160+ entries.  \n  **Solution:** Built a high-volume analysis tool to create a report with timestamps and samples.\n\n- **Challenge:** `UnicodeEncodeError` on some systems.  \n  **Solution:** Removed emoji characters from print statements for script portability.\n\n---\n\n## ✨ Results \u0026 Visualizations\nThe scripts successfully:\n- Cleaned the dataset.\n- Selected **10 winners** based on a fair and weighted system.\n- Generated a comprehensive analysis report.\n\n### **Winner Engagement Comparison**\nA chart was generated to visualize the final *winning score* for each of the 10 winners, highlighting engagement differences.\n\n### **Winner Summary Table (Usernames Censored)**\n\n| Username    | Total Valid Entries | Total Likes on Entries | Final Winning Score |\n|-------------|----------------------|-------------------------|----------------------|\n| Winner 1    | 160                 | 43                      | 203                 |\n| Winner 2    | 53                  | 23                      | 76                  |\n| Winner 3    | 9                   | 9                       | 18                  |\n| Winner 4    | 26                  | 24                      | 50                  |\n| Winner 5    | 1                   | 1                       | 2                   |\n| Winner 6    | 24                  | 24                      | 48                  |\n| Winner 7    | 1                   | 0                       | 1                   |\n| Winner 8    | 272                 | 18                      | 290                 |\n| Winner 9    | 14                  | 0                       | 14                   |\n| Winner 10   | 2                   | 0                       | 2                   |\n\n---\n\n## 💻 Technologies Used\n- **Python**\n  - **Pandas:** For data manipulation, cleaning, and analysis.\n  - **Matplotlib \u0026 Seaborn:** For data visualization and generating graphs.\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fomarsolieman%2Fsocialgiveawaydataanalysis","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fomarsolieman%2Fsocialgiveawaydataanalysis","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fomarsolieman%2Fsocialgiveawaydataanalysis/lists"}