{"id":32470707,"url":"https://github.com/instagram-automations/instagram-web-scraper","last_synced_at":"2025-10-26T16:19:56.208Z","repository":{"id":318739655,"uuid":"1073889573","full_name":"Instagram-Automations/Instagram-web-scraper","owner":"Instagram-Automations","description":"instagram web scraper and automation toolkit","archived":false,"fork":false,"pushed_at":"2025-10-10T19:35:11.000Z","size":1437,"stargazers_count":0,"open_issues_count":3,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-10-14T19:04:41.886Z","etag":null,"topics":["anti-detect","automation","bot","cli","docker","instagram","instagram-web-scraper","nodejs","proxy","python","rate-limits","selenium","srarper","web"],"latest_commit_sha":null,"homepage":"","language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Instagram-Automations.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-10-10T19:28:01.000Z","updated_at":"2025-10-10T19:35:14.000Z","dependencies_parsed_at":"2025-10-14T19:04:45.250Z","dependency_job_id":"1cbd8f4c-90b7-4cee-b697-8eb87ebe3529","html_url":"https://github.com/Instagram-Automations/Instagram-web-scraper","commit_stats":null,"previous_names":["instagram-automations/instagram-web-scraper"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/Instagram-Automations/Instagram-web-scraper","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Instagram-Automations%2FInstagram-web-scraper","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Instagram-Automations%2FInstagram-web-scraper/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Instagram-Automations%2FInstagram-web-scraper/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Instagram-Automations%2FInstagram-web-scraper/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Instagram-Automations","download_url":"https://codeload.github.com/Instagram-Automations/Instagram-web-scraper/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Instagram-Automations%2FInstagram-web-scraper/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":281132812,"owners_count":26449083,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-26T02:00:06.575Z","response_time":61,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["anti-detect","automation","bot","cli","docker","instagram","instagram-web-scraper","nodejs","proxy","python","rate-limits","selenium","srarper","web"],"created_at":"2025-10-26T16:19:50.948Z","updated_at":"2025-10-26T16:19:56.197Z","avatar_url":"https://github.com/Instagram-Automations.png","language":null,"funding_links":[],"categories":[],"sub_categories":[],"readme":"# instagram web scraper\n\nA production-ready boilerplate to collect publicly available Instagram web data (profiles, posts, hashtags) using safe automation patterns, rotating proxies, and human-like delays. Built for agencies, researchers, and growth teams that want reliable scraping with lower block risk.\n\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"https://t.me/devpilot1\" target=\"_blank\"\u003e\n    \u003cimg src=\"https://img.shields.io/badge/Chat%20on-Telegram-2CA5E0?style=for-the-badge\u0026logo=telegram\u0026logoColor=white\" alt=\"Telegram\"\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://discord.gg/vBu9huKBvy\" target=\"_blank\"\u003e\n    \u003cimg src=\"https://img.shields.io/badge/Join-Discord-5865F2?style=for-the-badge\u0026logo=discord\u0026logoColor=white\" alt=\"Discord\"\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://wa.me/447723343390?text=Hi%20Zeeshan%2C%20I%27m%20interested%20in%20automation.\" target=\"_blank\"\u003e\n    \u003cimg src=\"https://img.shields.io/badge/Chat-WhatsApp-25D366?style=for-the-badge\u0026logo=whatsapp\u0026logoColor=white\" alt=\"WhatsApp\"\u003e\n  \u003c/a\u003e\n  \u003ca href=\"mailto:support@appilot.app\" target=\"_blank\"\u003e\n    \u003cimg src=\"https://img.shields.io/badge/Email-support@appilot.app-EA4335?style=for-the-badge\u0026logo=gmail\u0026logoColor=white\" alt=\"Gmail\"\u003e\n  \u003c/a\u003e\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003cstrong\u003eFor discussion, queries, and freelance work — reach out 👆\u003c/strong\u003e\n\u003c/p\u003e\n\n---\n\n##  Introduction\n\u003e This repository provides a modular Instagram web scraping starter that focuses on resilience (anti-detect flows, rotating proxies, session reuse) and clarity (typed schema, storage adapters). It’s ideal for analysts, SaaS builders, and agencies that need compliant, rate-aware scraping of public pages.\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"instagram-web-scraper.png\" alt=\"instagram-web-scraper.png\" width=\"80%\"\u003e\n\u003c/p\u003e\n\n###  Key Benefits\n1. Saves time with prebuilt Playwright/Selenium runners.  \n2. Scales from single run to distributed jobs.  \n3. Safer with proxy rotation, backoff, fingerprint \u0026 session logic.  \n\n---\n\n## Features must be in table\n\n| Feature | Details |\n|---|---|\n| Headless/Visible Browsers | Playwright or Selenium drivers with toggleable headless mode |\n| Proxy Rotation | Supports residential/mobile proxies with per-request rotation |\n| Session Persistence | Reuse cookies/storage to reduce challenges and CAPTCHAs |\n| Human-like Throttling | Randomized delays, jitter, scrolling, and viewport variance |\n| Target Modules | Profile, posts, hashtag pages (public data) with parsers |\n| Output Formats | JSONL, CSV, SQLite/Postgres adapters |\n| Error/Retry Logic | Exponential backoff, soft-fail queues, resumable runs |\n| CLI Runner | `scrape profiles`, `scrape hashtag`, `resume` subcommands |\n| Dockerized | Reproducible runs with one-line Docker start |\n| Env-First Config | `.env` for proxies, rate limits, storage, headless flags |\n\n---\n\n##  Use Cases\n- Competitive research and trend tracking  \n- Social listening for public hashtags  \n- Creator discovery \u0026 lead lists (public info)  \n- Academic/market research on public engagement  \n\n---\n\n##  FAQs\n\n**Q:** How to remove scraping warning?  \n**A:** Scraping warnings (blocks/challenges) often result from aggressive request rates, reused fingerprints, or IP reputation. Reduce concurrency, add randomized delays, persist sessions, rotate high-quality residential/mobile proxies, and lower fetch depth. Clearing cookies blindly can worsen flags—prefer stable sessions per account/profile, rotate user-agents with consistent device signatures, and implement exponential backoff on 4xx/429 responses.\n\n**Q:** Does Instagram allow web scraping?  \n**A:** Accessing or collecting data is governed by Instagram’s Terms and your local laws. This boilerplate is for educational and compliance-oriented uses on publicly available pages. Always review and follow the platform’s terms and applicable regulations before running any scraper.\n\n**Q:** Can web scraping be detected?  \n**A:** Yes. Platforms detect patterns like high request rates, identical fingerprints, datacenter IPs, and scripted navigation. Mitigate via residential/mobile proxies, realistic browser automation (Playwright/Selenium), randomized timings, scroll/viewport simulation, and consistent sessions. Even with safeguards, detection risk can’t be eliminated—only reduced.\n\n---\n\n## Results\n----------------------------------- \n\u003e 10x faster posting schedules  \n\u003e 80% engagement increase on group campaigns  \n\u003e Fully automated lead response system  \n\n##  Performance Metrics\n-----------------------------------\nAverage Performance Benchmarks:  \n- **Speed:** 2x faster than manual posting  \n- **Stability:** 99.2% uptime  \n- **Ban Rate:** \u003c0.5% with safe automation mode  \n- **Throughput:** 100+ posts/hour per session\n\n---\n\n##Do you have a customize project for us ?\nContact Us\n\n\u003cdiv align=\"center\"\u003e\n  \u003ca href=\"https://mail.google.com/mail/u/?authuser=ahmadzee26@gmail.com\"\u003e\n    \u003cimg alt=\"Gmail\" width=\"30px\" src=\"https://edent.github.io/SuperTinyIcons/images/svg/gmail.svg\" /\u003e\n    \u003ccode\u003esupport@appilot.app\u003c/code\u003e\n  \u003c/a\u003e\n  \u003cspan\u003e ┃ \u003c/span\u003e\n  \u003ca href=\"https://t.me/devpilot1\"\u003e\n    \u003cimg alt=\"Telegram\" width=\"30px\" src=\"https://edent.github.io/SuperTinyIcons/images/svg/telegram.svg\" /\u003e\n    \u003ccode\u003epilot\u003c/code\u003e\n  \u003c/a\u003e\n  \u003cspan\u003e ┃ \u003c/span\u003e\n  \u003ca href=\"https://discord.com\"\u003e\n    \u003cimg alt=\"Discord\" width=\"30px\" src=\"https://github.com/Zeeshanahmad4/RealEstateMate-WhatsApp-Group-Management-Bot/blob/main/discord-icon-svgrepo-com.svg\" /\u003e\n    \u003ccode\u003ezee#2655\u003c/code\u003e\n  \u003c/a\u003e\n  \u003cspan\u003e ┃ \u003c/span\u003e\n  \u003ca href=\"https://wa.me/447723343390?text=Hi%20Zeeshan%2C%20I%27m%20interested%20in%20automation.\" target=\"_blank\"\u003e\n    \u003cimg alt=\"WhatsApp\" width=\"30px\" src=\"https://cdn.jsdelivr.net/npm/simple-icons@v11/icons/whatsapp.svg\" /\u003e\n    \u003ccode\u003ewhatsapp\u003c/code\u003e\n  \u003c/a\u003e\n  \u003cbr /\u003e\n\u003c/div\u003e\n\n---\n\n##  Installation\n\n###  Pre-requisites\n- Node.js or Python  \n- Git  \n- Docker (optional)  \n\n###  Steps\n```bash\n# Clone the repo\ngit clone https://github.com/yourusername/instagram-web-scraper.git\ncd instagram-web-scraper\n\n# Install dependencies\n# Node (Playwright)\nnpm install\nnpx playwright install\n\n# or Python (Selenium/Playwright)\npip install -r requirements.txt\n\n# Setup environment\ncp .env.example .env\n# then edit .env to set:\n# PROXY_URL=           # e.g. http://user:pass@host:port\n# DRIVER=playwright    # or selenium\n# HEADLESS=true\n# RATE_MIN_MS=800\n# RATE_MAX_MS=2200\n# STORAGE_DIR=.storage\n# OUT_FORMAT=jsonl     # csv|jsonl|sqlite|postgres\n\n# Run (examples)\n# Scrape a hashtag page (public)\nnpm run scrape:hashtag -- --tag \"travel\" --limit 50\n# or\npython main.py hashtag --tag \"travel\" --limit 50\n```\n\n---\n\n##  Example Output\n\n```json\n{\"type\":\"post\",\"shortcode\":\"CxyZ12A\",\"likes\":1243,\"comments\":57,\"caption\":\"Sunset shots #travel\",\"timestamp\":\"2025-10-11T14:22:10Z\",\"author\":\"@example\"}\n{\"type\":\"profile\",\"username\":\"example\",\"followers\":10422,\"following\":312,\"posts\":87,\"bio\":\"Photographer | Traveler\"}\n```\n\n---\n\n##  License\n\nMIT License\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Finstagram-automations%2Finstagram-web-scraper","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Finstagram-automations%2Finstagram-web-scraper","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Finstagram-automations%2Finstagram-web-scraper/lists"}