{"id":14976031,"url":"https://github.com/aahouzi/instagram-scraper-2021","last_synced_at":"2025-10-27T17:30:39.592Z","repository":{"id":47106993,"uuid":"328824003","full_name":"aahouzi/Instagram-Scraper-2021","owner":"aahouzi","description":"Scrape Instagram content and stories, using a new technique based on the har file (No Token + No public API).","archived":false,"fork":false,"pushed_at":"2022-08-22T14:11:03.000Z","size":1182,"stargazers_count":111,"open_issues_count":2,"forks_count":12,"subscribers_count":3,"default_branch":"master","last_synced_at":"2024-10-11T13:04:00.210Z","etag":null,"topics":["browsermob-proxy","data","facebook","facebook-graph-api","graphql-api","instagram","instagram-api","instagram-bot","instagram-crawler","instagram-feed","instagram-scraper","instagram-stories","meta","scraper","selenium","webscraping"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/aahouzi.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2021-01-12T00:06:26.000Z","updated_at":"2024-09-18T15:36:37.000Z","dependencies_parsed_at":"2022-08-12T13:11:38.332Z","dependency_job_id":null,"html_url":"https://github.com/aahouzi/Instagram-Scraper-2021","commit_stats":null,"previous_names":["aahouzi-intel/instagram-scraper-2021","aahouzi/instagram-scraper-2021"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aahouzi%2FInstagram-Scraper-2021","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aahouzi%2FInstagram-Scraper-2021/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aahouzi%2FInstagram-Scraper-2021/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aahouzi%2FInstagram-Scraper-2021/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/aahouzi","download_url":"https://codeload.github.com/aahouzi/Instagram-Scraper-2021/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":219860931,"owners_count":16556009,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["browsermob-proxy","data","facebook","facebook-graph-api","graphql-api","instagram","instagram-api","instagram-bot","instagram-crawler","instagram-feed","instagram-scraper","instagram-stories","meta","scraper","selenium","webscraping"],"created_at":"2024-09-24T13:53:11.207Z","updated_at":"2025-10-27T17:30:34.258Z","avatar_url":"https://github.com/aahouzi.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Scrape Instagram content \u0026 stories | 2021 version.\n\n## :monocle_face: Description\n- This project enables the user to scrape all content and feed of a public instagram page, as well as the stories anonymously given the username\n or hashtag of the account.\u003c/br\u003e\n\n- In 2021, Instagram made it even more difficult to scrape data from its graphql API. Even though there are many open-source projects that enables you to\n scrape content from Instagram, many of those projects don't work anymore or work partially, and get you only a small portion of the data you need.\n \n- In this project, I used a **new technique** based on **the har file**. This file contains all the GET requests sent by Instagram to its graphql API,\nand by getting access to this file we can capture all the precious json files containing all the data we need to scrape.\n\n\n## :rocket: Repository Structure\nThe repository contains the following files \u0026 directories:\n- **scraper/insta_feed_scraper.py:** Scrape content/feed from a user public page.\n- **scraper/insta_story_scraper.py:** Scrape stories from a user public page.\n- **scraper/insta_hashtag_scraper.py:** Scrape content from a hashtag page.\n- **data_analysis.ipynb:** It contains some data analysis for the scraped **nike page** feed.\n\n## :scroll: Scraping process\n\n- Before executing the code, the user needs to get **browsermob-proxy-2.1.4** from [here](https://bmp.lightbody.net/) and put it in the project directory\n. This proxy will help us get access to the har file during the execution with Selenium.\n\n- **Scraping stories** is an easy task, since we don't need to analyze graphql responses or get the har file, we only access to Instagram and get every story using\ntheir XPath with selenium.\n\n- **For scraping content**, the user is asked to enter the username or hashtag he wants to scrape, then the program gets access\ndirectly to the username page.**However**, sometimes Instagram blocks the direct access to public pages, and asks the user to log in. In this case, the program types\nsome random user account that was created for scraping purposes. After getting access to the page we want to scrape, **Selenium** executes a Javascript\ncode that enables to keep scrolling down until all the content is loaded. After this step, we analyze the resulting har file in order to extract all\ngraphql responses, in a json format. Finally, we loop through every response to get all the informations we need. Here's a small demo of scraping\n **nike** page feed:\n \n\n![](https://j.gifs.com/k8YDNX.gif)\n\n\n## :bulb: Scraping comments\nAn improvement for this project would be to use the same technique of the har file to scrape all comments given the link of a certain publication. It can be easily\nimplemented using the same strategy: \"We start by having access to the publication (Format: https://www.instagram.com/p/***********), scrolling up comments and\nclicking every time on the plus button to load more comments\". The more we click on the plus button, the more we collect graphql responses,\nand so comments (12 comments per graphql response). **However**, scraping comments will take much more time than scraping content, since we can have thousands\nof comments in a publication, and getting 12 comments per graphql response is time consuming.\n\n\n\n## :mailbox_closed: License \u0026 Contact\nThis code is free to use, share and modify for any non-commercial purposes, any commercial use is strictly prohibited without the authors' consent.\nThis project is for educational purposes, and has no intent to mess with Instagram policies concerning data privacy.\nFor any information, feedback or questions, please [contact me][anas-email]\n\n\n\n\n\n\n\n\n\n\n[anas-email]: mailto:ahouzi2000@hotmail.fr\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faahouzi%2Finstagram-scraper-2021","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Faahouzi%2Finstagram-scraper-2021","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faahouzi%2Finstagram-scraper-2021/lists"}