{"id":13527965,"url":"https://github.com/InstaPy/instagram-profilecrawl","last_synced_at":"2025-04-01T11:30:38.702Z","repository":{"id":39597173,"uuid":"82048422","full_name":"InstaPy/instagram-profilecrawl","owner":"InstaPy","description":"📝 quickly crawl the information (e.g. followers, tags etc...) of an instagram profile.","archived":false,"fork":false,"pushed_at":"2024-03-19T22:13:31.000Z","size":13309,"stargazers_count":1193,"open_issues_count":12,"forks_count":248,"subscribers_count":57,"default_branch":"master","last_synced_at":"2025-04-01T01:41:15.535Z","etag":null,"topics":["automation","crawler","information","instagram","instapy","python","python-script","selenium","simple"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/InstaPy.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null},"funding":{"github":["timgrossmann"],"patreon":"timgrossmann","open_collective":"InstaPy","ko_fi":"timgrossmann","tidelift":null,"community_bridge":null,"liberapay":null,"issuehunt":null,"otechie":null,"custom":null}},"created_at":"2017-02-15T10:25:36.000Z","updated_at":"2025-03-28T16:55:58.000Z","dependencies_parsed_at":"2022-08-28T14:50:26.000Z","dependency_job_id":"23f79c24-41d1-461a-ae34-78fb592e4489","html_url":"https://github.com/InstaPy/instagram-profilecrawl","commit_stats":{"total_commits":203,"total_committers":28,"mean_commits":7.25,"dds":0.6748768472906403,"last_synced_commit":"2ca9f2753a6577b88b9ae8fe8ec4c130f9def2db"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/InstaPy%2Finstagram-profilecrawl","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/InstaPy%2Finstagram-profilecrawl/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/InstaPy%2Finstagram-profilecrawl/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/InstaPy%2Finstagram-profilecrawl/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/InstaPy","download_url":"https://codeload.github.com/InstaPy/instagram-profilecrawl/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246631511,"owners_count":20808696,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["automation","crawler","information","instagram","instapy","python","python-script","selenium","simple"],"created_at":"2024-08-01T06:02:08.592Z","updated_at":"2025-04-01T11:30:38.673Z","avatar_url":"https://github.com/InstaPy.png","language":"Python","readme":"\u003cimg src=\"https://s3-eu-central-1.amazonaws.com/centaur-wp/designweek/prod/content/uploads/2016/05/11170038/Instagram_Logo-1002x1003.jpg\" width=\"200\" align=\"right\"\u003e\n\n# Instagram-Profilecrawl\n\n## Quickly crawl the information (e.g. followers, tags etc...) of an instagram profile. No login required!\nAutomation Script for crawling information from ones instagram profile.  \nLike e.g. the number of posts, followers, and the tags of the the posts\n\n**Guide to Bot Creation: [Learn to Build your own Bots and Automations with the Creators of InstaPy](https://www.udemy.com/course/the-complete-guide-to-bot-creation/?referralCode=7418EBB47E11E34D86C9)**\n\n#### Getting started\nJust do:\n```bash\ngit clone https://github.com/timgrossmann/instagram-profilecrawl.git\n```\n\nIt uses selenium and requests to get all the information so install them with:\n```bash\npip install -r requirements.txt\n```\n\nCopy the `.env.example` to `.env`  \n```bash\ncp .env.example .env\n```\n\nModify your IG profile inside `.env`   \n```\nIG_USERNAME=\u003cYour Instagram Username\u003e\nIG_PASSWORD=\u003cYour Instagram Password\u003e\n```\n\n\nInstall the proper `chromedriver` for your operating system.  Once you [download it](https://sites.google.com/a/chromium.org/chromedriver/downloads) just drag and drop it into `instagram-profilecrawl/assets` directory.\n\n## Use it!\nNow you can start using it following this example:\n```bash\npython3.7 crawl_profile.py username1 username2 ... usernameX\n```\n\n## Download The Images Posts to your local  \n```bash\npython3.7 extract_image.py \u003ccolected_profiles_path\u003e\n```\n**Settings:**\nTo limit the amount of posts to be analyzed, change variable limit_amount in settings.py. Default value is 12000.\n\n### Optional login\nIf you want to access **more features** (such as private accounts which you followed with yours will be accessible) you must enter your username and password in settings.py. Remember, it's optional.\n\nHere are the steps to do so:\n1. Open Settings.py\n2. Search for `login_username` \u0026 `login_password`\n3. Put your information inside the quotation marks\n\nSecond option:\njust the settings to your script\n```python\nSettings.login_username = 'my_insta_account'\nSettings.login_password = 'my_password_xxx'\n```\n\n### Run on Raspberry Pi\nTo run the crawler on Raspberry Pi with Firefox, follow these steps:\n\n1. Install Firefox: `sudo apt-get install firefox-esr`\n2. Get the `geckodriver` as [described here](https://www.raspberrypi.org/forums/viewtopic.php?t=167292)\n3. Install `pyvirtualdisplay`: `sudo pip3 install pyvirtualdisplay`\n4. Run the script for RPi: `python3 crawl_profile_pi.py username1 username2 ...`\n\n**Collecting stats:**\n\nIf you are interested in collecting and logging stats from a crawled profile, use the `log_stats.py` script *after* runnig `crawl_profile.py` (or `crawl_profile_pi.py`).\nFor example, on Raspberry Pi run:\n\n1. Run `python3 crawl_profile_pi.py username`\n2. Run `python3 log_stats.py -u username` for specific user or `python3 log_stats.py` for all user\n\nThis appends the collected profile info to `stats.csv`. Can be useful for monitoring the growth of an Instagram account over time.\nThe logged stats are: Time, username, total number of followers, following, posts, likes, and comments.\nThe two commands can simply be triggered using `crontab` (make sure to trigger `log_stats.py` several minutes after `crawl_profile_pi.py`).\n\n**Settings:**\n\nPath to the save the profile jsons:\n```python\nSettings.profile_location = os.path.join(BASE_DIR, 'profiles')\n```\nShould the profile json file should get a timestamp\n```python\nSettings.profile_file_with_timestamp = True\n```\nPath to the save the commenters:\n```python\nSettings.profile_commentors_location = os.path.join(BASE_DIR, 'profiles')\n```\nShould the commenters file should get a timestamp\n```python\nSettings.profile_commentors_file_with_timestamp = True\n```\n\nScrape \u0026 save the posts json\n```python\nSettings.scrape_posts_infos = True\n```\nHow many (max) post should be scraped\n```python\nSettings.limit_amount = 12000\n```\nShould the comments also be saved in json files\n```python\nSettings.output_comments = False\n```\nShould the mentions in the post image saved in json files\n```python\nSettings.mentions = True\n```\nShould the users who liked the post saved in json files\n**Attention:** be aware it would take a lot of time. script just can load 12 like at once. before making a break and load again\n```python\nSettings.scrape_posts_likers = True\n```\nShould the profile followers be scrap\n**Attention:** crawler must has be logged in (see above) / crashes sometimes on huge accounts\n```python\nSettings.scrape_follower = True\n```\n\nTime between post scrolling (increase if you got errors)\n```python\nSettings.sleep_time_between_post_scroll = 1.5\n```\nTime between comment scrolling (increase if you got errors)\n```python\nSettings.sleep_time_between_comment_loading = 1.5\n```\n\nOutput debug messages to Console\n```python\nSettings.log_output_toconsole = True\n```\nPath to the logfile\n```python\nSettings.log_location = os.path.join(BASE_DIR, 'logs')\n```\nOutput debug messages to File\n```python\nSettings.log_output_tofile = True\n```\nNew logfile for every run\n```python\nSettings.log_file_per_run = False\n```\n\n\n\n#### The information will be saved in a JSON-File in ./profiles/{username}.json\n\u003e Example of a files data\n```\n{\n  \"alias\": \"Tim Gro\\u00dfmann\",\n  \"username\": \"grossertim\",\n  \"num_of_posts\": 127,\n  \"posts\": [\n    {\n      \"caption\": \"It was a good day\",\n      \"location\": {\n        \"location_url\": \"https://www.instagram.com/explore/locations/345421482541133/caffe-fernet/\",\n        \"location_name\": \"Caffe Fernet\",\n        \"location_id\": \"345421482541133\",\n        \"latitude\": 1.2839,\n        \"longitude\": 103.85333\n      },\n      \"img\": \"https://scontent.cdninstagram.com/t51.2885-15/e15/p640x640/16585292_1355568261161749_3055111083476910080_n.jpg?ig_cache_key=MTQ0ODY3MjA3MTQyMDA3Njg4MA%3D%3D.2\",\n      \"date\": \"2018-04-26T15:07:32.000Z\",\n      \"tags\": [\"#fun\", \"#good\", \"#goodday\", \"#goodlife\", \"#happy\", \"#goodtime\", \"#funny\", ...],\n      \"likes\": 284,\n      \"comments\": {\n        \"count\": 0,\n        \"list\": [],\n       },\n     },\n     {\n      \"caption\": \"Wild Rocket Salad with Japanese Sesame Sauce\",\n      \"location\": {\n        \"location_url\": \"https://www.instagram.com/explore/locations/318744905241462/junior-kuppanna-restaurant-singapore/\",\n        \"location_name\": \"Junior Kuppanna Restaurant, Singapore\",\n        \"location_id\": \"318744905241462\",\n        \"latitude\": 1.31011,\n        \"longitude\": 103.85672\n      },\n      \"img\": \"https://scontent.cdninstagram.com/t51.2885-15/e35/16122741_405776919775271_8171424637851271168_n.jpg?ig_cache_key=MTQ0Nzk0Nzg2NDI2ODc5MTYzNw%3D%3D.2\",\n      \"date\": \"2018-04-26T15:07:32.000Z\",\n      \"tags\": [\"#vegan\", \"#veganfood\", \"#vegansofig\", \"#veganfoodporn\", \"#vegansofig\", ...],\n      \"likes\": 206,\n      \"comments\": {\n        \"count\": 1,\n        \"list\": [\n          {\n            \"user\": \"pastaglueck\",\n            \"comment\": \"nice veganfood\"\n           },\n         ],\n       },\n     },\n     .\n     .\n     .\n     ],\n  \"prof_img\": \"https://scontent.cdninstagram.com/t51.2885-19/s320x320/14564896_1313394225351599_6953533639699202048_a.jpg\",\n  \"followers\": 1950,\n  \"following\": 310\n}\n```\n\nThe script also collects usernames of users who commented on the posts and saves it in ./profiles/{username}_commenters.txt file, sorted by comment frequency.\n\n#### With the help of [Wordcloud](https://github.com/amueller/word_cloud) you could do something like that with your used tags\n![](https://cdn-media-1.freecodecamp.org/images/1*_odSGfGjVl36PnL4S5NXRA.png)\n\n\u003chr /\u003e\n\n###### Have Fun \u0026 Feel Free to report any issues\n","funding_links":["https://github.com/sponsors/timgrossmann","https://patreon.com/timgrossmann","https://opencollective.com/InstaPy","https://ko-fi.com/timgrossmann"],"categories":["Python"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FInstaPy%2Finstagram-profilecrawl","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FInstaPy%2Finstagram-profilecrawl","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FInstaPy%2Finstagram-profilecrawl/lists"}