{"id":24890737,"url":"https://github.com/skulltech/arachnid","last_synced_at":"2025-06-13T17:06:49.380Z","repository":{"id":95312094,"uuid":"134472151","full_name":"skulltech/arachnid","owner":"skulltech","description":"Crawling Instagram for reasons.","archived":false,"fork":false,"pushed_at":"2018-05-29T10:59:14.000Z","size":47,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-03-27T12:48:53.194Z","etag":null,"topics":["crawler","instagram","instagram-scraper","python3","scraper","scrapy"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/skulltech.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-05-22T20:28:05.000Z","updated_at":"2020-03-16T08:48:23.000Z","dependencies_parsed_at":"2023-07-15T21:00:30.205Z","dependency_job_id":null,"html_url":"https://github.com/skulltech/arachnid","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/skulltech/arachnid","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/skulltech%2Farachnid","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/skulltech%2Farachnid/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/skulltech%2Farachnid/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/skulltech%2Farachnid/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/skulltech","download_url":"https://codeload.github.com/skulltech/arachnid/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/skulltech%2Farachnid/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":259686086,"owners_count":22896050,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["crawler","instagram","instagram-scraper","python3","scraper","scrapy"],"created_at":"2025-02-01T17:17:39.645Z","updated_at":"2025-06-13T17:06:49.371Z","avatar_url":"https://github.com/skulltech.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# arachnid\nCrawling Instagram for reasons.\n\n## [Why](http://knowyourmeme.com/memes/why-is-gamora) is arachnid?\n\nThe search feature on Instagram does not support searching for keywords in users' bio, it only looks through the usernames. The only way to circumvent this would be making a local database of the users' info. Unfortunately, recently Instagram has disabled reading public content on a user's behalf through their API platform, see [this](https://raw.githubusercontent.com/SkullTech/arachnid/master/images/instagram-notice.png) for reference. So the only way left to accomplish our task would be scraping. And this is where `arachnid` comes in.\n\nArachnid is a scraper built using the powerful [`scrapy`](https://scrapy.org/) framework. Just give it a list of usernames to start scraping from, the rest will be taken care of ;)\n\n## Installation\n\nClone the repo using `git`. The only requirements to run this program is [`python3`](https://www.python.org/) and [`scrapy`](https://scrapy.org/). You can install `scrapy` using [`pip`](https://pip.pypa.io/en/stable/).\n\n```console\nsumit@HAL9000:~$ git clone https://github.com/SkullTech/arachnid.git\nCloning into 'arachnid'...\nremote: Counting objects: 54, done.\nremote: Compressing objects: 100% (33/33), done.\nremote: Total 54 (delta 17), reused 50 (delta 16), pack-reused 0\nUnpacking objects: 100% (54/54), done.\nsumit@HAL9000:~$ cd arachnid/\nsumit@HAL9000:~/arachnid$ pip3 install -r requirements.txt \n...\n```\n\n## Usage\n\n```console\nsumit@HAL9000:~/arachnid$ scrapy crawl arachnid -a profiles=summit.ghosh,ank.it42 -a linkings=comments,tags -o out.jl\n```\n\nThe details of the crawled profiles will be saved in the _JSONL_ files mentioned with the -o flag, which is _out.jl_ in the above example.\n\n__Spider arguments__, supplied to scrapy using the `-a` argument.\n- `profiles`: Comma-seperated list of usernames from which the crawler would start scraping. This is mandatory.\n- `linkings`: Comma-seperated list of the linkings the crawler would follow to get more usernames to crawl. Choose from `['comments', 'tags', 'likes']`. By default all three are used.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fskulltech%2Farachnid","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fskulltech%2Farachnid","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fskulltech%2Farachnid/lists"}