{"id":28240750,"url":"https://github.com/flute/instagram-crawler","last_synced_at":"2025-06-11T00:32:37.821Z","repository":{"id":41761440,"uuid":"150254713","full_name":"flute/instagram-crawler","owner":"flute","description":"instagram crawler, downloads all video and photos from users or tags","archived":false,"fork":false,"pushed_at":"2022-12-10T07:30:57.000Z","size":44,"stargazers_count":8,"open_issues_count":5,"forks_count":1,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-06-04T17:15:44.901Z","etag":null,"topics":["crawler","instagram","instagram-crawler","instagram-downloader"],"latest_commit_sha":null,"homepage":"","language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/flute.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2018-09-25T11:29:43.000Z","updated_at":"2023-03-07T04:00:46.000Z","dependencies_parsed_at":"2023-01-25T23:15:32.458Z","dependency_job_id":null,"html_url":"https://github.com/flute/instagram-crawler","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/flute%2Finstagram-crawler","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/flute%2Finstagram-crawler/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/flute%2Finstagram-crawler/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/flute%2Finstagram-crawler/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/flute","download_url":"https://codeload.github.com/flute/instagram-crawler/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/flute%2Finstagram-crawler/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":259175571,"owners_count":22817021,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["crawler","instagram","instagram-crawler","instagram-downloader"],"created_at":"2025-05-19T04:11:57.730Z","updated_at":"2025-06-11T00:32:37.813Z","avatar_url":"https://github.com/flute.png","language":"JavaScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"### instagram 内容抓取\n\n### 注意\n\n由于某墙的缘故，在本地运行代码虽然可以通过接口拉取到数据，但资源下载不了。需要全局梯子or在外网服务器运行可下载资源成功。\n\n### 运行\n\n修改 `app.js`:\n\n* `users` 为要抓取的用户列表\n* `tags` 为要抓取的tag数组\n* `purePage` 为每页返回的数据条数，最大为50\n* `userCookie` `tagCookie` 在登陆ins后，分别访问用户主页及tag页面获取，然后保存到app.js中。\n\n执行`node app.js`运行代码或使用pm2守护进程：`pm2 start app.js --name 'Instagram'`。\n\n下载完的视频、图片及JSON文件存储在`downloads`对应目录下，完整日志在`logs`目录。\n\n### 程序说明\n\n1、需要登录信息，即抓取时需要附带`cookie`，同时需要`user-agent`。\n\n2、数据获取接口及下载均有频率限制，无间隔的请求（几百个资源）会被限制，在被限制后睡眠一定时间继续。\n\n3、内容抓取分为两个入口\n\n* 一个是抓取某个用户发布的所有资源\n* 一个是抓取某个tag下的所有资源\n\n两种入口附带的cookie不同，请求的URL不同。\n\n4、抓取步骤：\n\n1. 电脑端登陆ins，保存 `cookie`、`query_hash`、`user-agent`信息。后续所有请求附带`cookie`及`user-agent`。\n2. 模拟请求个人主页/tag主页，通过解析HTML页面，得到userId/tag name。同时拿到第一页的数据及下页cursor。\n3. 通过API接口，根据`cursor`持续获取多页数据。所有数据获取完毕后开始下载。\n4. 返回的数据中，图片资源可以直接下载。视频资源需要再次请求视频地址获取接口获得视频地址，然后再下载。\n\n5、请求数据接口：\n\nuser:\n\n```\nhttps://www.instagram.com/graphql/query/?query_hash=a5164aed103f24b03e7b7747a2d94e3c\u0026variables=%7B%22id%22%3A%22%s%22%2C%22first%22%3A${purePage}%2C%22after%22%3A%22%s%22%7D\n```\ntag:\n\n```\nhttps://www.instagram.com/graphql/query/?query_hash=1780c1b186e2c37de9f7da95ce41bb67\u0026variables=%7B%22tag_name%22%3A%22%s%22%2C%22first%22%3A${purePage}%2C%22after%22%3A%22%s%22%7D\n```\n\n获取视频的地址:\n\n```\nhttps://www.instagram.com/p/%s/?__a=1\n```\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fflute%2Finstagram-crawler","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fflute%2Finstagram-crawler","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fflute%2Finstagram-crawler/lists"}