Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/ezzcodeezzlife/scraper-instagram
Scrape data from Instagram without applying for the authenticated API 🎯
https://github.com/ezzcodeezzlife/scraper-instagram
auth authentication crawler ig instagram instagram-api instagram-client instagram-scraper javascript js nodejs npm scraper scraper-instagram scraping wrapper
Last synced: 10 days ago
JSON representation
Scrape data from Instagram without applying for the authenticated API 🎯
- Host: GitHub
- URL: https://github.com/ezzcodeezzlife/scraper-instagram
- Owner: ezzcodeezzlife
- License: gpl-2.0
- Created: 2022-04-07T12:08:30.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2022-07-27T10:54:20.000Z (over 2 years ago)
- Last Synced: 2023-03-04T19:55:37.327Z (almost 2 years ago)
- Topics: auth, authentication, crawler, ig, instagram, instagram-api, instagram-client, instagram-scraper, javascript, js, nodejs, npm, scraper, scraper-instagram, scraping, wrapper
- Language: JavaScript
- Homepage: https://www.npmjs.com/package/scraper-instagram
- Size: 83 KB
- Stars: 13
- Watchers: 3
- Forks: 2
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
![logo](https://github.com/ezzcodeezzlife/scraper-instagram/blob/main/logo.png)
# scraper-instagramScrape data from Instagram without applying for the authenticated API.
## Getting started
### Prerequisites
- NodeJS
- NPM or Yarn### Install
From [npm](https://www.npmjs.com/package/scraper-instagram)
`npm i scraper-instagram --save`
or
`yarn add scraper-instagram`
### Basic usage
```js
const Insta = require('scraper-instagram');
const InstaClient = new Insta();InstaClient.getHashtag("javascript")
.then((hashtag) => console.log(hashtag))
.catch((err) => console.error(err));
``````
...
{
shortcode: 'CbGxIdAXxA',
caption: 'Lorem ipsum #javascript',
comments: 66,
likes: 1090,
thumbnail: 'https://scontent-dus1-1.cdninstagram.com/v/123',
timestamp: 1647290186
},
...
```# Authentication
Authentication allows you to access private profile as long as you follow them.
##### Importing your session ID
- Go to instagram.com
- Login *(if not already logged in)*
- Open development tools *(`Ctrl` + `Shift` + `I`)*
- Get the `sessionid` cookie value
- For chromium-based browsers : `application` tab
- For firefox-based browsers : `storage` tab##### Code
```js
InstaClient.authBySessionId(yourSessionId)
.then(account => console.log(account))
.catch(err => console.error(err));
```If authentication is successfull, you'll get the form data from `accounts/edit` :
```json
{
"first_name": "",
"last_name": "",
"email": "",
"is_email_confirmed": true,
"is_phone_confirmed": true,
"username": "",
"phone_number": "",
"gender": 1,
"birthday": null,
"biography": "",
"external_url": "",
"chaining_enabled": true,
"presence_disabled": false,
"business_account": false,
"usertag_review_enabled": false
}
```If your session ID is invalid, you'll get the `401` error.
*Username/password authentication may be supported in the future.*
#### Get
These methods allows you to get specific elements from Instagram while you know exactly what you're looking for.
##### Errors handling
`get` may return errors in the two following cases.
- Request error : failed to get data from Instagram (HTTP code)
- Parsing error : failed to parse data returned by Instagram (`406`)
- No content : nothing to parse (`204`)
- Authentication required : session ID required to access data (`401`)
- Too many requests : rate limit exceeded (`429`)
- Conflict : automation detected, password reset required (`409`)## Get profile by username
```js
InstaClient.getProfile(username)
.then(profile => console.log(profile))
.catch(err => console.error(err));
```Result
- `id` *string* - Instagram identifier, only used for stories
- `name` *string* - public full [name](https://help.instagram.com/583107688369069)
- `pic` *url* - public profile [picture](https://help.instagram.com/557544397610546)
- `bio` *string* - public biography
`website` *url* - public website
[more info about bio & website](https://help.instagram.com/362497417173378)
- `private` *boolean* - account [private state](https://help.instagram.com/448523408565555)
- `access` *boolean* - access to the profile's feed
In order to have access to a private account's feed, you must have sent him a follow request that he accepted.
- `verified` *boolean* - account [verified state](https://help.instagram.com/854227311295302)
- `followers` *integer* - number of users following this profile
- `following` *integer* - number of users this profile follows
- `posts` *integer* - number of posts this profile published
- `lastPosts` *array of posts* - last posts
This property is empty (`[]`) when the profile doesn't have any post but `null` if `access` is `false` (denied).
- `link` *url* - link to the profile's page
- `business` *string* - business category (when applicable and profile unblocked)
- `user` *object* - user relevant properties **(while authenticated)** :
- `mutualFollowers` *array of usernames* - people following you and this profile
- `blocking` *boolean* - you blocked this profile
- `blocked` *boolean* - this profile blocked you (only available property in `user` while `true`)
- `requesting` *boolean* - you sent a follow request to this profile (if private)
- `requested` *boolean* - this profile sent you a follow request (if yours is private)
- `following` *boolean* - you're following this profile
- `followed` *boolean* - this profile follows you## Get profile story (requires authentication)
##### Using profile ID
```js
InstaClient.getProfileStoryById(id)
.then(profile => console.log(profile))
.catch(err => console.error(err));
```### Using profile username (will automatically request profile ID)
```js
InstaClient.getProfileStory(username)
.then(profile => console.log(profile))
.catch(err => console.error(err));
```##### Result
- `unread` *boolean* - profile story is unread
- `author` *object* - a subset of profile
- `username`
- `pic`
- `user` *object* - user relevant properties
- `requesting`
- `following`
- `items` *array of stories* - profile stories
- `url` *string* - link to original story file (`jpg`, `mp4`, ...)
- `type` *string* - story type : `photo` or `video`
- `timestamp` *epoch*
- `expirationTimestamp` *epoch**Those methods will return `null` when a profile has no story.*
Note : calling this method will not mark the story as read.
## Get hashtag
```js
InstaClient.getHashtag(hashtag)
.then(hashtag => console.log(hashtag))
.catch(err => console.error(err));
```Result
- `pic` *url* - hashtag profile pic (can't find out how it is chosen)
- `posts` *integer* - number of posts containing this hashtag
- `featuredPosts` *array of posts* - featured posts published with this hashtag
`lastPosts` *array of posts* - last posts published with this hashtag
[more info about hashtag posts](https://help.instagram.com/777754038986618)
- `link` *url* - link to the hashtag's page
- `user` *object* - user relevant properties **(while authenticated)** :
- `following` *boolean* - you [subscribed](https://help.instagram.com/2003408499915301) to this hashtag (receiving posts in your personal feed)### Get location by ID
Unfortunately, using IDs is currently the only way to get a location, at least for now.
```js
InstaClient.getLocation(id)
.then(location => console.log(location))
.catch(err => console.error(err));
```Result
- `pic` *url* - location profile pic
- `posts` *integer* - posts published from that location
- `address` *object*
- `street` *string*
- `zipCode` *string*
- `city` *string*
- `latitude` *float*
- `longitude` *float*
- `website` *url* - place's website
- `phone` *string* - place's contact phone number
- `featuredPosts` *array of posts* - featured posts published from this location
`lastPosts` *array of posts* - last posts published from this location
- `link` *url* - link to this location's page### Array of posts
This is a subset of a real post, containing the following properties :
- `shortcode` *string* - post identifier
- `caption` *string* - post description
- `comments` *integer* - number of comments
- `likes` *integer* - number of likes
- `thumbnail` *url* - post thumbnail
Always static image wether it's a photo or a video post, lower quality.## Get post by shortcode
The shortcode is the post's identifier : the link to a post is instagram.com/p/shortcode.
```js
InstaClient.getPost(shortcode)
.then(post => console.log(post))
.catch(err => console.error(err));
```Result
- `author` *object* - a subset of a profile's properties.
- `username` *string*
- `name` *string*
- `pic` *url*
- `verified` *boolean*
- `link` *url*
- `location`
- `name` *string*
- `city` *string*
- `contents` *array of posts*
- `type` *string* - post type : `photo` or `video`
- `url` *string* - link to original post file (`jpg`, `mp4`, ...)
- if `type` is `video` :
`thumbnail` *string* - link to thumbnail
`views` *integer* - number of views
- `tagged` *array of usernames* - people tagged in post contents
- `likes` *integer* - number of likes
- `caption` *string* - post description
- `hashtags` *array of hashtags* - hashtags mentioned in post description
- `mentions` *array of usernames* - people mentioned in post description
- `edited` *boolean* - caption edited
- `comments` *array of objects* (Max 40)
- `user` *string* - comment author's username
- `content` *string* - comment content
- `timestamp` *epoch*
- `hashtags` *array of hashtags*
- `mentions` *array of usernames*
- `likes` *integer*
- `commentCount` *integer*
- `timestamp` *epoch*
- `link` *string* - link to the post#### Paginated getters (require authentication)
Paginated getters allows bulk data downloads.
Params :
- `maxCount` *integer* - max number of items to return
- `pageId` *string* (optional) - page navigation identifierResult : array + `nextPageId` property
Sample :
```js
(async () => {
const page0 = await somePaginatedGetter(someId, 50);
const page1 = await somePaginatedGetter(someId, 50, page0.nextPageId);
const page2 = await somePaginatedGetter(someId, 50, page1.nextPageId);
})();
```The `pageId`/`nextPageId` property may contain a string of digits, a base64 string, or a JSON string, but always must be leaved untouched.
## Get profile posts
Result in array : full post object
##### Using profile ID
```js
InstaClient.getProfilePostsById(profileId, maxCount, pageId)
.then(posts => console.log(posts))
.catch(err => console.error(err));
```##### Using profile username (will automatically request profile ID)
```js
InstaClient.getProfilePosts(profileUsername, maxCount, pageId)
.then(posts => console.log(posts))
.catch(err => console.error(err));
```## Get post comments
```js
InstaClient.getPostComments(shortcode, maxCount, pageId)
.then(posts => console.log(posts))
.catch(err => console.error(err));
```Result in array : comment object
## Get hashtag posts
```js
InstaClient.getHashtagPosts(hashtag, maxCount, pageId)
.then(posts => console.log(posts))
.catch(err => console.error(err));
```Result in array : partial post object
## Get location posts
```js
InstaClient.getLocationPostsById(locationId, maxCount, pageId)
.then(posts => console.log(posts))
.catch(err => console.error(err));
```Result in array : partial post object
## Search
## Search profile
```js
InstaClient.searchProfile(query)
.then(profiles => console.log(profiles))
.catch(err => console.error(err));
```Result in array : a subset of profile.
- `username`
- `name`
- `pic`
- `private`
- `verified`
- `followers`
- `user`
- `following`## Search hashtag
```js
InstaClient.searchHashtag(hashtag)
.then(hashtags => console.log(hashtags))
.catch(err => console.error(err));
```Result in array : a subset of hashtag.
- `name`
- `posts`## Search location
```js
InstaClient.searchLocation(location)
.then(locations => console.log(locations))
.catch(err => console.error(err));
```Result in array : a subset of location.
- `id`
- `name`
- `address`
- `street`
- `city`
- `latitude`
- `longitude`#### Subscribe to posts
- `options` *object* (optional)
- `interval` *integer* (optional) - time in seconds between requests. **Default : 30**
- `lastPostShortcode` *string* (optional) - shortcode from which to begin if not the next one to be published.
- `fullPosts` *boolean* (optional) - fetch full post data, additional request required##### From user
```js
InstaClient.subscribeUserPosts(username, (post, err) => {
if(post)
console.log(post.shortcode);
else
console.error(err);
}, {
interval,
lastPostShortcode,
fullPosts
});
```##### From hashtag
```js
InstaClient.subscribeHashtagPosts(hashtag, (post, err) => {
if(post)
console.log(post.shortcode);
else
console.error(err);
}, {
interval,
lastPostShortcode,
fullPosts
});
```#### Account requests (user-relevant methods)
## Get account notifications
```js
InstaClient.getAccountNotifications()
.then(notifications => console.log(notifications))
.catch(err => console.error(err));
```Result in array : notification
- `id` *string* - Notification identifier
- `timestamp` *epoch*
- `type` *string* - Notification type : `like`, `mention`, `comment`, `follow`
- `post`
- `shortcode`
- `thumbnail`
- `by`
- `username`
- `name`
- `pic`
- `content` *string* - Comment content (when applicable)##### Subscribe to account notifications
- `options` *object* (optional)
- `interval` *integer* (optional) - time in seconds between requests. **Default : 30**
- `lastNotificationId` *string* (optional) - Notification ID```js
InstaClient.subscribeAccountNotifications((post, err) => {
if(post)
console.log(post.shortcode);
else
console.error(err);
}, {
interval,
lastNotificationId
});
```#### Get account stories
```js
InstaClient.getAccountStories()
.then(stories => console.log(stories))
.catch(err => console.error(err));
```Result in array : inbox-like
- `unread`
- `author` *object* - a subset of a profile's properties.
- `id`
- `username`
- `pic`
- `user` *object* - user relevant properties
- `requesting`
- `following`### Test
- `git clone https://github.com/ezzcodeezzlife/ig-scraper.git`
- `yarn install` or `npm install`
- `yarn test` or `npm run test`Optional environment variables for more complete testing :
- `SESSION_ID` : a session ID for authentication test and authenticated tests
- `PUBLIC_PROFILE` : a public profile to access
- `PRIVATE_PROFILE` : a private profile to access
- `STORY_PROFILE_ID` : a profile ID with a story to read
- `STORY_PROFILE_USERNAME` : a profile username with a story to read
- `HASHTAG` (default value : `cat`) : a hashtag to fetch
- `LOCATION_ID` (default value : `6889842` aka. Paris) : a location to fetch
- `POST` : a post to fetch
- `SEARCH_PROFILE` : a profile to search for
- `SEARCH_HASHTAG` (default value : `cats`) : a hashtag to search for
- `SEARCH_LOCATION` (default value : `Paris`) : a location to search forMethods not covered by tests :
- `subscribeUserPosts`
- `subscribeHashtagPosts`
- `subscribeAccountNotifications`