Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/timkiely/scrape-instagram-by-location
R package to locate Instagram place IDs and retrieve metadata about posts from that area
https://github.com/timkiely/scrape-instagram-by-location
Last synced: about 2 months ago
JSON representation
R package to locate Instagram place IDs and retrieve metadata about posts from that area
- Host: GitHub
- URL: https://github.com/timkiely/scrape-instagram-by-location
- Owner: timkiely
- Created: 2017-09-08T19:53:03.000Z (about 7 years ago)
- Default Branch: master
- Last Pushed: 2017-09-28T23:15:48.000Z (almost 7 years ago)
- Last Synced: 2024-05-21T02:52:50.287Z (4 months ago)
- Language: R
- Size: 3.76 MB
- Stars: 14
- Watchers: 3
- Forks: 2
- Open Issues: 5
-
Metadata Files:
- Readme: README.Rmd
Awesome Lists containing this project
README
---
title: "Instagram Metadata Scraper"
output: github_document
---### OPEN QUESTIONS:
* How to obtain an exhaustive list of location ID's?
* What are the api call limits? How many calls can you make in a row? What's the reset time? Do IP's ever get blocked? (running network speed test to address this)
* how to integrate proxy use with httr? 1) Obtain list of proxies 2) build fault tolerance
* Parallelize network requests? Maybe launch multiple EC2's?# Instagram Scraper:
This package focuses on metadata analysis of Instagram posts with an emphasis on geo-location.
Unlike other Instagram scraping projects around, the aim of this package is very narrow:
* search for Instagram/Facebook/Foursquare place ID's
* Retrieve the metadata about the posts from that locationThis package is not interested in downloading the media (photos/videos) associated with IG posts.
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```The first function searches for locaiton ID's using whatever text string you're interested in. Let's search for posts around Williamsburg:
```{r, message=FALSE, warning=FALSE}
source("search-for-location-ids.R")possible_locations <- search_for_location_ids(location = "Williamsburg, Brooklyn")
possible_locations
```
The `pk` collumn contains the relevant location ID's we need to hit instagram's Graphql api. The correct location appears to be `242698464`. We can feed that to the subsequent function `get_posts_from_location` in order to retrieve metadata about the first batch of posts.
```{r}
source("scrape-posts-from-location.R")
posts <- get_posts_from_location(location_id = 242698464, after = 0)
posts
```
You can input the `end_cursor` value returned by `get_posts_from_location` to a subsequent call as the `after` parameter. That will ensure the next call will return the next batch of posts.