Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/timkiely/scrape-instagram-by-location

R package to locate Instagram place IDs and retrieve metadata about posts from that area
https://github.com/timkiely/scrape-instagram-by-location

Last synced: about 2 months ago
JSON representation

R package to locate Instagram place IDs and retrieve metadata about posts from that area

Host: GitHub
URL: https://github.com/timkiely/scrape-instagram-by-location
Owner: timkiely
Created: 2017-09-08T19:53:03.000Z (about 7 years ago)
Default Branch: master
Last Pushed: 2017-09-28T23:15:48.000Z (almost 7 years ago)
Last Synced: 2024-05-21T02:52:50.287Z (4 months ago)
Language: R
Size: 3.76 MB
Stars: 14
Watchers: 3
Forks: 2
Open Issues: 5
Metadata Files:
- Readme: README.Rmd

Awesome Lists containing this project

README

        ---

title: "Instagram Metadata Scraper"

output: github_document

---

### OPEN QUESTIONS:

* How to obtain an exhaustive list of location ID's? 

* What are the api call limits? How many calls can you make in a row? What's the reset time? Do IP's ever get blocked? (running  network speed test to address this)

* how to integrate proxy use with httr? 1) Obtain list of proxies 2) build fault tolerance

* Parallelize network requests? Maybe launch multiple EC2's?

# Instagram Scraper:

This package focuses on metadata analysis of Instagram posts with an emphasis on geo-location. 

Unlike other Instagram scraping projects around, the aim of this package is very narrow: 

* search for Instagram/Facebook/Foursquare place ID's

* Retrieve the metadata about the posts from that location

This package is not interested in downloading the media (photos/videos) associated with IG posts.

```{r setup, include=FALSE}

knitr::opts_chunk$set(echo = TRUE)

```

The first function searches for locaiton ID's using whatever text string you're interested in. Let's search for posts around Williamsburg:

```{r, message=FALSE, warning=FALSE}

source("search-for-location-ids.R")

possible_locations <- search_for_location_ids(location = "Williamsburg, Brooklyn")

possible_locations

```

The `pk` collumn contains the relevant location ID's we need to hit instagram's Graphql api. The correct location appears to be `242698464`. We can feed that to the subsequent function `get_posts_from_location` in order to retrieve metadata about the first batch of posts.

```{r}

source("scrape-posts-from-location.R")

posts <- get_posts_from_location(location_id = 242698464, after = 0)

posts

```

You can input the `end_cursor` value returned by `get_posts_from_location` to a subsequent call as the `after` parameter. That will ensure the next call will return the next batch of posts.