https://github.com/refeed/scrapy_facebooker

Collection of scrapy spiders which can scrape posts, images, and so on from public Facebook Pages.
https://github.com/refeed/scrapy_facebooker

facebook scraper scraping scrapy spider

Last synced: 8 months ago
JSON representation

Collection of scrapy spiders which can scrape posts, images, and so on from public Facebook Pages.

Host: GitHub
URL: https://github.com/refeed/scrapy_facebooker
Owner: refeed
License: mit
Created: 2017-07-15T12:51:11.000Z (almost 9 years ago)
Default Branch: master
Last Pushed: 2018-12-20T20:27:24.000Z (over 7 years ago)
Last Synced: 2025-06-29T16:14:40.821Z (12 months ago)
Topics: facebook, scraper, scraping, scrapy, spider
Language: HTML
Homepage:
Size: 56.6 KB
Stars: 26
Watchers: 3
Forks: 6
Open Issues: 4
Metadata Files:
- Readme: README.md
- License: LICENSE.txt

Awesome Lists containing this project

README

          # scrapy_facebooker

[![Build Status](https://travis-ci.org/refeed/scrapy_facebooker.svg?branch=master)](https://travis-ci.org/refeed/scrapy_facebooker)

`scrapy_facebooker` is a collection of scrapy spiders which can scrape

posts, images, and so on from public Faceook Pages.

These spiders are intended to archive public Facebook pages, use it at your

own risk!

There are spiders which can operate normally without a Facebook account,

but there are also spiders which just can operate with a Facebook

Graph API access token.

## How to prepare

Before using these spiders you need to install all of its dependencies,

you can easily install it in one command:

```

pip install -r requirements.txt

```

This project is intended to run in Python 3.

## How to run

To run a spider, first you need to choose what spider you want to use,

you can look at spiders available at this project in

`/scrapy_facebooker/spiders/`.

For example, I want to use `facebook_post` spider and run it to scrape a public

page in Facebook with username `RHWEBsites`, and print the output to a file

named `output.json`:

```

$ scrapy crawl facebook_post -a target_username=RHWEBsites -o output.json

```

This is a name list of the spiders available in this repository:

- `facebook_event_graph`

- `facebook_post_graph`

- `facebook_photo_graph`

- `facebook_video_graph`

- `facebook_event`

- `facebook_post`

- `facebook_photo`

## License

Is available at `LICENSE.txt` in the root of this project.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/refeed/scrapy_facebooker

Awesome Lists containing this project

README