https://github.com/shaikhsajid1111/facebook_page_scraper

Scrapes facebook's pages front end with no limitations & provides a feature to turn data into structured JSON or CSV
https://github.com/shaikhsajid1111/facebook_page_scraper

csv facebook facebook-apis facebook-page facebook-page-post facebook-page-post-scraper facebook-page-scraper facebook-scraper fb fb-scrapper hacktoberfest open-source python scraper selenium social-media web-scraper web-scraping

Last synced: 6 months ago
JSON representation

Scrapes facebook's pages front end with no limitations & provides a feature to turn data into structured JSON or CSV

Host: GitHub
URL: https://github.com/shaikhsajid1111/facebook_page_scraper
Owner: shaikhsajid1111
License: mit
Created: 2020-12-21T06:44:22.000Z (almost 5 years ago)
Default Branch: master
Last Pushed: 2024-07-14T12:41:22.000Z (about 1 year ago)
Last Synced: 2024-11-29T09:10:09.122Z (10 months ago)
Topics: csv, facebook, facebook-apis, facebook-page, facebook-page-post, facebook-page-post-scraper, facebook-page-scraper, facebook-scraper, fb, fb-scrapper, hacktoberfest, open-source, python, scraper, selenium, social-media, web-scraper, web-scraping
Language: Python
Homepage: https://pypi.org/project/facebook-page-scraper/
Size: 98.6 KB
Stars: 239
Watchers: 9
Forks: 66
Open Issues: 76
Metadata Files:
- Readme: README.md
- Changelog: changelog.MD
- Contributing: CONTRIBUTING.md
- License: LICENSE

Awesome Lists containing this project

README

Facebook Page Scraper

[![Maintenance](https://img.shields.io/badge/Maintained-Yes-green.svg)](https://github.com/shaikhsajid1111/facebook_page_scraper/graphs/commit-activity)
[![PyPI license](https://img.shields.io/pypi/l/ansicolortags.svg)](https://opensource.org/licenses/MIT) [![Python >=3.6.9](https://img.shields.io/badge/python-3.7+-blue.svg)](https://www.python.org/downloads/release/python-360/)

No need of API key, No limitation on number of requests. Import the library and Just Do It !

Table of Contents

Getting Started
- Prerequisites
- Installation
  - Installing from source
  - Installing with PyPI

Usage

How to instantiate?

Parameters for Facebook_scraper()

Scrape in JSON format
- JSON Output Format

Scrape in CSV format
- Parameters for scrape_to_csv() method

Keys of the output data

Tech

License

Prerequisites

- Internet Connection
- Python 3.7+
- Chrome or Firefox browser installed on your machine

Installation:

Installing from source:

```
git clone https://github.com/shaikhsajid1111/facebook_page_scraper
```

Inside project's directory

```
python3 setup.py install
```

Installing with pypi

```
pip3 install facebook-page-scraper
```

How to use?

```python
#import Facebook_scraper class from facebook_page_scraper
from facebook_page_scraper import Facebook_scraper

#instantiate the Facebook_scraper class

page_or_group_name = "Meta"
posts_count = 10
browser = "firefox"
proxy = "IP:PORT" #if proxy requires authentication then user:password@IP:PORT
timeout = 600 #600 seconds
headless = True
# get env password
fb_password = os.getenv('fb_password')
fb_email = os.getenv('fb_email')
# indicates if the Facebook target is a FB group or FB page
isGroup= False
meta_ai = Facebook_scraper(page_or_group_name, posts_count, browser, proxy=proxy, timeout=timeout, headless=headless, isGroup=isGroup)

```

Parameters for `Facebook_scraper(page_name, posts_count, browser, proxy, timeout, headless)` class

Parameter Name
Parameter Type
Description

page_or_group_name

String

Name of the facebook page or group

posts_count

Integer

Number of posts to scrap, if not passed default is 10

browser

String

Which browser to use, either chrome or firefox. if not passed,default is chrome

proxy(optional)

String

Optional argument, if user wants to set proxy, if proxy requires authentication then the format will be user:password@IP:PORT

timeout

Integer

The maximum amount of time the bot should run for. If not passed, the default timeout is set to 10 minutes

headless

Boolean

Whether to run browser in headless mode?. Default is True

isGroup

Boolean

Whether the Facebook target is a group or page. Default is False

username

String

username to log into Facebook when scraping (recommended to use .env)

password

String

password to log into Facebook when scraping (recommended to use .env)

⚠️ Warning: Use Logged-In Scraping at Your Own Risk ⚠️

Using logged-in scraping methods may result in the permanent suspension of your account. Proceed with caution, as violating a platform's terms of service can lead to severe consequences. Exercise discretion and adhere to ethical practices when collecting data through scraping. The library/provider assumes no responsibility for any consequences resulting from the misuse of scraping methods.

Done with instantiation?. Let the scraping begin!

For post's data in JSON format:

```python
#call the scrap_to_json() method

json_data = meta_ai.scrap_to_json()
print(json_data)

```

Output:

```javascript

{
"2024182624425347": {
"name": "Meta AI",
"shares": 0,
"reactions": {
"likes": 154,
"loves": 19,
"wow": 0,
"cares": 0,
"sad": 0,
"angry": 0,
"haha": 0
},
"reaction_count": 173,
"comments": 2,
"content": "We’ve built data2vec, the first general high-performance self-supervised algorithm for speech, vision, and text. We applied it to different modalities and found it matches or outperforms the best self-supervised algorithms. We hope this brings us closer to a world where computers can learn to solve many different tasks without supervision. Learn more and get the code: https://ai.facebook.com/…/the-first-high-performance-self-s…",
"posted_on": "2022-01-20T22:43:35",
"video": [],
"image": [
"https://scontent-bom1-2.xx.fbcdn.net/v/t39.30808-6/s480x480/272147088_2024182621092014_6532581039236849529_n.jpg?_nc_cat=100&ccb=1-5&_nc_sid=8024bb&_nc_ohc=j4_1PAndJTIAX82OLNq&_nc_ht=scontent-bom1-2.xx&oh=00_AT9us__TvC9eYBqRyQEwEtYSit9r2UKYg0gFoRK7Efrhyw&oe=61F17B71"
],
"post_url": "https://www.facebook.com/MetaAI/photos/a.360372474139712/2024182624425347/?type=3&__xts__%5B0%5D=68.ARBoSaQ-pAC_ApucZNHZ6R-BI3YUSjH4sXsfdZRQ2zZFOwgWGhjt6dmg0VOcmGCLhSFyXpecOY9g1A94vrzU_T-GtYFagqDkJjHuhoyPW2vnkn7fvfzx-ql7fsBYxL5DgQVSsiC1cPoycdCvHmi6BV5Sc4fKADdgDhdFvVvr-ttzXG1ng2DbLzU-XfSes7SAnrPs-gxjODPKJ7AdqkqkSQJ4HrsLgxMgcLFdCsE6feWL7rXjptVWegMVMthhJNVqO0JHu986XBfKKqB60aBFvyAzTSEwJD6o72GtnyzQ-BcH7JxmLtb2_A&__tn__=-R"
}, ...

}
```

Output Structure for JSON format:

```javascript
{
"id": {
"name": string,
"shares": integer,
"reactions": {
"likes": integer,
"loves": integer,
"wow": integer,
"cares": integer,
"sad": integer,
"angry": integer,
"haha": integer
},
"reaction_count": integer,
"comments": integer,
"content": string,
"video" : list,
"image" : list,
"posted_on": datetime, //string containing datetime in ISO 8601
"post_url": string
}
}

```

For saving post's data directly to CSV file

```python
#call scrap_to_csv(filename,directory) method

filename = "data_file" #file name without CSV extension,where data will be saved
directory = "E:\data" #directory where CSV file will be saved
meta_ai.scrap_to_csv(filename, directory)

```

content of `data_file.csv`:

```csv
id,name,shares,likes,loves,wow,cares,sad,angry,haha,reactions_count,comments,content,posted_on,video,image,post_url
2024182624425347,Meta AI,0,154,19,0,0,0,0,0,173,2,"We’ve built data2vec, the first general high-performance self-supervised algorithm for speech, vision, and text. We applied it to different modalities and found it matches or outperforms the best self-supervised algorithms. We hope this brings us closer to a world where computers can learn to solve many different tasks without supervision. Learn more and get the code: https://ai.facebook.com/…/the-first-high-performance-self-s…",2022-01-20T22:43:35,,https://scontent-bom1-2.xx.fbcdn.net/v/t39.30808-6/s480x480/272147088_2024182621092014_6532581039236849529_n.jpg?_nc_cat=100&ccb=1-5&_nc_sid=8024bb&_nc_ohc=j4_1PAndJTIAX82OLNq&_nc_ht=scontent-bom1-2.xx&oh=00_AT9us__TvC9eYBqRyQEwEtYSit9r2UKYg0gFoRK7Efrhyw&oe=61F17B71,https://www.facebook.com/MetaAI/photos/a.360372474139712/2024182624425347/?type=3&__xts__%5B0%5D=68.ARAse4eiZmZQDOZumNZEDR0tQkE5B6g50K6S66JJPccb-KaWJWg6Yz4v19BQFSZRMd04MeBmV24VqvqMB3oyjAwMDJUtpmgkMiITtSP8HOgy8QEx_vFlq1j-UEImZkzeEgSAJYINndnR5aSQn0GUwL54L3x2BsxEqL1lElL7SnHfTVvIFUDyNfAqUWIsXrkI8X5KjoDchUj7aHRga1HB5EE0x60dZcHogUMb1sJDRmKCcx8xisRgk5XzdZKCQDDdEkUqN-Ch9_NYTMtxlchz1KfR0w9wRt8y9l7E7BNhfLrmm4qyxo-ZpA&__tn__=-R
...
```

Parameters for `scrap_to_csv(filename, directory)` method.

Parameter Name
Parameter Type
Description

filename

String

Name of the CSV file where post's data will be saved

Keys of the outputs:

Key

Type

Description

String

Post Identifier(integer casted inside string)

name

String

Name of the page

Integer

Share count of post

reactions

Dictionary

Dictionary containing reactions as keys and its count as value. Keys => ["likes","loves","wow","cares","sad","angry","haha"]

reaction_count

Integer

Total reaction count of post

comments

Integer

Comments count of post

content

String

Content of post as text

video

List

URLs of video present in that post

images

List

List containing URLs of all images present in the post

posted_on

Datetime

Time at which post was posted(in ISO 8601 format)

post_url

String

URL for that post

Tech

This project uses different libraries to work properly.

Selenium

Webdriver Manager

Python Dateutil

Selenium-wire

If you encounter anything unusual please feel free to create issue here

LICENSE

MIT

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/shaikhsajid1111/facebook_page_scraper

Awesome Lists containing this project

README

Facebook Page Scraper

Table of Contents

Prerequisites

Installation:

Installing from source:

Inside project's directory

How to use?

Parameters for `Facebook_scraper(page_name, posts_count, browser, proxy, timeout, headless)` class

Done with instantiation?. Let the scraping begin!

For post's data in JSON format:

For saving post's data directly to CSV file

Parameters for `scrap_to_csv(filename, directory)` method.

Keys of the outputs:

Tech

LICENSE

https://github.com/shaikhsajid1111/facebook_page_scraper

Awesome Lists containing this project

README

Facebook Page Scraper

Table of Contents

Prerequisites

Installation:

Installing from source:

Inside project's directory

How to use?

Parameters for Facebook_scraper(page_name, posts_count, browser, proxy, timeout, headless) class

Done with instantiation?. Let the scraping begin!

For post's data in JSON format:

For saving post's data directly to CSV file

Parameters for scrap_to_csv(filename, directory) method.

Keys of the outputs:

Tech

LICENSE

Parameters for `Facebook_scraper(page_name, posts_count, browser, proxy, timeout, headless)` class

Parameters for `scrap_to_csv(filename, directory)` method.