Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/shaikhsajid1111/facebook_page_scraper
Scrapes facebook's pages front end with no limitations & provides a feature to turn data into structured JSON or CSV
https://github.com/shaikhsajid1111/facebook_page_scraper
csv facebook facebook-apis facebook-page facebook-page-post facebook-page-post-scraper facebook-page-scraper facebook-scraper fb fb-scrapper hacktoberfest open-source python scraper selenium social-media web-scraper web-scraping
Last synced: 4 days ago
JSON representation
Scrapes facebook's pages front end with no limitations & provides a feature to turn data into structured JSON or CSV
- Host: GitHub
- URL: https://github.com/shaikhsajid1111/facebook_page_scraper
- Owner: shaikhsajid1111
- License: mit
- Created: 2020-12-21T06:44:22.000Z (about 4 years ago)
- Default Branch: master
- Last Pushed: 2024-07-14T12:41:22.000Z (6 months ago)
- Last Synced: 2024-11-29T09:10:09.122Z (about 2 months ago)
- Topics: csv, facebook, facebook-apis, facebook-page, facebook-page-post, facebook-page-post-scraper, facebook-page-scraper, facebook-scraper, fb, fb-scrapper, hacktoberfest, open-source, python, scraper, selenium, social-media, web-scraper, web-scraping
- Language: Python
- Homepage: https://pypi.org/project/facebook-page-scraper/
- Size: 98.6 KB
- Stars: 239
- Watchers: 9
- Forks: 66
- Open Issues: 76
-
Metadata Files:
- Readme: README.md
- Changelog: changelog.MD
- Contributing: CONTRIBUTING.md
- License: LICENSE
Awesome Lists containing this project
README
Facebook Page Scraper
[![Maintenance](https://img.shields.io/badge/Maintained-Yes-green.svg)](https://github.com/shaikhsajid1111/facebook_page_scraper/graphs/commit-activity)
[![PyPI license](https://img.shields.io/pypi/l/ansicolortags.svg)](https://opensource.org/licenses/MIT) [![Python >=3.6.9](https://img.shields.io/badge/python-3.7+-blue.svg)](https://www.python.org/downloads/release/python-360/)No need of API key, No limitation on number of requests. Import the library and Just Do It !
Table of Contents
Table of Contents
Getting Started
- Usage
- How to instantiate?
- Parameters for
Facebook_scraper()
Scrape in JSON format
Scrape in CSV format
- Keys of the output data
- Tech
- License
Prerequisites
- Internet Connection
- Python 3.7+
- Chrome or Firefox browser installed on your machine
Installation:
Installing from source:
```
git clone https://github.com/shaikhsajid1111/facebook_page_scraper
```Inside project's directory
```
python3 setup.py install
```
Installing with pypi
```
pip3 install facebook-page-scraper
```
How to use?
```python
#import Facebook_scraper class from facebook_page_scraper
from facebook_page_scraper import Facebook_scraper#instantiate the Facebook_scraper class
page_or_group_name = "Meta"
posts_count = 10
browser = "firefox"
proxy = "IP:PORT" #if proxy requires authentication then user:password@IP:PORT
timeout = 600 #600 seconds
headless = True
# get env password
fb_password = os.getenv('fb_password')
fb_email = os.getenv('fb_email')
# indicates if the Facebook target is a FB group or FB page
isGroup= False
meta_ai = Facebook_scraper(page_or_group_name, posts_count, browser, proxy=proxy, timeout=timeout, headless=headless, isGroup=isGroup)```
Parameters for
Facebook_scraper(page_name, posts_count, browser, proxy, timeout, headless)
classParameter Name
Parameter Type
Descriptionpage_or_group_name
String
Name of the facebook page or group
posts_count
Integer
Number of posts to scrap, if not passed default is 10
browser
String
Which browser to use, either chrome or firefox. if not passed,default is chrome
proxy(optional)
String
Optional argument, if user wants to set proxy, if proxy requires authentication then the format will be
user:password@IP:PORT
timeout
Integer
The maximum amount of time the bot should run for. If not passed, the default timeout is set to 10 minutes
headless
Boolean
Whether to run browser in headless mode?. Default is True
isGroup
Boolean
Whether the Facebook target is a group or page. Default is False
username
String
username to log into Facebook when scraping (recommended to use .env)
password
String
password to log into Facebook when scraping (recommended to use .env)
⚠️ Warning: Use Logged-In Scraping at Your Own Risk ⚠️Using logged-in scraping methods may result in the permanent suspension of your account. Proceed with caution, as violating a platform's terms of service can lead to severe consequences. Exercise discretion and adhere to ethical practices when collecting data through scraping. The library/provider assumes no responsibility for any consequences resulting from the misuse of scraping methods.
Done with instantiation?. Let the scraping begin!
For post's data in JSON format:
```python
#call the scrap_to_json() methodjson_data = meta_ai.scrap_to_json()
print(json_data)```
Output:
```javascript
{
"2024182624425347": {
"name": "Meta AI",
"shares": 0,
"reactions": {
"likes": 154,
"loves": 19,
"wow": 0,
"cares": 0,
"sad": 0,
"angry": 0,
"haha": 0
},
"reaction_count": 173,
"comments": 2,
"content": "We’ve built data2vec, the first general high-performance self-supervised algorithm for speech, vision, and text. We applied it to different modalities and found it matches or outperforms the best self-supervised algorithms. We hope this brings us closer to a world where computers can learn to solve many different tasks without supervision. Learn more and get the code: https://ai.facebook.com/…/the-first-high-performance-self-s…",
"posted_on": "2022-01-20T22:43:35",
"video": [],
"image": [
"https://scontent-bom1-2.xx.fbcdn.net/v/t39.30808-6/s480x480/272147088_2024182621092014_6532581039236849529_n.jpg?_nc_cat=100&ccb=1-5&_nc_sid=8024bb&_nc_ohc=j4_1PAndJTIAX82OLNq&_nc_ht=scontent-bom1-2.xx&oh=00_AT9us__TvC9eYBqRyQEwEtYSit9r2UKYg0gFoRK7Efrhyw&oe=61F17B71"
],
"post_url": "https://www.facebook.com/MetaAI/photos/a.360372474139712/2024182624425347/?type=3&__xts__%5B0%5D=68.ARBoSaQ-pAC_ApucZNHZ6R-BI3YUSjH4sXsfdZRQ2zZFOwgWGhjt6dmg0VOcmGCLhSFyXpecOY9g1A94vrzU_T-GtYFagqDkJjHuhoyPW2vnkn7fvfzx-ql7fsBYxL5DgQVSsiC1cPoycdCvHmi6BV5Sc4fKADdgDhdFvVvr-ttzXG1ng2DbLzU-XfSes7SAnrPs-gxjODPKJ7AdqkqkSQJ4HrsLgxMgcLFdCsE6feWL7rXjptVWegMVMthhJNVqO0JHu986XBfKKqB60aBFvyAzTSEwJD6o72GtnyzQ-BcH7JxmLtb2_A&__tn__=-R"
}, ...}
```
Output Structure for JSON format:```javascript
{
"id": {
"name": string,
"shares": integer,
"reactions": {
"likes": integer,
"loves": integer,
"wow": integer,
"cares": integer,
"sad": integer,
"angry": integer,
"haha": integer
},
"reaction_count": integer,
"comments": integer,
"content": string,
"video" : list,
"image" : list,
"posted_on": datetime, //string containing datetime in ISO 8601
"post_url": string
}
}```
For saving post's data directly to CSV file
```python
#call scrap_to_csv(filename,directory) methodfilename = "data_file" #file name without CSV extension,where data will be saved
directory = "E:\data" #directory where CSV file will be saved
meta_ai.scrap_to_csv(filename, directory)```
content of `data_file.csv`:
```csv
id,name,shares,likes,loves,wow,cares,sad,angry,haha,reactions_count,comments,content,posted_on,video,image,post_url
2024182624425347,Meta AI,0,154,19,0,0,0,0,0,173,2,"We’ve built data2vec, the first general high-performance self-supervised algorithm for speech, vision, and text. We applied it to different modalities and found it matches or outperforms the best self-supervised algorithms. We hope this brings us closer to a world where computers can learn to solve many different tasks without supervision. Learn more and get the code: https://ai.facebook.com/…/the-first-high-performance-self-s…",2022-01-20T22:43:35,,https://scontent-bom1-2.xx.fbcdn.net/v/t39.30808-6/s480x480/272147088_2024182621092014_6532581039236849529_n.jpg?_nc_cat=100&ccb=1-5&_nc_sid=8024bb&_nc_ohc=j4_1PAndJTIAX82OLNq&_nc_ht=scontent-bom1-2.xx&oh=00_AT9us__TvC9eYBqRyQEwEtYSit9r2UKYg0gFoRK7Efrhyw&oe=61F17B71,https://www.facebook.com/MetaAI/photos/a.360372474139712/2024182624425347/?type=3&__xts__%5B0%5D=68.ARAse4eiZmZQDOZumNZEDR0tQkE5B6g50K6S66JJPccb-KaWJWg6Yz4v19BQFSZRMd04MeBmV24VqvqMB3oyjAwMDJUtpmgkMiITtSP8HOgy8QEx_vFlq1j-UEImZkzeEgSAJYINndnR5aSQn0GUwL54L3x2BsxEqL1lElL7SnHfTVvIFUDyNfAqUWIsXrkI8X5KjoDchUj7aHRga1HB5EE0x60dZcHogUMb1sJDRmKCcx8xisRgk5XzdZKCQDDdEkUqN-Ch9_NYTMtxlchz1KfR0w9wRt8y9l7E7BNhfLrmm4qyxo-ZpA&__tn__=-R
...
```
Parameters for
scrap_to_csv(filename, directory)
method.Parameter Name
Parameter Type
Descriptionfilename
String
Name of the CSV file where post's data will be saved
directory
String
Directory where CSV file have to be stored.
Keys of the outputs:
Key
Type
Description
id
String
Post Identifier(integer casted inside string)
name
String
Name of the page
shares
Integer
Share count of post
reactions
Dictionary
Dictionary containing reactions as keys and its count as value. Keys =>
["likes","loves","wow","cares","sad","angry","haha"]
reaction_count
Integer
Total reaction count of post
comments
Integer
Comments count of post
content
String
Content of post as text
video
List
URLs of video present in that post
images
List
List containing URLs of all images present in the post
posted_on
Datetime
Time at which post was posted(in ISO 8601 format)
post_url
String
URL for that post
Tech
This project uses different libraries to work properly.
If you encounter anything unusual please feel free to create issue here
LICENSE
MIT