https://github.com/budavariam/traverse_facebook_galleries
It is a selenium scraper that loads an image from facebook, downloads it in full size and travels through the gallery until it arrives to the same image.
https://github.com/budavariam/traverse_facebook_galleries
download-photos facebook scraper selenium selenium-python selenium-webdriver
Last synced: 12 months ago
JSON representation
It is a selenium scraper that loads an image from facebook, downloads it in full size and travels through the gallery until it arrives to the same image.
- Host: GitHub
- URL: https://github.com/budavariam/traverse_facebook_galleries
- Owner: budavariam
- License: mit
- Created: 2018-01-07T22:20:36.000Z (about 8 years ago)
- Default Branch: master
- Last Pushed: 2022-12-07T23:42:24.000Z (over 3 years ago)
- Last Synced: 2025-03-29T05:04:54.757Z (about 1 year ago)
- Topics: download-photos, facebook, scraper, selenium, selenium-python, selenium-webdriver
- Language: Python
- Size: 3.11 MB
- Stars: 7
- Watchers: 2
- Forks: 4
- Open Issues: 6
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Facebook Gallery Downloader (Windows only)
It is a scraper that loads an image from facebook, downloads it in full size and travels through the gallery until it arrives to the same image.
It assumes that:
* the user provided link is an opened image, not just a gallery.
* certain style classes appear in these facebook pages
* the user provides good login information
Disclaimer: The secretly entered password is only used to pass it to selenium in the login page
## Dependencies
1. Uses [Python 3.6](https://www.python.org/download/releases/3.6/)
1. Uses `selenium chromedriver`. The windows version is included in the repository, see this [link](https://sites.google.com/a/chromium.org/chromedriver/downloads) for other versions.
More information [here](https://sites.google.com/a/chromium.org/chromedriver/getting-started)
1. [Google Chrome](https://www.google.com/chrome/browser/desktop/) must be installed on your computer
1. pip
## Instructions (Windows)
1. the virtual environment shall be loaded like: `virtualenv -p c:\Python36\python.exe .ve`
1. run `.ve.bat` to init the working directory by typing `.ve`:
* the webdriver will be added to the path
* the virtualenv will be loaded
1. the requirements should be installed `pip install -r requirements.txt`
1. create `options.json` file from `options_template.json`. It will be ignored by git.
1. to leave from the virtualenv type `deactivate`
## Usage
1. Update the `options.json` appropriately
1. Run `python traverse_gallery.py`
1. Enter your password in the prompt
After first successful login, you can save the printed cookie value to the options file, set the `force-login` field to false and then you do not have to provide your data again until your tokens are valid.
## Options
Name | Description
---- | ----
loginURL | Url for the domain for session cookies, also contains the login fields.
start_images | Array that holds full URL address (with parameters) of one image from each downloadable gallery.
max_workers | Number of parallel image save processes
username | The credential email address, if it is not present it will be asked for. It won't be stored anywhere, only sent to selenium.
cookies | The cookies facebook uses to authenticate the users. After login I write the current cookies to the cobsole. I recommend you to fill it with that one, but feel free to get it from another source, but it might not work as intended.
force_login | If `true` the login data will be requested, the password will have to be written in secretly. If `false` then the provided cookies will be used to authenticate the requests, but when no cookies are provided, then the user will be forced to sign in.
save_image_index | If set to `true`, then save images by their appearance order by adding a number before their names.
destination_dir | The destination directory of the result. Should be empty. The string does not need to contain a slash in the end.
unique_galleries | If `true`, add timestamp to the start of the gallery folder name. Without this there is no guarantee that two galleries will be saved with different names.
## Result
The galleries will be saved to a directory by the album names.
The directories contain the images that are inside them.
Also three files:
* `captions.txt`: The captions of the images
* `data.json`: All of the extracted data for further usage
* `urls.txt`: Urls of the saved images in case of corrupted or missing downloads.
In case of errors see the log generated by the program, it might contain information about the errors.
## TODO
see [todo.md](todo.md) file
## History
This project came alive, because I needed to collect the images uploaded to our group, to fill up our galleries in our public site, with the original image captions included.
I've searched for already existing solutions, but I haven't found exactly what I was looking for.
I've found [seeya](https://github.com/seeya/Facebook-Album-Downloader)'s Facebook Gallery Downloader. I couldn't make it work, facebook has changed since its latest commits, so I tried to use a generalized solution, closely to what I would do if I had to do it manually. It inspired me to use selenium, I updated the code to python3 and added my own tweaks.