https://github.com/tethik/flashback-user-crawler

Last synced: 11 months ago
JSON representation

Host: GitHub
URL: https://github.com/tethik/flashback-user-crawler
Owner: Tethik
Created: 2018-12-17T01:38:12.000Z (over 7 years ago)
Default Branch: master
Last Pushed: 2018-12-17T01:59:48.000Z (over 7 years ago)
Last Synced: 2025-04-07T15:55:18.817Z (about 1 year ago)
Language: Python
Size: 15.6 KB
Stars: 0
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Flashback User Posts Crawler

![cat](https://66.media.tumblr.com/5face01e81b48d0a4403fee24a09e838/tumblr_o14899nCAX1v56zzqo5_500.jpg)

A simple crawler that fetches all of a users posts in their raw html form from the flashback.org website.

⚠ Only fetches the last 1000 posts, due to a limitation on how many posts are listed on a user's profile.

You will need:

1. python3 and pipenv
2. a throwaway flashback.org account. https://www.throwawaymail.com/ worked great for me to create an account with.
3. a VPN or proxy. I used https://mullvad.net

# Install / Setup

Use `pipenv install` to install all dependencies.

Addd your throwaway username and password in a `.env` file inside the project root, like the following.

```
FLASHBACK_USERNAME=username
FLASHBACK_PASSWORD=password
```

Next run `pipenv shell` to activate the environment.

# Example Usage

The following command will start a crawl and save all posts in their original html into the `downloaded_data` folder.
It requires a target userid, which you can find in the url by navigating to the target user's profile page
in your browser.

```
scrapy crawl user-posts -a userid=221078
```

If you'd like the entire threads, i.e. with all the pages, you can run the command with the `-a fetch_full_thread=True` flag.
Note that this will likely take a lot longer.

# Other helpful stuff

To verify how many posts were downloaded, I counted downloaded files via `ls -1q downloaded_data/221078/posts/ | wc -l`
then compared against the number given on the profile.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/tethik/flashback-user-crawler

Awesome Lists containing this project

README