https://github.com/ryanking13/bellorin
Multi-threaded Social Media Crawler 🔍
https://github.com/ryanking13/bellorin
crawler instagram social-media
Last synced: 12 months ago
JSON representation
Multi-threaded Social Media Crawler 🔍
- Host: GitHub
- URL: https://github.com/ryanking13/bellorin
- Owner: ryanking13
- Created: 2020-01-22T08:06:04.000Z (over 6 years ago)
- Default Branch: master
- Last Pushed: 2022-12-08T03:31:08.000Z (over 3 years ago)
- Last Synced: 2025-02-02T18:28:10.037Z (over 1 year ago)
- Topics: crawler, instagram, social-media
- Language: Python
- Homepage:
- Size: 108 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 3
-
Metadata Files:
- Readme: README-en.md
Awesome Lists containing this project
README
# Bellorin
Keyword based Social Media crawler
## Installation
> pip install -r requirements.txt
## Available Social Media
- Instagram
- Naver Blog
- [Naver API required](https://developers.naver.com/products/search/)
- Naver Cafe
- [Naver API required](https://developers.naver.com/products/search/)
- Tistory
- [Kakao REST API required](https://developers.kakao.com/features/kakao)
## Usage
```sh
usage: run.py [-h] [-v] [-t TARGETS [TARGETS ...]] [-d MAX_DAYS] [-o OUTPUT]
[--no-analyse] [--all-columns]
query [query ...]
positional arguments:
query Query to crawl
optional arguments:
-h, --help show this help message and exit
-v, --verbose Print all debug logs
-t TARGETS [TARGETS ...], --targets TARGETS [TARGETS ...]
Targets services to crawl (default: instagram naver-
blog naver-cafe tistory)
-d MAX_DAYS, --max-days MAX_DAYS
Days to crawl (start from today, going backwards)
-o OUTPUT, --output OUTPUT
Set output log file. if not specified, log will be
printed only to stdout
--no-analyse Do not analyse scrapped data after crawling
--all-columns Add additional columns to scrapped data
```
### Prerequisite
Set API keys in `config.py`.
You can get API KEYs at links below.
- [NAVER](https://developers.naver.com/products/search/)
- [KAKAO](https://developers.kakao.com/docs/restapi/search)
> Writing directly to `config.py` is not recommended. Use environment variables or copy `config.py` to `_config.py` and modify `_config.py`.
### Simple Usage
```sh
python run.py thornapple bandthornapple
# python run.py
# python run.py ...
```
Every collected data is saved at `save/` directory.
### Advanced Usage
#### Specifying target platforms
```sh
# Collects data from Naver Blog and Naver Cafe
python run.py thornapple -t naver-blog naver-cafe
```
By using `-t` option, you can specify target platforms to scrap data.
#### Setting date range
```sh
# From today, to 30 days before
python run.py thornapple -d 30
```
By using `-d` option, you can change date range.
#### Other
```sh
# Verbose mode
python run.py thornapple -v
# Save log to specified file
python run.py thornapple -o out.log
```
### Miscellaneous
- The name `Bellorin` came from the novel [Polaris Rhapsody](https://en.wikipedia.org/wiki/Lee_Yeongdo#Other_novels) by _Lee Yeongdo(이영도)_, Bellorin is a girl who knows everything in the world.