Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/hironsan/japanese-news-crawler
A complete automated japanese news crawler built on the top of Scrapy framework
https://github.com/hironsan/japanese-news-crawler
crawler
Last synced: 15 days ago
JSON representation
A complete automated japanese news crawler built on the top of Scrapy framework
- Host: GitHub
- URL: https://github.com/hironsan/japanese-news-crawler
- Owner: Hironsan
- License: mit
- Created: 2017-05-26T04:17:09.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2017-06-02T02:52:44.000Z (over 7 years ago)
- Last Synced: 2024-12-02T01:51:47.233Z (27 days ago)
- Topics: crawler
- Language: Python
- Homepage:
- Size: 158 KB
- Stars: 8
- Watchers: 4
- Forks: 4
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: newspider/__init__.py
- License: LICENSE
Awesome Lists containing this project
README
# Japanese News Crawler
This is a japanese news crawler built on the top of Scrapy framework.
-----------------
## Supported News Site
So far, following news sites are supported:
* [Nikkei](http://www.nikkei.com/news/category/)## Requirements
* Python 3.x
* MongoDB
* Docker(Preferable)## Installation
First of all, you should install Docker:```shell
$ sudo apt-get install docker.io
$ sudo usermod -aG docker $USER
```After Docker installation, you have to prepare Docker images:
```shell
$ wget https://raw.githubusercontent.com/Hironsan/japanese-news-crawler/master/Dockerfile
$ docker build -t newscrawler .
$ docker pull mongo
```Finaly, you have to create Docker container:
```shell
$ mkdir ~/data
$ docker run -d -p 27017:27017 --name dbserver -v ~/data:/data/db mongo
$ docker run -it --name crawler --link dbserver:mng newscrawler
```