Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/pjt3591oo/news-crawler
https://github.com/pjt3591oo/news-crawler
crawler data python
Last synced: about 5 hours ago
JSON representation
- Host: GitHub
- URL: https://github.com/pjt3591oo/news-crawler
- Owner: pjt3591oo
- Created: 2019-08-27T04:46:44.000Z (about 5 years ago)
- Default Branch: master
- Last Pushed: 2022-07-06T20:14:42.000Z (over 2 years ago)
- Last Synced: 2023-03-03T06:15:26.891Z (over 1 year ago)
- Topics: crawler, data, python
- Language: Python
- Size: 77.1 KB
- Stars: 4
- Watchers: 3
- Forks: 1
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# News Crawler
뉴스기사 크롤러
1. [사이트 분석](./docs/site_analysis.md)
2. [시스템 구조](./docs/system_architecture.md)
3. usage
4. pipeline 구축## usage
```bash
$ git clone https://github.com/pjt3591oo/news-crawler.git
$ cd news-crawler
```* 로컬 크롤러 실행
```bash
$ pip install -r requirements.txt
$ python crawler.py
```logs 디렉터리로 수집 데이터 로그가 쌓임
* pipeline 전체실행
```bash
$ ./start.sh
```## step by step
* crawler(filebeat), logstash 실행
```
$ ./createImg.sh
$ ./createContainer.sh
```* elsasticsearch
```
$ docker-compose up
```## pipeline 구축
* logstash 실행
```bash
$ logstash -r -f ./pipeline/logstash/pipeline.conf
```* filebeat 실행
```bash
$ filebeat -e -c filebeat.yml -d "publish"
```설정파일 위치
MAC: **`/usr/local/etc/filebeat/`**
ubuntu: **`/etc/filebeat/`**