Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/softmarshmallow/inked-news-crawler

🕷 korean news source crawler (realtime & bulk)
https://github.com/softmarshmallow/inked-news-crawler

crawler naver-news python3 scrapy

Last synced: 1 day ago
JSON representation

🕷 korean news source crawler (realtime & bulk)

Awesome Lists containing this project

README

        

# How to install virtualenv:

### Install **pip** first

sudo apt-get install python3-pip

### Then install **virtualenv** using pip3

sudo pip3 install virtualenv

### Now create a virtual environment

virtualenv venv

### Active your virtual environment:

source venv/bin/activate

### install pip packages

`pip install -r requirements.txt

### install chromedriver

> latest version from https://sites.google.com/a/chromium.org/chromedriver/downloads
```
sudo apt-get update
sudo apt-get install -y unzip xvfb libxi6 libgconf-2-4
sudo apt-get install default-jdk

wget -N https://chromedriver.storage.googleapis.com/81.0.4044.69/chromedriver_linux64.zip
unzip chromedriver_linux64.zip
chmod +x chromedriver

sudo mv -f chromedriver /usr/local/share/chromedriver
sudo ln -s /usr/local/share/chromedriver /usr/local/bin/chromedriver
sudo ln -s /usr/local/share/chromedriver /usr/bin/chromedriver

wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | sudo apt-key add -
echo 'deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main' | sudo tee /etc/apt/sources.list.d/google-chrome.list
sudo apt-get update
sudo apt-get install google-chrome-stable
```

### Using fish shell:

source venv/bin/activate.fish

### To deactivate:

deactivate

### Create virtualenv using Python3
virtualenv -p python3 myenv

### Instead of using virtualenv you can use this command in Python3
python3 -m venv myenv

### Add python module to path
`export PYTHONPATH="${PYTHONPATH}:inkedNewsCrawler"`
`chmod +x crawler.sh`



## register service

```shell script
sudo cp crawler.service /etc/systemd/system/crawler.service
sudo chmod 664 /etc/systemd/system/crawler.service

sudo systemctl daemon-reload
sudo systemctl start crawler.service
sudo systemctl status crawler.service
sudo systemctl enable crawler.service
```