https://github.com/projectweekend/article-collector
https://github.com/projectweekend/article-collector
Last synced: about 1 year ago
JSON representation
- Host: GitHub
- URL: https://github.com/projectweekend/article-collector
- Owner: projectweekend
- License: mit
- Created: 2015-04-19T22:54:38.000Z (about 11 years ago)
- Default Branch: master
- Last Pushed: 2015-06-08T00:01:46.000Z (about 11 years ago)
- Last Synced: 2025-03-27T15:53:18.627Z (over 1 year ago)
- Language: Python
- Size: 156 KB
- Stars: 0
- Watchers: 1
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
Article-Collector takes a source URL for a news site, extracts all the individual article URLs it can find, and sends each one as a message to RabbitMQ for later processing.
## Config
Article-Collector uses a config file, written in YAML, that contains two properties:
* `source` - the URL for a news site
* `rabbit_url` - the connection URL for a RabbitMQ server
#### Example
```yaml
source: http://cnn.com/
rabbit_url: https://user:password@somerabbitserver.com/whatever
```
## Run with Docker
The Article-Collector is easy to run using [Docker](https://www.docker.com/), first you need to build an image using the provided `Dockerfile`. From inside the project directory:
```
docker build -t give_it_a_name .
```
After the build process completes, you can launch a container to run the process. When launching the container, you will need to mount a volume containing the config file you wish to use:
#### Interactive Mode
```
docker run -it -v /path/to/config.yml:/src/config.yml give_it_a_name
```
#### Detached Mode
```
docker run -d -v /path/to/config.yml:/src/config.yml give_it_a_name
```