https://github.com/smbl64/feedcloud

☁️ An RSS scraper and aggregator
https://github.com/smbl64/feedcloud

rss-aggregator rss-feed

Last synced: about 1 month ago
JSON representation

☁️ An RSS scraper and aggregator

Host: GitHub
URL: https://github.com/smbl64/feedcloud
Owner: smbl64
Created: 2021-11-24T10:37:58.000Z (over 3 years ago)
Default Branch: master
Last Pushed: 2021-11-25T20:49:36.000Z (over 3 years ago)
Last Synced: 2025-03-25T05:46:21.740Z (about 1 month ago)
Topics: rss-aggregator, rss-feed
Language: Python
Homepage:
Size: 63.5 KB
Stars: 0
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# FeedCloud

FeedCloud is an RSS scraper application which saves RSS feeds to its database and allows end-users to use the API to read feed entries and manage their own feeds.

## Running the server

You can use Docker and Docker Compose to run the server and interact with the API. To do so, run the following command:

```
docker-compose -f docker-compose.yml -f docker-compose.prod.yml up
```

This should automatically build / download the required Docker images and start the service.

`docker-compose.yml` file contains the base services (PostgreSQL and RabbitMQ), while `docker-compose-prod.yml` defines the FeedCloud specific services. All FeedCloud services share the same Docker image which is defined by `Dockerfile` in the repository. These are the services defined for FeedCloud:

- `init-db`: This command will make sure that the tables exist in the database and a root user is created. The default admin user and password is `root`/`root`.
- `api`: This service runs the API. The API server will listen on the port `5000`.
- `dramatiq-worker`: FeedCloud uses Dramatiq for managing its background tasks. This will run Dramatiq workers.
- `scheduler`: FeedCloud has a dedicated daemon for picking up feeds and scheduling them for download. This Docker service starts that daemon.

**API Endpoints**

By default the server listens on `127.0.0.1:5000` for requests. Below is a list of HTTP endpoints available:

- `GET /swagger.json`: This is the OpenAPI specification of the endpoints.
- `POST /auth/`: Must be used logging in and getting a JWT token. This token must be used when calling other endpoints. Token should be passed via HTTP headers: `Authorization:"Bearer "`.

All of the endpoints are documented in the code and in the OpenAPI specification. So I won't repeat the comments here.

- `POST /feeds/`
- `DELETE /feeds/`
- `GET /feeds/`
- `PUT /feeds//force-run`
- `GET /feeds//entries/`
- `GET /entries/`
- `PUT /entries/`
- `POST /users/`

## Running the tests

First create a Python virtual environment and install the dependencies. Python 3.7+ is required.

```
python3 -m virtualenv .venv
source .venv/bin/activate
pip install requirements/prod.txt
```

Then run the required base services (PostgreSQL and RabbitMQ):

```
docker-compose -f docker-compose.yml up -d
```

And finally use `pytest` to run the test suite:

```
pytest -vv
```

## Design

The project consists of two main packages: `api` and `ingest`.

**feedcloud.api**

API endpoints use `Flask` and `Marshmallow` to incoming HTTP requests. Also `apispec` package is used to generate OpenAPI documentation.

**feedcloud.ingest**

Downloading entries for a feed happen in the background using `dramatiq` and RabbitMQ. On predefined intervals, `FeedScheduler` finds the feeds that need to updated and creates `dramatiq` jobs for each of them.

## Further improvements

I have tried to cover all the main points mentioned in the assignment. But like any other project, there is room for improvement. Below I have listed some of them:

- Right now there is no pagination for feed entries and all of them are being returned to the user.
- More test cases can be added to verify that `mashmallow` rejects invalid requests properly.
- When a feed fails permanently (i.e. the exponential backoff mechanism), FeedCloud needs to send a notification to the user. Right now the app just logs a message in console indicating that it is "informing" the user. I didn't spent time for implementing a email notification system.
- I have made sure that the whole codebase passes the `flake8` checks and the code is formatted with `black`. There is a script in `scripts/run-linters` to help with it. That being said, I think more type hints can be added to the project and then `mypy` can be used to validate them.
- With current design, every user can define the same feed and have the entries separately. An alternative design would have been to share a feed among users. When a user requests to register with a feed, we will create a subscription record between the user and that feed. Also, an additional model is required to keep track of which entries a user has "read" (because entries won't be exclusive to the user anymore).

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/smbl64/feedcloud

Awesome Lists containing this project

README