Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/swaldman/feedletter

A service that converts RSS feeds into e-mail newsletters and notification bridges.
https://github.com/swaldman/feedletter

e-mail-lists mailing-list newsletter rss self-hosted

Last synced: about 1 month ago
JSON representation

A service that converts RSS feeds into e-mail newsletters and notification bridges.

Awesome Lists containing this project

README

        

# feedletter

**Turn any RSS feed into a newsletter or notification bot**

## Introduction

**feedletter** is a service that

- **watches RSS (or Atom!) feeds with great care**
* works great with feeds generated by static-site generators!
* distinguishes between new items and older stuff or stuff already seen that flakily reappears
* awaits "finalization" of items, meaning their stabilization (and nondeletion) over specified time intervals
- lets you **define a wide variety of subscriptions** to those feeds
* Over different media
- e-mail
- Post to Mastodon
- SMS (coming soon!)
- etc
* In different arrangements
- each item as newsletter
- daily or weekly digests
- compilations of every `n` posts
- etc
- which are **formatted via rich, customizable [`untemplates`](https://github.com/swaldman/untemplate-doc#readme)**
- which are **managed via a web API** for easy subscription, confirmation, and unsubscription by users

## Prerequisites

The application requres a Java 17+ JVM and a Postgres database.

Typical installations will proxy the web API behind e.g. `nginx`, and run the daemon as a `systemd` service.

## Getting Started

A (very) detailed tutorial on setting up, configuring, and customizing a _feedletter_ instance is available [here](https://tech.interfluidity.com/2024/01/29/feedletter-tutorial/index.html).

## Developer Notes

### Lifecycle

1. A `feed` is added
2. One or more `subscribable`s is defined against the feed
3. One or more `destination`s subscribe to the feed.
4. `item`s are observed in the feed, and are added in the `Unassigned` state
5. Each `item` is assigned, in a single transaction, to all the collections (`assignable`s) to which they will ever belong.

(Steps 4 and 5 can repeat arbitrarily as new items come in.)

6. Separately, collections (`assignable`s) are periodically marked "complete"
and, in the same transaction forwarded to subscribers.
7. Complete `assignable`s are deleted, along with their `assignment`s
8. `item`s that are...
* Already assigned
* No longer belong to not-yet-completed `assignables` can drop their cached contents,
and then move into the `Cleared` state.

### Database Schema

I want to sketch the not-so-obvious db schema I've adopted for this project while
I still understand it.

#### feed

First there are feeds:

```sql
CREATE TABLE feed(
id INTEGER,
url VARCHAR(1024),
min_delay_minutes INTEGER NOT NULL,
await_stabilization_minutes INTEGER NOT NULL,
max_delay_minutes INTEGER NOT NULL,
assign_every_minutes INTEGER NOT NULL,
added TIMESTAMP NOT NULL,
last_assigned TIMESTAMP NOT NULL, -- we'll start at added
PRIMARY KEY(id)
)
```

Feeds must be defined before subscriptions can be created against them.
They are defined by a URL, and they define what it means for a feed to
"finalize", in the sense of being ready for notification.

Feeds are permanent and basically unchanging until (when someday I implement
this) they are manually removed.

(`last_assigned` changes, but so far it's
just informational, has no role in the application.)

#### item

Next there are items:

```sql
CREATE TABLE item(
feed_id INTEGER,
guid VARCHAR(1024),
single_item_rss TEXT,
content_hash INTEGER, -- ItemContent.contentHash
link VARCHAR(1024),
first_seen TIMESTAMP NOT NULL,
last_checked TIMESTAMP NOT NULL,
stable_since TIMESTAMP NOT NULL,
assignability ItemAssignability NOT NULL,
PRIMARY KEY(feed_id, guid),
FOREIGN KEY(feed_id) REFERENCES feed(id)
)""".stripMargin
```

* `feed_id` and `guid` identify an item.
* `single_item_rss` _caches_ the RSS item.
We want to cache this, in case by the time we get around to notifying, the item is no longer available in the feed.
* `content_hash` is a hash based on the prior five fields. We use it to identify whether an item has changed.
* `link` may eventually be used as a neurotic double-check so we never notify the same human-perceived item twice
* `first_seen`, `last_checked`, and `stable_since` are pretty self-explanatory timestamps, We use these to
calculate whether an item has stabilized and so can be "assigned". (See below.)
* `assignability`: items can be in one of four states
* `Unassigned` — The item has not yet been assigned to the collections (including single member collections)
to which it will eventually belong, but is eligible for assignment.
* `Assigned` — The item _hash_ been assigned to _all_ the collections (including single member collections)
to which it will eventually belong. The application may not be done assigning to those collections, and the
items may not yet be distributed to subscribers.
* `Cleared` — This is the terminal state for an item. The item has been assigned to all collections, and have
already been distributed to subscribers. The cache fields (`title`, `author`, `article`, `publication_date`, and `link`)
should all be cleared in this state. `Cleared` items are not deleted, but retained indefinitely, so that we don't
renotify if the item (an item with the same `guid`) reappears in the feed.
* `Excluded` — Items which are marked to always be ignored. Items are marke `Excluded` only upon initial insert.
Items can be manually updated from `Excluded` to `Unassigned` (timestamps should be reset to the tie of the update),
to cause `Excluded` posts to be published.

#### subscribable (subscription definition)

Next there is `subscribable`, which represents the definition of a subscription by which parties will be
notified of items or collections of items.

```sql
CREATE TABLE subscribable(
subscribable_name VARCHAR(64),
feed_id INTEGER NOT NULL,
subscription_manager_json JSONB NOT NULL,
last_completed_wti VARCHAR(1024),
PRIMARY KEY (subscribable_name),
FOREIGN KEY (feed_id) REFERENCES feed(id)
)
```

A subscribable maps a name to a feed and a `SubscriptionManager`. For our purposes here,
the main role of a `SubscriptionManager` (a serialization of a Scala ADT) is to

1. Generate for items a `within_type_id`, which is really just a collection identifier.
All items in a collection of items that will be distributed will share the same `within_type_id`.
2. Determine whether a collection (identified by its `within_type_id`) is "complete" — that is,
no further items need by assigned the same `within_type_id`.
3. When a collection has been notified, it is deleted from the database. However, some
`SubscriptionManagers` need to maintain a sequence of `within_type_id` identifiers.
So for each subscribable, the `last_completed_wti` is retained.

`SubscriptionManager` determines how collections are compiled, to what kind of destination (e-mail,
Mastodon, mobile message, whatever) notifications will be sent, and how they will be formatted.

Names are scoped on a per-feed-URL basis. Users subscribe to a `(feed_url, subscribable_name)`
pair.

#### assignable (a collection of items)

Next there is `assignable`, which represents a collection. They essentially map
`subscribables` (subscription definitions) to `within_type_id`s (the collections
generated by the subscription definition and notified to subscribers).

```sql
CREATE TABLE assignable(
subscribable_name VARCHAR(64),
within_type_id VARCHAR(1024),
opened TIMESTAMP NOT NULL,
PRIMARY KEY(subscribable_name, within_type_id),
FOREIGN KEY(subscribable_name) REFERENCES subscribable(subscribable_name)
)
```

`opened` is the timestamp of the first assignment to the collection.

Once an assignable has been notified ("completed"), it is simply deleted from the database.
For each subscribable, the `within_type_id` of only the most recently completed
assignable is retained (see `subscribable` table above).

#### assignment (an item in a collection)

Next there is `assignment`, which represents an item in an `assignable`, i.e. a collection.
It's pretty self-explanatory I think.

```sql
CREATE TABLE assignment(
subscribable_name VARCHAR(64),
within_type_id VARCHAR(1024),
guid VARCHAR(1024),
PRIMARY KEY( subscribable_name, within_type_id, guid ),
FOREIGN KEY( subscribable_name, within_type_id ) REFERENCES assignable( subscribable_name, within_type_id )
)
```

#### subscription

Next there is `subscription`, which just maps a destination to a `subscribable`.
the destination is JSON blob that can refer to a variety of things: e-mail addresses, SMS numbers, mastodon instances, etc.
Each `SubscriptionManager` works with a destination subtype.

```sql
CREATE TABLE subscription(
subscription_id BIGINT,
destination_json JSONB NOT NULL,
destination_unique VARCHAR(1024) NOT NULL,
subscribable_name VARCHAR(64) NOT NULL,
confirmed BOOLEAN NOT NULL,
added TIMESTAMP NOT NULL,
PRIMARY KEY( subscription_id ),
FOREIGN KEY( subscribable_name ) REFERENCES subscribable( subscribable_name )
)
```

Since destinations can have ornamentation (an e-mail address,
for example, might have a personal part (e.g. Buffy in "Buffy "), it's not sufficient to prevent multiple
subscriptions to insist that the JSON entities be unique. So destinations declare a unique core, whose uniqueness within
a subscription the database enforces:

```sql
CREATE UNIQUE INDEX destination_unique_subscribable_name ON subscription(destination_unique, subscribable_name)
```

That's it for the base schema! There are also tables that convert destinations specific to subscription
types into their various queues for notification. I'm omitting those for now.

### Untemplates vs Templates

There are two layers of templating in _feedletter_:

Many notifications are rendered via [untemplates](https://github.com/swaldman/untemplate-doc#readme).
However, what untemplates render goes to all subscribers of a subscribable. We generate one "form letter"
for all recipients, and store it only once.

But since we may want to customize our notifications in a per-recipient basis, the _output_ of the untemplates
can take the form of a [trivial template](https://github.com/swaldman/feedletter/blob/main/src/com/mchange/feedletter/trivialtemplate/TrivialTemplate.scala)
with case-insensitive, percentage-delimited `%Fields%` that get filled in separately for each recipient.

We try to refer to the former, initial, shared templates as _untemplates_ (because that's the technology
that underlies them), and the last-minute substitution templates that are generated by the untemplates as
mere templates.