An open API service indexing awesome lists of open source software.

https://github.com/vitorpy/noreposts-atproto-feed


https://github.com/vitorpy/noreposts-atproto-feed

atproto bluesky feed-generator jetstream rust

Last synced: 2 months ago
JSON representation

Awesome Lists containing this project

README

          

# Following No Reposts Feed Generator

A production-ready Bluesky feed generator written in Rust that shows posts from people you follow, excluding all reposts. Built using Jetstream for efficient real-time data consumption with full JWT signature verification.

## Table of Contents

- [Features](#features)
- [How It Works](#how-it-works)
- [Prerequisites](#prerequisites)
- [Installation](#installation)
- [Configuration](#configuration)
- [Running Locally](#running-locally)
- [Deployment](#deployment)
- [Publishing Your Feed](#publishing-your-feed)
- [Architecture](#architecture)
- [API Endpoints](#api-endpoints)
- [Performance](#performance)
- [Development](#development)
- [Troubleshooting](#troubleshooting)
- [Contributing](#contributing)
- [License](#license)

## Features

- **🚫 No Reposts**: Automatically filters out all reposts, showing only original content
- **πŸ‘₯ Personalized**: Shows only posts from accounts you follow
- **⚑ Real-time**: Updates in real-time as new posts are created
- **πŸ”’ Secure**: Full ES256K JWT signature verification with DID resolution
- **πŸ“‘ Efficient**: Uses Jetstream for lightweight event consumption (~850 MB/day vs 200+ GB/day)
- **πŸ—„οΈ Smart Caching**: Automatic cleanup of posts older than 48 hours
- **πŸ”„ Auto-recovery**: Automatic reconnection on Jetstream disconnects
- **πŸ“Š Observable**: Structured logging with configurable verbosity
- **πŸ—οΈ Production Ready**: Battle-tested error handling and recovery mechanisms

## How It Works

This feed generator:

1. **Consumes Events**: Connects to Bluesky's Jetstream to receive real-time events for posts and follows
2. **Filters Content**: Only subscribes to `app.bsky.feed.post` and `app.bsky.graph.follow` collections
3. **Stores Data**: Maintains a local SQLite database of recent posts and follow relationships
4. **Serves Feeds**: Provides personalized feeds via AT Protocol's `app.bsky.feed.getFeedSkeleton` endpoint
5. **Authenticates Users**: Validates JWT tokens by resolving user DIDs and verifying signatures

## Prerequisites

- **Rust** 1.70+ (install via [rustup](https://rustup.rs))
- **SQLite** 3.35+ (usually pre-installed on modern systems)
- **Domain** with HTTPS (required for production deployment)
- **Bluesky Account** (for publishing the feed)

## Installation

### 1. Clone the Repository

```bash
git clone https://github.com/vitorpy/noreposts-atproto-feed.git
cd noreposts-atproto-feed
```

### 2. Build the Project

```bash
cargo build --release
```

The compiled binary will be at `target/release/following-no-reposts-feed`.

## Configuration

### Environment Variables

Create a `.env` file in the project root (see `.env.example` for reference):

```bash
# Required: Database location
DATABASE_URL=sqlite:./feed.db

# Required: Server port
PORT=3000

# Required: Your domain name
FEEDGEN_HOSTNAME=your-domain.com

# Required: Your service DID
FEEDGEN_SERVICE_DID=did:web:your-domain.com

# Optional: Jetstream server (defaults to jetstream1.us-east.bsky.network)
JETSTREAM_HOSTNAME=jetstream1.us-east.bsky.network
```

### Service DID Setup

Your `FEEDGEN_SERVICE_DID` should match your domain. For `did:web`, it's typically:
- Domain: `feed.example.com` β†’ DID: `did:web:feed.example.com`
- Domain: `example.com` β†’ DID: `did:web:example.com`

## Running Locally

### Quick Start

```bash
# Set up environment
cp .env.example .env
# Edit .env with your configuration
nano .env

# Run the server
cargo run --release
```

The server will:
1. Automatically run database migrations
2. Connect to Jetstream and start consuming events
3. Start the HTTP server on the configured port
4. Serve the DID document at `/.well-known/did.json`

### Testing Locally

```bash
# Test DID document endpoint
curl http://localhost:3000/.well-known/did.json

# Test feed endpoint (requires authentication in production)
curl "http://localhost:3000/xrpc/app.bsky.feed.getFeedSkeleton?feed=at://did:web:your-domain.com/app.bsky.feed.generator/following-no-reposts&limit=10"
```

### Command-Line Options

```bash
# Override environment variables
./following-no-reposts-feed --port 8080 --hostname feed.example.com

# Run database migrations only
./following-no-reposts-feed migrate

# Publish feed to your Bluesky account
./following-no-reposts-feed publish \
--handle your-handle.bsky.social \
--password your-app-password \
--record-name following-no-reposts \
--display-name "Following (No Reposts)" \
--description "See posts from people you follow, without any reposts"

# Backfill posts from firehose (optional)
./following-no-reposts-feed backfill --cursor
```

## Deployment

### 1. Build for Production

```bash
cargo build --release --locked
```

### 2. Set Up Your Server

Transfer the binary to your server:

```bash
scp target/release/following-no-reposts-feed user@your-server:/opt/feed-generator/
```

### 3. Create a Systemd Service

Create `/etc/systemd/system/feed-generator.service`:

```ini
[Unit]
Description=Bluesky Feed Generator - Following No Reposts
After=network.target

[Service]
Type=simple
User=feedgen
WorkingDirectory=/opt/feed-generator
Environment="DATABASE_URL=sqlite:/opt/feed-generator/feed.db"
Environment="PORT=3000"
Environment="FEEDGEN_HOSTNAME=your-domain.com"
Environment="FEEDGEN_SERVICE_DID=did:web:your-domain.com"
ExecStart=/opt/feed-generator/following-no-reposts-feed
Restart=always
RestartSec=10

[Install]
WantedBy=multi-user.target
```

Enable and start the service:

```bash
sudo systemctl daemon-reload
sudo systemctl enable feed-generator
sudo systemctl start feed-generator
sudo systemctl status feed-generator
```

### 4. Configure Reverse Proxy

The feed generator must be accessible via HTTPS. Configure your reverse proxy:

#### Nginx

```nginx
server {
listen 443 ssl http2;
server_name your-domain.com;

ssl_certificate /etc/letsencrypt/live/your-domain.com/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/your-domain.com/privkey.pem;

location / {
proxy_pass http://127.0.0.1:3000;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;

# WebSocket support for Jetstream
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
}
}
```

#### Caddy

```caddyfile
your-domain.com {
reverse_proxy localhost:3000
}
```

### 5. Verify Deployment

```bash
# Test DID document
curl https://your-domain.com/.well-known/did.json

# Check if feed endpoint is accessible
curl https://your-domain.com/xrpc/app.bsky.feed.getFeedSkeleton
```

## Publishing Your Feed

Once your feed generator is deployed and accessible via HTTPS:

### Method 1: Using the Built-in Publish Command

```bash
./following-no-reposts-feed publish \
--handle your-handle.bsky.social \
--password your-app-password \
--record-name following-no-reposts \
--display-name "Following (No Reposts)" \
--description "See posts from people you follow, without any reposts" \
--avatar ./avatar.png
```

**Note**: Use an [App Password](https://bsky.app/settings/app-passwords), not your main account password!

### Method 2: Manual Publishing

1. **Get your DID**:
```bash
curl "https://bsky.social/xrpc/com.atproto.identity.resolveHandle?handle=yourhandle.bsky.social"
```

2. **Create a session**:
```bash
curl -X POST https://bsky.social/xrpc/com.atproto.server.createSession \
-H "Content-Type: application/json" \
-d '{"identifier": "yourhandle.bsky.social", "password": "your-app-password"}'
```

3. **Publish the feed generator record**:
```bash
curl -X POST https://bsky.social/xrpc/com.atproto.repo.putRecord \
-H "Authorization: Bearer YOUR_ACCESS_JWT" \
-H "Content-Type: application/json" \
-d '{
"repo": "your.did",
"collection": "app.bsky.feed.generator",
"rkey": "following-no-reposts",
"record": {
"$type": "app.bsky.feed.generator",
"did": "did:web:your-domain.com",
"displayName": "Following (No Reposts)",
"description": "See posts from people you follow, without any reposts",
"createdAt": "2025-01-01T00:00:00.000Z"
}
}'
```

### Finding Your Feed

After publishing, your feed will be available at:

```
https://bsky.app/profile/yourhandle.bsky.social/feed/following-no-reposts
```

## Architecture

### System Components

```
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Jetstream │────────▢│ Consumer │────────▢│ Database β”‚
β”‚ (WebSocket) β”‚ Events β”‚ (Async) β”‚ Store β”‚ (SQLite) β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚
β”‚ Read
β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Bluesky │────────▢│ HTTP Server │────────▢│ Feed β”‚
β”‚ App β”‚ JWT β”‚ (Axum) β”‚ Query β”‚ Algorithm β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```

### Code Structure

- **`main.rs`**: Application entry point, HTTP server setup, routing
- **`jetstream_consumer.rs`**: WebSocket client for Jetstream events
- **`database.rs`**: SQLite abstraction layer, queries, and migrations
- **`feed_algorithm.rs`**: Feed generation logic (filtering by follows, excluding reposts)
- **`auth.rs`**: JWT validation with ES256K signature verification
- **`backfill.rs`**: Optional historical data backfilling from firehose
- **`publish.rs`**: Feed generator publishing utilities
- **`admin_socket.rs`**: Unix socket for admin commands
- **`types.rs`**: Shared data structures

### Data Flow

1. **Event Ingestion**: Jetstream sends `commit` events when posts are created or follows happen
2. **Event Processing**: Consumer parses events and extracts relevant data
3. **Database Storage**: Posts and follows are stored in SQLite with TTL
4. **Feed Requests**: Bluesky app requests feed via `getFeedSkeleton`
5. **Authentication**: JWT is validated by resolving DID and verifying signature
6. **Feed Generation**: Algorithm queries posts from followed users, excludes reposts
7. **Response**: Ordered list of post URIs returned with pagination cursor

## API Endpoints

### `GET /.well-known/did.json`

Returns the DID document for the feed generator service.

**Response**:
```json
{
"@context": ["https://www.w3.org/ns/did/v1"],
"id": "did:web:your-domain.com",
"service": [{
"id": "#bsky_fg",
"type": "BskyFeedGenerator",
"serviceEndpoint": "https://your-domain.com"
}]
}
```

### `GET /xrpc/app.bsky.feed.getFeedSkeleton`

Returns a personalized feed skeleton for the authenticated user.

**Query Parameters**:
- `feed` (required): Feed AT-URI (e.g., `at://did:web:your-domain.com/app.bsky.feed.generator/following-no-reposts`)
- `limit` (optional): Number of posts (1-100, default: 50)
- `cursor` (optional): Pagination cursor

**Headers**:
- `Authorization`: Bearer JWT token from Bluesky app

**Response**:
```json
{
"feed": [
{"post": "at://did:plc:xxx/app.bsky.feed.post/abc123"},
{"post": "at://did:plc:yyy/app.bsky.feed.post/def456"}
],
"cursor": "1234567890"
}
```

## Performance

### Resource Usage

- **Memory**: ~50-100 MB (depends on database size)
- **CPU**: Minimal (<1% on modern hardware)
- **Bandwidth**: ~850 MB/day (Jetstream with compression)
- **Storage**: ~100-500 MB (48-hour post retention)

### Scalability

The feed generator can handle:
- **10,000+** active users
- **1M+** posts/day ingestion
- **100+** requests/second

### Database Optimization

```sql
-- Efficient indexes
CREATE INDEX idx_posts_author_did ON posts(author_did);
CREATE INDEX idx_posts_indexed_at ON posts(indexed_at);
CREATE INDEX idx_follows_follower ON follows(follower_did);

-- Automatic cleanup (posts > 48 hours old)
DELETE FROM posts WHERE indexed_at < datetime('now', '-2 days');
```

## Development

### Running Tests

```bash
cargo test
```

### Database Migrations

Create a new migration:

```bash
sqlx migrate add your_migration_name
```

Run migrations manually:

```bash
cargo install sqlx-cli --no-default-features --features sqlite
sqlx migrate run
```

### Logging

Set the `RUST_LOG` environment variable:

```bash
# Debug level (verbose)
RUST_LOG=debug cargo run

# Info level (default)
RUST_LOG=info cargo run

# Specific module logging
RUST_LOG=following_no_reposts_feed::jetstream_consumer=debug cargo run
```

### Code Quality

```bash
# Format code
cargo fmt

# Run linter
cargo clippy --all-targets

# Check for security vulnerabilities
cargo audit
```

## Troubleshooting

### Jetstream Connection Issues

**Problem**: Cannot connect to Jetstream

**Solutions**:
- Verify network connectivity: `ping jetstream1.us-east.bsky.network`
- Check firewall rules (port 443 outbound)
- Try alternative Jetstream servers
- Monitor logs for specific error messages

### Database Locked Errors

**Problem**: `database is locked` errors

**Solutions**:
- Ensure only one instance is running
- Check for long-running transactions
- Increase `busy_timeout` in database configuration
- Consider using WAL mode: `PRAGMA journal_mode=WAL;`

### Authentication Failures

**Problem**: JWT validation errors

**Solutions**:
- Verify `FEEDGEN_SERVICE_DID` matches your domain
- Check network access to `plc.directory` for DID resolution
- Enable debug logging: `RUST_LOG=following_no_reposts_feed::auth=debug`
- Verify your domain's HTTPS certificate is valid

### Feed Not Updating

**Problem**: Feed shows stale content

**Solutions**:
- Check Jetstream connection: look for "Connected to Jetstream" in logs
- Verify database is being updated: `sqlite3 feed.db "SELECT COUNT(*) FROM posts;"`
- Check for errors in logs: `journalctl -u feed-generator -n 100`
- Restart the service: `systemctl restart feed-generator`

### High Memory Usage

**Problem**: Memory usage growing over time

**Solutions**:
- Verify automatic cleanup is working: check `posts` table size
- Manually trigger cleanup: `DELETE FROM posts WHERE indexed_at < datetime('now', '-2 days');`
- Reduce retention period in code if needed
- Monitor with: `ps aux | grep following-no-reposts-feed`

### Debug Mode

Enable comprehensive debugging:

```bash
RUST_LOG=debug,hyper=info,tokio=info cargo run
```

Test endpoints directly:

```bash
# Test DID endpoint
curl -v https://your-domain.com/.well-known/did.json

# Test feed endpoint with authentication
curl -v -H "Authorization: Bearer YOUR_JWT" \
"https://your-domain.com/xrpc/app.bsky.feed.getFeedSkeleton?feed=at://did:web:your-domain.com/app.bsky.feed.generator/following-no-reposts&limit=5"
```

## Contributing

Contributions are welcome! Please:

1. Fork the repository
2. Create a feature branch (`git checkout -b feature/amazing-feature`)
3. Make your changes
4. Add tests if applicable
5. Run `cargo fmt` and `cargo clippy`
6. Commit your changes (`git commit -m 'Add amazing feature'`)
7. Push to the branch (`git push origin feature/amazing-feature`)
8. Open a Pull Request

### Development Guidelines

- Follow Rust best practices and idioms
- Add tests for new functionality
- Update documentation for user-facing changes
- Keep commits atomic and well-described
- Ensure CI passes before submitting PR

## License

This project is licensed under the GNU General Public License v3.0 - see the [LICENSE](LICENSE) file for details.

This ensures the code remains free and open source. If you modify and distribute this software, you must:
- Disclose your source code
- License your modifications under GPLv3
- State significant changes made
- Include the original copyright notice

## Resources

- [AT Protocol Documentation](https://atproto.com)
- [Bluesky API Reference](https://docs.bsky.app)
- [Jetstream Documentation](https://github.com/bluesky-social/jetstream)
- [ATrium Rust Library](https://github.com/sugyan/atrium)
- [Feed Generator Guide](https://docs.bsky.app/docs/starter-templates/custom-feeds)

## Acknowledgments

- Built with [ATrium](https://github.com/sugyan/atrium) - Rust libraries for AT Protocol
- Uses [Jetstream](https://github.com/bluesky-social/jetstream) for efficient event streaming
- Inspired by the Bluesky community's work on custom feeds

---

**Made with ❀️ for the Bluesky community**