An open API service indexing awesome lists of open source software.

https://github.com/bug-ops/feedparser-rs

High-performance RSS/Atom/JSON Feed parser for Rust with Python and Node.js bindings
https://github.com/bug-ops/feedparser-rs

Last synced: about 2 months ago
JSON representation

High-performance RSS/Atom/JSON Feed parser for Rust with Python and Node.js bindings

Awesome Lists containing this project

README

          

# feedparser-rs

[![Crates.io](https://img.shields.io/crates/v/feedparser-rs)](https://crates.io/crates/feedparser-rs)
[![docs.rs](https://img.shields.io/docsrs/feedparser-rs)](https://docs.rs/feedparser-rs)
[![PyPI](https://img.shields.io/pypi/v/feedparser-rs)](https://pypi.org/project/feedparser-rs/)
[![npm](https://img.shields.io/npm/v/feedparser-rs)](https://www.npmjs.com/package/feedparser-rs)
[![CI](https://img.shields.io/github/actions/workflow/status/bug-ops/feedparser-rs/ci.yml?branch=main)](https://github.com/bug-ops/feedparser-rs/actions)
[![codecov](https://codecov.io/gh/bug-ops/feedparser-rs/graph/badge.svg)](https://codecov.io/gh/bug-ops/feedparser-rs)
[![MSRV](https://img.shields.io/badge/MSRV-1.88.0-blue)](https://blog.rust-lang.org/)
[![License](https://img.shields.io/badge/license-MIT%2FApache--2.0-blue)](LICENSE-MIT)

High-performance RSS/Atom/JSON Feed parser written in Rust, with Python and Node.js bindings. A drop-in replacement for Python [feedparser](https://github.com/kurtmckee/feedparser) that is 90-100x faster.

## Features

- **Multi-format support** -- RSS 0.9x, 1.0, 2.0 / Atom 0.3, 1.0 / JSON Feed 1.0, 1.1
- **Tolerant parsing** -- Handles malformed feeds gracefully with the `bozo` flag pattern, propagated at both feed and entry level
- **HTTP fetching** -- Built-in URL fetching with compression (gzip, deflate, brotli) and conditional GET (ETag/Last-Modified)
- **Podcast support** -- iTunes and Podcast 2.0 namespace extensions
- **Security** -- DoS protection via `ParserLimits`, SSRF protection, input size validation
- **Multi-language bindings** -- Native Python (PyO3) and Node.js (napi-rs) bindings
- **feedparser drop-in** -- Dict-style access, field aliases, same API patterns as Python feedparser

## Supported Formats

| Format | Versions | Status |
|--------|----------|--------|
| RSS | 0.90, 0.91, 0.92, 1.0, 2.0 | Full support |
| Atom | 0.3, 1.0 | Full support |
| JSON Feed | 1.0, 1.1 | Full support |

### Namespace Extensions

| Namespace | Description |
|-----------|-------------|
| Dublin Core | Creator, date, rights metadata |
| Content | Encoded HTML content |
| Media RSS | Media attachments and metadata |
| iTunes | Podcast metadata (author, duration, explicit) |
| Podcast 2.0 | Chapters, transcripts, funding |
| Syndication | Update schedule (period, frequency, base) |
| GeoRSS | Geographic location data (point, line, polygon, box) |
| Creative Commons | License information with `rel="license"` links |
| Slash | Comment count (`slash:comments`) |
| WFW | Comment feed URL (`wfw:commentRss`) |
| Atom Threading (thr:) | In-reply-to, reply count, reply datetime (RFC 4685) |

## Installation

### Rust

```bash
cargo add feedparser-rs
```

Or add to your `Cargo.toml`:

```toml
[dependencies]
feedparser-rs = "0.5.0"
```

> [!IMPORTANT]
> Requires Rust 1.88.0 or later (edition 2024).

### Python

```bash
pip install feedparser-rs
```

> [!NOTE]
> Requires Python 3.10 or later.

### Node.js

```bash
npm install feedparser-rs
# or
pnpm add feedparser-rs
```

> [!NOTE]
> Requires Node.js 18 or later.

## Usage

### Rust

```rust
use feedparser_rs::parse;

fn main() -> Result<(), Box> {
let xml = r#"



Example Feed
https://example.com

First Post
https://example.com/post/1



"#;

let feed = parse(xml.as_bytes())?;

println!("Version: {}", feed.version.as_str()); // "rss20"
println!("Title: {:?}", feed.feed.title);
println!("Entries: {}", feed.entries.len());

for entry in &feed.entries {
println!(" - {:?}", entry.title);
}

Ok(())
}
```

#### Fetching from URL

```rust
use feedparser_rs::fetch_and_parse;

fn main() -> Result<(), Box> {
let feed = fetch_and_parse("https://example.com/feed.xml")?;
println!("Fetched {} entries", feed.entries.len());
Ok(())
}
```

> [!TIP]
> Use `fetch_and_parse` for URL fetching with automatic compression handling (gzip, deflate, brotli).

### Python

```python
import feedparser_rs as feedparser # Drop-in replacement

# Parse from bytes, string, or URL (auto-detected)
d = feedparser.parse(b'...')
d = feedparser.parse('https://example.com/feed.xml') # URL auto-detected

# Attribute-style access
print(d.version) # 'rss20'
print(d.feed.title)
print(d.bozo) # True if parsing had issues

# Dict-style access (feedparser-compatible)
print(d['feed']['title'])
print(d['entries'][0]['link'])

# Deprecated field aliases work
print(d.feed.description) # -> d.feed.subtitle
print(d.channel.title) # -> d.feed.title
```

> [!NOTE]
> Python bindings provide full feedparser compatibility: dict-style access, field aliases, and `time.struct_time` for date fields.

### Node.js

```javascript
import { parse, fetchAndParse } from 'feedparser-rs';

// Parse from string
const feed = parse('...');
console.log(feed.version); // 'rss20'
console.log(feed.feed.title);
console.log(feed.entries.length);

// Fetch from URL
const remoteFeed = await fetchAndParse('https://example.com/feed.xml');
```

See [Node.js API documentation](crates/feedparser-rs-node/README.md) for complete reference.

## Cargo Features

| Feature | Description | Default |
|---------|-------------|---------|
| `http` | Enable URL fetching with reqwest (gzip/deflate/brotli support) | Yes |

To disable HTTP support and reduce dependencies:

```toml
[dependencies]
feedparser-rs = { version = "0.5.0", default-features = false }
```

## Workspace Structure

| Crate | Description | Package |
|-------|-------------|---------|
| [`feedparser-rs`](crates/feedparser-rs-core) | Core Rust parser | [crates.io](https://crates.io/crates/feedparser-rs) |
| [`feedparser-rs-node`](crates/feedparser-rs-node) | Node.js bindings | [npm](https://www.npmjs.com/package/feedparser-rs) |
| [`feedparser-rs-py`](crates/feedparser-rs-py) | Python bindings | [PyPI](https://pypi.org/project/feedparser-rs) |

## Development

```bash
# Install cargo-make
cargo install cargo-make

# Run all checks (format, lint, test)
cargo make ci-all

# Run tests with coverage
cargo make coverage

# Run benchmarks
cargo make bench
```

See all available tasks:

```bash
cargo make --list-all-steps
```

## Benchmarks

Measured on Apple M1 Pro, parsing real-world RSS feeds:

| Feed Size | Time | Throughput |
|-----------|------|------------|
| Small (2 KB) | **10.7 us** | 187 MB/s |
| Medium (20 KB) | **93.6 us** | 214 MB/s |
| Large (200 KB) | **939 us** | 213 MB/s |

Format detection: **128 ns** (near-instant)

### vs Python feedparser

| Operation | feedparser-rs | Python feedparser | Speedup |
|-----------|---------------|-------------------|---------|
| Parse 20 KB RSS | 0.09 ms | 8.5 ms | **94x** |
| Parse 200 KB RSS | 0.94 ms | 85 ms | **90x** |

> [!TIP]
> Run your own benchmarks with `cargo bench` or compare against Python with `cargo make bench-compare`.

## MSRV Policy

Minimum Supported Rust Version: **1.88.0** (edition 2024).

MSRV increases are considered breaking changes and will result in a minor version bump.

## License

Licensed under either of:

- [Apache License, Version 2.0](LICENSE-APACHE)
- [MIT License](LICENSE-MIT)

at your option.

## Contributing

Contributions are welcome! Please read our [Contributing Guide](CONTRIBUTING.md) before submitting a pull request.

This project follows the [Rust Code of Conduct](https://www.rust-lang.org/policies/code-of-conduct).