https://github.com/rpatrik96/bibtexupdater
Tools for managing BibTeX bibliographies: automatically update preprints to published versions and filter to only cited references.
https://github.com/rpatrik96/bibtexupdater
bibtex bibtex-references bibtexparser latex
Last synced: 5 months ago
JSON representation
Tools for managing BibTeX bibliographies: automatically update preprints to published versions and filter to only cited references.
- Host: GitHub
- URL: https://github.com/rpatrik96/bibtexupdater
- Owner: rpatrik96
- License: mit
- Created: 2025-11-30T14:13:23.000Z (7 months ago)
- Default Branch: main
- Last Pushed: 2026-01-25T20:54:33.000Z (5 months ago)
- Last Synced: 2026-01-26T11:57:58.049Z (5 months ago)
- Topics: bibtex, bibtex-references, bibtexparser, latex
- Language: Python
- Homepage:
- Size: 226 KB
- Stars: 46
- Watchers: 0
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
Awesome Lists containing this project
README
# BibTeX Updater
Tools for managing BibTeX bibliographies: automatically update preprints to published versions, validate references against external databases, and filter to only cited references.
## Installation
### From PyPI (Recommended)
```bash
pip install bibtex-updater
# With Google Scholar support
pip install bibtex-updater[scholarly]
# With Zotero support
pip install bibtex-updater[zotero]
# All optional dependencies
pip install bibtex-updater[all]
```
### From Source
```bash
git clone https://github.com/rpatrik96/bibtexupdater.git
cd bibtexupdater
pip install -e ".[dev]"
```
### Using uv (No Installation)
Run directly without managing virtual environments using [uv](https://docs.astral.sh/uv/):
```bash
# Run any command directly
uv run --with "bibtex-updater[all]" bibtex-update references.bib -o updated.bib
# Or use the provided wrapper script
./scripts/bibtex-x update references.bib -o updated.bib
./scripts/bibtex-x check references.bib
./scripts/bibtex-x filter paper.tex -b references.bib -o filtered.bib
```
## CLI Commands
| Command | Description |
|---------|-------------|
| `bibtex-update` | Replace preprints with published versions |
| `bibtex-check` | Validate references exist with correct metadata |
| `bibtex-filter` | Filter to only cited entries |
| `bibtex-zotero` | Update preprints in Zotero library |
## Quick Start
### Update Preprints
```bash
# Update preprints to published versions
bibtex-update references.bib -o updated.bib
# Preview changes (dry run)
bibtex-update references.bib --dry-run --verbose
```
### Validate References (Fact-Check)
```bash
# Check if references exist and have correct metadata
bibtex-check references.bib --report report.json
# Strict mode: exit with error if hallucinated/not-found entries
bibtex-check references.bib --strict
```
### Filter Bibliography
```bash
# Filter to only cited entries
bibtex-filter paper.tex -b references.bib -o filtered.bib
# Multiple tex files
bibtex-filter *.tex -b references.bib -o filtered.bib
```
### Update Zotero Library
```bash
# Set credentials (get from zotero.org/settings/keys)
export ZOTERO_LIBRARY_ID="your_user_id"
export ZOTERO_API_KEY="your_api_key"
# Preview changes
bibtex-zotero --dry-run
# Apply updates
bibtex-zotero
```
### Sync BibTeX Updates to Zotero
When updating a `.bib` file, you can simultaneously update matching entries in your Zotero library:
```bash
# Set Zotero credentials
export ZOTERO_LIBRARY_ID="your_user_id"
export ZOTERO_API_KEY="your_api_key"
# Update bib file AND sync to Zotero
bibtex-update references.bib -o updated.bib --zotero
# Preview Zotero changes only (bib changes still apply)
bibtex-update references.bib -o updated.bib --zotero --zotero-dry-run
# Limit to a specific Zotero collection
bibtex-update references.bib -o updated.bib --zotero --zotero-collection ABCD1234
```
The sync matches bib entries to Zotero items by:
1. **arXiv ID** - Most reliable for preprints
2. **DOI** - For preprints with DOIs (e.g., bioRxiv)
3. **Title + Author** - Fuzzy matching as fallback
## Standalone Scripts
For environments without pip (e.g., Overleaf), `filter_bibliography.py` can be used directly as it has no dependencies:
```bash
# Copy the script and run directly
python filter_bibliography.py paper.tex -b references.bib -o filtered.bib
```
## Documentation
| Document | Description |
|----------|-------------|
| [docs/BIBTEX_UPDATER.md](docs/BIBTEX_UPDATER.md) | Full BibTeX updater documentation |
| [docs/REFERENCE_FACT_CHECKER.md](docs/REFERENCE_FACT_CHECKER.md) | Full reference fact-checker documentation |
| [docs/ZOTERO_UPDATER.md](docs/ZOTERO_UPDATER.md) | Full Zotero updater documentation |
| [docs/FILTER_BIBLIOGRAPHY.md](docs/FILTER_BIBLIOGRAPHY.md) | Full filter documentation |
| [examples/](examples/) | Example workflows and configuration files |
## Overleaf Integration
Both tools integrate with Overleaf via GitHub Actions or latexmkrc.
### GitHub Actions (Recommended)
1. Enable GitHub sync in Overleaf (Menu -> Sync -> GitHub)
2. Copy a workflow from [examples/workflows/](examples/workflows/) to `.github/workflows/`
3. Changes synced from Overleaf automatically trigger updates
### latexmkrc (Direct Overleaf)
For `filter_bibliography.py` only (no dependencies required):
1. Upload `filter_bibliography.py` to your Overleaf project
2. Create `.latexmkrc` based on [examples/latexmkrc](examples/latexmkrc)
3. Recompile - filtered bibliography appears in your file list
## Features
### BibTeX Updater (`bibtex-update`)
- **Multi-source resolution**: arXiv, Crossref, DBLP, Semantic Scholar, Google Scholar
- **High accuracy**: Title and author fuzzy matching with confidence thresholds
- **Batch processing**: Multiple files with concurrent workers (default: 8)
- **Deduplication**: Merge duplicates by DOI or normalized title+authors
- **Smart caching**: On-disk cache + semantic resolution cache with TTL
- **Per-service rate limiting**: Optimized rate limits per API (Crossref, S2, DBLP, arXiv)
- **Batch API support**: Faster bulk lookups via arXiv/S2/Crossref batch endpoints
### Zotero Updater (`bibtex-zotero`)
- **Direct Zotero integration**: Fetches and updates items via Zotero API
- **Same resolution pipeline**: Uses the same multi-source resolution
- **Preserves metadata**: Keeps notes, tags, and attachments intact
- **Idempotent**: Already-published papers are automatically skipped
- **Dry-run mode**: Preview changes before applying
### Reference Fact-Checker (`bibtex-check`)
- **Multi-source validation**: Crossref, DBLP, Semantic Scholar
- **Detailed mismatch detection**: Title, author, year, venue comparisons
- **Hallucination detection**: Identifies likely fabricated references
- **Structured reports**: JSON and JSONL output formats
- **CI/CD integration**: Strict mode with exit codes for automation
### Filter Bibliography (`bibtex-filter`)
- **Zero dependencies**: Uses only Python standard library
- **Works on Overleaf**: No pip install needed
- **Multiple bib files**: Merge and filter from multiple sources
- **Citation detection**: Supports natbib, biblatex, and standard LaTeX citations
## Python API
```python
from bibtex_updater import Detector, Resolver, Updater, HttpClient, RateLimiter, DiskCache
# Create HTTP client with rate limiting and caching
rate_limiter = RateLimiter(req_per_min=30)
cache = DiskCache(".cache.json")
http_client = HttpClient(
timeout=30.0,
user_agent="bibtex-updater/0.1.0",
rate_limiter=rate_limiter,
cache=cache
)
# Detect preprints
detector = Detector()
detection = detector.detect(entry)
if detection.is_preprint:
# Resolve to published version
resolver = Resolver(http_client)
candidate = resolver.resolve(detection)
if candidate and candidate.confidence >= 0.9:
# Update the entry
updater = Updater()
updated_entry = updater.update_entry(entry, candidate.record, detection)
```
## Development
```bash
# Clone and install in development mode
git clone https://github.com/rpatrik96/bibtexupdater.git
cd bibtexupdater
pip install -e ".[dev,all]"
# Run tests
pytest tests/ -v
# Run tests with coverage
pytest tests/ -v --cov=bibtex_updater --cov-report=term-missing
# Code quality
pre-commit run --all-files
# Build package
python -m build
# Check package
twine check dist/*
```
## License
MIT License - see [LICENSE](LICENSE) for details.