{"id":36498962,"url":"https://github.com/revidiumhq/biblib","last_synced_at":"2026-02-20T00:01:21.452Z","repository":{"id":274159980,"uuid":"922081470","full_name":"revidiumhq/biblib","owner":"revidiumhq","description":"Parse, manage, and deduplicate academic citations","archived":false,"fork":false,"pushed_at":"2026-02-19T22:06:02.000Z","size":330,"stargazers_count":6,"open_issues_count":1,"forks_count":1,"subscribers_count":2,"default_branch":"main","last_synced_at":"2026-02-19T23:30:19.753Z","etag":null,"topics":["academic","bibliography","citations","doi","pubmed","ris"],"latest_commit_sha":null,"homepage":"https://crates.io/crates/biblib","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/revidiumhq.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE-APACHE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-01-25T09:10:36.000Z","updated_at":"2026-02-19T22:04:41.000Z","dependencies_parsed_at":"2025-01-25T10:47:10.195Z","dependency_job_id":"30100320-b7b4-4043-81a9-42414ce80841","html_url":"https://github.com/revidiumhq/biblib","commit_stats":null,"previous_names":["aliazlanpro/biblib","aliazlandev/biblib","revidiumhq/biblib"],"tags_count":6,"template":false,"template_full_name":null,"purl":"pkg:github/revidiumhq/biblib","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/revidiumhq%2Fbiblib","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/revidiumhq%2Fbiblib/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/revidiumhq%2Fbiblib/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/revidiumhq%2Fbiblib/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/revidiumhq","download_url":"https://codeload.github.com/revidiumhq/biblib/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/revidiumhq%2Fbiblib/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29637400,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-19T22:32:43.237Z","status":"ssl_error","status_checked_at":"2026-02-19T22:32:38.330Z","response_time":117,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["academic","bibliography","citations","doi","pubmed","ris"],"created_at":"2026-01-12T02:15:14.362Z","updated_at":"2026-02-20T00:01:21.439Z","avatar_url":"https://github.com/revidiumhq.png","language":"Rust","funding_links":[],"categories":[],"sub_categories":[],"readme":"# biblib\n\n[![Crates.io](https://img.shields.io/crates/v/biblib.svg)](https://crates.io/crates/biblib)\n[![Documentation](https://docs.rs/biblib/badge.svg)](https://docs.rs/biblib)\n[![License](https://img.shields.io/crates/l/biblib.svg)](LICENSE-MIT)\n\nA Rust library for parsing and deduplicating academic citations.\n\n## Installation\n\n```toml\n[dependencies]\nbiblib = \"0.4\"\n```\n\nFor minimal builds:\n\n```toml\n[dependencies]\nbiblib = { version = \"0.4\", default-features = false, features = [\"ris\", \"regex\"] }\n```\n\n## Supported Formats\n\n| Format      | Feature  | Description                         |\n| ----------- | -------- | ----------------------------------- |\n| RIS         | `ris`    | Research Information Systems format |\n| PubMed      | `pubmed` | MEDLINE/PubMed `.nbib` files        |\n| EndNote XML | `xml`    | EndNote XML export format           |\n| CSV         | `csv`    | Configurable delimited files        |\n\nAll format features are enabled by default.\n\n## Quick Start\n\n### Parsing Citations\n\n```rust\nuse biblib::{CitationParser, RisParser};\n\nlet ris_content = r#\"TY  - JOUR\nTI  - Machine Learning in Healthcare\nAU  - Smith, John\nAU  - Doe, Jane\nPY  - 2023\nER  -\"#;\n\nlet parser = RisParser::new();\nlet citations = parser.parse(ris_content).unwrap();\n\nprintln!(\"Title: {}\", citations[0].title);\nprintln!(\"Authors: {:?}\", citations[0].authors);\n```\n\n### Auto-Detecting Format\n\n```rust\nuse biblib::detect_and_parse;\n\nlet content = \"TY  - JOUR\\nTI  - Example\\nER  -\";\nlet (citations, format) = detect_and_parse(content).unwrap();\n\nprintln!(\"Detected format: {}\", format); // \"RIS\"\n```\n\n### Deduplicating Citations\n\n```rust\nuse biblib::dedupe::{Deduplicator, DeduplicatorConfig};\n\nlet config = DeduplicatorConfig {\n    group_by_year: true,      // Group by year for performance\n    run_in_parallel: true,    // Use parallel processing\n    source_preferences: vec![\"PubMed\".to_string()], // Prefer PubMed records\n};\n\nlet deduplicator = Deduplicator::new().with_config(config);\nlet groups = deduplicator.find_duplicates(\u0026citations).unwrap();\n\nfor group in groups {\n    if !group.duplicates.is_empty() {\n        println!(\"Kept: {}\", group.unique.title);\n        println!(\"Duplicates: {}\", group.duplicates.len());\n    }\n}\n```\n\n### CSV with Custom Headers\n\n```rust\nuse biblib::csv::{CsvParser, CsvConfig};\nuse biblib::CitationParser;\n\nlet mut config = CsvConfig::new();\nconfig\n    .set_delimiter(b';')\n    .set_header_mapping(\"title\", vec![\"Article Name\".to_string()])\n    .set_header_mapping(\"authors\", vec![\"Writers\".to_string()]);\n\nlet parser = CsvParser::with_config(config);\nlet citations = parser.parse(\"Article Name;Writers\\nMy Paper;Smith J\").unwrap();\n```\n\n## Citation Fields\n\nEach parsed citation contains:\n\n| Field           | Type             | Description                                 |\n| --------------- | ---------------- | ------------------------------------------- |\n| `title`         | `String`         | Work title                                  |\n| `authors`       | `Vec\u003cAuthor\u003e`    | Authors with name, given name, affiliations |\n| `journal`       | `Option\u003cString\u003e` | Full journal name                           |\n| `journal_abbr`  | `Option\u003cString\u003e` | Journal abbreviation                        |\n| `date`          | `Option\u003cDate\u003e`   | Year, month, day                            |\n| `volume`        | `Option\u003cString\u003e` | Volume number                               |\n| `issue`         | `Option\u003cString\u003e` | Issue number                                |\n| `pages`         | `Option\u003cString\u003e` | Page range                                  |\n| `doi`           | `Option\u003cString\u003e` | Digital Object Identifier                   |\n| `pmid`          | `Option\u003cString\u003e` | PubMed ID                                   |\n| `pmc_id`        | `Option\u003cString\u003e` | PubMed Central ID                           |\n| `issn`          | `Vec\u003cString\u003e`    | ISSNs                                       |\n| `abstract_text` | `Option\u003cString\u003e` | Abstract                                    |\n| `keywords`      | `Vec\u003cString\u003e`    | Keywords                                    |\n| `urls`          | `Vec\u003cString\u003e`    | Related URLs                                |\n| `mesh_terms`    | `Vec\u003cString\u003e`    | MeSH terms (PubMed)                         |\n| `extra_fields`  | `HashMap`        | Additional format-specific fields           |\n\n## Features\n\n| Feature       | Dependencies      | Description                                       |\n| ------------- | ----------------- | ------------------------------------------------- |\n| `ris`         | -                 | RIS format parser                                 |\n| `pubmed`      | -                 | PubMed/MEDLINE parser                             |\n| `xml`         | `quick-xml`       | EndNote XML parser                                |\n| `csv`         | `csv`             | CSV parser                                        |\n| `dedupe`      | `rayon`, `strsim` | Deduplication engine                              |\n| `regex`       | `regex`           | Full regex support                                |\n| `lite`        | `regex-lite`      | Lightweight regex (smaller binary)                |\n| `diagnostics` | `ariadne`         | Pretty, coloured error output with source context |\n\nDefault: all features enabled except `lite` and `diagnostics`.\n\n\u003e **Note:** At least one of `regex` or `lite` must always be enabled — the crate will not compile without one of them. They are mutually exclusive; do not enable both.\n\n## Error Handling\n\nAll parse errors carry a 1-based line number and, where available, a byte-offset span pointing to the problematic citation record:\n\n```rust\nuse biblib::{CitationParser, RisParser, ValueError};\n\nmatch RisParser::new().parse(input) {\n    Ok(citations) =\u003e println!(\"Parsed {} citations\", citations.len()),\n    Err(e) =\u003e {\n        eprintln!(\"Parse error: {}\", e); // includes \"at line N\" when known\n        if let ValueError::MissingValue { key, .. } = \u0026e.error {\n            eprintln!(\"Missing required field: {}\", key);\n        }\n    }\n}\n```\n\n### Pretty diagnostics (optional)\n\nEnable the `diagnostics` feature for human-friendly, coloured output powered by [ariadne](https://crates.io/crates/ariadne):\n\n```toml\n[dependencies]\nbiblib = { version = \"0.4\", features = [\"diagnostics\"] }\n```\n\n```rust\nuse biblib::{RisParser, parse_with_diagnostics};\n\nlet source = std::fs::read_to_string(\"citations.ris\")?;\nmatch parse_with_diagnostics(\u0026RisParser::new(), \u0026source, \"citations.ris\") {\n    Ok(citations) =\u003e println!(\"Parsed {} citations\", citations.len()),\n    Err(diagnostic) =\u003e eprintln!(\"{}\", diagnostic),\n    // Error: Error in RIS format at line 5: Missing value for TI\n    //    ╭─[citations.ris:5:1]\n    //  5 │ TY  - JOUR\n    //    │ ──────────── Missing value for TI\n    //    ╰───\n}\n```\n\nYou can also call `error.to_diagnostic(filename, source)` directly on any `ParseError`.\n\n## Documentation\n\n- **[Parsing Guide](PARSING_GUIDE.md)** — Format-specific tag mappings, date formats, and author handling\n- **[Deduplication Guide](DEDUPLICATION_GUIDE.md)** — Matching algorithm, similarity thresholds, and configuration\n- **[API Docs](https://docs.rs/biblib)** — Complete API reference\n\n## License\n\nLicensed under either of [Apache License 2.0](LICENSE-APACHE) or [MIT](LICENSE-MIT) at your option.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frevidiumhq%2Fbiblib","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frevidiumhq%2Fbiblib","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frevidiumhq%2Fbiblib/lists"}