An open API service indexing awesome lists of open source software.

https://github.com/lode-org/readcon-core

Oxidized rewrite of readCon
https://github.com/lode-org/readcon-core

chemistry parser

Last synced: about 2 months ago
JSON representation

Oxidized rewrite of readCon

Awesome Lists containing this project

README

          

# Table of Contents

1. [About](#org6c05cb6)
1. [Features](#orgdaee6f9)
2. [Install](#org6030511)
3. [Tutorial](#org396efa6)
4. [Design Decisions](#org9e2ac18)
1. [FFI Layer](#org821878d)
5. [Specification](#orgdd36a00)
1. [CON format](#org5e88f7d)
2. [convel format](#org4b3e8e7)
6. [Why use this over readCon?](#org7a9436c)
7. [Citation](#org4154b3e)
2. [License](#org45415ab)

# About

Oxidized rust re-implementation of [readCon](https://github.com/HaoZeke/readCon).

Reads and writes both `.con` (coordinate-only) and `.convel` (coordinates
plus velocities) simulation configuration files used by [eOn](https://theory.cm.utexas.edu/eon/).

## Features

- **CON and convel support:** Parses both coordinate-only and velocity-augmented files. Velocity sections are auto-detected without relying on file extensions.
- **Lazy iteration:** `ConFrameIterator` parses one frame at a time for memory-efficient trajectory processing.
- **Performance:** Uses [fast-float2](https://github.com/aldanor/fast-float-rust) (Eisel-Lemire algorithm) for the f64 parsing hot path and [memmap2](https://docs.rs/memmap2) for large trajectory files.
- **Parallel parsing:** Optional rayon-based parallel frame parsing behind the `parallel` feature gate.
- **Language bindings:** Python (PyO3), Julia (ccall), C (cbindgen FFI), and C++ (RAII header-only wrapper), following the hourglass design from [Metatensor](https://github.com/metatensor/metatensor).
- **Spec-v2 metadata helpers:** Rust, Python, Julia, C, and C++ bindings all expose typed helpers for common JSON metadata keys like `energy`, `frame_index`, `time`, `timestep`, `neb_bead`, and `neb_band`, while still allowing raw JSON metadata when needed.
- **Spec-v2 validation:** `validate=true` enforces finite numeric values, reserved metadata schema, physical header geometry, exact component labels, valid symbols, declared section presence, and matching per-atom identity columns.
- **Force and constraint fidelity:** Writers preserve velocities, forces, original atom ids, and per-axis fixed masks across Rust, Python, Julia, C, and C++.
- **RPC serving:** Optional Cap'n Proto RPC interface (`rpc` feature) for network-accessible parsing.

## Install

Language
Install command

Rust
cargo add readcon-core

Python
pip install readcon

Julia
julia --project=julia/ReadCon -e 'using Pkg; Pkg.instantiate()'

C / C++ system
cargo cinstall --release --prefix /usr/local (installs libreadcon_core.{so,a}, readcon-core.h, readcon-core.hpp, and a pkg-config file)

C / C++ via meson subproject
drop the repository under subprojects/readcon-core/ and link against the readcon_core_dep dependency

The C/C++ headers require a C99 (`readcon-core.h`) or C++17 (`readcon-core.hpp`, for `std::optional` and `std::filesystem`) compiler.

## Tutorial

A copy-pasteable walkthrough that parses a multi-frame trajectory, inspects metadata, builds a new frame, and writes it back. Run it as-is.

cargo run --example rust_usage -- resources/test/tiny_multi_cuh2.con

The example above iterates lazily over every frame, prints atom counts plus the per-frame energy if present, and exits. Equivalent flows in the other bindings:

import readcon

# Read every frame; the iterator yields PyConFrame objects
for frame in readcon.iter_frames("resources/test/tiny_multi_cuh2.con"):
print(frame.natms_per_type, frame.energy()) # energy() is None when absent

# Build and write a new frame
b = readcon.ConFrameBuilder(cell=[10.0, 10.0, 10.0], angles=[90.0, 90.0, 90.0])
b.set_energy(-42.5).add_atom("Cu", 0.0, 0.0, 0.0, 1, 63.546)
b.write("out.con")

using ReadCon
for frame in iter_frames("resources/test/tiny_multi_cuh2.con")
println(frame.natms_per_type, " ", energy(frame))
end

#include
#include

int main() {
readcon::ConFrameIterator it("resources/test/tiny_multi_cuh2.con");
for (const auto &frame : it) {
std::cout << frame.atoms().size() << " atoms";
if (auto e = frame.energy_opt()) std::cout << " E=" << *e;
std::cout << "\n";
}
}

#include
#include

int main(void) {
uintptr_t n = 0;
RKRConFrame **frames = rkr_read_all_frames("resources/test/tiny_multi_cuh2.con", &n);
for (uintptr_t i = 0; i < n; ++i) {
printf("frame %zu energy=%f\n", i, rkr_frame_energy(frames[i]));
}
free_rkr_frame_array(frames, n);
}

## Design Decisions

The library is designed with the following principles in mind:

- **Lazy Parsing:** The `ConFrameIterator` allows for lazy parsing of frames, which can be more memory-efficient when dealing with large trajectory files.

- **Interoperability:** The FFI layer makes the core parsing logic accessible from other programming languages, increasing the library's utility. Currently, a `C` header is auto-generated along with a hand-crafted `C++` interface, following the hourglass design from [Metatensor](https://github.com/metatensor/metatensor).

### FFI Layer

A key challenge in designing an FFI is deciding how data is exposed to the C-compatible world. This library uses a hybrid approach to offer both safety and convenience:

1. **Opaque Pointers (The Handle Pattern):** The primary way to interact with
frame data is through an opaque pointer, represented as `RKRConFrame*` in C.
The C/C++ client holds this "handle" but cannot inspect its contents
directly. Instead, it must call Rust functions to interact with the data
(e.g., `rkr_frame_get_header_line(frame_handle, ...)`). This is the safest
and most flexible pattern, as it completely hides Rust's internal data
structures and memory layout, preventing ABI breakage if the Rust code is
updated.

2. **Transparent `#[repr(C)]` Structs (The Data Extraction Pattern):** For
convenience and performance in cases where only the core atomic data is
needed, the library provides a function (`rkr_frame_to_c_frame`) to extract a
"lossy" but transparent `CFrame` struct from an opaque handle. The C/C++
client can directly read the fields of this struct (e.g.,
`my_c_frame->num_atoms`). The client takes ownership of this extracted struct
and is responsible for freeing its memory.

This hybrid model provides the best of both worlds: the safety and
forward-compatibility of opaque handles for general use, and the performance of
direct data access for the most common computational tasks.

## Specification

See [docs/orgmode/spec.org](docs/orgmode/spec.md) (or the [published HTML build](https://lode-org.github.io/readcon-core/spec.html)) for the full specification. A summary follows.

### CON format

- A 9-line header (comments, cell dimensions, cell angles, atom type/count/mass metadata)
- Line 2 is reserved for spec-v2 JSON metadata
- Per-type coordinate blocks (symbol, label, atom lines with x y z fixed atomID)
- Optional spec-v2 `sections` and `validate` metadata for declared per-atom sections and strict validation
- Multiple frames are concatenated directly with no separator

### convel format

Same as CON, with an additional velocity section after each frame's coordinates:

- A blank separator line
- Per-type velocity blocks (symbol, label, atom lines with vx vy vz fixed atomID)

## Why use this over [readCon](https://github.com/HaoZeke/readCon)?

Speed, correctness, and multi-language bindings.

## Citation

If you use `readcon-core` in academic work, please cite it via the metadata in [CITATION.cff](CITATION.cff). The Zenodo DOI tracks the latest release.

# License

MIT.