Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/haozeke/readcon

A tiny library for reading con files
https://github.com/haozeke/readcon

Last synced: 22 days ago
JSON representation

A tiny library for reading con files

Awesome Lists containing this project

README

        

* About
This is a repository meant to provide *fast* access to reading ~.con~ files from
various languages. The goal is to be able to quickly generate visuals in a
variety of interpreted languages.
** Usage
This is C++ libary built with ~meson~. The most automated way to get started is:
#+begin_src bash
micromamba create -n readcon_dev -f conda-lock.yml
micromamba activate readcon_dev
meson setup bbdir --prefix=$CONDA_PREFIX --libdir=lib --native-file nativeFiles/mold.ini --native-file nativeFiles/ccache_gnu.ini -Dwith_tests=true
meson compile -C bbdir
meson test -C bbdir
#+end_src
** Features
- [X] Fast reader for both single ~.con~ and trajectory ~.con~ files
- [X] Pure C++17 core implementation, with optional helpers
+ ~fmt~ is used optionally for some debug printing
+ ~range-v3~ can be used for more efficiency (views instead of copies)
- [X] Apache Arrow wrapper

** Rationale
One of the main drawbacks of visualization is the need to read in specific file
formats which may not map cleanly to delimited file specifications like ~csv~
files. Additionally, languages which are meant for interactive visuals, like
~Python~ or ~R~ often have sub-par file I/O capabilities for arbitrary formats.
To this end, a core ~C++~ library with an Apache Arraow compatibility layer is a
reasonable solution which allows minimal, expressive bindings to supported Arrow
languages, while also providing simple enough inter-operability with unsupported
languages like ~Fortran~.

*** Existing solutions
It is tempting to reach for or recreate an entire ecosystem of discipline
specific code, as for example, the Atomic Simulation Environment (ASE) or
~AtomsBase.jl~, both of which are excellent pedagogical aids with a rich
machninery of helpers. Often these are not designed for interoperability or
speed.

** Design Decisions
Currently this implements the ~con~ format specification as written out by eON,
so some assumptions are made about the input files, not all of which are
currently tested / guaranteed to throw (contributions are welcome for additional
sanity checks).
*** Single Frames
- The first 9 lines are the header
- The remaining lines can be inferred from the header
*** Multiple Frames
Often, as for example when running a Nudged Elastic Band, ~eON~ will write out
multiple units of ~con~ like data into a single file.
- The ~con~ like data *have no whitespace between them*!

That is we expect:
#+begin_src bash
Random Number Seed
Time
15.345600 21.702000 100.000000
90.000000 90.000000 90.000000
0 0
218 0 1
2
2 2
63.546000 1.007930
Cu
Coordinates of Component 1
0.63940000000000108 0.90450000000000019 6.97529999999999539 1 0
3.19699999999999873 0.90450000000000019 6.97529999999999539 1 1
H
Coordinates of Component 2
8.68229999999999968 9.94699999999999740 11.73299999999999343 0 2
7.94209999999999550 9.94699999999999740 11.73299999999999343 0 3
Random Number Seed
Time
15.345600 21.702000 100.000000
90.000000 90.000000 90.000000
0 0
218 0 1
2
2 2
63.546000 1.007930
Cu
Coordinates of Component 1
0.63940000000000108 0.90450000000000019 6.97529999999999539 1 0
3.19699999999999873 0.90450000000000019 6.97529999999999539 1 1
H
Coordinates of Component 2
8.85495714285713653 9.94699999999999740 11.16538571428571380 0 2
7.76944285714285154 9.94699999999999740 11.16538571428571380 0 3
#+end_src

Nothing else. No whitespace or lines between the ~con~ entries.
**** Why?
We read the entire file to memory and then map it to a set of strings. The first
9 lines are parsed to figure out how many lines are needed for the rest of the
(first) frame, and then this logic is repeated en-masse until the lines run out.

Better memory management / streaming files / more errors and sanity checks are
all welcome as pull requests.
** Development
*** Developing locally
A ~pre-commit~ job is setup on CI to enforce consistent styles, so it is best to
set it up locally as well (using [[https://pypa.github.io/pipx][pipx]] for isolation):

#+begin_src sh
# Run before commiting
pipx run pre-commit run --all-files
# Or install the git hook to enforce this
pipx run pre-commit install
#+end_src

* License
MIT.