{"id":28288333,"url":"https://github.com/johnlcaron/netchdf","last_synced_at":"2025-07-11T07:04:19.514Z","repository":{"id":70829220,"uuid":"595243879","full_name":"JohnLCaron/netchdf","owner":"JohnLCaron","description":"Pure JVM read access to netcdf3, netcdf4, hdf4, hdf5, hdf-eos2 and hdf-eos5 data files. Written in Kotlin.","archived":false,"fork":false,"pushed_at":"2025-07-07T11:46:38.000Z","size":4036,"stargazers_count":4,"open_issues_count":12,"forks_count":3,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-07-07T12:44:00.960Z","etag":null,"topics":["hdf-eos2","hdf-eos5","hdf4","hdf5","netcdf3","netcdf4"],"latest_commit_sha":null,"homepage":"","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/JohnLCaron.png","metadata":{"files":{"readme":"Readme.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2023-01-30T17:33:44.000Z","updated_at":"2025-07-07T11:46:42.000Z","dependencies_parsed_at":"2025-05-13T18:27:07.869Z","dependency_job_id":"229a2233-9bb3-4d95-abb8-b5e9051267f4","html_url":"https://github.com/JohnLCaron/netchdf","commit_stats":null,"previous_names":["johnlcaron/netchdf-kotlin","johnlcaron/netchdf"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/JohnLCaron/netchdf","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JohnLCaron%2Fnetchdf","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JohnLCaron%2Fnetchdf/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JohnLCaron%2Fnetchdf/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JohnLCaron%2Fnetchdf/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/JohnLCaron","download_url":"https://codeload.github.com/JohnLCaron/netchdf/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JohnLCaron%2Fnetchdf/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":264752567,"owners_count":23658655,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["hdf-eos2","hdf-eos5","hdf4","hdf5","netcdf3","netcdf4"],"created_at":"2025-05-21T23:15:08.391Z","updated_at":"2025-07-11T07:04:19.505Z","avatar_url":"https://github.com/JohnLCaron.png","language":"Java","funding_links":[],"categories":[],"sub_categories":[],"readme":"# netchdf\n_last updated: 7/7/2025_\n\nThis is a rewrite in Kotlin of parts of the devcdm and netcdf-java libraries. \n\nThe intention is to create a maintainable, read-only, pure JVM library allowing full access to \nnetcdf3, netcdf4, hdf4, hdf5, hdf-eos2, and hdf-eos5 data files. \n\nEvaluating if support for superblock 4 is feasible.\n\nPlease contact me if you'd like to help out. Especially needed are test datasets from all the important data archives!!\n\n### Building\n\n* Download Java 21 JDK and set JAVA_HOME.\n* Download git and add to PATH.\n\n````\ncd \u003cyour_build_dir\u003e\ngit clone https://github.com/JohnLCaron/netchdf.git\ncd netchdf\n./gradlew clean assemble\n````\n\nAlso see:\n  * [Building and Running native library](docs/Building.md)\n  * [Building and Running ncdump](cli/Readme.md)\n\n### Why this library? \n\nThere is so much important scientific data stored in the NetCDF and HDF file formats, that those formats will \nnever go away. It is important that there be maintainable, independent libraries to read these files forever.\n\nThe Netcdf-Java library prototyped a \"Common Data Model\" (CDM) to provide a single API to access various file formats. \nThe netcdf* and hdf* file formats are similar enough to make a common API a practical and useful goal. \nBy focusing on read-only access to just these formats, the API and the code are kept simple.\n\nIn short, a library that focuses on simplicity and clarity is a safeguard for the irreplaceable investment in these\nscientific datasets.\n\n### Why do we need another library besides the standard reference libraries?\n\nIt's necessary to have independent implementations of any standard. If you don't have multiple implementations, it's\neasy for the single implementer to mistake the implementation for the actual standard. It's easy to hide problems \nthat are actually in the standard by adding work-arounds in the code, instead of documenting problems and creating new\nversions of the standard with clear fixes. For Netcdf/Hdf, the standard is the file formats, along with their semantic \ndescriptions. The API is language and library specific, and is secondary to the standard.\n\nHaving multiple implementations is a huge win for the reference library, in that bugs are more quickly found, and \nambiguities more quickly identified. \n\n### What's wrong with the standard reference libraries?\n\nThe reference libraries are well maintained but complex. They are coded in C, which is a difficult language to master\nand keep bug free, with implications for memory safety and security. The libraries require various machine and OS dependent\ntoolchains. Shifts in funding could wipe out much of the institutional knowledge needed to maintain them.\n\nThe HDF file formats are overly complicated, which impacts code complexity and clarity. The data structures do not\nalways map to a user understandable data model. Semantics are left to data-writers to document (or not). \nWhile this problem isn't specific to HDF file users, it is exacerbated by a \"group of messages\" design approach. \n\nThe HDF4 C library is a curious hodgepodge of disjointed APIs. The HDF5 API is better and the Netcdf4 API much better.\nBut all suffer from the limitations of the C language, the difficulty of writing good documentation for all skill levels \nof programmers, and the need to support legacy APIs. \n\nHDF-EOS uses an undocumented \"Object Descriptor Language (ODL)\" text format, which adds a dependency on the SDP Toolkit \nand possibly other libraries. These toolkits also provide functionality such as handling projections and coordinate system \nconversions, and arguably it's impossible to process HDF-EOS without them. So the value added here by an independent \nlibrary for data access is less clear. For now, we will provide a \"best-effort\" to expose the internal \ncontents of the file.\n\nCurrently, the Netcdf-4 and HDF5 libraries are not thread safe, not even for read-only applications. \nThis is a serious limitation for high performance, scalable applications, and it is disappointing that it hasnt been fixed.\nSee [Toward Multi-Threaded Concurrency in HDF5](https://www.hdfgroup.org/wp-content/uploads/2022/05/Toward-MT-HDF5.pdf),\nand [RFC:Multi-Thread HDF5](https://support.hdfgroup.org/releases/hdf5/documentation/rfc/RFC_multi_thread.pdf) for more information.\n\n\n### Why Kotlin?\n\nKotlin is a modern, statically typed, garbage-collected language suitable for large development projects. \nIt has many new features for safer (like null-safety) and more concise (like functional idioms) code, and is an important \nimprovement over Java, without giving up any of Java's strengths. Kotlin will attract the next generation of serious \nopen-source developers, and hopefully some of them will be willing to keep this library working into the unforeseeable future.\n\n\n### What about performance?\n\nWe are aiming to be within 2x of the C libraries for reading data. Preliminary tests indicate that's a reasonable goal. \nFor HDF5 files using deflate filters, the deflate library dominates the read time, and standard Java deflate libraries \nare about 2X slower than native code. Unless the deflate libraries get better, there's not much gain in trying to make\nother parts of the code faster.\n\nWe will investigate using Kotlin coroutines to speed up performance bottlenecks.\n\n\n### What version of the JVM, Kotlin, and Gradle?\n\nWe will always use the latest LTS (long term support) Java version, and will not be explicitly supporting older versions.\nCurrently that is Java 21.\n\nWe also use the latest stable version of Kotlin that is compatible with the Java version. Currently that is Kotlin 2.1.\n\nGradle is our build system. We will use the latest stable version of Gradle compatible with our Java and Kotlin versions.\nCurrently that is Gradle 8.14.\n\nFor now, you must download and build the library yourself. Eventually we will publish it to Maven Central. \nThe IntelliJ IDE is highly recommended for all JVM development.\n\n\n### Goals and scope\n\nOur goal is to give read access to all the content in NetCDF, HDF5, HDF4, and HDF-EOS files.\n\nThe library will be thread-safe for reading multiple files concurrently.\n\nWe are focussing on earth science data, and don't plan to support other uses except as a byproduct.\n\nThe core module will remain pure Kotlin with very minimal dependencies and no write capabilities. In particular, \nthere will be no dependency on the reference C libraries (except for testing). \n\nThere will be no dependencies on native libraries in the core module, but other modules or\nprojects that use the core are free to use dependencies as needed. We will likely add runtime discovery to facilitate this, \nfor example, to use HDF5 filters that link to native libraries.\n\n\n### Non-goals\n\nIts not a goal to duplicate netcdf-java functionality.\n\nIts not a goal to duplicate Netcdf-C library functionality.\n\nIts not a goal to provide remote access to files.\n\n\n\n### Testing\n\nWe use the Java [Foreign Function \u0026 Memory API](https://docs.oracle.com/en/java/javase/21/core/foreign-function-and-memory-api.html)\nfor testing against the Netcdf, HDF5, and HDF4 C libraries. \nWith these tools we can be confident that our library gives the same results as the reference libraries.\n\nCurrently using \n* HDF5 library version: 1.14.6.\n* netcdf-c library version 4.10.0-development of May 23 2025\n\nCurrently we have this test coverage from core/test:\n\n````\n cdm      88% (1560/1764) LOC\n hdf4     84% (1743/2071) LOC\n hdf5     81% (2278/2792) LOC\n netcdf3  77% (230/297) LOC\n ````\n\nThe core library has ~6500 LOC.\n\nMore and deeper test coverage is provided in the clibs module, which compares netchdf metadata and data against\nthe Netcdf, HDF5, and HDF4 C libraries. The clibs module is not part of the released netchdf library and is \nonly supported for test purposes.\n\nCurrently we have 1470 test files in the core test suite:\n\n````\nhdf-eos2  = 267 files\nhdf-eos5  = 18 files\nhdf4      = 205 files\nhdf5      = 113 files\nnetcdf3   = 664 files\nnetcdf3.2 = 81 files\nnetcdf3.5 = 1 files\nnetcdf4   = 121 files\n\ntotal # files = 1470\n````\nWe need to get representative samples of recent files for improved testing and code coverage.\n\n\n### Data Model notes\n\nAlso see [Netchdf core UML](https://docs.google.com/drawings/d/1lkouJBUG5uy8aUtbKfAZN9D5h_v22JNWf6QUQWjPNBc)\n\n#### Type Safety and Generics\n\nDatatype\\\u003cT\\\u003e, Attribute\\\u003cT\\\u003e, Variable\\\u003cT\\\u003e, StructureMember\\\u003cT\\\u003e, Array\\\u003cT\\\u003e and ArraySection\\\u003cT\\\u003e are all generics, \nwith T indicating the data type returned when read, eg:\n\n````\n    fun \u003cT\u003e readArrayData(v2: Variable\u003cT\u003e, section: SectionPartial? = null) : ArrayTyped\u003cT\u003e\n````\n\nFor example, a Variable of datatype Float will return an ArrayFloat, which is ArrayTyped\\\u003cFloat\\\u003e.\n\n#### Cdl Names\n\n* spaces are replaced with underscores\n\n#### Datatype\n* _Datatype.ENUM_ returns an array of the corresponding UBYTE/USHORT/UINT. Call _data.convertEnums()_ to turn this into\n  an ArrayString of corresponding enum names.\n* _Datatype.CHAR_: All Attributes of type CHAR are assumed to be Strings. All Variables of type CHAR return data as\n  ArrayUByte. Call _data.makeStringsFromBytes()_ to turn this into Strings with the array rank reduced by one.\n  * Netcdf-3 does not have STRING or UBYTE types. In practice, CHAR is used for either. \n  * Netcdf-4/HDF5 library encodes CHAR values as HDF5 string type with elemSize = 1, so we use that convention to detect \n    legacy CHAR variables in HDF5 files. NC_CHAR should not be used in Netcdf-4, use NC_UBYTE or NC_STRING.\n  * HDF4 does not have a STRING type, but does have signed and unsigned CHAR, and signed and unsigned BYTE. \n    We map both signed and unsigned to Datatype.CHAR and handle it as above (Attributes are Strings, Variables are UBytes).\n* _Datatype.STRING_ is always variable length, regardless of whether the data in the file is variable or fixed length.\n\n#### Typedef\nUnlike Netcdf-Java, we follow Netcdf-4 \"user defined types\" and add typedefs for Compound, Enum, Opaque, and Vlen.\n* _Datatype.ENUM_ typedef has a map from integer to name (same as Netcdf-Java)\n* _Datatype.COMPOUND_ typedef contains a description of the members of the Compound (aka Structure).\n* _Datatype.OPAQUE_ typedef may contain the byte length of OPAQUE data.\n* _Datatype.VLEN_ typedef has the base type. An array of VLEN may have different lengths for each object.\n\n#### Dimension\n* Unlike Netcdf-3 and Netcdf-4, dimensions may be \"anonymous\", in which case they have a length but not a name, and are \nlocal to the variable they are referenced by.\n* There are no UNLIMITED dimensions. These are unneeded since we do not support writing.\n\n#### Compare with HDF5 data model\n* Creation order is ignored\n* We don't include soft (aka symbolic) links in a group, as these point to an existing dataset (variable).\n* Opaque: hdf5 makes arrays of Opaque all the same size, which gives up some of its usefulness. If there's a need,\n  we will allow Opaque(*) indicating that the sizes can vary.\n* Attributes can be of type REFERENCE, with value the full path name of the referenced dataset.\n\n#### Compare with HDF4 data model\n* All data access is unified under the netchdf API.\n\n#### Compare with HDF-EOS data model\n* The _StructMetadata_ ODL is gathered and applied to the file header metadata as well as possible. \n  Contact us with example files if you see something we are missing.\n\n## Elevator blurb\n\nAn independent implementation of HDF4/HDF5/HDF-EOS in Kotlin.\n\nThis will be complementary to the important work of maintaining the primary HDF libraries.\nThe goal is to give read access to all the content in NetCDF, HDF5, HDF4 and HDF-EOS files.\n\nThe core library is pure Kotlin. \nKotlin currently runs on JVM's as far back as Java 8. However, I am targeting the latest LTS\n(long term support) Java version, and will not be explicitly supporting older versions.\n\nA separate library tests the core against the C libraries.\nThe key to this working reliably is if members of the HDF community contribute test files to make sure\nthe libraries agree. I have a large cache of test files from my work on netcdf-java, but these\nare mostly 10-20 years old.\n\nCurrently the code is in alpha, and you must build it yourself with gradle. \nWhen it hits beta, I will start releasing compiled versions to Maven Central.\n\nI welcome any feedback, questions, and concerns. Thanks!","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjohnlcaron%2Fnetchdf","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjohnlcaron%2Fnetchdf","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjohnlcaron%2Fnetchdf/lists"}