{"id":16464995,"url":"https://github.com/duckasteroid/rat-xml","last_synced_at":"2025-09-10T11:37:31.747Z","repository":{"id":11362876,"uuid":"13797307","full_name":"duckAsteroid/rat-xml","owner":"duckAsteroid","description":"Random access, tiny XML","archived":false,"fork":false,"pushed_at":"2019-12-26T20:17:37.000Z","size":931,"stargazers_count":1,"open_issues_count":7,"forks_count":0,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-04-04T17:14:38.269Z","etag":null,"topics":["cdb","java","tiny","xml","xml-data"],"latest_commit_sha":null,"homepage":null,"language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/duckAsteroid.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2013-10-23T08:23:06.000Z","updated_at":"2023-10-26T15:04:46.000Z","dependencies_parsed_at":"2022-08-29T20:21:38.005Z","dependency_job_id":null,"html_url":"https://github.com/duckAsteroid/rat-xml","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/duckAsteroid/rat-xml","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/duckAsteroid%2Frat-xml","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/duckAsteroid%2Frat-xml/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/duckAsteroid%2Frat-xml/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/duckAsteroid%2Frat-xml/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/duckAsteroid","download_url":"https://codeload.github.com/duckAsteroid/rat-xml/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/duckAsteroid%2Frat-xml/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":274455698,"owners_count":25288557,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-10T02:00:12.551Z","response_time":83,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cdb","java","tiny","xml","xml-data"],"created_at":"2024-10-11T11:31:14.812Z","updated_at":"2025-09-10T11:37:31.718Z","avatar_url":"https://github.com/duckAsteroid.png","language":"Java","readme":"# Random Access, Tiny (RAT) XML.\n\nRAT XML is a library for working with XML data to use minimum read-time memory. \nAt read time XML structures and data are pulled on demand from an indexed file format - designed for fast access. The structure supports DOM style element traversal as well as access via XPath.\n\nA file writer (and utilities) are provided for taking XML data and converting to the RAT XML (CDB) file format.\n\nThe reader API then provides a kind-of lightweight DOM-esque API for traversing elements/attributes etc.\n\n## The gory details...\nThe key to Rat XML is the underlying file format which is simply a [CBD](http://cr.yp.to/cdb.html) file. CDB files are essentially hashtables that provide rapid file access to data stored in a key (two disk hits per read using the OS memory mapped random access file). If you want to know more about the internals of the CDB file format I recommend [this page](http://www.unixuser.org/~euske/doc/cdbinternals/)\n\nIn RAT XML we store element \u0026 attribute data in the CDB table using a key that represents the \"path\" to that element or attribute. In addition we store meta data about the structure of the XML (what children an element has).\n\nFor example consider a simple XML file:\n\n```xml\n\u003c?xml version=\"1.0\" encoding=\"UTF-8\"?\u003e\n\u003ccontacts src=\"My Contacts\"\u003e\n   \u003cperson id=\"j-smith\" type=\"A\"\u003e\n      \u003cname\u003eJohn Smith\u003c/name\u003e\n      \u003cemail\u003ejs@foo.bar\u003c/email\u003e\n   \u003c/person\u003e\n   \u003cperson id=\"i-brown\"\u003e\n      \u003cname\u003eIan Brown\u003c/name\u003e\n   \u003c/person\u003e\n\u003c/contacts\u003e\n\n```\nWe would store the following element data:\n\n| Key                            | Data       |\n|--------------------------------|------------|\n| `/contacts[0]`                 |            |\n| `/contacts[0]/person[0]`       |            |\n| `/contacts[0]/person[0]/name`  | John Smith |\n| `/contacts[0]/person[0]/email` | js@foo.bar |\n| `/contacts[0]/person[1]`       |            |\n| `/contacts[0]/person[1]/name`  | Ian Brown  |\n\nWe would store the following attribute data:\n\n| Key                            | Data        |\n|--------------------------------|-------------|\n| `/contacts[0]@src`             | My Contacts |\n| `/contacts[0]/person[0]@id`    | j-smith     |\n| `/contacts[0]/person[0]@type`  | A           |\n| `/contacts[0]/person[1]@id`    | i-brown     |\n\nHowever to compress the storage for keys we generate a \"virtual\" key table:\n\n| Key                            | Data       |\n|--------------------------------|------------|\n| `/contacts[0]`                 | 1          |\n| `/contacts[0]@src`             | 7          |\n| `/contacts[0]/person[0]`       | 2          |\n| `/contacts[0]/person[0]@id`    | 8          |\n| `/contacts[0]/person[0]@type`  | 9          |\n| `/contacts[0]/person[0]/name`  | 3          |\n| `/contacts[0]/person[0]/email` | 4          |\n| `/contacts[0]/person[1]`       | 5          |\n| `/contacts[0]/person[1]@id`    | 10         |\n| `/contacts[0]/person[1]/name`  | 6          |\n\nAnd the following resultant data keys:\nWhere the key indicates this is element data for ID 3 etc.:\n\n| Key    | Data       |\n|--------|------------|\n| `E:3`  | John Smith |\n| `E:4`  | js@foo.bar |\n| `E:6`  | Ian Brown  |\n\nAnd for attributes the key indicates attribute data ID 7 etc.\n\n| Key    | Data        |\n|--------|-------------|\n| `A:7`  | My Contacts |\n| `A:8`  | j-smith     |\n| `A:9`  | A           |\n| `A:10` | i-brown     |\n\n\nFinally to make traversal quicker (rather than searching keys) we store meta data about children in special keys; \nchild elements (under the key `#E:\u003cparent_id\u003e`) and attributes (under the key `#A:\u003cparent_id\u003e`). \nIn the meta data we also store the element ID to make child traversal quicker (no need to look up the ID in CDB). \n\nContinuing the example the following meta data would be stored:\n\n| Key    | Data          |\n|--------|---------------|\n| `#E:0` | contacts[0]:1 |\n| `#E:1` | person[0]:2   |\n| `#A:1` | src:          |\n| `#A:2` | id:           |\n| `#A:2` | type:         |\n| `#E:1` | person[1]:5   |\n| `#A:5` | id:           |\n\nThe keys themselves are 9 bytes - 8 byte long and 1 byte for the \"type\" (A, E, #A, #E).\n\nThis format is fast/easy to access and does not involve large amounts of key searching when performing node traversal (parent:child, child:parent). But it does come at some cost. The size of the file. We duplicate paths a lot in the rat XML and that takes up a large (the largest) portion of the file. However, we do not load this into memory - we use a random access file and fancy pointers to load data (see the CDB file spec) to access data.\n\n## About\n\nThis project is built by CloudBees Jenkins. https://duck-asteroid.ci.cloudbees.com/job/rat-xml/\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fduckasteroid%2Frat-xml","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fduckasteroid%2Frat-xml","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fduckasteroid%2Frat-xml/lists"}