An open API service indexing awesome lists of open source software.

https://github.com/ndw/org-to-xml

Library to convert Emacs org-mode files to XML
https://github.com/ndw/org-to-xml

emacs org-mode xml

Last synced: 12 months ago
JSON representation

Library to convert Emacs org-mode files to XML

Awesome Lists containing this project

README

          

#+TITLE: org-to-xml
#+STARTUP: showeverything

This project contains a library to convert Emacs org-mode files to
XML: [[*om-to-xml][om-to-xml]].

The goal is a complete and accurate translation of the internal
~org-mode~ data structures to XML. This produces XML that isn’t
especially pretty, but the assumption is that downstream XML
processing tools can be used to transform it.

* What’s new?

** 04 August 2020
+ Renamed default branch to ~main~.
+ Updated dependency (the ~om.el~ package was renamed ~org-ml~)

* om-to-xml

This projects started began with [[*org-to-xml (obsolete)][org-to-xml]], but development stalled
when I realized that I didn’t really have a solid understanding of the
underlying Org data structures and what I really needed to write first
was a library that provided a consistent API that I could use in the
conversion. (This is not a criticism of the Org developers; there are
still corners of Emacs lisp that I don’t understand very well.)

Lo and behold! Nate Dwarshuis has written [[https://github.com/ndwarshuis/om.el][just such a library]]! The
~om-to-xml~ library is a rewrite of my XML conversion on top of that
library.

It does a more complete and accurate job and has a few extensibility
mechanisms that ~org-to-xml~ did not:

+ Whitespace handling
+ Attribute handling
+ Custom element processing

** How to use it

Install the library somewhere. I use [[https://github.com/raxod502/straight.el][straight.el]], but you can just
download the library and put it on your load path any way you like.

Run ~m-x om-to-xml~ in a buffer where you’re viewing an Org mode file.
An XML representation will be constructed and saved. If the Org mode
file is ~notes.org~, the XML file will be saved in ~notes.xml~.

** How it works

For the curious, here’s how it works.

Consider, [[file:tests/simple.org][an org file]]:

#+BEGIN_SRC org
#+TITLE: Some Title
#+AUTHOR: Norman Walsh
#+DATE: 2019-02-19

A paragraph with in it. This isn’t intended to be meaningful
or useful.

* First level heading
:PROPERTIES:
:CUSTOM_ID: first
:END:

** TODO This is an example TODO item.
DEADLINE: <2019-02-26 Tue +1w>
:PROPERTIES:
:CREATED: [2019-02-19 Tue 06:39]
:SRC: [[file:/projects/emacs/org-to-xml/README.md::For%20the%20curious,%20here%E2%80%99s%20how%20it%20works.]]
:END:

See [[https://orgmode.org/][org-mode]] for more information about ~org-mode~.
#+END_SRC

1. First, it’s parsed by ~om~, swaths of which I’ve elided:
#+BEGIN_SRC elisp
((keyword (:key "TITLE" :value "Some Title"))
(keyword (:key "AUTHOR" :value "Norman Walsh"))
(keyword (:key "DATE" :value "2019-02-19"))
(paragraph
(:parent (section (:post-affiliated 64) #1))
#("A paragraph with in it. This isn’t intended to be meaningful
or useful.
" 0 81 (:parent #1)))
(headline
(:raw-value "First level heading" :level 1 :CUSTOM_ID "first"
:title (#("First level heading" 0 19 (:parent #1))))
(section
(:parent #1)
(property-drawer
(:parent #2)
(node-property (:key "CUSTOM_ID" :value "first" :parent #3))))
(headline
(:raw-value "This is an example TODO item." :level 2
:todo-keyword #("TODO" 0 4 (fontified t face org-todo))
:todo-type todo
:deadline (timestamp
(:type active :raw-value "<2019-02-26 Tue +1w>"
:year-start 2019 :month-start 2 :day-start 26
:year-end 2019 :month-end 2 :day-end 26
:repeater-type cumulate :repeater-value 1
:repeater-unit week))
:SRC "[[file:/projects/emacs/org-to-xml/README.md::For%20the%20curious,%20here%E2%80%99s%20how%20it%20works.]]"
:CREATED "[2019-02-19 Tue 06:39]"
:title (#("This is an example TODO item." 0 29 (:parent #2)))
:parent #1)
(section
(:parent #2)
(planning
(:closed nil
:deadline (timestamp
(:type active :raw-value "<2019-02-26 Tue +1w>"
:year-start 2019 :month-start 2 :day-start 26
:year-end 2019 :month-end 2 :day-end 26
:repeater-type cumulate :repeater-value 1
:repeater-unit week)) :parent #3))
(property-drawer
(:parent #3)
(node-property (:key "CREATED" :value "[2019-02-19 Tue 06:39]" :parent #4))
(node-property (:key "SRC" :value "[[file:/projects/emacs/org-to-xml/README.md::For%20the%20curious,%20here%E2%80%99s%20how%20it%20works.]]" :parent #4)))
(paragraph
(:parent #3)
#("See " 0 4 (:parent #4))
(link (:type "https" :path "//orgmode.org/" :format bracket
:raw-link "https://orgmode.org/" :parent #4)
#("org-mode" 0 8 (:parent #5)))
#("for more information about " 0 27 (:parent #4))
(code (:value "org-mode" :parent #4)) #(".
" 0 2 (:parent #4)))))))
#+END_SRC
2. A buffer is created to store the XML, and the om data is traversed
emiting XML elements for each node. The node properties become
attributes, except for ignored properties and properties that come
from a [[https://orgmode.org/manual/Properties-and-Columns.html#Properties-and-Columns][properties drawer]] which are always ignored.
3. If a post-processing function has been provided, it is run.
4. Save the file.
#+BEGIN_SRC xml

A paragraph with <markup> in it. This isn’t intended to be meaningful
or useful.

First level heading

This is an example TODO item.

See org-mode
for more information about .

#+END_SRC

It’s been twenty years since I tried to do anything much more interesting than
a keybinding in [[https://en.wikipedia.org/wiki/Emacs_Lisp][elisp]]. I expect the code, especially the tree walking, is embarrassingly
crude. Suggestions for improvement, or simply pointers to the bits of the
[[https://www.gnu.org/software/emacs/manual/elisp.html][elisp manual]] I should read again, most humbly solicited.

I also confess, I’m completely winging it on current function
naming/namspacing conventions.

* Pros and Cons

There are two obvious ways to approach the problem of converting .org files to .xml.

1. Use the [[https://orgmode.org/worg/exporters/ox-overview.html][ox framework]].
2. Do it the hard way.

My goal in this project is to have a complete dump of the org
structures in XML. That rules out the =ox= framework. The =ox=
framework is definitely the place to start if you want to convert from
an unknown org file and extract the information that you know about.
But it flattens structures like the property drawer so that it’s
impossible to extract /everything/ with fidelity, even the things you
/don’t/ know about.

So this code attempts to do it the hard way. But I’m also lying when I
say I want a /complete/ dump of the org structures. I want a dump of
the /meaningful/ structures. One person’s meaning is another person’s
pointless cruft, however.

Examples of structures I don’t consider meaningful:

+ The =pre-blank= and =post-blank= properties that the org data
structures use to encode spaces in some circumstances.
+ Leading blanks in code blocks.
+ Leading spaces in paragraphs.

It’s likely that this list will grow as I learn more about the Org
data strutures. Unless I give up on this project altogether, of
course.

* org-to-xml (obsolete)

You can still find ~org-to-xml.el~ in this repository’s history (at tag
[[https://github.com/ndw/org-to-xml/tree/0.0.5][0.0.5]], for example). I’ve removed it from the master branch because I
really think it’s a dead end and it caused confusion for at least some
users.