An open API service indexing awesome lists of open source software.

https://github.com/200ok-ch/org-parser

org-parser is a parser for the Org mode markup language for Emacs.
https://github.com/200ok-ch/org-parser

Last synced: 6 months ago
JSON representation

org-parser is a parser for the Org mode markup language for Emacs.

Awesome Lists containing this project

README

          

* General

#+html:

Tests:

#+html: Clojars Project

Community chat: #organice on IRC [[https://libera.chat/][Libera.Chat]], or [[https://matrix.to/#/!DfVpGxoYxpbfAhuimY:matrix.org?via=matrix.org&via=ungleich.ch][#organice:matrix.org]] on Matrix

** What does this project do?

=org-parser= is a parser for the [[https://orgmode.org/][Org mode]] markup language for Emacs.

It can be used from JavaScript, Java, Clojure and ClojureScript!

** Why is this project useful / Rationale

Org mode in Emacs is implemented in [[http://git.savannah.gnu.org/cgit/emacs/org-mode.git/tree/lisp/org-element.el][org-element.el]] ([[https://orgmode.org/worg/dev/org-element-api.html][API
documentation]]). The [[https://orgmode.org/worg/dev/org-syntax.html][spec for the Org syntax is written in prose]].

This is already great work, yet it has some drawbacks:

1. The spec is not machine readable. Hence, there can be drift between
documentation and implementation. In fact, during the development
of [[https://github.com/200ok-ch/organice/][organice]], our web-based Org implementation with great mobile
phone support, and =org-parser= we have encountered drift.
2. org-element.el is naturally written in Emacs lisp and makes strong
use of Emacs as a text-processor. Hence, its code can only be used
within Emacs.

While writing the official spec already is an amazing effort in the
standardization of the Org format, the power of Org is so enticing
that many want to use it outside of Emacs, as well. Since
org-element.el only runs in Emacs, this caused a myriad of
implementations for other platforms (JavaScript, Rust, Go, Java, etc)
to have been created. Most implementations are only partial, and more
importantly each of them creates another island. Since they are just
as programming language dependent as org-element.el, it is impossible
to share logic between them.

=org-parser= aims at alleviating both these issues. It documents the
syntax in a standard and machine readable notation (EBNF). And the
reference implementation is done in a way that it runs on the
established virtual machines of Java and JavaScript. Hence,
=org-parser= can be used from all programming languages running on
those virtual machines. =org-parser= provides a higher-level data
structure that is easy to consume for an application working with Org
mode data. Even if your application is not running on the Java or
JavaScript virtual machines, you can embed =org-parser= as a
command-line application. Lastly, =org-parser= brings a strong test
suite to document the reference implementation in yet another
unambiguous way.

It is our aim that =org-parser= can be the foundation on which many
Org mode applications in many different languages can be built. The
applications using =org-parser= can then focus on implementing user
facing features and don’t have to worry about the implementation of
the Org syntax itself.

** Architecture

The code base of =org-parser= is split into four namespaces:

- org-parser.core (top level api, i.e. =read-str=, =write-str=)
- org-parser.parse (aka. deserializer, reader)
- org-parser.parse.transform (transforms the result of the parser into
a more desirable structure)
- org-parser.render (aka. serializer, writer)

Thus =org-parser= has become a misnomer in the sense, that it now
strives to be =clojure/data.org= (after the pattern of existing Clojure
libraries like =data.json=, =data.xml=, =data.csv=, etc) providing
reader as well as writer capabilities for the serialization format
=org=.

* Project State

This project is work-in-progress. It is not ready for production yet
because the structure of the AST (parse tree) can still change.

The biggest milestones are:

- [X] Finish [[http://xahlee.info/clojure/clojure_instaparse_grammar_syntax.html][EBNF parser]] to support most Org mode syntax
- [X] Headlines
- [X] Org mode =#+*= stuff
- [X] Timestamps
- [X] Links
- [X] Text links
- [X] Footnotes
- [X] Styled text
- [X] Drawers and =#+BEGIN_xxx= blocks
- [ ] Nested markup (see #12)
- [X] Setup basic transformation from the parse tree to a higher-level structure.
- [-] Transformations to higher-level structure: catch up with
features that are already supported by the EBNF parser.
- [-] Render parsed org file with =write-str=

*It can already be useful* for you:
E.g. if your script needs to parse parts of Org mode features, our EBNF
parser probably already supports that. Do not underestimate
e.g. timestamps. Use our well-tested parser to disassemble it in its
parts, instead of trying to write a poor and ugly regex that is only
capable of a subset of Org mode's timestamps ;)

Don't hesitate to contribute!

* Development

=org-parser= uses [[https://github.com/Engelberg/instaparse/][instaparse]] which aims to be the simplest way to
build parsers in Clojure. Apart from living up to this claim (and
beyond the scope of just the one programming language), using
instaparse is great for another reason: Instaparse works both on CLJ
and CLJS. Therefore =org-parser= can be used from both ecosystems
which, of course, include JavaScript and Java. Hence, it is possible
to [[#usage][use it]] in various situations.

** Prerequisites

Please install [[https://clojure.org/guides/getting_started][Clojure]] and [[https://leiningen.org/][Leiningen]].

There's no additional installation required. Leiningen will pull
dependencies if required.

** Testing

Running the tests:

#+BEGIN_SRC shell
# Clojure
lein test
# CLJS (starts a watcher)
lein doo node
#+END_SRC

If you're not familiar with Lisp or Clojure, here's a short video on
how the tooling for Lisp (and hence Clojure) is great and enables fast
developer feedback and high quality applications. Initially, the video
was created to answer a [[https://github.com/200ok-ch/org-parser/issues/4][specific issue]] on this repository. However, the question is a valid
general question that is asked quite often by people who haven't used
a Lisp before.

[[https://raw.githubusercontent.com/200ok-ch/org-parser/master/doc/images/quick_introduction_to_lisp_clojure_and_using_the_repl.jpg]]

You can watch it here: https://youtu.be/o2MLHFGUkoQ

* Release and Dependency Information

Note: The version number should be replaced with the current version of org-parser.
See the clojars badge at the [[https://github.com/200ok-ch/org-parser#general][top of this README]].

** [[https://clojure.org/reference/deps_and_cli][CLI/deps.edn]] dependency information:

#+BEGIN_SRC
org-parser/org-parser {:mvn/version "0.1.4"}
#+END_SRC

** [[https://github.com/technomancy/leiningen][Leiningen]] dependency information:

#+BEGIN_SRC
[org-parser "0.1.4"]
#+END_SRC

* Usage
:PROPERTIES:
:CUSTOM_ID: usage
:END:

At the moment, you can run =org-parser= from Clojure, ClojureScript,
or Java. Other targets which are hosted on the JVM or on JavaScript
are possible.

** Clojure Library

#+BEGIN_SRC clojure :exports both
(ns hello-world.core
(:require [org-parser.parser :refer [parse]]
[org-parser.core :refer [read-str write-str]]))

(prn (parse "* Headline"))
(prn (read-str "* Headline"))
(println (write-str (read-str "* Headline")))
#+END_SRC

#+RESULTS:
| [:S [:headline [:stars "*"] [:text [:text-normal "Headline"]]]] |
| {:headlines [{:headline {:level 1, :title [[:text-normal "Headline"]], :planning [], :tags []}}]} |
| "* Headline\n" |

** Clojure

Run =lein run file.org=, for example:

#+begin_src sh :results verbatim :exports both
lein run test/org_parser/fixtures/schedule_with_repeater.org
#+end_src

#+RESULTS:
: {:headlines [{:headline {:level 1, :title [[:text-sty-bold "Header"] [:text-normal " with repeater"]], :planning [[:planning-info [:planning-keyword [:planning-kw-scheduled]] [:timestamp-active [:ts-inner [:ts-inner-wo-time [:ts-date "2019-11-27"] [:ts-day "Wed"]] [:ts-modifiers [:ts-repeater [:ts-repeater-type "+"] [:ts-mod-value "1"] [:ts-mod-unit "d"]]]]]]], :tags []}}]}

** Java

First, compile =org-parser= with:

#+begin_src sh :exports code :results silent :tangle buildjar.sh
lein uberjar
#+end_src

Then run =java -jar target/uberjar/org-parser-*-SNAPSHOT-standalone.jar file.org=, for example:

#+begin_src sh :results verbatim :exports both :tangle testjar.sh
java -jar target/uberjar/org-parser-*-SNAPSHOT-standalone.jar test/org_parser/fixtures/schedule_with_repeater.org
#+end_src

#+RESULTS:
: {:headlines [{:headline {:level 1, :title [[:text-sty-bold "Header"] [:text-normal " with repeater"]], :planning [[:planning-info [:planning-keyword [:planning-kw-scheduled]] [:timestamp-active [:ts-inner [:ts-inner-wo-time [:ts-date "2019-11-27"] [:ts-day "Wed"]] [:ts-modifiers [:ts-repeater [:ts-repeater-type "+"] [:ts-mod-value "1"] [:ts-mod-unit "d"]]]]]]], :tags []}}]}

Note: The =*= character must be replaced with the current version number of org-parser.
See the clojars badge at the [[https://github.com/200ok-ch/org-parser#general][top of this README]].

* License