Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/substance/tahi-article

Tahi Article Spec and local reference implementation
https://github.com/substance/tahi-article

Last synced: about 2 months ago
JSON representation

Tahi Article Spec and local reference implementation

Awesome Lists containing this project

README

        

# Tahi Article Specification

The goal of this spec is to define the possible structure and markup of a Tahi-Article. This format is used as an input and output format for the Tahi editor and the document conversion service iHat.

Tahi articles live in the database, and are fragmented to different tables and fields. For that reason we defined rules for individual sections on the persistence level.

*Note: This spec is a work in progress*

## Persistence Level

**Manuscript body** (`Paper.body`)

```html

A heading



Some annotated content (see Doe, 2010 and Figure 1).



When

a≠0
,
there are two solutions to

ax2
+ bx
+ c = 0

and they are




x =




b
±

b2

4ac


2a


.




ABCDEF


123456
78910112
131415161718
192021222324
252627282930


Top level




```

Allowed tags: `

`, `

`, `

`, `

`, ``, ``, ``, `
`

The manuscript body uses common HTML tags for representing headings, paragraphs and annotations. For expressing figure references and citations we use the `` tag in combination with data attributes. We make use of some RDFa attributes for assigning property and type names to our elements. In order to include a figure in a certain place we just use a placeholder element of the type "`figinc`"

**Title** (`Paper.title`)

```html
The TAHI Article Format
```

Allowed tags: ``, ``

**Abtract** (`Paper.abstract`)

```html
Article abstract that can be annotated
```

Allowed tags: ``, ``

**Figure caption** (`Figure.caption`)

While most data fields (title, url, etc.) of a figure are stored as strings in the database, we use HTML for the figure caption.

Example:

```html
Figure caption that can be annotated
```

Allowed tags: ``, ``, ``

## Exchange Level

The previous section defined allowed HTML markup for individual properties of a Tahi document. In order to exchange a complete snapshot of a document, including all resources such as figures and bibliographic entries we need a combined representation.

Here's a possible representation in JSON, that could be used to transfer data to the editor.

```json
{
"id": "10",
"title": "The TAHI Article Format",
"abstract": "Article abstract that can be annotated",
"body": "

A heading

...",
"resources": {
"fig1": {
"type": "fig",
"caption": "annotated caption"
},
"bib1": {}
}
}
```

An interesting way could also be providing the complete data as an HTML document.

```html




🐰






The TAHI Article Format


Article abstract that can be annotated



Figure caption annotated

fig1.png

fig1_preview.png

...



{
"subtitle": [],
"issued": {
"year": 2013,
"month": 8
},
"score": 1,
"prefix": "http://id.crossref.org/prefix/10.1109",
"author": [
{
"family": "Lughofer",
"given": "Edwin"
},
{
"family": "Buchtala",
"given": "Oliver"
}
],
"container-title": "IEEE Transactions on Fuzzy Systems",
"reference-count": 0,
"page": "625-641",
"deposited": {
"date-parts": [
[
2013,
8,
14
]
],
"timestamp": 1376438400000
},
"issue": "4",
"title": "Reliable All-Pairs Evolving Fuzzy Classifiers",
"type": "journal-article",
"DOI": "10.1109/tfuzz.2012.2226892",
"ISSN": [
"1063-6706",
"1941-0034"
],
"URL": "http://dx.doi.org/10.1109/tfuzz.2012.2226892",
"source": "CrossRef",
"publisher": "Institute of Electrical & Electronics Engineers (IEEE)",
"indexed": {
"date-parts": [
[
2014,
9,
21
]
],
"timestamp": 1411301462763
},
"volume": "21",
"member": "http://id.crossref.org/member/263",
"page-first": "625"
}





A heading



Some annotated content (see Doe, 2010 and Figure 1).


Top level





```

Exposing the Tahi Article as an HTML document would allow somebody to edit the whole document by hand in a text editor. However, we have to be aware that importing a modified Tahi-Source-HTML file is a destructive operation. If we want to allow it, the user must be warned that all contents in the database will be overwritten with the contents of the HTML file.

# Use cases

## Different update strategies

We have to deal with two categories of document manipulations:

- **Individual field writes** : For instance update document title or body: HTML fragment is sent to the server and paper.title or paper.body is set.

- **Full document writes:** Complete Tahi Source HTML file is written: HTML is validated and verified. If successful individual fields are extracted from the file and stored in the corresponding db records. Old contents are removed (also figures and bibliographic entries) and replaced with the new contents.

iHat for instance would use the full document write API for the initial import of a Word document. However, we have to be aware that there can only be one source of information. E.g. when after the import the author would make fixes in the Tahi Editor, he would loose all changes when he does another import of the Word document.

## Maintaining a stand-alone version of the editor

This is very important to make editor development efficient. A stand-alone version of the editor could be implemented against a simplified backend that just deals with reading and saving the Tahi-Source HTML, instead of decomposing the information into separate database tables. This allows editor development to be isolated from the Tahi platform, which has a very complex setup that is not suitable for editor development.

There's also potential in offering the Tahi Editor as a stand-alone app, so people can use it outside of the Tahi platform.