Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/mrcook/pgrdf

Library (golang) for reading/writing Project Gutenberg RDF XML catalog metadata
https://github.com/mrcook/pgrdf

gutenberg-ebooks gutenberg-metadata project-gutenberg rdf xml

Last synced: 11 days ago
JSON representation

Library (golang) for reading/writing Project Gutenberg RDF XML catalog metadata

Awesome Lists containing this project

README

        

# pgrdf - a Project Gutenberg RDF library

A library written in the Go language for reading and writing Project Gutenberg
RDF documents using a simpler set of intermediary data types, which can also be
marshaled to JSON for a more compact representation of the metadata.

Helper functions are provided for reading RDF files directly from their `tar`
archive. See the usage section below for more information.

The `Ebook` struct is used as an intermediary representation of the metadata,
which provides a much easier set of data types than needing to handle RDF
directly, and can also be marshaled to JSON.

The following is a (truncated) JSON example:

```json
{
"id": 1400,
"released": "1998-07-01",
"titles": ["Great Expectations"],
"creators": [{
"id": 37,
"name": "Dickens, Charles",
"aliases": [
"Dickens, Charles John Huffam",
"Boz"
],
"born_year": 1812,
"died_year": 1870,
"webpages": ["https://en.wikipedia.org/wiki/Charles_Dickens"]
}]
}
```

And here's the corresponding RDF snippet for [Great Expectations](https://gutenberg.org/ebooks/1400.rdf):

```xml


Great Expectations


Dickens, Charles
Dickens, Charles John Huffam
Boz

1812
1870


1998-07-01

```

## Usage

A basic example might be:

```go
package main

import (
"bytes"
"encoding/json"
"fmt"
"os"

"github.com/mrcook/pgrdf"
)

func main() {
rdfFile, _ := os.Open("/path/to/pg1400.rdf")
ebook, _ := pgrdf.ReadRDF(rdfFile)

ebook.Titles = append(ebook.Titles, "In Three Volumes")

w := bytes.NewBuffer([]byte{}) // create an io.Writer
_ = ebook.WriteRDF(w) // write the RDF data

data, _ := json.Marshal(ebook) // marshal to JSON
fmt.Println(string(data))
}
```

It is possible to read an RDF directly from the official Project Gutenberg
offline catalog archive: http://www.gutenberg.org/cache/epub/feeds/.

There are currently two archives available:

rdf-files.tar.bz2
rdf-files.tar.zip

Reading from a `bz2` is considerably slower than just the plain `tar` archive,
so it is recommended to first extract the tarball from the `bz2` archive.
Example on Linux:

$ bzip2 -dk rdf-files.tar.bz2

If this is not possible/desirable then the `.tar.bz2` must first be wrapped in a
`bzip2` reader:

rdf, err := archive.FromTarArchive(bzip2.NewReader(archiveFile), id)

When an archive is fully extracted to a local directory, the `FromDirectory`
function can be used:

rdf, err := archive.FromDirectory("/rdf_files_dir", 1400)

It is important to note that the directory structure must be exactly as
extracted from the archive:

rdf_files_dir/
├─ cache/
│ └─ epub/
│ ├─ 1/
│ │ └─ pg1.rdf
│ ├─ 2/
│ ...

## LICENSE

Copyright (c) 2018-2023 Michael R. Cook. All rights reserved.

This work is licensed under the terms of the MIT license.
For a copy, see .