https://github.com/mrcook/pgrdf

Library (golang) for reading/writing Project Gutenberg RDF XML catalog metadata
https://github.com/mrcook/pgrdf

gutenberg-ebooks gutenberg-metadata project-gutenberg rdf xml

Last synced: 2 months ago
JSON representation

Library (golang) for reading/writing Project Gutenberg RDF XML catalog metadata

Host: GitHub
URL: https://github.com/mrcook/pgrdf
Owner: mrcook
License: mit
Created: 2021-04-17T20:48:30.000Z (about 4 years ago)
Default Branch: master
Last Pushed: 2023-11-03T21:37:08.000Z (over 1 year ago)
Last Synced: 2025-01-22T12:49:04.430Z (4 months ago)
Topics: gutenberg-ebooks, gutenberg-metadata, project-gutenberg, rdf, xml
Language: Go
Homepage:
Size: 212 KB
Stars: 3
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE

Awesome Lists containing this project

README

        # pgrdf - a Project Gutenberg RDF library

A library written in the Go language for reading and writing Project Gutenberg

RDF documents using a simpler set of intermediary data types, which can also be

marshaled to JSON for a more compact representation of the metadata.

Helper functions are provided for reading RDF files directly from their `tar`

archive. See the usage section below for more information.

The `Ebook` struct is used as an intermediary representation of the metadata,

which provides a much easier set of data types than needing to handle RDF

directly, and can also be marshaled to JSON.

The following is a (truncated) JSON example:

```json

{

  "id": 1400,

  "released": "1998-07-01",

  "titles": ["Great Expectations"],

  "creators": [{

    "id": 37,

    "name": "Dickens, Charles",

    "aliases": [

      "Dickens, Charles John Huffam",

      "Boz"

    ],

    "born_year": 1812,

    "died_year": 1870,

    "webpages": ["https://en.wikipedia.org/wiki/Charles_Dickens"]

  }]

}

```

And here's the corresponding RDF snippet for [Great Expectations](https://gutenberg.org/ebooks/1400.rdf):

```xml

    

        Great Expectations

        

            

                Dickens, Charles

                Dickens, Charles John Huffam

                Boz

                

                1812

                1870

            

        

        1998-07-01

        

    

```

## Usage

A basic example might be:

```go

package main

import (

	"bytes"

	"encoding/json"

	"fmt"

	"os"

	"github.com/mrcook/pgrdf"

)

func main() {

	rdfFile, _ := os.Open("/path/to/pg1400.rdf")

	ebook, _ := pgrdf.ReadRDF(rdfFile)

	ebook.Titles = append(ebook.Titles, "In Three Volumes")

	w := bytes.NewBuffer([]byte{}) // create an io.Writer

	_ = ebook.WriteRDF(w)          // write the RDF data

	data, _ := json.Marshal(ebook) // marshal to JSON

	fmt.Println(string(data))

}

```

It is possible to read an RDF directly from the official Project Gutenberg

offline catalog archive: http://www.gutenberg.org/cache/epub/feeds/.

There are currently two archives available:

    rdf-files.tar.bz2

    rdf-files.tar.zip

Reading from a `bz2` is considerably slower than just the plain `tar` archive,

so it is recommended to first extract the tarball from the `bz2` archive.

Example on Linux:

    $ bzip2 -dk rdf-files.tar.bz2

If this is not possible/desirable then the `.tar.bz2` must first be wrapped in a

`bzip2` reader:

    rdf, err := archive.FromTarArchive(bzip2.NewReader(archiveFile), id)

When an archive is fully extracted to a local directory, the `FromDirectory`

function can be used:

    rdf, err := archive.FromDirectory("/rdf_files_dir", 1400)

It is important to note that the directory structure must be exactly as

extracted from the archive:

    rdf_files_dir/

    ├─ cache/

    │  └─ epub/

    │     ├─ 1/

    │     │  └─ pg1.rdf

    │     ├─ 2/

    │     ...

## LICENSE

Copyright (c) 2018-2023 Michael R. Cook. All rights reserved.

This work is licensed under the terms of the MIT license.

For a copy, see .

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/mrcook/pgrdf

Awesome Lists containing this project

README