Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/mrcook/pgrdf
Library (golang) for reading/writing Project Gutenberg RDF XML catalog metadata
https://github.com/mrcook/pgrdf
gutenberg-ebooks gutenberg-metadata project-gutenberg rdf xml
Last synced: 11 days ago
JSON representation
Library (golang) for reading/writing Project Gutenberg RDF XML catalog metadata
- Host: GitHub
- URL: https://github.com/mrcook/pgrdf
- Owner: mrcook
- License: mit
- Created: 2021-04-17T20:48:30.000Z (almost 4 years ago)
- Default Branch: master
- Last Pushed: 2023-11-03T21:37:08.000Z (about 1 year ago)
- Last Synced: 2024-11-21T21:37:34.557Z (2 months ago)
- Topics: gutenberg-ebooks, gutenberg-metadata, project-gutenberg, rdf, xml
- Language: Go
- Homepage:
- Size: 212 KB
- Stars: 3
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
Awesome Lists containing this project
README
# pgrdf - a Project Gutenberg RDF library
A library written in the Go language for reading and writing Project Gutenberg
RDF documents using a simpler set of intermediary data types, which can also be
marshaled to JSON for a more compact representation of the metadata.Helper functions are provided for reading RDF files directly from their `tar`
archive. See the usage section below for more information.The `Ebook` struct is used as an intermediary representation of the metadata,
which provides a much easier set of data types than needing to handle RDF
directly, and can also be marshaled to JSON.The following is a (truncated) JSON example:
```json
{
"id": 1400,
"released": "1998-07-01",
"titles": ["Great Expectations"],
"creators": [{
"id": 37,
"name": "Dickens, Charles",
"aliases": [
"Dickens, Charles John Huffam",
"Boz"
],
"born_year": 1812,
"died_year": 1870,
"webpages": ["https://en.wikipedia.org/wiki/Charles_Dickens"]
}]
}
```And here's the corresponding RDF snippet for [Great Expectations](https://gutenberg.org/ebooks/1400.rdf):
```xml
Great Expectations
Dickens, Charles
Dickens, Charles John Huffam
Boz
1812
1870
1998-07-01
```
## Usage
A basic example might be:
```go
package mainimport (
"bytes"
"encoding/json"
"fmt"
"os""github.com/mrcook/pgrdf"
)func main() {
rdfFile, _ := os.Open("/path/to/pg1400.rdf")
ebook, _ := pgrdf.ReadRDF(rdfFile)ebook.Titles = append(ebook.Titles, "In Three Volumes")
w := bytes.NewBuffer([]byte{}) // create an io.Writer
_ = ebook.WriteRDF(w) // write the RDF datadata, _ := json.Marshal(ebook) // marshal to JSON
fmt.Println(string(data))
}
```It is possible to read an RDF directly from the official Project Gutenberg
offline catalog archive: http://www.gutenberg.org/cache/epub/feeds/.There are currently two archives available:
rdf-files.tar.bz2
rdf-files.tar.zipReading from a `bz2` is considerably slower than just the plain `tar` archive,
so it is recommended to first extract the tarball from the `bz2` archive.
Example on Linux:$ bzip2 -dk rdf-files.tar.bz2
If this is not possible/desirable then the `.tar.bz2` must first be wrapped in a
`bzip2` reader:rdf, err := archive.FromTarArchive(bzip2.NewReader(archiveFile), id)
When an archive is fully extracted to a local directory, the `FromDirectory`
function can be used:rdf, err := archive.FromDirectory("/rdf_files_dir", 1400)
It is important to note that the directory structure must be exactly as
extracted from the archive:rdf_files_dir/
├─ cache/
│ └─ epub/
│ ├─ 1/
│ │ └─ pg1.rdf
│ ├─ 2/
│ ...## LICENSE
Copyright (c) 2018-2023 Michael R. Cook. All rights reserved.
This work is licensed under the terms of the MIT license.
For a copy, see .