An open API service indexing awesome lists of open source software.

https://github.com/impresso/impresso-schemas

Repository of JSON schemas used in the Impresso project.
https://github.com/impresso/impresso-schemas

digital-humanities historical-newspapers json-schema

Last synced: 5 months ago
JSON representation

Repository of JSON schemas used in the Impresso project.

Awesome Lists containing this project

README

          

# Impresso JSON Schemas

This repository contains the JSON schemas used in the [Impresso project](https://impresso-project.ch/).

Impresso JSON Schemas are used to define, declare, validate and document the structure, constraint and data types of Impresso JSON documents that can represent data or processes (e.g. manifests).

### Schemas
We define schemas for:
#### Newspaper data
- Canonical format:
- [Issue](docs/issue.md) (draft 06)
- [Page](docs/page.md) (draft 06)
- [Audio Record](docs/audio_record.md) (draft 06)
- Rebuilt format:
- [Paper Content Item](docs/paper_contentitem.md) (draft 06)
- [Audio Record Content Item](docs/audio_record_contentitem.md) (draft 06)
#### Semantic enrichments
- Topic Model
- [Topic Assignment](docs/topic_assignment.md) (draft 06)
- [Topic Description](docs/topic_description.md) (draft 06)
- Language Identification
- [Language Identification](docs/language_identification.md) (draft 06)
- Entities
- [Entities](docs/entities.md) (2020-12)
- [OCR Quality Assessment](docs/ocr_qa.md) (OCR-QA)

#### Processes
- Data processing manifests (todo)
- Data release manifests (todo)

## File organisation in this repository

- `json/` subdirectory for JSON schemas
- `examples/` subdirectory for example/test files
- `docs/` documentation of schemas in markdown format

## Validation

To validate an instance (example file) against a JSON schema, run:

```bash
make tests
```

## Documentation

Generated by using [`jsonschema2md`](https://github.com/adobe/jsonschema2md) with the following commands:

```bash
make documentation
```

## Project

The 'impresso - Media Monitoring of the Past' project is funded by the Swiss National Science Foundation (SNSF) under grant number [CRSII5_173719](http://p3.snf.ch/project-173719) (Sinergia program). The project aims at developing tools to process and explore large-scale collections of historical newspapers, and at studying the impact of this new tooling on historical research practices. More information at https://impresso-project.ch.

## License

Copyright (C) 2020 The *impresso* team. Contributors to this program include: [Simon Clematide](https://github.com/simon-clematide), [Maud Ehrmann](https://github.com/e-maud) and [Matteo Romanello](http://github.com/mromanello/).

This program is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but without any warranty; without even the implied warranty of merchantability or fitness for a particular purpose. See the [GNU Affero General Public License](https://github.com/impresso/impresso-schemas/blob/master/LICENSE) for more details.