https://github.com/impresso/impresso-schemas
Repository of JSON schemas used in the Impresso project.
https://github.com/impresso/impresso-schemas
digital-humanities historical-newspapers json-schema
Last synced: 5 months ago
JSON representation
Repository of JSON schemas used in the Impresso project.
- Host: GitHub
- URL: https://github.com/impresso/impresso-schemas
- Owner: impresso
- License: agpl-3.0
- Created: 2018-11-28T12:42:04.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2026-01-15T11:16:01.000Z (5 months ago)
- Last Synced: 2026-01-15T16:18:25.828Z (5 months ago)
- Topics: digital-humanities, historical-newspapers, json-schema
- Language: Makefile
- Homepage:
- Size: 1.13 MB
- Stars: 4
- Watchers: 7
- Forks: 3
- Open Issues: 12
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Impresso JSON Schemas
This repository contains the JSON schemas used in the [Impresso project](https://impresso-project.ch/).
Impresso JSON Schemas are used to define, declare, validate and document the structure, constraint and data types of Impresso JSON documents that can represent data or processes (e.g. manifests).
### Schemas
We define schemas for:
#### Newspaper data
- Canonical format:
- [Issue](docs/issue.md) (draft 06)
- [Page](docs/page.md) (draft 06)
- [Audio Record](docs/audio_record.md) (draft 06)
- Rebuilt format:
- [Paper Content Item](docs/paper_contentitem.md) (draft 06)
- [Audio Record Content Item](docs/audio_record_contentitem.md) (draft 06)
#### Semantic enrichments
- Topic Model
- [Topic Assignment](docs/topic_assignment.md) (draft 06)
- [Topic Description](docs/topic_description.md) (draft 06)
- Language Identification
- [Language Identification](docs/language_identification.md) (draft 06)
- Entities
- [Entities](docs/entities.md) (2020-12)
- [OCR Quality Assessment](docs/ocr_qa.md) (OCR-QA)
#### Processes
- Data processing manifests (todo)
- Data release manifests (todo)
## File organisation in this repository
- `json/` subdirectory for JSON schemas
- `examples/` subdirectory for example/test files
- `docs/` documentation of schemas in markdown format
## Validation
To validate an instance (example file) against a JSON schema, run:
```bash
make tests
```
## Documentation
Generated by using [`jsonschema2md`](https://github.com/adobe/jsonschema2md) with the following commands:
```bash
make documentation
```
## Project
The 'impresso - Media Monitoring of the Past' project is funded by the Swiss National Science Foundation (SNSF) under grant number [CRSII5_173719](http://p3.snf.ch/project-173719) (Sinergia program). The project aims at developing tools to process and explore large-scale collections of historical newspapers, and at studying the impact of this new tooling on historical research practices. More information at https://impresso-project.ch.
## License
Copyright (C) 2020 The *impresso* team. Contributors to this program include: [Simon Clematide](https://github.com/simon-clematide), [Maud Ehrmann](https://github.com/e-maud) and [Matteo Romanello](http://github.com/mromanello/).
This program is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but without any warranty; without even the implied warranty of merchantability or fitness for a particular purpose. See the [GNU Affero General Public License](https://github.com/impresso/impresso-schemas/blob/master/LICENSE) for more details.