https://github.com/acmiyaguchi/schema-validator

A continuous integration service for schema validation
https://github.com/acmiyaguchi/schema-validator

Last synced: about 2 months ago
JSON representation

A continuous integration service for schema validation

Host: GitHub
URL: https://github.com/acmiyaguchi/schema-validator
Owner: acmiyaguchi
License: mit
Created: 2018-04-07T00:11:30.000Z (about 7 years ago)
Default Branch: master
Last Pushed: 2022-12-08T02:08:07.000Z (over 2 years ago)
Last Synced: 2023-08-03T20:58:59.917Z (almost 2 years ago)
Language: Python
Homepage:
Size: 103 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 11
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# schema-validator

A continuous integration service for schema validation.

# Usage

```bash
# build the docker components
make build

# bring up the services, this can be put into the background
make up

# in a separate shell, open the entry-point into the service
curl localhost:8000
# or through a browser
xdg-open http://localhost:8000

# bring down the service
make clean
```

## Running tests

There are currently three levels of testing in this project. The first two tests are located
within the `validator` sub-project as standard `pytest` unit and integration test. At the root
of the project, a micro-service integration test is available.

```bash
# only runs the service-level integration tests
make test

# or without make
./tests/test-service.sh

# test that docker-compose is working correctly
./tests/test-compose.sh
```

# Architecture

The bulk schema validator is separated into two separate systems. The primary subsystem is the
pyspark application that validates a set of json documents against a schema and renders a human
readable summary. A secondary REST api provides a layer suitable for CI tooling.

Flask provides the frontend to the service. The PySpark application can be run on-demand with an
included standalone Spark docker image. Celery provides the internal task inter-op between Flask
and Spark.

The service is exposed through the REST api using a Dockerflow compatible configuration.

# Roadmap

The following features are planned for the v2 release.

* REST interface into the `mozilla-pipeline-schemas` repository
* Support for the `s3://` protocol

# Resources

* [Dockerflow](https://github.com/mozilla-services/Dockerflow)
* [Flask quickstart](http://flask.pocoo.org/docs/0.12/quickstart/#a-minimal-application)
* [Docker Compose](https://docs.docker.com/compose/gettingstarted/#step-1-setup)
* [Using Celery with Flask](https://blog.miguelgrinberg.com/post/using-celery-with-flask)
* [Spark Standalone](https://spark.apache.org/docs/latest/spark-standalone.html)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/acmiyaguchi/schema-validator

Awesome Lists containing this project

README