https://github.com/keepsafe/content-validator
Content validator looks at text content and preforms different validation tasks.
https://github.com/keepsafe/content-validator
Last synced: 4 months ago
JSON representation
Content validator looks at text content and preforms different validation tasks.
- Host: GitHub
- URL: https://github.com/keepsafe/content-validator
- Owner: KeepSafe
- License: apache-2.0
- Created: 2014-10-14T16:52:44.000Z (over 11 years ago)
- Default Branch: master
- Last Pushed: 2024-12-11T15:39:49.000Z (over 1 year ago)
- Last Synced: 2025-05-09T01:49:40.151Z (about 1 year ago)
- Language: Python
- Size: 180 KB
- Stars: 1
- Watchers: 16
- Forks: 2
- Open Issues: 10
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG
- License: LICENSE
- Authors: AUTHORS
Awesome Lists containing this project
README
content-validator [](https://travis-ci.org/KeepSafe/content-validator)
=================
Content validator looks at text content and preforms different validation tasks.
## Requirements
1. Python 3.6.6+
## Installation
`make install`
## Dev Setup
`make env`
`make dev`
## Usage
Generally it's easiest to write a separate test for each validation case. The simplest example:
```
f = files('src/**/*.txt')
parser = create_parser(Filetype.txt)
reporter = ConsoleReporter()
check = urls(Filetype.txt)
result = validator.validate(checks=[check], files=f, parser=parser, reporter=reporter)
self.assertEqual([], result)
```
### Files
The `files` function takes a pattern and resolves it to file paths. You can pass any glob like pattern but in addition you can include the parameters. The parameter is used when you want to compare content. Let's say you have translations in English and German. You have two files `src/en/myfile.txt` and `src/de/myfile.txt` for English and German. The pattern might look something like this `src/{lang}/myfile.txt`. In addition you need to say which file will be the base of the comparison, in case you have more then 2 files. To do that you need to pass all parameters you are using in the pattern as named parameters for the `files` call. Finally the call should look something like this:
`files('src/{lang}/*.txt', lang='en')`
In case you are not doing any comparison checks you can use a usual glob like pattern `files('src/**/*.txt')`
### Parsers
When the file is first read it the data you want to validate needs to be extracted from it. The simplest example is a text file. Nothing is done here except reading the file content. The more complex example is when, for eg., you have embedded markdown in an xml tag. To extract the data you should create a chain of parsers. First you want to extract all tags from the xml. Second you want to parse the content of the tags from markdown to html. Here is an example how to do that:
`chain_parsers([Filetype.xml, Filetype.md], query='//strings')`
The xml parser takes additional parameter `query` used to extract the tags. You can pass it to `create_parser` in the same way:
`create_parser(Filetype.xml, query='//strings')`
Available parsers types:
* `Filetype.txt` - simply reads the file
* `Filetype.md` - converts markdown to
* `Filetype.xml` - extracts text from xml and concatenates
* `Filetype.csv` - puts every value on a separate line
### Reporters
Shows the result of the validation. There are 2 reporters available:
* `HtmlReporter` - creates an error file for every error
* `ConsoleReporter` - print the error to the console
### Checks
Checks perform validation on the content. Wether it's url or structure or anything else. If the content in not valid the check will return an error which later can be passed to a reporter.
Available checks:
* `urls(filetype, skip_images=False)` - validates if the url is accessible
* `markdown()` - validates markdown structure by comparing it with the base
## Example
A more detailed example looks like this:
```
class TestEmail(TestCase):
def test_email(self):
f = files('src/{lang}/*.xml', lang='en')
parser = create_parser(Filetype.xml, query='.//string')
reporter = HtmlReporter()
md = markdown()
result = validator.validate(checks=[md], files=f, parser=parser, reporter=reporter)
self.assertEqual({}, v.validate())
def test_urls(self):
f = files('src/{lang}/*.xml', lang='en')
parser = chain_parsers([Filetype.xml, Filetype.md], query='.//string')
reporter = ConsoleReporter()
check = urls(Filetype.html, skip_images=True)
result = Validator(checks=[check], files=f, parser=parser, reporter=reporter)
self.assertEqual({}, v.validate())
```