An open API service indexing awesome lists of open source software.

https://github.com/datopian/ckan-ng-harvester-core


https://github.com/datopian/ckan-ng-harvester-core

Last synced: 4 months ago
JSON representation

Awesome Lists containing this project

README

          

[![Build Status](https://travis-ci.org/datopian/ckan-ng-harvester-core.svg?branch=master)](https://travis-ci.org/datopian/ckan-ng-harvester-core)

# Harvester Next Generation for CKAN

## Install

```
pip install ckan-harvesters
```

### Use data.json sources

```python
from harvesters.datajson.harvester import DataJSON
dj = DataJSON()
dj.url = 'https://data.iowa.gov/data.json'
try:
dj.fetch()
except Exception as e:
print(e)

valid = dj.validate(validator_schema='non-federal-v1.1')
print(dj.errors)
# ['Error validating JsonSchema: \'bureauCode\' is a required property ...

# full dict with the source
print(dj.as_json())
"""
{
'@context': 'https://project-open-data.cio.gov/v1.1/schema/catalog.jsonld',
'@id': 'https://data.iowa.gov/data.json',
'@type': 'dcat:Catalog',
'conformsTo': 'https://project-open-data.cio.gov/v1.1/schema',
'describedBy': 'https://project-open-data.cio.gov/v1.1/schema/catalog.json',
'dataset': [{
'accessLevel': 'public',
'landingPage': 'https://data.iowa.gov/d/23jk-3uwr',
'issued': '2017-01-30',
'@type': 'dcat:Dataset',

...
"""
# just headers
print(dj.headers)

"""
{
'@context': 'https://project-open-data.cio.gov/v1.1/schema/catalog.jsonld',
'@id': 'https://data.iowa.gov/data.json',
'@type': 'dcat:Catalog',
'conformsTo': 'https://project-open-data.cio.gov/v1.1/schema',
'describedBy': 'https://project-open-data.cio.gov/v1.1/schema/catalog.json',
}
"""

for dataset in dj.datasets:
print(dataset['title'])

Impaired Streams 2014
2009-2010 Iowa Public School District Boundaries
2015 - 2016 Iowa Public School District Boundaries
Impaired Streams 2010
Impaired Lakes 2014
2007-2008 Iowa Public School District Boundaries
Impaired Streams 2012
2011-2012 Iowa Public School District Boundaries
Active and Completed Watershed Projects - IDALS
2012-2013 Iowa Public School District Boundaries
2010-2011 Iowa Public School District Boundaries
2016-2017 Iowa Public School District Boundaries
2014 - 2015 Iowa Public School District Boundaries
Impaired Lakes 2008
2008-2009 Iowa Public School District Boundaries
2013-2014 Iowa Public School District Boundaries
Impaired Lakes 2010
Impaired Lakes 2012
Impaired Streams 2008

```

### Use CSW sources

```python
from harvesters.csw.harvester import CSWSource
c = CSWSource(url='http://data.nconemap.com/geoportal/csw?Request=GetCapabilities&Service=CSW&Version=2.0.2')

csw.fetch()
csw_info = csw.as_json()
print('CSW title: {}'.format(csw_info['identification']['title']))
# CSW title: ArcGIS Server Geoportal Extension 10 - OGC CSW 2.0.2 ISO AP
```

## Development

To setup a develop environment, clone the repository and in a virtualenv install the dependencies

```
pip install -r requirements.txt
```

This will install the library in development mode, and other libraries for tests.

## Test

Then to run the test suite with pytest:

```
pytest
```

We use [pytest-vcr](https://pytest-vcr.readthedocs.io/en/latest/) based on the wonderful [VCRpy](https://vcrpy.readthedocs.io/en/latest/), to mock http requests. In this way, we don't need to hit the real internet to run our test (which is very fragile and slow), because there is a mocked version of a each response needed by tests, in vcr's *cassettes* format.

In order to update these *cassettes* just run as following:

```
pytest --vcr-record=all
```

To actually hit the internet without use mocks, disable the plugin

```
pytest --disable-vcr
```

In order to read from these *cassettes* just run as following:

```
pytest --vcr-record=none
```

Tests without a CKAN instance

```
python -m pytest tests

================ test session starts =============
platform linux -- Python 3.6.8, pytest-5.2.0, py-1.8.0, pluggy-0.13.0
rootdir: /home/hudson/dev/datopian/ckan-ng-harvester-core
plugins: vcr-1.0.2
collected 17 items

tests/test_csw_dataset_adapter.py .... [ 23%]
tests/test_data_json.py ....... [ 64%]
tests/test_datajson_dataset_adapter.py .....[100%]

=============== 17 passed in 17.52s ==============
```

Tests with a CKAN instance.
You will need to copy settings.py file to local_settings.py file and fill the required values.
You can use a local or remote CKAN instance.

```
python -m pytest tests_with_ckan/test_harvest.py
```