{"id":29201081,"url":"https://github.com/datopian/ckan-ng-harvester-core","last_synced_at":"2026-02-16T16:34:29.878Z","repository":{"id":66733949,"uuid":"227404415","full_name":"datopian/ckan-ng-harvester-core","owner":"datopian","description":null,"archived":false,"fork":false,"pushed_at":"2020-02-21T15:44:38.000Z","size":3989,"stargazers_count":3,"open_issues_count":1,"forks_count":1,"subscribers_count":13,"default_branch":"master","last_synced_at":"2025-09-01T20:50:46.312Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/datopian.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2019-12-11T15:56:50.000Z","updated_at":"2023-03-13T06:41:25.000Z","dependencies_parsed_at":null,"dependency_job_id":"4d40852b-8929-4f05-bc6c-1722ae7321e0","html_url":"https://github.com/datopian/ckan-ng-harvester-core","commit_stats":null,"previous_names":[],"tags_count":23,"template":false,"template_full_name":null,"purl":"pkg:github/datopian/ckan-ng-harvester-core","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/datopian%2Fckan-ng-harvester-core","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/datopian%2Fckan-ng-harvester-core/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/datopian%2Fckan-ng-harvester-core/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/datopian%2Fckan-ng-harvester-core/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/datopian","download_url":"https://codeload.github.com/datopian/ckan-ng-harvester-core/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/datopian%2Fckan-ng-harvester-core/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29513261,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-16T09:05:14.864Z","status":"ssl_error","status_checked_at":"2026-02-16T08:55:59.364Z","response_time":115,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-07-02T11:07:29.903Z","updated_at":"2026-02-16T16:34:29.871Z","avatar_url":"https://github.com/datopian.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"[![Build Status](https://travis-ci.org/datopian/ckan-ng-harvester-core.svg?branch=master)](https://travis-ci.org/datopian/ckan-ng-harvester-core)\n\n# Harvester Next Generation for CKAN\n\n## Install\n\n```\npip install ckan-harvesters\n```\n\n\n### Use data.json sources\n\n```python\nfrom harvesters.datajson.harvester import DataJSON\ndj = DataJSON()\ndj.url = 'https://data.iowa.gov/data.json'\ntry:\n\tdj.fetch()\nexcept Exception as e:\n\tprint(e)\n\nvalid = dj.validate(validator_schema='non-federal-v1.1')\nprint(dj.errors)\n# ['Error validating JsonSchema: \\'bureauCode\\' is a required property ...\n\n# full dict with the source\nprint(dj.as_json())\n\"\"\"\n{\n\t'@context': 'https://project-open-data.cio.gov/v1.1/schema/catalog.jsonld',\n\t'@id': 'https://data.iowa.gov/data.json',\n\t'@type': 'dcat:Catalog',\n\t'conformsTo': 'https://project-open-data.cio.gov/v1.1/schema',\n\t'describedBy': 'https://project-open-data.cio.gov/v1.1/schema/catalog.json',\n\t'dataset': [{\n\t\t'accessLevel': 'public',\n\t\t'landingPage': 'https://data.iowa.gov/d/23jk-3uwr',\n\t\t'issued': '2017-01-30',\n\t\t'@type': 'dcat:Dataset',\n\n        ... \n\"\"\"\n# just headers\nprint(dj.headers)\n\n\"\"\"\n{\n'@context': 'https://project-open-data.cio.gov/v1.1/schema/catalog.jsonld',\n'@id': 'https://data.iowa.gov/data.json',\n'@type': 'dcat:Catalog',\n'conformsTo': 'https://project-open-data.cio.gov/v1.1/schema',\n'describedBy': 'https://project-open-data.cio.gov/v1.1/schema/catalog.json',\n}\n\"\"\"\n\nfor dataset in dj.datasets:\n    print(dataset['title'])\n\nImpaired Streams 2014\n2009-2010 Iowa Public School District Boundaries\n2015 - 2016 Iowa Public School District Boundaries\nImpaired Streams 2010\nImpaired Lakes 2014\n2007-2008 Iowa Public School District Boundaries\nImpaired Streams 2012\n2011-2012 Iowa Public School District Boundaries\nActive and Completed Watershed Projects - IDALS\n2012-2013 Iowa Public School District Boundaries\n2010-2011 Iowa Public School District Boundaries\n2016-2017 Iowa Public School District Boundaries\n2014 - 2015 Iowa Public School District Boundaries\nImpaired Lakes 2008\n2008-2009 Iowa Public School District Boundaries\n2013-2014 Iowa Public School District Boundaries\nImpaired Lakes 2010\nImpaired Lakes 2012\nImpaired Streams 2008\n\n```\n\n\n### Use CSW sources\n\n```python\nfrom harvesters.csw.harvester import CSWSource\nc = CSWSource(url='http://data.nconemap.com/geoportal/csw?Request=GetCapabilities\u0026Service=CSW\u0026Version=2.0.2')\n\ncsw.fetch()\ncsw_info = csw.as_json()\nprint('CSW title: {}'.format(csw_info['identification']['title']))\n # CSW title: ArcGIS Server Geoportal Extension 10 - OGC CSW 2.0.2 ISO AP\n```\n\n## Development\n\nTo setup a develop environment, clone the repository and in a virtualenv install the dependencies\n\n```\npip install -r requirements.txt\n```\n\nThis will install the library in development mode, and other libraries for tests. \n\n## Test\n\nThen to run the test suite with pytest:\n\n```\npytest\n```\n\nWe use [pytest-vcr](https://pytest-vcr.readthedocs.io/en/latest/) based on the wonderful [VCRpy](https://vcrpy.readthedocs.io/en/latest/), to mock http requests. In this way, we don't need to hit the real internet to run our test (which is very fragile and slow), because there is a mocked version of a each response needed by tests, in vcr's *cassettes* format. \n\nIn order to update these *cassettes* just run as following: \n\n```\npytest --vcr-record=all\n```\n\nTo actually hit the internet without use mocks, disable the plugin \n\n```\npytest --disable-vcr\n```\n\nIn order to read from these *cassettes* just run as following: \n\n```\npytest --vcr-record=none\n```\n\nTests without a CKAN instance\n\n```\npython -m pytest tests\n\n================ test session starts =============\nplatform linux -- Python 3.6.8, pytest-5.2.0, py-1.8.0, pluggy-0.13.0\nrootdir: /home/hudson/dev/datopian/ckan-ng-harvester-core\nplugins: vcr-1.0.2\ncollected 17 items                                                                                                                                                          \n\ntests/test_csw_dataset_adapter.py ....      [ 23%]\ntests/test_data_json.py .......             [ 64%]\ntests/test_datajson_dataset_adapter.py .....[100%]\n\n=============== 17 passed in 17.52s ==============\n```\n\nTests with a CKAN instance.  \nYou will need to copy settings.py file to local_settings.py file and fill the required values.  \nYou can use a local or remote CKAN instance.  \n\n\n```\npython -m pytest tests_with_ckan/test_harvest.py\n```","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdatopian%2Fckan-ng-harvester-core","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdatopian%2Fckan-ng-harvester-core","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdatopian%2Fckan-ng-harvester-core/lists"}