https://github.com/edsu/dewey-crawler
simplistic crawler and serializer for linked data at dewey.info
https://github.com/edsu/dewey-crawler
Last synced: 11 months ago
JSON representation
simplistic crawler and serializer for linked data at dewey.info
- Host: GitHub
- URL: https://github.com/edsu/dewey-crawler
- Owner: edsu
- Created: 2010-06-18T00:17:05.000Z (almost 16 years ago)
- Default Branch: master
- Last Pushed: 2010-06-18T13:49:26.000Z (almost 16 years ago)
- Last Synced: 2025-05-07T12:12:25.035Z (about 1 year ago)
- Language: Python
- Homepage:
- Size: 1.46 MB
- Stars: 13
- Watchers: 3
- Forks: 3
- Open Issues: 0
-
Metadata Files:
- Readme: README
Awesome Lists containing this project
README
dewey-crawler is a simplistic, single threaded, possibly daft, crawler for
Dewey Decimal Classification Summaries at http://dewey.info from the folks
at OCLC. The idea is to be able to pull down the summaries to make it
easier to reference the resources in your linked data application.
More information about the data at http://dewey.info can be found at:
http://www.worldcat.org/devnet/wiki/DeweyInfoTechOverview
After a crawl you'll have a rdflib berkelydb triple store on disk. You can
then run dump.py to generate dewey.rdf, dewey.ttl and dewey.json.
Usage:
./crawl.py
./dump.py
Dependencies:
rdflib 3.0
License:
This code is in the Public Domain.
http://creativecommons.org/licenses/publicdomain/
The data is governed by OCLC's use of the Attribution-Noncommercial-No
Derivative Works 3.0 Unported:
http://creativecommons.org/licenses/by-nc-nd/3.0/
Improvements welcome at:
http://github.com/edsu/dewey-crawler
Comments, Questions, Complaints:
Ed Summers