Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/linki/thief
https://github.com/linki/thief
Last synced: about 2 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/linki/thief
- Owner: linki
- Created: 2010-05-07T16:34:18.000Z (over 14 years ago)
- Default Branch: master
- Last Pushed: 2010-08-01T20:01:02.000Z (over 14 years ago)
- Last Synced: 2024-10-12T18:53:49.724Z (3 months ago)
- Language: Ruby
- Homepage:
- Size: 8.82 MB
- Stars: 2
- Watchers: 2
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.textile
Awesome Lists containing this project
README
h1. Thief
h2. Installation
gem install bundler
bundle install
rake thief:create_tablesh2. Usage
bin/thief
bin/serverThen browse to http://localhost:4567
bin/consoleUse the interactive console
h2. Internals
Thief.sources << Thief::Source::SomeSource.new
Thief.fetchWill go through all assigned sources (must be located in /thief/sources/*.rb) and call their fetch method.
Each Source has its own table for "Person" objects (dapi_people, dapi_wikipedia) and will store all the information in there.
Thief.integrateWill go through all assigned sources (same as above) and call their integrate method.
Each Integrator will convert its own Person-schema to the integrated schema (Thief::DAPI::Person -> Thief::Person)
Thief::Person.all.each { |person| puts person.name }prints out the stored and integrated people.
h2. Wikipedia
Special download mechanism in Thief::Wikipedia::ETL loads the necessary txt file which contains the person data if the file doesn't exist.
Thief::Wikipedia:ETL.download_file!This method is called by the fetch method if the necessary file containing the person data doesn't exist. It will download the zip file, unzip it and rename and move the contained text file to the
wikipedia directory.h2. Extending
You can copy the dapi.rb and dapi-folder to create a new source with ETL, Integrator and Person objects.
The defaults will look under the specific namespace for the etl and integrator classes. (Thief::YourLibrary::ETL)
You can change that by overwriting the etl and integrator methods.
require 'somewhere/over/the/rainbow'module Thief
module Sources
class YourAwesomeSource
def etl
::OneETLToRuleThemAll.new
end
end
end
endh2. Another way to start the server
Start the app on any rack-compatible webserver with the config.ru file. (in production mode in the examples)
Thin:
thin --rackup config.ru --environment production startor Unicorn:
unicorn --env productionor WEBrick through rackup:
THIEF_ENV=production rackup --server webrick config.ruor with shotgun:
shotgun --env productionand so on..
h2. Special
Thief::ETL now has inheritable mechanisms to download and extract files as well as store them in their specific temporary directories.
Thief::Integrator now has inheritable mechanisms for defining a mapping from a source specific schema to the global schema (experimental)