{"id":26294600,"url":"https://github.com/afiore/extraloop-redis-storage","last_synced_at":"2025-03-15T03:32:04.089Z","repository":{"id":2503591,"uuid":"3478493","full_name":"afiore/extraloop-redis-storage","owner":"afiore","description":"Redis+Ohm based persistence module for Extraloop","archived":false,"fork":false,"pushed_at":"2012-04-08T18:25:02.000Z","size":196,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-02-22T07:04:20.402Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Ruby","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/afiore.png","metadata":{"files":{"readme":"README.rdoc","changelog":"History.txt","contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2012-02-18T14:03:40.000Z","updated_at":"2014-06-23T14:01:05.000Z","dependencies_parsed_at":"2022-09-09T07:50:36.552Z","dependency_job_id":null,"html_url":"https://github.com/afiore/extraloop-redis-storage","commit_stats":null,"previous_names":[],"tags_count":11,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/afiore%2Fextraloop-redis-storage","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/afiore%2Fextraloop-redis-storage/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/afiore%2Fextraloop-redis-storage/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/afiore%2Fextraloop-redis-storage/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/afiore","download_url":"https://codeload.github.com/afiore/extraloop-redis-storage/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243681024,"owners_count":20330152,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-03-15T03:31:09.173Z","updated_at":"2025-03-15T03:32:04.080Z","avatar_url":"https://github.com/afiore.png","language":"Ruby","funding_links":[],"categories":[],"sub_categories":[],"readme":"= Extraloop Redis Storage\n\n== Description\n\nPersistence layer for the {ExtraLoop}[https://github.com/afiore/extraloop] data extraction toolkit.\nThis module is implemented as a wrapper around {Ohm}[http://ohm.keyvalue.org], an object-hash mapping library which \nmakes easy storing structured data into Redis. Includes a convinent command line tool that allows to \nlist, filter, and delete harvested datasets, as well as exporting them on local files or remote data stores (i.e Google Fusion tables).\n\n== Installation \n\n gem install extraloop-redis-storage\n\n== Usage\n\nExtraloop's Redis storage module decorates \u003ccode\u003eExtraLoop::ScraperBase\u003c/code\u003e and \u003ccode\u003eExtraLoop::IterativeScraper\u003c/code\u003e instances \nwith the +set_storage+ method: a helper method that allows to specify how the scraped data should be stored.\n\n require \"extraloop/redis-storage\"\n\n class AmazonReview \u003c ExtraLoop::Storage::Record\n   attribute :title\n   attribute :rank\n   attribute :date\n\n   def validate\n     assert (0..5).include?(rank.to_i), \"Rank not in range\"\n   end\n end\n\n scraper = AmazonReviewScraper.new(\"0262560992\").\n   .set_storage(AmazonReview, \"Amazon reviews of 'The Little Schemer'\")\n   .run()\n\nAt each scraper run, the ExtraLoop storage module internally instantiates a \nsession (see \u003ccode\u003eExtraLoop::Storage::ScrapingSession\u003c/code\u003e) and associates the extracted records to it.\nThe `AmazonReview` records just created, can now be accessed by calling the `#records` metod on scraper session object.\n\n reviews = scraper.session.records\n\n=== #set_storage \n\nThe +set_storage+ method accepts the following arguments:\n\n* _model_ A Ruby constant or a symbol specifying the model to be used for storing the extracted data. If a symbol is passed, it is assumed that a model does not exist and the storage module dynamically generates one by subclassing \u003ccode\u003eExtraLoop::Storage::Record\u003c/code\u003e. \n* _session_title_ A human readable title for the extracted dataset (optional).\n\n== Command line interface \n\nOnce installed, the gem will also add to your system path the +extraloop+ executable: a command line interface to the datasets harvested through ExtraLoop.\nA list of datasets can be obtained by running:\n\n extraloop datastore list\n\nThis will generate a table like the following one:\n\n  id | title                              | model           | records\n --------------------------------------------------------------------\n  48 | 1330106699 GoogleNewsStory Dataset | GoogleNewsStory | 110    \n  49 | 1330106948 AmazonReview Dataset    | AmazonReview    | 0      \n  51 | 1330107087 GoogleNewsStory Dataset | GoogleNewsStory | 110    \n  52 | 1330111630 AmazonReview Dataset    | AmazonReview    | 10     \n\n\nDatasets can be removed using the +delete+ subcommand:\n\n extraloop datastore delete [id]\n\nWhere +id+ is either a single scraping session id, or a session id range (e.g. 48..52).\n\nFrom the Redis datastore, ExtraLoop datasets can be exported to disk as CSV, JSON, or YAML documents:\n\n  extraloop datastore export 51..52 -f csv\n\nSimilarly, stored datasets can be uploaded to a remote datastore:\n\n  extraloop datastore push 51..48 fusion_tables -c google_username:password\n\nWhile Google's Fusion Tables is currently the only one implemented, support for pushing dataset to other remote datastores (e.g. {couchDB}[http://couchdb.apache.org/],  {cartoDB}[http://cartodb.com], and {CKAN Webstore}[http://wiki.ckan.org/Webstore]) will be added soon.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fafiore%2Fextraloop-redis-storage","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fafiore%2Fextraloop-redis-storage","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fafiore%2Fextraloop-redis-storage/lists"}