{"id":19900004,"url":"https://github.com/scrapy-plugins/scrapy-bigml","last_synced_at":"2025-05-02T22:32:06.762Z","repository":{"id":92875582,"uuid":"46078671","full_name":"scrapy-plugins/scrapy-bigml","owner":"scrapy-plugins","description":"Scrapy pipeline for writing items to BigML datasets","archived":false,"fork":false,"pushed_at":"2015-11-17T02:34:01.000Z","size":7,"stargazers_count":4,"open_issues_count":0,"forks_count":3,"subscribers_count":4,"default_branch":"master","last_synced_at":"2025-04-07T07:52:36.690Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/scrapy-plugins.png","metadata":{"files":{"readme":"README.rst","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2015-11-12T20:24:15.000Z","updated_at":"2020-01-23T19:29:46.000Z","dependencies_parsed_at":"2023-03-13T17:24:41.173Z","dependency_job_id":null,"html_url":"https://github.com/scrapy-plugins/scrapy-bigml","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/scrapy-plugins%2Fscrapy-bigml","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/scrapy-plugins%2Fscrapy-bigml/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/scrapy-plugins%2Fscrapy-bigml/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/scrapy-plugins%2Fscrapy-bigml/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/scrapy-plugins","download_url":"https://codeload.github.com/scrapy-plugins/scrapy-bigml/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252116449,"owners_count":21697381,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-12T20:10:47.506Z","updated_at":"2025-05-02T22:32:05.468Z","avatar_url":"https://github.com/scrapy-plugins.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"============\nscrapy-bigml\n============\n\nscrapy-bigml facilitates creating `BigML \u003chttps://bigml.com/\u003e`_ sources and\ndatasets from `Scrapy \u003chttp://scrapy.org\u003e`_ crawls. It can be used both as a\nfeed storage or as a pipeline.\n\nBigML configuration\n===================\n\nCredentials\n-----------\n\nFor both usage methods (feed storage or pipeline), you need to supply your\nBigML credentials. You can do this either by supplying them as environment\nvariables::\n\n    # in shell\n    export BIGML_USERNAME=your_username\n    export BIGML_API_KEY=your_apikey\n\nOr by supplying them as Scrapy settings::\n\n    BIGML_USERNAME = 'your_username'\n    BIGML_API_KEY = 'your_api_key'\n\nIf you use scrapy-bigml as a feed storage, you can also provide them by adding\nthem to your feed URI::\n\n    FEED_URI = 'bigml://your_username:your_api_key@your_source_name'\n\nDevelopment mode\n----------------\n\nDuring development, you probably want to enable BigML's dev mode::\n\n    BIGML_DEVMODE = True\n\nUsage as feed storage\n=====================\n\nscrapy-bigml can be used as storage backend on top of Scrapy's `feed exports\n\u003chttp://doc.scrapy.org/en/stable/topics/feed-exports.html\u003e`_. To use it, adjust\nyour Scrapy settings by setting the feed format to either ``csv`` (preferred)\nor ``json``, enabling the ``bigml`` feed storage and providing a corresponding\nfeed URI with the name you wish to use for your BigML source::\n\n    FEED_FORMAT = 'csv'\n    FEED_STORAGES = {'bigml': 'scrapy_bigml.BigMLFeedStorage'}\n    FEED_URI = 'bigml://your_source_name'\n\nA spider with example configuration can be found in\n``example_spider_feedstorage.py``.\n\nUsage as pipeline\n=================\n\nIf you wish to use scrapy-bigml as a pipeline, all you need to do is enable the\npipeline::\n\n    ITEM_PIPELINES = {'scrapy_bigml.BigMLPipeline': 500}\n\nYou should also set a name for your BigML source (if not, scrapy-bigml will\ndefault to \"Scrapy\")::\n\n    BIGML_SOURCE_NAME = 'Your source name'\n\nA spider with example configuration can be found in\n``example_spider_pipeline.py``.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fscrapy-plugins%2Fscrapy-bigml","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fscrapy-plugins%2Fscrapy-bigml","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fscrapy-plugins%2Fscrapy-bigml/lists"}