{"id":32458886,"url":"https://github.com/anthonybloomer/nrscrapy","last_synced_at":"2025-10-26T11:01:53.643Z","repository":{"id":39721732,"uuid":"174412406","full_name":"AnthonyBloomer/nrscrapy","owner":"AnthonyBloomer","description":"Monitor Scrapy using the New Relic Python Agent API","archived":false,"fork":false,"pushed_at":"2022-11-10T21:35:56.000Z","size":46,"stargazers_count":1,"open_issues_count":16,"forks_count":2,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-05-15T00:13:14.271Z","etag":null,"topics":["newrelic","newrelic-insights","scrapy","scrapy-framework","scrapy-tutorial"],"latest_commit_sha":null,"homepage":"https://discuss.newrelic.com/t/relic-solution-monitoring-scrapy-using-the-python-agent-api/70772","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/AnthonyBloomer.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2019-03-07T20:06:31.000Z","updated_at":"2022-06-20T12:28:40.000Z","dependencies_parsed_at":"2022-09-21T05:07:54.620Z","dependency_job_id":null,"html_url":"https://github.com/AnthonyBloomer/nrscrapy","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/AnthonyBloomer/nrscrapy","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AnthonyBloomer%2Fnrscrapy","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AnthonyBloomer%2Fnrscrapy/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AnthonyBloomer%2Fnrscrapy/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AnthonyBloomer%2Fnrscrapy/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/AnthonyBloomer","download_url":"https://codeload.github.com/AnthonyBloomer/nrscrapy/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AnthonyBloomer%2Fnrscrapy/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":281092757,"owners_count":26442440,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-26T02:00:06.575Z","response_time":61,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["newrelic","newrelic-insights","scrapy","scrapy-framework","scrapy-tutorial"],"created_at":"2025-10-26T11:00:24.756Z","updated_at":"2025-10-26T11:01:53.624Z","avatar_url":"https://github.com/AnthonyBloomer.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Monitoring Scrapy using the New Relic Python Agent API\n\n[Scrapy](https://scrapy.org) is web scraping framework that is unsupported by the New Relic Python agent. We occasionally receive support tickets asking to help with instrumenting a Scrapy application. The response we give to our customers is that it is unsupported but they can use custom instrumentation to monitor their Scrapy application. Recently I worked on a project to learn more about the Scrapy framework and to demonstrate how a customer can monitor an unsupported framework such as Scrapy using the Python Agent API.\n\n## Basic Instrumentation using Agent Background Tasks\n\nA trivial example of how a customer can monitor their Scrapy application is to use the background task decorator with the New Relic Python agent API. Consider the following Spider that scrapes content from [Quotes to Scrape:](https://quotes.toscrape.com)\n\n``` python\nimport scrapy\n\n\nclass QuotesSpider(scrapy.Spider):\n    name = \"quotes\"\n    start_urls = [\n        'http://quotes.toscrape.com/page/1/',\n    ]\n\n    def parse(self, response):\n        for quote in response.css('div.quote'):\n            yield {\n                'text': quote.css('span.text::text').get(),\n                'author': quote.css('small.author::text').get(),\n                'tags': quote.css('div.tags a.tag::text').getall(),\n            }\n\n        for a in response.css('li.next a'):\n            yield response.follow(a, callback=self.parse)\n```\n\n\nTo add basic instrumentation using the New Relic Python agent, we just need to add three additional lines of code!\n\n``` python\nimport newrelic.agent\nnewrelic.agent.initialize('newrelic.ini')\n\nimport scrapy\n\n\nclass QuotesSpider(scrapy.Spider):\n    name = \"quotes\"\n    start_urls = [\n        'http://quotes.toscrape.com/page/1/',\n    ]\n\n    @newrelic.agent.background_task()\n    def parse(self, response):\n        for quote in response.css('div.quote'):\n            yield {\n                'text': quote.css('span.text::text').get(),\n                'author': quote.css('small.author::text').get(),\n                'tags': quote.css('div.tags a.tag::text').getall(),\n            }\n\n        for a in response.css('li.next a'):\n            yield response.follow(a, callback=self.parse)\n```\n\nIn the example above, the `initialize` method is used to initialize the agent with the specified newrelic.ini configuration file. The `@newrelic.agent.background_task()` decorator is used to instrument the parse function as a background task. This transaction is then displayed as a non-web transactions in the APM UI and separated from web transactions.\n\n## Advanced Instrumentation using Scrapy Extensions\n\nTo go one step further with instrumenting Scrapy applications is to use [Scrapy Extensions](https://docs.scrapy.org/en/latest/topics/extensions.html). The extensions framework built into Scrapy provides a mechanism for inserting your own custom functionality into Scrapy. Extensions are just regular classes that are instantiated at Scrapy startup, when extensions are initialized.\n\nI worked on an extension that collects some statistics and records a New Relic custom event that can be queried using New Relic Insights. Scrapy uses [signals](https://docs.scrapy.org/en/latest/topics/signals.html) to notify when certain events occur. You can catch some of those signals in your Scrapy application using a custom extension to perform tasks or extend Scrapy to add functionality not provided out of the box. \n\nIn my custom New Relic extension, I gather some basic statistics when the Spider is opened, closed, scraped, etc. In the closed method, I send the gathered data using the `record_custom_event` API method.\n\nYou can find the custom extension below:\n\n``` python\nimport newrelic.agent\n\nimport logging\nimport datetime\n\nfrom scrapy import signals\nfrom scrapy.exceptions import NotConfigured\n\nlogger = logging.getLogger(__name__)\n\n\nclass NewRelic(object):\n\n    def __init__(self):\n        self.event_stats = {}\n\n    @classmethod\n    def from_crawler(cls, crawler):\n        if not crawler.settings.getbool('MYEXT_ENABLED'):\n            raise NotConfigured\n\n        o = cls()\n        crawler.signals.connect(o.spider_opened, signal=signals.spider_opened)\n        crawler.signals.connect(o.spider_closed, signal=signals.spider_closed)\n        crawler.signals.connect(o.item_scraped, signal=signals.item_scraped)\n        crawler.signals.connect(o.item_dropped, signal=signals.item_dropped)\n        crawler.signals.connect(o.response_received, signal=signals.response_received)\n        return o\n\n    def set_value(self, key, value):\n        self.event_stats[key] = value\n\n    def spider_opened(self, spider):\n        self.set_value('start_time', datetime.datetime.utcnow())\n\n    def spider_closed(self, spider, reason):\n        self.set_value('finish_time', datetime.datetime.utcnow())\n        application = newrelic.agent.application()\n        self.event_stats.update({'spider': spider.name})\n        newrelic.agent.record_custom_event(\"ScrapyEvent\", self.event_stats, application)\n\n    def inc_value(self, key, count=1, start=0, spider=None):\n        d = self.event_stats\n        d[key] = d.setdefault(key, start) + count\n\n    def item_scraped(self, item, spider):\n        self.inc_value('item_scraped_count', spider=spider)\n\n    def response_received(self, spider):\n        self.inc_value('response_received_count', spider=spider)\n\n    def item_dropped(self, item, spider, exception):\n        reason = exception.__class__.__name__\n        self.inc_value('item_dropped_count', spider=spider)\n        self.inc_value('item_dropped_reasons_count/%s' % reason, spider=spider)\n```\n\nThe above example includes only a few of the signal events available. For a full list of signals go to: [https://docs.scrapy.org/en/latest/topics/signals.html](https://docs.scrapy.org/en/latest/topics/signals.html) \n\n \n## Testing the Project\n\n \nTo try this project, follow these steps:\n\n1. Clone the repo: `git clone https://github.com/AnthonyBloomer/nrscrapy.git`\n2. Install the requirements. Run `pip install -r requirements.txt`\n3. Update `newrelic.ini` with your license key or export your license key as an environment variable.\n4. Run `cd tutorial`\n5. Run `scrapy crawl quotes`\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fanthonybloomer%2Fnrscrapy","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fanthonybloomer%2Fnrscrapy","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fanthonybloomer%2Fnrscrapy/lists"}