{"id":13414816,"url":"https://github.com/scrapy-plugins/scrapy-magicfields","last_synced_at":"2025-05-02T22:32:05.448Z","repository":{"id":43153441,"uuid":"62226374","full_name":"scrapy-plugins/scrapy-magicfields","owner":"scrapy-plugins","description":"Scrapy middleware to add extra fields to items, like timestamp, response fields, spider attributes etc.","archived":false,"fork":false,"pushed_at":"2022-03-16T01:17:57.000Z","size":15,"stargazers_count":56,"open_issues_count":0,"forks_count":7,"subscribers_count":9,"default_branch":"master","last_synced_at":"2025-04-23T09:51:35.316Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/scrapy-plugins.png","metadata":{"files":{"readme":"README.rst","changelog":"CHANGES.rst","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2016-06-29T13:05:56.000Z","updated_at":"2025-02-09T18:22:04.000Z","dependencies_parsed_at":"2022-08-31T04:52:21.121Z","dependency_job_id":null,"html_url":"https://github.com/scrapy-plugins/scrapy-magicfields","commit_stats":null,"previous_names":[],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/scrapy-plugins%2Fscrapy-magicfields","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/scrapy-plugins%2Fscrapy-magicfields/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/scrapy-plugins%2Fscrapy-magicfields/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/scrapy-plugins%2Fscrapy-magicfields/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/scrapy-plugins","download_url":"https://codeload.github.com/scrapy-plugins/scrapy-magicfields/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252116436,"owners_count":21697378,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-07-30T21:00:37.195Z","updated_at":"2025-05-02T22:32:03.579Z","avatar_url":"https://github.com/scrapy-plugins.png","language":"Python","funding_links":[],"categories":["Apps","Scrapy Middleware"],"sub_categories":["Other Useful Extensions"],"readme":"==================\nscrapy-magicfields\n==================\n\n.. image:: https://travis-ci.org/scrapy-plugins/scrapy-magicfields.svg?branch=master\n    :target: https://travis-ci.org/scrapy-plugins/scrapy-magicfields\n\n.. image:: https://codecov.io/gh/scrapy-plugins/scrapy-magicfields/branch/master/graph/badge.svg\n  :target: https://codecov.io/gh/scrapy-plugins/scrapy-magicfields\n\nThis is a Scrapy spider middleware to add extra fields to items,\nbased on the configuration settings ``MAGIC_FIELDS`` and ``MAGIC_FIELDS_OVERRIDE``.\n\n\nInstallation\n============\n\nInstall scrapy-magicfields using ``pip``::\n\n    $ pip install scrapy-magicfields\n\n\nConfiguration\n=============\n\n1. Add MagicFieldsMiddleware by including it in ``SPIDER_MIDDLEWARES``\n   in your ``settings.py`` file::\n\n      SPIDER_MIDDLEWARES = {\n          'scrapy_magicfields.MagicFieldsMiddleware': 100,\n      }\n\n   Here, priority ``100`` is just an example.\n   Set its value depending on other middlewares you may have enabled already.\n\n2. Enable the middleware using ``MAGIC_FIELDS`` (and optionally ``MAGIC_FIELDS_OVERRIDE``)\n   in your ``setting.py``.\n\n\nUsage\n=====\n\nBoth settings ``MAGIC_FIELDS`` and ``MAGIC_FIELDS_OVERRIDE`` are dicts:\n\n* the keys are the destination field names,\n* their value is a string which accepts **magic variables**,\n  — identified by a starting  ``$`` (dollar sign),\n  which will be substituted by a corresponding value at runtime.\n\nSome magic variables also accept arguments, and are specified after the magic name,\nusing a ``:`` (column) as separator.\n\n\nYou can set project-global magics with ``MAGIC_FIELDS``,\nand tune them for a specific spider using ``MAGIC_FIELDS_OVERRIDE``.\n\nIn case there is more than one argument, they must come separated by ``,`` (comma sign).\nSo the generic magic format is::\n\n    $\u003cmagic name\u003e[:arg1,arg2,...]\n\n\nSupported magic variables\n-------------------------\n\n``$time``\n    the UTC timestamp at which the item was scraped, in format ``'%Y-%m-%d %H:%M:%S'``.\n\n``$unixtime``\n    the unixtime (number of seconds since the Epoch, i.e. ``time.time()``)\n    at which the item was scraped.\n\n``$isotime``\n    the UTC timestamp at which the item was scraped, with format ``'%Y-%m-%dT%H:%M:%S\"``.\n\n``$spider``\n    must be followed by an argument,\n    which is the name of an attribute of the spider (like an argument passed to it).\n\n``$env``\n    the value of an environment variable.\n    It acccepts as argument the name of the variable.\n\n``$jobid``\n    the job id (shortcut for ``$env:SCRAPY_JOB``)\n\n``$jobtime``\n    the UTC timestamp at which the job started, in format ``'%Y-%m-%d %H:%M:%S'``.\n\n``$response``\n    Access to some response properties.\n\n    ``$response:url``\n        The url from where the item was extracted from.\n\n    ``$response:status``\n        Response http status.\n\n    ``$response:headers``\n        Response http headers.\n\n``$setting``\n    Access the given Scrapy setting. It accepts one argument: the name of the setting.\n\n``$field``\n    Allows to copy the value of one field to another\n    Its argument is the source field.\n    Effects are unpredicable if you use as source a field that is filled\n    using magic fields.\n\n\nExamples\n--------\n\nThe following configuration will add two fields to each scraped item:\n\n- ``'timestamp'``, which will be filled with the string ``'item scraped at \u003cscraped timestamp\u003e'``,\n- and ``'spider'``, which will contain the spider name\n\n::\n\n    MAGIC_FIELDS = {\n        \"timestamp\": \"item scraped at $time\",\n        \"spider\": \"$spider:name\"\n    }\n\nThe following configuration will copy the url to the field sku::\n\n    MAGIC_FIELDS = {\n        \"sku\": \"$field:url\"\n    }\n\nMagics also accept a regular expression argument which allows to extract\nand assign only part of the value generated by the magic.\nYou have to specify it using the ``r''`` notation.\n\nLet's pretend that the urls of your items look like ``'http://www.example.com/product.html?item_no=345'``\nand you want to assign to the ``sku`` field only the item number.\n\nThe following example, similar to the previous one but with a second regular expression argument,\nwill do the task::\n\n    MAGIC_FIELDS = {\n        \"sku\": \"$field:url,r'item_no=(\\d+)'\"\n    }\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fscrapy-plugins%2Fscrapy-magicfields","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fscrapy-plugins%2Fscrapy-magicfields","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fscrapy-plugins%2Fscrapy-magicfields/lists"}