{"id":20709934,"url":"https://github.com/oxylabs/custom-parser-instructions","last_synced_at":"2025-12-24T06:26:59.300Z","repository":{"id":178263614,"uuid":"646698637","full_name":"oxylabs/custom-parser-instructions","owner":"oxylabs","description":"Learn the fundamentals of writing parsing instructions with Oxylabs' Custom Parser.","archived":false,"fork":false,"pushed_at":"2025-02-11T13:04:42.000Z","size":1851,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-02-11T14:23:12.731Z","etag":null,"topics":["parser","parsing","python","scraping","scraping-websites","tutorial","web-scraping"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/oxylabs.png","metadata":{"files":{"readme":"readme.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-05-29T06:20:14.000Z","updated_at":"2025-02-11T13:04:46.000Z","dependencies_parsed_at":"2024-04-04T14:57:21.297Z","dependency_job_id":"b91f739c-04af-4e55-8329-a0be8d1199f5","html_url":"https://github.com/oxylabs/custom-parser-instructions","commit_stats":null,"previous_names":["oxylabs/custom-parser-instructions"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/oxylabs%2Fcustom-parser-instructions","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/oxylabs%2Fcustom-parser-instructions/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/oxylabs%2Fcustom-parser-instructions/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/oxylabs%2Fcustom-parser-instructions/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/oxylabs","download_url":"https://codeload.github.com/oxylabs/custom-parser-instructions/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":242980783,"owners_count":20216285,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["parser","parsing","python","scraping","scraping-websites","tutorial","web-scraping"],"created_at":"2024-11-17T02:09:06.896Z","updated_at":"2025-12-24T06:26:59.291Z","avatar_url":"https://github.com/oxylabs.png","language":"Python","readme":"# Custom Parser Instruction\n\n[![Oxylabs promo code](https://raw.githubusercontent.com/oxylabs/product-integrations/refs/heads/master/Affiliate-Universal-1090x275.png)](https://oxylabs.io/pages/gitoxy?utm_source=877\u0026utm_medium=affiliate\u0026groupid=877\u0026utm_content=custom-parser-instructions-github\u0026transaction_id=102f49063ab94276ae8f116d224b67)\n\n[![](https://dcbadge.vercel.app/api/server/eWsVUJrnG5)](https://discord.gg/GbxmdGhZjq)\n\n# How to Write Parsing Instructions with Custom Parser?\n[![](https://dcbadge.vercel.app/api/server/eWsVUJrnG5)](https://discord.gg/GbxmdGhZjq)\n\n- [The structure of parsing instructions](#the-structure-of-parsing-instructions)\n- [How to write parsing instructions](#how-to-write-parsing-instructions)\n  * [Configuring the payload](#configuring-the-payload)\n  * [Parsing a single field using XPath](#parsing-a-single-field-using-xpath)\n  * [Parsing a single field using CSS selectors](#parsing-a-single-field-using-css-selectors)\n  * [Parsing multiple fields with separated results](#parsing-multiple-fields-with-separated-results)\n  * [Parsing multiple fields with categorized results](#parsing-multiple-fields-with-categorized-results)\n- [Parsing example of a real target](#parsing-example-of-a-real-target)\n  * [Product listings](#product-listings)\n  * [Product page](#product-page)\n\nCustom Parser is a free feature of Oxylabs [\u003cu\u003eScraper APIs\u003c/u\u003e](https://oxylabs.io/products/scraper-api), which allows you to write your own parsing instructions for a chosen target when needed. The Custom Parser feature expands your options and flexibility throughout the entire scraping process on any website.\n\nWith it, you can:\n\n- Extract all text from an HTML document;\n- Parse data using XPath and CSS expressions;\n- Manipulate strings with pre-defined functions and regex expressions;\n- Perform common string actions like conversion, indexing, and retrieving the length;\n- Do mathematical calculations, such as calculating the average, finding the maximum and minimum values, and multiplying values;\n- Save, reuse, and modify [Parser Presets](https://developers.oxylabs.io/scraping-solutions/web-scraper-api/features/result-processing-and-storage/custom-parser/parser-presets) by hosting them on our system;\n- Enable **Self-healing** for your parser presets to make them resilient to minor layout changes automatically;\n- View performance and usage statistics of a preset over time.\n\nThis guide will teach you the fundamentals of writing custom parsing instructions in Python and will showcase Custom Parser in action.\n\n---\n\n## Self-healing parser presets\n\nOxylabs' [Web Scraper API](https://oxylabs.io/products/scraper-api/web) now supports self-healing functionality in Parser Presets. This feature significantly improves scraping resilience by automatically adjusting parsing logic when target website structures change slightly.\n\n### What are self-healing parser presets?\n\nSelf-healing parser presets are Custom Parser configurations that adapt to structural changes in target websites. Once enabled, they monitor their performance and attempt to automatically fix parsing failures caused by layout changes (e.g., altered class names, node paths).\n\n### Why use self-healing?\n\n- **Reduced manual maintenance** – your preset adjusts automatically  \n- **Increased success rates** – scraping jobs stay operational longer  \n- **Efficient scaling** – fewer failures across multiple targets  \n\n### How to enable self-healing for a parser preset\n\nOnce you have created a parser preset via the API, you can enable self-healing by providing the `self_heal`, `prompt_schema`, and `urls` parameters. For example:\n\n**Endpoint:** `PUT https://data.oxylabs.io/v1/parsers/presets/{preset_name}`\n\n```json\n{\n    \"self_heal\": true,\n    \"urls\": [\"https://sandbox.oxylabs.io/products\"],\n    \"prompt_schema\": {\n        \"properties\": {\n            \"product_titles\": {\n                \"description\": \"Title of each product.\",\n                \"items\": {\n                    \"type\": \"string\"\n                },\n                \"maxItems\": 5,\n                \"title\": \"Product Titles\",\n                \"type\": \"array\"\n            }\n        },\n        \"required\": [\n            \"product_titles\"\n        ],\n        \"title\": \"Fields\",\n        \"type\": \"object\"\n    }\n}\n```\n\n## Using a Self-Healing Preset in a Scraping Job\n\n\nWhen submitting a scraping job, instead of passing `parsing_instructions` in the payload, you can reference your parser preset by name:\n\n\n```json\n{\n    \"source\": \"universal\",\n    \"url\": \"https://sandbox.oxylabs.io/products\",\n    \"parser_preset\": \"my_self_healing_preset\"\n}\n```\n\n\n## Monitoring Success Rate\n\n\nTrack how well your preset is performing using the statistics endpoint:\n\n\n```\nGET https://data.oxylabs.io/v1/parsers/presets/{preset_name}/stats\n```\n\n\n### Example response:\n\n\n```json\n{\n    \"success_rate\": 98,\n    \"success_rate_by_path\": {\n        \"titles\": 100,\n        \"prices\": 96\n    },\n    \"successful_results\": 49,\n    \"total_results\": 50\n}\n```\n\n\nThese stats help you understand which parts of your parser might need manual review.\n\n\n## Best Practices for Building Self-Healing-Ready Presets\n\n\n- Use multiple fallback selectors (XPath/CSS) in `\"_args\"`\n- Avoid brittle absolute paths – prefer relative, semantic selectors\n- Monitor `success_rate` to detect unexpected drops\n- Regularly test against updated pages to ensure healing performance\n\n\nFor more, visit the [Parser Preset documentation](https://developers.oxylabs.io/scraping-solutions/web-scraper-api/features/custom-parser/parser-presets).\n\n\n---\n\n\n## The structure of parsing instructions\n\nTo start off, you should already have a basic grasp of Oxylabs Scraper\nAPIs. If you’re new to our web scraping solutions, you can familiarize\nyourself by reading our [\u003cu\u003edocumentation\u003c/u\u003e](https://developers.oxylabs.io/scraper-apis/web-scraper-api/features/custom-parser/getting-started).\nNote that you can only use one parser simultaneously – either a\nDedicated Parser, Adaptive Parser, or Custom Parser.\n\nIn essence, the parsing instructions have to be specified in the payload\nof the request, which is composed in a JSON format. Parsing instructions\nconsist of HTML node selection and value transformation functions.\n\nYou’re going to use XPath expressions or CSS selectors to select HTML nodes and extract data from them. We highly recommend reading our [\u003cu\u003eblog post\u003c/u\u003e](https://oxylabs.io/blog/xpath-vs-css), where we introduce the\nbasics of using XPath and CSS selectors.\n\nThe two XPath functions of Custom Parser are `xpath`, which returns all\nmatches, and `xpath_one`, which returns the first match. Similarly, there are also two CSS functions you can use – `css` to get all matches and `css_one` to get only the first match. You can learn more about other functions in our [\u003cu\u003edocumentation\u003c/u\u003e](https://developers.oxylabs.io/scraper-apis/web-scraper-api/features/custom-parser/list-of-functions).\n\nThe structure of parsing instructions can be summed up into four main\nsteps:\n\n1.  Name of a field that will store the results;\n\n2.  `_fns` array that holds all the specific parsing instructions for that field;\n\n3.  `_fn` function that defines the action;\n\n4.  `_args` variables that modify the behavior of the `_fn` associated with it.\n\nThe following code sample illustrates these steps:\n\n```python\n{\n    \"parsing_instructions\": {\n        \"Result field name\": {      # 1.\n            \"_fns\": [               # 2.\n                {\n                    \"_fn\": \"What action to perform?\",        # 3.\n                    \"_args\": [\"How to perform the action?\"]  # 4.\n                }\n            ]\n        }\n    }\n}\n```\n\n## How to write parsing instructions\n\nWe’ll use a dummy bookstore website,\n[\u003cu\u003ebooks.toscrape.com\u003c/u\u003e](https://books.toscrape.com/catalogue/page-1.html),\nto showcase several ways you can extract the desired information.\n\n### Configuring the payload\n\nFirst, define the necessary payload parameters for your specific needs,\nthen add the `\"parse\": True` parameter to enable parsing. Next, add the\n`\"parsing_instructions\"` parameter to define the parsing instructions\nwithin the curly brackets. So far, your payload should look similar to\nthis:\n```python\npayload = {\n    \"source\": \"universal\",\n    \"url\": \"https://books.toscrape.com/catalogue/page-1.html\",\n    \"parse\": True,\n    \"parsing_instructions\": {}\n}\n```\n### Parsing a single field using XPath\n\nLet’s start by gathering all the book titles from our [\u003cu\u003etarget\npage\u003c/u\u003e](https://books.toscrape.com/catalogue/page-1.html). Create a\nnew JSON object and assign a new field, which will hold a list of all\nthe book titles. This field name will be displayed in the parsed result.\nLet’s call it `\"titles\"`:\n\n\u003e **Note**\n\u003e \n\u003e When creating custom parameter names, you can’t use the\nunderscore symbol `_` at the very beginning.\n\n```python\n{\n    \"parsing_instructions\": {\n        \"titles\": {}\n    }\n}\n```\nNext, let’s add the `_fns` array to define a data processing pipeline.\nThis property will hold all the instructions required to parse the book\ntitles from our target:\n```python\n{\n    \"parsing_instructions\": {\n        \"titles\": {\n            \"_fns\": []\n        }\n    }\n}\n```\nThen, in the square brackets of the `_fns` field, add the `_fn` and\n`_args` properties:\n```python\n{\n    \"parsing_instructions\": {\n        \"titles\": {\n            \"_fns\": [\n                {\n                    \"_fn\": \"\",\n                    \"_args\": [\"\"]\n                }\n            ]\n        }\n    }\n}\n```\nIn this section we’ll use XPath expressions to parse all the book titles. You can find an example of how to use CSS selectors below.\n\nIn order to get all the book titles, set `\"_fn\"` value to `\"xpath\"` and\nprovide one or more XPath expressions in the `\"_args\"` array. Please note\nthat the XPath expressions will be executed in the order they’re found\nin the array. For instance, if the first XPath expression is valid (i.e.\nthe node exists), subsequent XPath expressions won’t be executed.\n\nIn this case, all the book titles are in the `\u003ca\u003e` tags, which are\ninside the `\u003ch3\u003e` tag, so the XPath expression can be written as\n`\"//h3//a/text()\"`. The `text()` in the XPath expression instructs the\nparser to select only the textual values:\n\n```python\nimport requests\nfrom pprint import pprint\n\npayload = {\n    \"source\": \"universal\",\n    \"url\": \"https://books.toscrape.com/catalogue/page-1.html\",\n    \"parse\": True,\n    \"parsing_instructions\": {\n        \"titles\": {\n            \"_fns\": [\n                {\n                    \"_fn\": \"xpath\",\n                    \"_args\": [\"//h3//a/text()\"]\n                }\n            ]\n        }\n    }\n}\n\nresponse = requests.request(\n    \"POST\",\n    \"https://realtime.oxylabs.io/v1/queries\",\n    auth=(\"USERNAME\", \"PASSWORD\"),\n    json=payload\n)\n\npprint(response.json())\n```\nThis code produces the following list of book titles:\n```bash\n{\n  \"titles\": [\n    \"A Light in the ...\",\n    \"Tipping the Velvet\",\n    \"Soumission\",\n    \"Sharp Objects\",\n    \"Sapiens: A Brief History ...\",\n    \"The Requiem Red\",\n    \"The Dirty Little Secrets ...\",\n    \"The Coming Woman: A ...\",\n    \"The Boys in the ...\",\n    \"The Black Maria\",\n    \"Starving Hearts (Triangular Trade ...\",\n    \"Shakespeare's Sonnets\",\n    \"Set Me Free\",\n    \"Scott Pilgrim's Precious Little ...\",\n    \"Rip it Up and ...\",\n    \"Our Band Could Be ...\",\n    \"Olio\",\n    \"Mesaerion: The Best Science ...\",\n    \"Libertarianism for Beginners\",\n    \"It's Only the Himalayas\"\n  ]\n}\n```\n### Parsing a single field using CSS selectors\n\nAlternatively, the same result can be achieved using CSS selectors. To do that, set the `\"_fn\"` value to `\"css\"`, and provide one or more CSS expressions in the `\"_args\"` array.\nTo parse all the book titles from the target website, you can form the CSS expression as `\"h3 \u003e [title]\"` since all the titles are inside the `title` attribute. Your parsing instructions should look like this:\n\n```python\n{\n    \"parsing_instructions\": {\n        \"titles\": {\n            \"_fns\": [\n                {\n                    \"_fn\": \"css\",\n                    \"_args\": [\"h3 \u003e [title]\"]\n                }\n            ]\n        }\n    }\n}\n```\n\nNote that CSS expressions can only \nselect HTML elements, meaning they **can’t directly extract the values**. Hence, using the above code, the received response is a JSON array with HTML elements, including the opening and closing tags.\nTo extract the values, you can create another `\"_fn\"` function within the `\"_fns\"` array and use the `\"element_text\"` function of Custom Parser that extracts text and strips leading and trailing whitespaces:\n\n```python\n{\n    \"parsing_instructions\": {\n        \"titles\": {\n            \"_fns\": [\n                {\n                    \"_fn\": \"css\",\n                    \"_args\": [\"h3 \u003e [title]\"]\n                },\n                {\n                    \"_fn\": \"element_text\"\n                }\n            ]\n        }\n    }\n}\n```\n\nThis time, the parsing instructions brought back only the text from the `title` attribute:\n\n```bash\n{\n  \"titles\": [\n    \"A Light in the ...\",\n    \"Tipping the Velvet\",\n    \"Soumission\",\n    \"Sharp Objects\",\n    \"Sapiens: A Brief History ...\",\n    \"The Requiem Red\",\n    \"The Dirty Little Secrets ...\",\n    \"The Coming Woman: A ...\",\n    \"The Boys in the ...\",\n    \"The Black Maria\",\n    \"Starving Hearts (Triangular Trade ...\",\n    \"Shakespeare's Sonnets\",\n    \"Set Me Free\",\n    \"Scott Pilgrim's Precious Little ...\",\n    \"Rip it Up and ...\",\n    \"Our Band Could Be ...\",\n    \"Olio\",\n    \"Mesaerion: The Best Science ...\",\n    \"Libertarianism for Beginners\",\n    \"It's Only the Himalayas\"\n  ]\n}\n```\n\n### Parsing multiple fields with separated results\n\nLet’s include the book prices, which are in the `\u003cp\u003e` tag with an\nattribute `class=\"price_color\"`. You can separate the results by creating\nanother field that will hold the prices. The process is the same as\nexplained previously – you have to create another field called `\"prices\"`,\njust like you did with the `\"titles\"`. The parsing instructions using XPath should be\nas follows:\n```python\n{\n    \"parsing_instructions\": {\n        \"titles\": {\n            \"_fns\": [\n                {\n                    \"_fn\": \"xpath\",\n                    \"_args\": [\"//h3//a/text()\"]\n                }\n            ]\n        },\n        \"prices\": {\n            \"_fns\": [\n                {\n                    \"_fn\": \"xpath\",\n                    \"_args\": [\"//p[@class='price_color']/text()\"]\n                }\n            ]\n        }\n    }\n}\n```\nThe output will give you results separated by fields:\n\n```bash\n{\n  \"prices\": [\n    \"£51.77\",\n    \"£53.74\",\n      \n      ...\n      \n    \"£51.33\",\n    \"£45.17\"\n  ],\n  \"titles\": [\n    \"A Light in the ...\",\n    \"Tipping the Velvet\",\n      \n      ...\n      \n    \"Libertarianism for Beginners\",\n    \"It's Only the Himalayas\"\n  ]\n}\n```\n\nThe results can also be categorized by product, which we’ll overview\nnext.\n\n### Parsing multiple fields with categorized results\n\nSay you want to get the **titles**, **prices**, **availability**, and\nthe **URL** of all the books on page 1. Following the logic of the\nprevious parsing instructions, the results would be separated into\ndifferent fields, which may not be a preferred way to parse product\nlistings.\n\nCustom Parser allows you to categorize the results by product. To do\nthat, you can first define the parsing scope of the HTML document and\niterate over it with the `\"_items\"` function. This function tells our\nsystem that every field inside it, such as `\"title\"`, is a part of one\nitem and should be grouped together.\n\nBy defining the parsing scope, you’re telling the system to look only at\na specific part of the HTML document. All books are listed within the\n`\u003cli\u003e` tags, which are under the `\u003col\u003e` tag. Thus, you can use the XPath\nexpression `//ol//li` to define the parsing scope for book listings.\n\nWhen defining the parsing scope, use the `xpath` function for the` _fn`\nproperty to find everything that matches the XPath expression. At this\nmoment, the code should look like this:\n```python\n{\n    \"parsing_instructions\": {\n        \"products\": {\n            \"_fns\": [\n                {\n                    \"_fn\": \"xpath\",\n                    \"_args\": [\"//ol//li\"]\n                }\n            ]\n        }\n    }\n}\n```\nThen, when using the `\"_items\"` property, use the `xpath_one` function to\nfind only the first match since the `\"_items\"` property will iterate over\nthe defined parsing scope, which finds all the matches. Let’s add the\n**title**, **price**, **availability**, and **URL** fields to our code\ninside the `\"_items\"` property:\n```python\n{\n    \"parsing_instructions\": {\n        \"products\": {\n            \"_fns\": [\n                {\n                    \"_fn\": \"xpath\",\n                    \"_args\": [\n                        \"//ol//li\"\n                    ]\n                }\n            ],\n            \"_items\": {\n                \"title\": {\n                    \"_fns\": [\n                        {\n                            \"_fn\": \"xpath_one\",\n                            \"_args\": [\n                                \".//h3//a/text()\"\n                            ]\n                        }\n                    ]\n                },\n                \"price\": {\n                    \"_fns\": [\n                        {\n                            \"_fn\": \"xpath_one\",\n                            \"_args\": [\n                                \".//p[@class='price_color']/text()\"\n                            ]\n                        }\n                    ]\n                },\n                \"availability\": {\n                    \"_fns\": [\n                        {\n                            \"_fn\": \"xpath_one\",\n                            \"_args\": [\n                                \"normalize-space(.//p[contains(@class, 'availability')]/text()[last()])\"\n                            ]\n                        }\n                    ]\n                },\n                \"url\": {\n                    \"_fns\": [\n                        {\n                            \"_fn\": \"xpath_one\",\n                            \"_args\": [\n                                \".//a/@href\"\n                            ]\n                        }\n                    ]\n                }\n            }\n        }\n    }\n}\n```\nWith these parsing instructions, the results are categorized by product:\n```bash\n{\n  \"products\": [\n    {\n      \"availability\": \"In stock\",\n      \"price\": \"£51.77\",\n      \"title\": \"A Light in the ...\",\n      \"url\": \"a-light-in-the-attic_1000/index.html\"\n    },\n    {\n      \"availability\": \"In stock\",\n      \"price\": \"£53.74\",\n      \"title\": \"Tipping the Velvet\",\n      \"url\": \"tipping-the-velvet_999/index.html\"\n    },\n      \n      ...\n      \n    {\n      \"availability\": \"In stock\",\n      \"price\": \"£51.33\",\n      \"title\": \"Libertarianism for Beginners\",\n      \"url\": \"libertarianism-for-beginners_982/index.html\"\n    },\n    {\n      \"availability\": \"In stock\",\n      \"price\": \"£45.17\",\n      \"title\": \"It's Only the Himalayas\",\n      \"url\": \"its-only-the-himalayas_981/index.html\"\n    }\n  ]\n}\n```\n\n## Parsing example of a real target\n\n### Product listings\n\nIn this section, let’s use Custom Parser to parse [\u003cu\u003ethis product\nlisting page\u003c/u\u003e](https://www.ebay.com/sch/i.html?_from=R40\u0026_trksid=p2334524.m570.l1313\u0026_nkw=laptop\u0026_sacat=0\u0026LH_TitleDesc=0\u0026_odkw=laptop\u0026_osacat=0) on eBay:\n\n![](images/ebay_product_listings.png)\n\nThe goal is to extract the **title**, **price**, **item condition**,\n**URL**, and **seller information** from each product listing.\n\nHere, you can again define the parsing scope. All of the products are\ninside the `\u003cli\u003e` tag with the attribute `data-viewport`, which is under\nthe `\u003cul\u003e` tag. With this information, you can form the XPath expression\nas `//ul//li[@data-viewport]`:\n\n```python\n{\n    \"parsing_instructions\": {\n        \"products\": {\n            \"_fns\": [\n                {\n                    \"_fn\": \"xpath\",\n                    \"_args\": [\"//ul//li[@data-viewport]\"]\n                }\n            ]\n        }\n    }\n}\n```\n\nFollowing the same logic as shown previously, you can form the parsing\ninstructions within the `\"_items\"` function. Notice the second XPath\nexpression for the `\"title\"` field:\n```python\n{\n    \"parsing_instructions\": {\n        \"products\": {\n            \"_fns\": [\n                {\n                    \"_fn\": \"xpath\",\n                    \"_args\": [\n                        \"//ul//li[@data-viewport]\"\n                    ]\n                }\n            ],\n            \"_items\": {\n                \"title\": {\n                    \"_fns\": [\n                        {\n                            \"_fn\": \"xpath_one\",\n                            \"_args\": [\n                                \".//span[@role='heading']/text()\",\n                                \".//span[@class='BOLD']/text()\"\n                            ]\n                        }\n                    ]\n                },\n                \"price\": {\n                    \"_fns\": [\n                        {\n                            \"_fn\": \"xpath_one\",\n                            \"_args\": [\n                                \".//span[@class='s-item__price']/text()\"\n                            ]\n                        }\n                    ]\n                },\n                \"condition\": {\n                    \"_fns\": [\n                        {\n                            \"_fn\": \"xpath_one\",\n                            \"_args\": [\n                                \".//span[@class='SECONDARY_INFO']/text()\"\n                            ]\n                        }\n                    ]\n                },\n                \"seller\": {\n                    \"_fns\": [\n                        {\n                            \"_fn\": \"xpath_one\",\n                            \"_args\": [\n                                \".//span[@class='s-item__seller-info-text']/text()\"\n                            ]\n                        }\n                    ]\n                },\n                \"url\": {\n                    \"_fns\": [\n                        {\n                            \"_fn\": \"xpath_one\",\n                            \"_args\": [\n                                \".//a/@href\"\n                            ]\n                        }\n                    ]\n                }\n            }\n        }\n    }\n}\n```\nThe additional XPath expression is used to fall back to if the first\nexpression doesn’t return any value. This is the case with our target\npage since there are some titles found within the `\u003cspan\u003e` tag with an\nattribute set to `class=\"BOLD\"`:\n\n![](images/ebay_bold_title.png)\n\nLet’s fully build up the code sample to parse eBay products:\n\n```python\nimport requests\nimport json\nfrom pprint import pprint\n\n# Structure payload\npayload = {\n    \"source\": \"universal\",\n    \"url\": \"https://www.ebay.com/sch/i.html?_from=R40\u0026_trksid=p2334524.m570.l1313\u0026_nkw=laptop\u0026_sacat=0\u0026LH_TitleDesc=0\u0026_odkw=laptop\u0026_osacat=0\",\n    \"geo_location\": \"United States\",\n    \"parse\": True,\n    \"parsing_instructions\": {\n        \"products\": {\n            \"_fns\": [\n                {\n                    \"_fn\": \"xpath\",\n                    \"_args\": [\n                        \"//ul//li[@data-viewport]\"\n                    ]\n                }\n            ],\n            \"_items\": {\n                \"title\": {\n                    \"_fns\": [\n                        {\n                            \"_fn\": \"xpath_one\",\n                            \"_args\": [\n                                \".//span[@role='heading']/text()\",\n                                \".//span[@class='BOLD']/text()\"\n                            ]\n                        }\n                    ]\n                },\n                \"price\": {\n                    \"_fns\": [\n                        {\n                            \"_fn\": \"xpath_one\",\n                            \"_args\": [\n                                \".//span[@class='s-item__price']/text()\"\n                            ]\n                        }\n                    ]\n                },\n                \"condition\": {\n                    \"_fns\": [\n                        {\n                            \"_fn\": \"xpath_one\",\n                            \"_args\": [\n                                \".//span[@class='SECONDARY_INFO']/text()\"\n                            ]\n                        }\n                    ]\n                },\n                \"seller\": {\n                    \"_fns\": [\n                        {\n                            \"_fn\": \"xpath_one\",\n                            \"_args\": [\n                                \".//span[@class='s-item__seller-info-text']/text()\"\n                            ]\n                        }\n                    ]\n                },\n                \"url\": {\n                    \"_fns\": [\n                        {\n                            \"_fn\": \"xpath_one\",\n                            \"_args\": [\n                                \".//a/@href\"\n                            ]\n                        }\n                    ]\n                }\n            }\n        }\n    }\n}\n\n# Get a response\nresponse = requests.request(\n    \"POST\",\n    \"https://realtime.oxylabs.io/v1/queries\",\n    auth=(\"USERNAME\", \"PASSWORD\"),\n    json=payload\n)\n\n# Write the JSON response to a JSON file\nwith open(\"ebay_product_listings.json\", \"w\") as f:\n    json.dump(response.json(), f)\n\n# Instead of a response with job status and results URL, this will return\n# the JSON response with the result\npprint(response.json())\n```\nIt produces the output with the categorized information by product:\n\n```bash\n{\n  \"products\": [\n    {\n      \"condition\": \"Open Box\",\n      \"price\": \"$399.95\",\n      \"seller\": \"gtgeveryday (11,074) 98.8%\",\n      \"title\": \"HP Laptop Computer 15.6 HD Notebook 16GB 512GB SSD Win11 Intel WiFi Bluetooth\",\n      \"url\": \"https://www.ebay.com/itm/374483095044?hash=item5730ee8e04:g:GTUAAOSwSbJj1D3b\u0026amdata=enc%3AAQAIAAAAsPgaQKZhgBOAcXj6BHSIXZIQGiVP2blfkVh8s73u2tYQm3wSJQspCiKEvx6MkyORjJyiWzBwmdeoUJbfYilH%2FVBZx53G1LAA4hGrr8mVA7tfse8gF64Ses9dWjo5htwiFoeaiqA34DKAXFUHH32KU03simn1pu9lZiXqQspPyDG0Dt7DAYB6aus%2B8lYRKfRVurYSajf4KLANNUE4HAStHK24pzEYsUABr1uNp8P5Czf%2F%7Ctkp%3ABlBMUMq35OKHYg\"\n    },\n    {\n      \"condition\": \"Open Box\",\n      \"price\": \"$599.00\",\n      \"seller\": \"computergalleryonline (17,836) 100%\",\n      \"title\": \"Microsoft Surface Pro 6 12.3 1.90GHz CORE i7 [8650U] 1TB SSD 16GB W10PRO Webcam\",\n      \"url\": \"https://www.ebay.com/itm/255069566836?hash=item3b6354b774:g:PMgAAOSwfTdg0j3a\u0026amdata=enc%3AAQAIAAAAsNsgs7NCzOLwklJuBevZVZ6ohkW2lno%2F1Wh9r84C1AV1vlDrqncfYVQLFWtiFTwbXNMfy3YXkKqEqBEAS1SFMifni9n5V%2B8ZMC2zfAiNZX%2BWZH4VOXl2EZOKg69kdGaHAjL%2FEHcNZfkmIgLwvtmYoYbSeITVnXaGsiMS3qPwJHZcS0Qb2w%2BZgokPePR4thmBH%2Bc8cBwxA06a%2F5Hu1%2B7rOHz%2BXLmJ9iSLNJmBufaHk4Cp%7Ctkp%3ABlBMUMq35OKHYg\"\n    },\n    {\n      \"condition\": \"Very Good - Refurbished\",\n      \"price\": \"$130.58\",\n      \"seller\": \"discountcomputerdepot (101,356) 98.6%\",\n      \"title\": \"Lenovo ThinkPad Yoga 11e 5th Gen Touchscreen Laptop Windows 10 4GB Ram 256GB SSD\",\n      \"url\": \"https://www.ebay.com/itm/254646198216?hash=item3b4a189fc8:g:RYEAAOSwmbVfA5~a\u0026amdata=enc%3AAQAIAAAAsANRr%2F6XW4iwQrABynh1VKLP4xhMjrQSpGI2M%2B4Z3%2B1vWEAYS3Iadzz2OlIfrfs0UoipImK0fiYa5qxRmpaSQGZ24iCHofOVmQThBqyv4XDR3GhJoP718l5RKCB5cqSGLF69q7b2acskGS1Id064oQLtojZekMJzWOkLCb0tfIwV8jlgoJiE1NHoRowYhV%2FhmxRXAQpz9Ow7o9CHEqEsNO10bUSGbnc%2FYFDuPFRfRbp9%7Ctkp%3ABlBMUMq35OKHYg\"\n    },\n    {\n      \"condition\": \"Brand New\",\n      \"price\": \"$369.99\",\n      \"seller\": \"antonline (319,396) 98.9%\",\n      \"title\": \"Lenovo IdeaPad 3 14 Laptop FHD Intel Core i5-1135G7 8GB RAM 512GB SSD\",\n      \"url\": \"https://www.ebay.com/itm/304852488846?hash=item46fa9fd28e:g:G5sAAOSwaMpkT3K1\u0026amdata=enc%3AAQAIAAAAwI1TVVViXVxUCbkGokwpSEGjqhGuidyNYaY6VP22Kv8RqfeRYoUI8wKkSebTaTcFiY%2FjUz5t18Y0G8aU36cyKXbvhBq1%2Bv8mkBbNP3QtfBFFGnBu0d9OJ7x1f1RRac3c1iRiXb1jZd2TJMfNr7Ijen5y7t2Fv4bxwKL3%2BT7FAf6RPGbLpMXclyvJRPkxXuVab5g2U27DzDtuo6uJqp009pRyi%2F1QzehMXD6mAef9B6183jWkMEKtpN6F8ozshn3Yog%3D%3D%7Ctkp%3ABk9SR8y35OKHYg\"\n    }\n  ]\n}\n```\n\n### Product page\n\nThe parsing instructions to collect information from a specific product\npage don’t differ too much, yet there’s a certain parsing logic you can\nfollow. For demonstrational purposes, let’s use [\u003cu\u003ethis eBay product page\u003c/u\u003e](https://www.ebay.com/itm/256082552198?hash=item3b9fb5a586:g:G20AAOSwm-9iUMjU\u0026amdata=enc%3AAQAIAAAAsBVaJyw82KdZRRfIJpMYmmLIWty94MR%2FJXCYNOmilLafKM7iGdkVbac4c1CdxnzkJ9MhvAWumbBGriDQ%2BuRO5YtuapAckUKSwGnOjG3ITS4oP%2Bak%2FRPV%2B2mEba5veCK%2FpN2YYLn3rOyUjOoroU9Z1%2FBJ2xsih1S57d5U1yh%2B2o9m2L3lZFEe7flmjSKUbaVC%2BYPaSzZTYq%2BlNzVnk7sAniEurfuTzhiLHt58xBceAxUm%7Ctkp%3ABlBMUMSCmrWIYg)\nto extract the **title**, **price**, and details from the **Item\nspecifics** section. The target page looks like this:\n\n![](images/ebay_product_page.png)\n\n![](images/ebay_product_page_item_specifics.png)\n\nThe **title** and **price** can be parsed with separate functions.\nNotice the `\"amount_from_string\"` within the `\"price\"` field, which extracts\nonly the numeric value:\n\n```python\n{\n    \"source\": \"universal\",\n    \"url\": \"https://www.ebay.com/itm/256082552198?hash=item3b9fb5a586:g:G20AAOSwm-9iUMjU\u0026amdata=enc%3AAQAIAAAAsBVaJyw82KdZRRfIJpMYmmLIWty94MR%2FJXCYNOmilLafKM7iGdkVbac4c1CdxnzkJ9MhvAWumbBGriDQ%2BuRO5YtuapAckUKSwGnOjG3ITS4oP%2Bak%2FRPV%2B2mEba5veCK%2FpN2YYLn3rOyUjOoroU9Z1%2FBJ2xsih1S57d5U1yh%2B2o9m2L3lZFEe7flmjSKUbaVC%2BYPaSzZTYq%2BlNzVnk7sAniEurfuTzhiLHt58xBceAxUm%7Ctkp%3ABlBMUMSCmrWIYg\",\n    \"geo_location\": \"United States\",\n    \"parse\": True,\n    \"parsing_instructions\": {\n        \"title\": {\n            \"_fns\": [\n                {\n                    \"_fn\": \"xpath_one\",\n                    \"_args\": [\"//h1//span[@class='ux-textspans ux-textspans--BOLD']/text()\"]\n                }\n            ]\n        },\n        \"price\": {\n            \"_fns\": [\n                {\n                    \"_fn\": \"xpath_one\",\n                    \"_args\": [\"//div[@class='x-price-primary'][@data-testid='x-price-primary']//span[@class='ux-textspans']/text()\"]\n                },\n                {\n                    \"_fn\": \"amount_from_string\"\n                }\n            ]\n        }\n    }\n}\n```\n\nNext, to parse the **Item specifics** section, define the parsing scope\nand use the `\"_items\"` function to iterate through each key and value\npair:\n\n```python\nimport requests\nimport json\nfrom pprint import pprint\n\n# Structure payload.\npayload = {\n    \"source\": \"universal\",\n    \"url\": \"https://www.ebay.com/itm/256082552198?hash=item3b9fb5a586:g:G20AAOSwm-9iUMjU\u0026amdata=enc%3AAQAIAAAAsBVaJyw82KdZRRfIJpMYmmLIWty94MR%2FJXCYNOmilLafKM7iGdkVbac4c1CdxnzkJ9MhvAWumbBGriDQ%2BuRO5YtuapAckUKSwGnOjG3ITS4oP%2Bak%2FRPV%2B2mEba5veCK%2FpN2YYLn3rOyUjOoroU9Z1%2FBJ2xsih1S57d5U1yh%2B2o9m2L3lZFEe7flmjSKUbaVC%2BYPaSzZTYq%2BlNzVnk7sAniEurfuTzhiLHt58xBceAxUm%7Ctkp%3ABlBMUMSCmrWIYg\",\n    \"geo_location\": \"United States\",\n    \"parse\": True,\n    \"parsing_instructions\": {\n        \"title\": {\n            \"_fns\": [\n                {\n                    \"_fn\": \"xpath_one\",\n                    \"_args\": [\"//h1//span[@class='ux-textspans ux-textspans--BOLD']/text()\"]\n                }\n            ]\n        },\n        \"price\": {\n            \"_fns\": [\n                {\n                    \"_fn\": \"xpath_one\",\n                    \"_args\": [\"//div[@class='x-price-primary'][@data-testid='x-price-primary']//span[@class='ux-textspans']/text()\"]\n                },\n                {\n                    \"_fn\": \"amount_from_string\"\n                }\n            ]\n        },\n        \"item_specifics\": {\n            \"_fns\": [\n                {\n                    \"_fn\": \"xpath\",\n                    \"_args\": [\"//div[@class='ux-layout-section-evo__col']\"]\n                }\n            ],\n            \"_items\": {\n                \"key\": {\n                    \"_fns\": [\n                        {\n                            \"_fn\": \"xpath_one\",\n                            \"_args\": [\".//span[@class='ux-textspans']/text()\"]\n                        }\n                    ]\n                },\n                \"value\": {\n                    \"_fns\": [\n                        {\n                            \"_fn\": \"xpath_one\",\n                            \"_args\": [\".//div[@class='ux-labels-values__values']//text()\"]\n                        }\n                    ]\n                }\n            }\n        }\n    }\n}\n\n# Get a response.\nresponse = requests.request(\n    \"POST\",\n    \"https://realtime.oxylabs.io/v1/queries\",\n    auth=(\"USERNAME\", \"PASSWORD\"),\n    json=payload\n)\n\n# Write the JSON response to a .json file.\nwith open(\"ebay_product_page.json\", \"w\") as f:\n    json.dump(response.json(), f)\n\n# Instead of a response with job status and results url, this will return the\n# JSON response with the result.\npprint(response.json())\n```\n\nWith the above code sample, you can get the product page results as\nfollows:\n\n```bash\n{\n  \"item_specifics\": [\n    {\n      \"key\": \"Condition\",\n      \"value\": \"New: A brand-new, unused, unopened, undamaged item in its original packaging (where packaging is ... \"\n    },\n    {\n      \"key\": \"Optical Drive\",\n      \"value\": \"DVD-RW\"\n    },\n    {\n      \"key\": \"Processor\",\n      \"value\": \"Intel Dual Core 1017U 1.60GHz\"\n    },\n    {\n      \"key\": \"Screen Size\",\n      \"value\": \"15.6 in\"\n    },\n    {\n      \"key\": \"Color\",\n      \"value\": \"Black\"\n    },\n    {\n      \"key\": \"RAM Size\",\n      \"value\": \"8 GB\"\n    },\n    {\n      \"key\": \"MPN\",\n      \"value\": \"PN 3521 15.6 Windows 7 Pro Laptop\"\n    },\n    {\n      \"key\": \"SSD Capacity\",\n      \"value\": \"128 GB\"\n    },\n    {\n      \"key\": \"Processor Speed\",\n      \"value\": \"1.60 GHz\"\n    },\n    {\n      \"key\": \"Brand\",\n      \"value\": \"Dell\"\n    },\n    {\n      \"key\": \"Series\",\n      \"value\": \"Inspiron\"\n    },\n    {\n      \"key\": \"Operating System Edition\",\n      \"value\": \"Windows 7 Professional\"\n    },\n    {\n      \"key\": \"Type\",\n      \"value\": \"Notebook/Laptop\"\n    },\n    {\n      \"key\": \"Release Year\",\n      \"value\": \"2022\"\n    },\n    {\n      \"key\": \"Maximum Resolution\",\n      \"value\": \"1366 x 768\"\n    },\n    {\n      \"key\": \"Connectivity\",\n      \"value\": \"HDMI\"\n    },\n    {\n      \"key\": \"Operating System\",\n      \"value\": \"Windows 7\"\n    },\n    {\n      \"key\": \"Features\",\n      \"value\": \"10/100 LAN Card, Bluetooth, Built-in Microphone, Built-in Webcam, Multi-Touch Trackpad, Optical Drive, Wi-Fi\"\n    },\n    {\n      \"key\": \"Hard Drive Capacity\",\n      \"value\": \"128 GB SSD Solid State Drive\"\n    },\n    {\n      \"key\": \"Storage Type\",\n      \"value\": \"SSD (Solid State Drive)\"\n    },\n    {\n      \"key\": \"UPC\",\n      \"value\": \"n/a\"\n    }\n  ],\n  \"parse_status_code\": 12005,\n  \"price\": 549,\n  \"title\": \"NEW DELL 15.6 INTEL 1017U 1.60GHz 8GB RAM 128GB SSD DVD-RW WINDOWS 7 PRO\"\n}\n```\n\nWriting parsing instructions with Custom Parser may seem daunting at\nfirst, but with a little practice, you’ll quickly pick it up. This guide\naims to provide you with the fundamentals of creating parsing\ninstructions, yet they highly depend on your target and the goal you’re\ntrying to achieve. Explore our in-depth [\u003cu\u003edocumentation\u003c/u\u003e](https://developers.oxylabs.io/scraper-apis/web-scraper-api/features/custom-parser)\nto find more about the functions and parameters of Custom Parser.\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Foxylabs%2Fcustom-parser-instructions","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Foxylabs%2Fcustom-parser-instructions","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Foxylabs%2Fcustom-parser-instructions/lists"}