{"id":20768038,"url":"https://github.com/spithash/xml-extractor","last_synced_at":"2025-08-31T22:35:07.889Z","repository":{"id":178243408,"uuid":"596180124","full_name":"spithash/XML-Extractor","owner":"spithash","description":"An XML extractor for products matching specific elements using regular expressions written in Python.","archived":false,"fork":false,"pushed_at":"2023-02-13T10:16:40.000Z","size":31,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-11T19:34:39.484Z","etag":null,"topics":["python","wpallimport","xml","xml-extractor","xml-parser"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/spithash.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-02-01T16:29:44.000Z","updated_at":"2023-02-07T18:38:28.000Z","dependencies_parsed_at":null,"dependency_job_id":"318164ef-08d4-42bf-b65c-d4ba309ebe6e","html_url":"https://github.com/spithash/XML-Extractor","commit_stats":null,"previous_names":["spithash/xml-extractor"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/spithash%2FXML-Extractor","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/spithash%2FXML-Extractor/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/spithash%2FXML-Extractor/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/spithash%2FXML-Extractor/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/spithash","download_url":"https://codeload.github.com/spithash/XML-Extractor/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/spithash%2FXML-Extractor/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":259280711,"owners_count":22833434,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["python","wpallimport","xml","xml-extractor","xml-parser"],"created_at":"2024-11-17T11:34:36.787Z","updated_at":"2025-06-11T14:36:30.967Z","avatar_url":"https://github.com/spithash.png","language":"Python","readme":"# XML-Extractor\nAn XML extractor for products matching specific elements using regular expressions written in Python. There's a progress bar too while fetching the XML.\n\n## WHY?\nI personally use this script to extract (match) products matching certain categories from an XML url containing thousands of products and get only the ones I want, the ones I select and output it in another file.\n\nUse it to create custom (category specific) XMLs and import the products of 'output.xml' in WpAllImport.\n\n## USAGE\nChange selector values: Select an element (change it to match yours) like __\u003clevel3_category_description\u003e__ for example and match products belonging to that category, editing the variable value of 'desired_category' in the lines below:\n```\nselectorprefix = \"\u003clevel3_category_description\u003e\"\nselectorsuffix = \"\u003c/level3_category_description\u003e\"\n```\nand\n```\ndesired_category = re.compile(\"My category name.*\")\n```\nlevel3_category_description is an element of a **```\u003cproduct\u003e```** entry. Selecting that and changing ```desired_category``` string value (which also supports regex) __selects__ the product category. \n\nSo if you want to select another category, you do so by changing ```desired_category = re.compile(\"Smartphones.*\")``` to match your selection.\n\nYou should also change the ```output_file_name``` variable to the name of your output file because the old file will get overwritten.\n\n### Example product entry\n```\n\u003centry\u003e\n  \u003ccode\u003e22301\u003c/code\u003e\n  \u003cPerItemBarCode\u003e22929\u003c/PerItemBarCode\u003e\n  \u003cMUCode\u003eΤΕΜ\u003c/MUCode\u003e\n  \u003cname\u003eProduct Name\u003c/name\u003e\n  \u003cdescription\u003eProduct Description\u003c/description\u003e\n  \u003cimage\u003ehttps://example.com/photos/e7207152345c.jpg\u003c/image\u003e\n  \u003clevel3_category_description\u003eSmartphones\u003c/level3_category_description\u003e\n  \u003cpricing_category\u003e147-2\u003c/pricing_category\u003e\n  \u003cquantity_mode_value\u003e10\u003c/quantity_mode_value\u003e\n  \u003cavailability\u003eout of stock\u003c/availability\u003e\n  \u003cprice\u003e2.52\u003c/price\u003e\n  \u003crecommended_retail_price_with_vat\u003e2.03\u003c/recommended_retail_price_with_vat\u003e\n  \u003crecommended_retail_price_no_vat\u003e1.64\u003c/recommended_retail_price_no_vat\u003e\n\u003c/entry\u003e\n```\n\n### OUTPUT\nThe script will output matching products (or rather entries) in an XML file called **output.xml**.\n\nEnjoy :)\n\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fspithash%2Fxml-extractor","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fspithash%2Fxml-extractor","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fspithash%2Fxml-extractor/lists"}