{"id":46354862,"url":"https://github.com/mohammadraziei/pygixml","last_synced_at":"2026-04-04T20:02:50.945Z","repository":{"id":318714628,"uuid":"749440431","full_name":"MohammadRaziei/pygixml","owner":"MohammadRaziei","description":"a python wrapper over pugixml","archived":false,"fork":false,"pushed_at":"2026-04-03T21:18:29.000Z","size":180,"stargazers_count":20,"open_issues_count":0,"forks_count":2,"subscribers_count":1,"default_branch":"master","last_synced_at":"2026-04-03T23:19:22.325Z","etag":null,"topics":["cython","pugixml","xml","xml-parser","xpath"],"latest_commit_sha":null,"homepage":"https://mohammadraziei.github.io/pygixml/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/MohammadRaziei.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2024-01-28T15:43:10.000Z","updated_at":"2026-04-03T21:18:37.000Z","dependencies_parsed_at":"2025-10-12T09:19:13.649Z","dependency_job_id":null,"html_url":"https://github.com/MohammadRaziei/pygixml","commit_stats":null,"previous_names":["mohammadraziei/pygixml"],"tags_count":7,"template":false,"template_full_name":null,"purl":"pkg:github/MohammadRaziei/pygixml","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MohammadRaziei%2Fpygixml","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MohammadRaziei%2Fpygixml/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MohammadRaziei%2Fpygixml/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MohammadRaziei%2Fpygixml/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/MohammadRaziei","download_url":"https://codeload.github.com/MohammadRaziei/pygixml/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MohammadRaziei%2Fpygixml/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31411687,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-04T19:29:44.979Z","status":"ssl_error","status_checked_at":"2026-04-04T19:29:11.535Z","response_time":60,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cython","pugixml","xml","xml-parser","xpath"],"created_at":"2026-03-05T00:04:22.073Z","updated_at":"2026-04-04T20:02:50.939Z","avatar_url":"https://github.com/MohammadRaziei.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# pygixml\n\n\u003cimg src=\"https://github.com/MohammadRaziei/pygixml/raw/master/docs/images/pygixml.svg\" width=\"450\" /\u003e\n\n\n[![Python Version](https://img.shields.io/badge/python-3.8%2B-blue)](https://www.python.org/)\n[![PyPI version](https://img.shields.io/pypi/v/pygixml.svg)](https://pypi.org/project/pygixml/)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n[![Build Status](https://github.com/MohammadRaziei/pygixml/actions/workflows/wheels.yml/badge.svg)](https://github.com/MohammadRaziei/pygixml/actions)\n[![Documentation Status](https://github.com/MohammadRaziei/pygixml/actions/workflows/docs.yml/badge.svg)](https://mohammadraziei.github.io/pygixml/)\n[![GitHub Stars](https://img.shields.io/github/stars/MohammadRaziei/pygixml?style=social)](https://github.com/MohammadRaziei/pygixml)\n\nA high-performance XML parser for Python based on Cython and [pugixml](https://pugixml.org/), providing fast XML parsing, manipulation, XPath queries, text extraction, and advanced XML processing capabilities.\n\n📚 **[View Full Documentation](https://mohammadraziei.github.io/pygixml/)**\n\n## 🚀 Performance\n\npygixml delivers exceptional performance compared to other XML libraries:\n\n### Performance Comparison (5000 XML elements)\n\n| Library         | Parsing Time | Speedup vs ElementTree |\n|-----------------|--------------|------------------------|\n| **pygixml**     | 0.00077s     | **15.9x faster**       |\n| **lxml**        | 0.00407s     | 3.0x faster            |\n| **ElementTree** | 0.01220s     | 1.0x (baseline)        |\n\n![Performance Comparison](https://github.com/MohammadRaziei/pygixml/raw/master/benchmarks/results/performance_comparison.svg)\n\n### Key Performance Highlights\n\n- **15.9x faster** than Python's ElementTree for XML parsing\n- **5.3x faster** than lxml for XML parsing  \n- **Memory efficient** - uses pugixml's optimized C++ memory management\n- **Scalable performance** - maintains speed advantage across different XML sizes\n\n## Installation\n\n### From PyPI\n```bash\npip install pygixml\n```\n\n### From GitHub\n```bash\npip install git+https://github.com/MohammadRaziei/pygixml.git\n```\n\n\n### Supported XPath Features\n\n- **Node selection**: `//book`, `/library/book`, `book[1]`\n- **Attribute selection**: `book[@id]`, `book[@category='fiction']`\n- **Boolean operations**: `and`, `or`, `not()`\n- **Comparison operators**: `=`, `!=`, `\u003c`, `\u003e`, `\u003c=`, `\u003e=`\n- **Mathematical operations**: `+`, `-`, `*`, `div`, `mod`\n- **Functions**: `position()`, `last()`, `count()`, `sum()`, `string()`, `number()`\n- **Axes**: `child::`, `attribute::`, `descendant::`, `ancestor::`\n- **Wildcards**: `*`, `@*`, `node()`\n\n## API Overview\n\n### Core Classes\n\n- **XMLDocument**: Create, parse, save XML documents\n- **XMLNode**: Navigate and manipulate XML nodes  \n- **XMLAttribute**: Handle XML attributes\n- **XPathQuery**: Compile and execute XPath queries\n- **XPathNode**: Result of XPath queries (wraps nodes and attributes)\n- **XPathNodeSet**: Collection of XPath results\n\n### Key Methods\n\n#### XMLDocument Methods\n- `parse_string(xml_string)` - Parse XML from string\n- `parse_file(file_path)` - Parse XML from file\n- `save_file(file_path)` - Save XML to file\n- `append_child(name)` - Add child node\n- `first_child()` - Get first child node\n- `child(name)` - Get child by name\n- `reset()` - Clear document\n\n#### XMLNode Methods\n- `name` - Get/set node name\n- `value` - Get/set node value (for text nodes only)\n- `child_value(name)` - Get text content of child node\n- `append_child(name)` - Add child node\n- `first_child()` - Get first child\n- `child(name)` - Get child by name\n- `next_sibling` - Get next sibling\n- `previous_sibling` - Get previous sibling\n- `parent` - Get parent node\n- `text(recursive, join)` - Get text content\n- `to_string(indent)` - Serialize to XML string\n- `xml` - XML representation property\n- `xpath` - Absolute XPath of node\n- `is_null()` - Check if node is null\n- `mem_id` - Memory identifier for debugging\n\n#### XPath Methods\n- `select_nodes(query)` - Select multiple nodes using XPath\n- `select_node(query)` - Select single node using XPath\n- `XPathQuery(query)` - Create reusable XPath query object\n- `evaluate_node_set(context)` - Evaluate query and return node set\n- `evaluate_node(context)` - Evaluate query and return first node\n- `evaluate_boolean(context)` - Evaluate query and return boolean\n- `evaluate_number(context)` - Evaluate query and return number\n- `evaluate_string(context)` - Evaluate query and return string\n\n\n## Quick Start\n\n```python\nimport pygixml\n\n# Parse XML from string\nxml_string = \"\"\"\n\u003clibrary\u003e\n    \u003cbook id=\"1\"\u003e\n        \u003ctitle\u003eThe Great Gatsby\u003c/title\u003e\n        \u003cauthor\u003eF. Scott Fitzgerald\u003c/author\u003e\n        \u003cyear\u003e1925\u003c/year\u003e\n    \u003c/book\u003e\n\u003c/library\u003e\n\"\"\"\n\ndoc = pygixml.parse_string(xml_string)\nroot = doc.first_child()\n\n# Access elements\nbook = root.first_child()\ntitle = book.child(\"title\")\nprint(f\"Title: {title.child_value()}\")  # Output: Title: The Great Gatsby\n\n# Create new XML\ndoc = pygixml.XMLDocument()\nroot = doc.append_child(\"catalog\")\nproduct = root.append_child(\"product\")\nproduct.name = \"product\"\n\n# To add text content to an element, append a text node\ntext_node = product.append_child(\"\")  # Empty name creates text node\ntext_node.value = \"content\"\n```\n\n## Advanced Features\n\n### Text Content Extraction\n\n```python\nimport pygixml\n\nxml_string = \"\"\"\n\u003croot\u003e\n    \u003csimple\u003eHello World\u003c/simple\u003e\n    \u003cnested\u003e\n        \u003cchild\u003eChild Text\u003c/child\u003e\n        More text\n    \u003c/nested\u003e\n    \u003cmixed\u003eText \u003cb\u003ewith\u003c/b\u003e mixed \u003ci\u003econtent\u003c/i\u003e\u003c/mixed\u003e\n\u003c/root\u003e\n\"\"\"\n\ndoc = pygixml.parse_string(xml_string)\nroot = doc.first_child()\n\n# Get direct text content\nsimple = root.child(\"simple\")\nprint(simple.child_value())  # \"Hello World\"\n\n# Get recursive text content\nnested = root.child(\"nested\")\nprint(nested.text(recursive=True))  # \"Child Text\\nMore text\"\n\n# Get direct text only (non-recursive)\nmixed = root.child(\"mixed\") \nprint(mixed.text(recursive=False))  # \"Text \"\n\n# Custom join character\nprint(nested.text(recursive=True, join=\" | \"))  # \"Child Text | More text\"\n```\n\n### XML Serialization\n\n```python\nimport pygixml\n\ndoc = pygixml.XMLDocument()\nroot = doc.append_child(\"root\")\nchild = root.append_child(\"item\")\nchild.name = \"product\"\n\n# Serialize to string\nprint(root.to_string())  # \u003croot\u003e\\n  \u003cproduct/\u003e\\n\u003c/root\u003e\nprint(root.to_string(\"    \"))  # Custom indentation\n\n# Convenience property\nprint(root.xml)  # Same as to_string() with default indent\n```\n\n### Node Iteration\n\n```python\nimport pygixml\n\nxml_string = \"\"\"\n\u003croot\u003e\n    \u003citem\u003eFirst\u003c/item\u003e\n    \u003citem\u003eSecond\u003c/item\u003e\n    \u003citem\u003eThird\u003c/item\u003e\n\u003c/root\u003e\n\"\"\"\n\ndoc = pygixml.parse_string(xml_string)\n\n# Iterate over document (depth-first)\nfor node in doc:\n    print(f\"Node: {node.name}, XPath: {node.xpath}\")\n\n# Iterate over children\nroot = doc.first_child()\nfor child in root:\n    print(f\"Child: {child.name}, Value: {child.child_value()}\")\n```\n\n### Node Comparison and Identity\n\n```python\nimport pygixml\n\ndoc = pygixml.parse_string(\"\u003croot\u003e\u003ca/\u003e\u003cb/\u003e\u003c/root\u003e\")\nroot = doc.first_child()\na = root.child(\"a\")\nb = root.child(\"b\")\na2 = root.child(\"a\")\n\nprint(a == a2)  # True - same node\nprint(a == b)   # False - different nodes\nprint(a.mem_id) # Memory address for debugging\n```\n\n## XPath Support\n\npygixml provides full XPath 1.0 support through pugixml's powerful XPath engine:\n\n```python\nimport pygixml\n\nxml_string = \"\"\"\n\u003clibrary\u003e\n    \u003cbook id=\"1\" category=\"fiction\"\u003e\n        \u003ctitle\u003eThe Great Gatsby\u003c/title\u003e\n        \u003cauthor\u003eF. Scott Fitzgerald\u003c/author\u003e\n        \u003cyear\u003e1925\u003c/year\u003e\n        \u003cprice\u003e12.99\u003c/price\u003e\n    \u003c/book\u003e\n    \u003cbook id=\"2\" category=\"fiction\"\u003e\n        \u003ctitle\u003e1984\u003c/title\u003e\n        \u003cauthor\u003eGeorge Orwell\u003c/author\u003e\n        \u003cyear\u003e1949\u003c/year\u003e\n        \u003cprice\u003e10.99\u003c/price\u003e\n    \u003c/book\u003e\n\u003c/library\u003e\n\"\"\"\n\ndoc = pygixml.parse_string(xml_string)\nroot = doc.first_child()\n\n# Select all books\nbooks = root.select_nodes(\"book\")\nprint(f\"Found {len(books)} books\")\n\n# Select fiction books\nfiction_books = root.select_nodes(\"book[@category='fiction']\")\nprint(f\"Found {len(fiction_books)} fiction books\")\n\n# Select specific book by ID\nbook_2 = root.select_node(\"book[@id='2']\")\nif book_2:\n    title = book_2.node.child(\"title\").child_value()\n    print(f\"Book ID 2: {title}\")\n\n# Use XPathQuery for repeated queries\nquery = pygixml.XPathQuery(\"book[year \u003e 1930]\")\nrecent_books = query.evaluate_node_set(root)\nprint(f\"Found {len(recent_books)} books published after 1930\")\n\n# XPath boolean evaluation\nhas_orwell = pygixml.XPathQuery(\"book[author='George Orwell']\").evaluate_boolean(root)\nprint(f\"Has George Orwell books: {has_orwell}\")\n\n# XPath number evaluation\navg_price = pygixml.XPathQuery(\"sum(book/price) div count(book)\").evaluate_number(root)\nprint(f\"Average price: ${avg_price:.2f}\")\n```\n\n\n## Important Note: Element Nodes vs Text Nodes\n\nIn pugixml (and therefore pygixml), **element nodes do not have values directly**. Instead, they contain child text nodes that hold the text content.\n\n```python\n# ❌ This will NOT work (element nodes don't have values):\nelement_node.value = \"some text\"\n\n# ✅ Correct approach - use child_value() to get text content:\ntext_content = element_node.child_value()\n\n# ✅ To set text content, you need to append a text node:\ntext_node = element_node.append_child(\"\")  # Empty name creates text node\ntext_node.value = \"some text\"\n```\n\n## Benchmarks\n\nRun performance comparisons:\n\n```bash\n# Run complete benchmark suite\npython benchmarks/clean_visualization.py\n\n# View results\ncat benchmarks/results/benchmark_results.csv\n```\n\nThe benchmark suite compares pygixml against:\n- **lxml** - Industry-standard C-based parser\n- **xml.etree.ElementTree** - Python standard library\n\n**Benchmark Files:**\n- `benchmarks/clean_visualization.py` - Main benchmark runner\n- `benchmarks/benchmark_parsing.py` - Core benchmark logic\n- `benchmarks/results/` - Generated CSV data and SVG charts\n\n## Documentation\n\n📖 **Full documentation** is available at: [https://mohammadraziei.github.io/pygixml/](https://mohammadraziei.github.io/pygixml/)\n\nThe documentation includes:\n- Complete API reference with examples\n- Installation guides for all platforms\n- Performance benchmarks and optimization tips\n- XPath 1.0 usage guide with comprehensive examples\n- Real-world usage scenarios\n\n## License\n\nMIT License - see [LICENSE](LICENSE) file for details.\n\n**To use this library, you must star the project on GitHub!**\n\nThis helps support the development and shows appreciation for the work. Please star the repository before using the library:\n\n👉 **[Star pygixml on GitHub](https://github.com/MohammadRaziei/pygixml)**\n\n## Acknowledgments\n\n- [pugixml](https://pugixml.org/) - Fast and lightweight C++ XML processing library\n- [Cython](https://cython.org/) - C extensions for Python\n- [scikit-build](https://scikit-build.readthedocs.io/) - Modern Python build system\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmohammadraziei%2Fpygixml","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmohammadraziei%2Fpygixml","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmohammadraziei%2Fpygixml/lists"}