{"id":25242667,"url":"https://github.com/samesimilar/mirrorxml","last_synced_at":"2025-10-25T14:42:23.508Z","repository":{"id":34067221,"uuid":"165474305","full_name":"samesimilar/MirrorXML","owner":"samesimilar","description":"A block-based, event-driven, API for parsing xml (and basic html).","archived":false,"fork":false,"pushed_at":"2025-01-08T19:45:32.000Z","size":4898,"stargazers_count":7,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-02-10T17:42:16.376Z","etag":null,"topics":["cocoapods","html","ios","macos","nsattributedstring","objective-c","osx","swift","xml"],"latest_commit_sha":null,"homepage":null,"language":"Objective-C","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/samesimilar.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2019-01-13T06:38:47.000Z","updated_at":"2025-01-08T19:44:17.000Z","dependencies_parsed_at":"2022-08-08T00:00:27.757Z","dependency_job_id":null,"html_url":"https://github.com/samesimilar/MirrorXML","commit_stats":null,"previous_names":[],"tags_count":3,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/samesimilar%2FMirrorXML","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/samesimilar%2FMirrorXML/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/samesimilar%2FMirrorXML/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/samesimilar%2FMirrorXML/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/samesimilar","download_url":"https://codeload.github.com/samesimilar/MirrorXML/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":238389198,"owners_count":19463743,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cocoapods","html","ios","macos","nsattributedstring","objective-c","osx","swift","xml"],"created_at":"2025-02-11T23:58:27.809Z","updated_at":"2025-10-25T14:42:18.483Z","avatar_url":"https://github.com/samesimilar.png","language":"Objective-C","readme":"# MirrorXML\n\nMirrorXML is a wrapper for libxml2's SAX (push) xml and html parsers. It's also a wrapper for libxml2's streamable XPath pattern matching functionality.\n\nBut those two things don't quite describe how these features work together in MirrorXML to make event-driven xml parsing easier. \n\nLet's put it another way: MirrorXML is a block-based, event-driven, API for parsing xml (and basic html).\n\nMirrorXML doesn't attempt to magically turn XML into Swift model objects, rather, it puts you in control while helping you create more easily maintainable, explicit, and well-strucutred code. \n\nAnd it also comes with a neat little customizeable *html to NSAttributedString* API.\n\n## Example\n\nTo run the example iOS project, clone the repo, and run `pod install` from the Example directory first.\n\n## Requirements\n\nMirrorXML is written in Objective-C. It can be used in Swift and Objective-C projects.\n\nMirrorXML is compatible with iOS 9.0+ and macOS 10.11+ targets.\n\n## Installation\n\nMirrorXML is available through [CocoaPods](http://cocoapods.org). To install\nit, simply add the following line to your Podfile:\n\n```ruby\npod 'MirrorXML'\n```\n\n## I'm excited to start parsing XML data! How do I use this thing?\n\nHere's a basic example. Let's say we want to get the titles of all the items in an RSS document:\n\n```Swift\nvar titles = [String]() // array to store the titles\n\n// Create an MXMatch object with the XML path we want.\nlet titleMatch = try! MXMatch(path: \"/rss/channel/item/title\")\n// Create a block that will be called at the end of every title element.\ntitleMatch.exitHandler = {(elm) in\n    if let text = elm.text {\n        titles.append(text)\n    }    \n}\n\n// Give the MXMatch object to a parser instance and parse the data.\nlet xmlParser = MXParser(matches: [titleMatch])\nxmlParser.parseDataChunk(xmlData)\nxmlParser.dataFinished()\n```\nAfter parsing is finished, the `titles` array will contain all the titles found by the `MXMatch` object.\n\n`MXMatch` objects have different callback properties that can be assigned blocks. These blocks are called at appropriate points while the parser is reading the xml data.\n\nWhen a block is called at the beginning of an xml element, it can return temporary `MXMatch` objects that are used to parse the data within the current element. Since blocks are closures, they will retain references to any new objects you create in the same context. \n\nFor example, if we want to decode objects with several different properties that represent RSS items, we could do the following: \n\n```Swift\nvar items = [RSSItem]() // An array to store our RSS items\n\n// Create an MXMatch object with the XML path we want.\nlet itemMatch = try! MXMatch(path: \"/rss/channel/item\")\n\n// Create a block that will be called at the beginning of every item element.\nitemMatch.entryHandler = {(elm) in\n    //create a new instance of the RSSItem class and add it to the array\n    var thisRSSItem = RSSItem()\n    items.append(thisRSSItem)\n    \n    // Create a temporary MXMatch object with a path that matches title elements.\n    // Note that the path is relative to the parent 'item' element.\n    let titleMatch = try! MXMatch(path: \"/title\")\n    titleMatch.exitHandler = { (elm) in        \n        thisRSSItem.title = elm.text\n    }\n    // Similar idea for link elements.\n    let linkMatch = try! MXMatch(path: \"/link\")\n    titleMatch.exitHandler = { (elm) in\n        thisRSSItem.link = elm.text\n    }\n    // Return the temporary MXMatch objects. They will only apply to the current 'item' element.\n    return [titleMatch, linkMatch]\n}\n\n// Give the MXMatch object to a parser instance and parse the data.\nlet xmlParser = MXParser(matches: [itemMatch])\nxmlParser.parseDataChunk(xmlData)\nxmlParser.dataFinished()\n```\nThe common way to write an event-driven (push) xml parser (without using *MirrorXML*) would be to write a couple functions. One function would be called every time a new element begins, and one function would be called everytime the element ends. (This is how `NSXMLParser and NSXMLParserDelegate` work.)\n\nSince these functions are called for every element, at every level of the xml document structure, they won't naturally be aware of the context they are called in. Thus you have to keep track of lots of states, like `isInsideChannel` or `isInsideItem` or `currentRSSItem`. It can get messy since code that manages different types of data items is mixed together.\n\n*MirrorXML* simplifies everything becuase your code structure mirrors the structure of the XML document, and your callbacks are only activated for elements that they are interested in. As you can see above, all the code you need to build one of these theoretical RSSItem objects is kept together exclusively in one place. And since we are using blocks, they implicitly keep references to their context, so we don't need some global variable like `currentRSSitem.`\n\n### Preserving State Between Entry and Exit Events\n\nYou may wish to pass state between the entry and exit handlers for an element. There are two ways to do this.\n\nThe first way is to use the `userInfo` property of the element. You can assign an arbitrary object to this property in the entry handler and it will be available in the element passed to the exit handler.\n\nThe second way is to define a special exit block in the entry handler itself. This is a closure that implicitly saves the context of the entry handler.  You can create this special block object using `MXMatch.onRootExit(:)`. It will be called when the exit for the current tag is parsed.\n\nFor example (building upon the previous example), let's say you decide that you only want to add your new temporary RSS item to the items array if it has a valid title and link. You can write the code as follows:\n\n```Swift\nlet itemMatch = try! MXMatch(path: \"/rss/channel/item\")\n\n// Create a block that will be called at the beginning of every item element.\nitemMatch.entryHandler = {(elm) in\n    // create a new instance of the RSSItem class\n    // but don't add it to the storage array until later\n    let thisRSSItem = RSSItem()\n\n    let titleMatch = try! MXMatch(path: \"/title\")\n    titleMatch.exitHandler = { (elm) in\n        thisRSSItem.title = elm.text\n    }\n    let linkMatch = try! MXMatch(path: \"/link\")\n    titleMatch.exitHandler = { (elm) in\n        thisRSSItem.link = elm.text\n    }\n\n    // Only add the item if it is valid.\n    // This block will run after titleMatch and linkMatch.\n    // Note that in this circumstance we have a reference to the 'thisRSSItem' object we are building.\n    let itemExit = MXMatch.onRootExit({ (elm) in\n        guard thisRSSItem.title != nil \u0026\u0026 thisRSSItem.link != nil else {\n            return\n        }\n        items.append(thisRSSItem)\n    })\n    // Return the temporary MXMatch objects. They will only apply to the current 'item' element.\n    return [titleMatch, linkMatch, itemExit]\n}\n```\n\n\n### XPath-style Patterns\n\nThe previous two examples use simple paths to match elements at particular places in the xml document. There are a few more advanced tricks you can do: \n\n`/root/item             --\u003e Match 'item' elements that are children of 'root'`\n\n`/root/item/title       --\u003e Match 'title' elements that are children of 'item' that are children of 'root'`\n\n`/root/item|/root/otheritem --\u003e 'OR operator: Match either item or otheritem elements that are children of 'root' `\n\n`/root/item/@attrName   --\u003e Match 'item' elements (that are children of 'root') that have an attribute named 'attrName'.`\n\n`/root/*                 --\u003e Match every element that is a child of root.`\n\n`/root//item            --\u003e Match 'item' elements that are at every level below root.`\n`/root//*                --\u003e Match every element at every level below root.`\n`//*                     --\u003e Match every element in the document. `\n\n`/root/ns:item          --\u003e The item element is specified with a namespace prefix. The 'ns' prefix is mapped to a full namespace URI in the namespaces dictionary parameter passed to the MXMatch object.`\n\nIf you are familar with XPath syntax and symantics then this will look familar, but be aware that more advanced aspects of XPath syntax (like fancy predicates) are not supported. This is because these paths must be 'streamable', i.e. we are evaluating these paths 'as we go'. \n\nHere's an html example. Let's say we wanted to get all the links in some html data:\n\n```Swift\nvar links = [String]()\n\n// Match every 'a' element.\nlet linkElement = try! MXMatch(path:\"//a\")\nlinkElement.entryHandler = {\n    // Please note that attributes are only available in 'entryHandler' blocks.\n    if let url = $0.attributes[\"href\"] {\n        links.append(url)\n    }\n    return nil\n}\n\nlet htmlParser = MXHTMLParser(matches: [linkElement])\nhtmlParser.parseDataChunk(htmlData)\nhtmlParser.dataFinished()\n```\n### Namespaces\n\nThere is a more advanced initializer for MXMatch that can handle namespaces:\n\n```Swift\nlet nameSpacedMatch = try! MXMatch(path: \"/rss/channel/item/georss:point\", namespaces: [\"georss\":\"http://www.georss.org/georss\"])\n```\nThe `namespaces` parameter takes a dictionary of prefix/URI pairs.\n\nNamespaced attributes can be retrieved via MXElement's `namespacedAttributes` property inside an entryHandler block.\n\n\n### Error Handling\n\nParsing errors that are reported by libxml are passed through the MirrorXML API using callback blocks in a similar way to element 'begin' and 'end' events. You assign a block to the `errorHandler` property of `MXMatch` that gets called any time there is an error encountered in any element that matches the `MXMatch` object's pattern. \n\nFor example, to create an error handler that is called if an error is encountered on any element:\n\n```Swift\nlet errorMatch = try! MXMatch(path:\"//*\")\nerrorMatch.errorHandler = { (error, elm) in\n    print(\"An error was encountered: \\(error.localizedDescription), \\(elm.elementName ?? \"Unknown\")\")\n}\n\n// assuming itemMatch and otherItemMatch were previously declared\nlet xmlParser = MXParser(matches:[errorMatch, itemMatch, otherItemMatch])\n// then parseDataChunk etc.\n```\n\n### Parsing Large Documents\n\nYou can call `parseDataChunk:` multiple times to parse a large document incrementally. For example, you can start parsing a large document while it is still downloading. \n\nI've made some efforts to keep memory usage constant within MirrorXML during xml parsing, but you can wrap `parseDataChunk:` in an autoreleasepool block if you see lots of temporary objects building up during multiple calls.\n\n### Converting HTML to NSAttributedString\n\nMirrorXML also includes a class called `MXHTMLToAttributedString`. You can give it snippets of html or complete html documents to convert. It's built on top of `MXHTMLParser`.\n\nThe advantage of this over NSAttributedString's html-\u003estring conversion method is that:\n\n1. You can customize the styling during parsing using a delegate.\n\n2. You can use this on any thread.\n\n3. It seems to be faster (don't call me or anything if it's not).\n\nIt only handles basic 'Markdown-style' html tags, links and images. It doesn't handle scripts or stylesheets or anything fancy like that.\n\nAssign an object to the `MXHTMLToAttributedStringDelegate` delegate property to customize the font and paragraph attributes of the resulting text.\n\nAn instance of `MXHTMLToAttributedStringDelegateDefault`, which has many customizeable properties, is assigned to the delegate property by default.\n\nlibxml's html parser is not strict, so any errors that are encountered are not necessarily fatal. After you convert a string you can check the converter's `errors` property for any errors that were reported during parsing.\n\nIt doesn't necessarily require the input to be a full 'html' structured document with stuff like 'head' and 'body' - so you can parse a simple string with a few tags into an attributed string, e.g. `\u003ca\u003eClick href=\"mailto:support@example.com\"here\u003c/a\u003e to \u003cb\u003econtact support.\u003c/b\u003e` (Note: if you want links to be active inside something like a UILabel, make sure to enable user interaction with the UILabel.)\n\nIf image tags are encountered: a placeholder is inserted, and you can replace that with the required image later using `+insertImage:withInfo:toString`.\n\nExample: \n\n```Swift\nlet htmlString = \"\u003ca\u003eClick href=\\\"mailto:support@example.com\\\"here\u003c/a\u003e to \u003cb\u003econtact support.\u003c/b\u003e\"\nlet string = MXHTMLToAttributedString().convertHTMLString(htmlString)\n```\n\nThis is a bit experimental and is not guaranteed to produce text like a real browser would. It's best to use on data sets that you have some control over rather than arbitrary data from the web. If you need something more robust it's better to use a web view.\n\n### Thread Safety\n\nYou can use MXParser, MXHTMLParser, MXMatch, MXPattern, and MXHTMLToAttributedString on any thread, but don't access the same instance from more than one thread. \n\nA common scenario would be to use MXHTMLToAttributedString on a background thread and then pass the resulting AttributedString back to the main thread to show in a text view or label. Another common scenario would be to parse xml data into your model object on a background thread.\n\n### Style\n\nMirrorXML works with highly hierarchical callback code built using lots of blocks and layers, but it also works with very flat code with only a few blocks - something more like a standard NSXMLParserDelegate-style deal. It's up to you!\n\n### Further Reading\n\nCheck out the included example project to see some more advanced ways to use MirrorXML. I'd also recommend looking at the included unit tests, and maybe the implementation of the MXHTMLToAttributedString class.\n\n## Author\n\nMike Spears,  samesimilar@gmail.com\n\n## License\n\nMirrorXML is available under the MIT license. See the LICENSE file for more info.\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsamesimilar%2Fmirrorxml","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsamesimilar%2Fmirrorxml","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsamesimilar%2Fmirrorxml/lists"}