{"id":16703238,"url":"https://github.com/matehat/mhtextsearch","last_synced_at":"2025-10-12T09:39:07.561Z","repository":{"id":13240948,"uuid":"15925709","full_name":"matehat/MHTextSearch","owner":"matehat","description":"A fast full-text search library for Objective-C","archived":false,"fork":false,"pushed_at":"2017-04-20T10:07:48.000Z","size":463,"stargazers_count":77,"open_issues_count":2,"forks_count":6,"subscribers_count":4,"default_branch":"master","last_synced_at":"2024-04-29T21:06:14.984Z","etag":null,"topics":["index","leveldb","objective-c-library","search-engine"],"latest_commit_sha":null,"homepage":"","language":"Objective-C","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/matehat.png","metadata":{"files":{"readme":"Readme.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2014-01-15T05:16:36.000Z","updated_at":"2023-12-18T18:30:19.000Z","dependencies_parsed_at":"2022-09-16T23:40:29.655Z","dependency_job_id":null,"html_url":"https://github.com/matehat/MHTextSearch","commit_stats":null,"previous_names":[],"tags_count":9,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/matehat%2FMHTextSearch","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/matehat%2FMHTextSearch/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/matehat%2FMHTextSearch/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/matehat%2FMHTextSearch/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/matehat","download_url":"https://codeload.github.com/matehat/MHTextSearch/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243835942,"owners_count":20355611,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["index","leveldb","objective-c-library","search-engine"],"created_at":"2024-10-12T19:07:32.225Z","updated_at":"2025-10-12T09:39:02.542Z","avatar_url":"https://github.com/matehat.png","language":"Objective-C","funding_links":[],"categories":[],"sub_categories":[],"readme":"## MHTextSearch\n\nA fast \u0026 minimal embedded full-text indexing library, written in Objective-C, built on top of [Objective-LevelDB][2].\n\n### Installation\n\nBy far, the easiest way to integrate this library in your project is by using [CocoaPods][1].\n\n1. Have [CocoaPods][1] installed, if you don't already\n\n2. In your Podfile, add the line\n\n        pod 'MHTextSearch'\n\n3. Run `pod install`\n\n4. Add the `libc++.dylib` Framework to your project.\n\n### Simple API\n\n##### Create a embedded textual index \n\n```objective-c\nMHTextIndex *index = [MHTextIndex textIndexInLibraryWithName:@\"my.awesome.index\"];\n```\n\n##### Index any objects\n\nYou can tell a `MHTextIndex` instance to index your objects (any object)\n\n```objective-c\n[index indexObject:anyObjectYouWant];\n[index updateIndexForObject:anotherPreviousIndexedObject];\n[index removeIndexForObject:anotherPreviousIndexedObject];\n```\n\nBut for this to work, you need to tell us what *identifier* as `NSData *` can be used to \nuniquely refer to this object.\n\n```objective-c \n[index setIdentifier:^NSData *(MyCustomObject *object){\n    return object.indexID; // a NSData instance\n}];\n```\n\nYou also need to give us details about the objects, like what are the pieces of text to\nindex\n\n```objective-c \n[index setIndexer:^MHIndexedObject *(MyCustomObject *object, NSData *identifier){\n    MHIndexedObject *indx = [MHIndexedObject new];\n    indx.strings = @[ object.title, object.description ]; // Indexed strings\n    indx.weight = object.awesomenessLevel;                // Weight given to this object, when sorting results\n    indx.context = @{@\"title\": object.title};             // A NSDictionary that will be given alongside search results\n    return indx;\n}];\n```\n\nFinally, if you want to be able to get easy reference to your original object when you get\nsearch results, you can tell us how to do that for you\n\n```objective-c \n[index setObjectGetter:^MyCustomObject *(NSData *identifier){\n    return [MyCustomObject customObjectFromIdentifier:identifier];\n}];\n```\n\nand **that's it!** That's all you need to get a full-text index going. MHTextSearch takes care of\nsplitting text into words, factoring out diacritics and capitalization, all with respect to locale, as you'd expect \n(well, Foundation does most of the job here). \n\nYou can then start searching:\n\n```objective-c \n[index enumerateObjectsForKeyword:@\"duck\" options:0 withBlock:^(MHSearchResultItem *item, \n                                                                NSUInteger rank, \n                                                                NSUInteger count, \n                                                                BOOL *stop){\n                                                                    \n    item.weight;      // As provided by you earlier\n    item.rank;        // The effective rank in the search result\n    item.object;      // The first time it is used, it will use the block\n                            // you provided earlier to get the object\n    item.context;     // The dictionary you provided in the \"indexer\" block\n    item.identifier;  // The object identifier you provided in the \"identifier\" block\n\n    NSIndexPath *token = item.resultTokens[0]; \n    /* This is an NSArray of NSIndexPath instances, each containing 3 indices:\n     *   - mh_string : the string in which the token occured \n     *                 (here, 0 for the object's title)\n     *   - mh_word : the position in the string where the word containing\n     *               the token occured\n     *   - mh_token : the position in the word where the token occured\n     */\n                        \n    NSRange tokenRange = [item rangeOfToken:token];\n    /* This gives the exact range of the matched token in the string where it was found.\n     *\n     * So, according to the example setup I've been giving from the start,  \n     * if token.mh_string == 0, that means the token was found in the object's \"title\",\n     * and [item.object.title substringWithRange:tokenRange] would yield \"duck\" (minus \n     * capitalization and diacritics).\n     */\n}];\n```\n\nYou can also fetch the whole array of `MHSearchResultItem` instances at once using\n\n```objective-c\nNSArray *resultSet = [index searchResultForKeyword:@\"duck\"\n                                           options:NSEnumerationReverse];\n```\n\n##### Subclassing\n\nIf giving blocks for specifying behavior is not your thing, you can also override the following methods:\n\n* `-[MHTextIndex getIdentifierForObject:]` which, by default uses the `identifier` block\n* `-[MHTextIndex getIndexInfoForObject:andIdentifier:]` which, by default uses the `indexer` block\n* `-[MHTextIndex compareResultItem:withItem:reversed:]` which is used to order the search result set\n\n### Using with Core Data\n\nYou can use `NSManagedObject` lifecycle methods to trigger changes to the text index. The following example\nwas taken from\nhttp://www.adevelopingstory.com/blog/2013/04/adding-full-text-search-to-core-data.html and adapted to use with\nthis project:\n\n```objective-c\n\n- (void)prepareForDeletion\n{\n    [super prepareForDeletion];\n\n    if (self.indexID.length) {\n        [textindex deleteIndexForObject:self.indexID];\n    }\n}\n\n+ (NSData *)createIndexID {\n    NSUUID *uuid = [NSUUID UUID];\n    uuid_t uuidBytes;\n    [uuid getUUIDBytes:uuidBytes];\n    return [NSData dataWithBytes:uuidBytes length:16];\n}\n\n- (void)willSave\n{\n    [super willSave];\n\n    if (self.indexID.length) {\n        [textindex updateIndexForObject:self.indexID];\n    } else {\n        self.indexID = [[self class] createIndexID];\n        [textindex indexObject:self.indexID];\n    }\n}\n```\n\n### Operations \u0026 Queues\n\n`MHTextIndex` uses a `NSOperationQueue` under the hood to coordinate indexing operations. It is \nexposed as a property named `indexingQueue`. You can thus set its `maxConcurrentOperationCount`\nproperty to control how concurrent the indexing can be. Since the underlying database library\nperforming I/O is [thread-safe][3], concurrency is not a problem. This also means you can explicitly\nwait for indexing operations to finish using:\n  \n```objective-c\n[index.indexingQueue waitUntilAllOperationsAreFinished];\n```\n\nThe three indexing methods `-[MHTextIndex indexObject:]`, `-[MHTextIndex updateIndexForObject:]`, \n`-[MHTextIndex removeIndexForObject:]` all return `NSOperation` instances, that you can take advantage of,\nif you need, using its `completionBlock` property or `-[NSOperation waitUntilFinished]` method.\n \nSearching is also concurrent, but it uses a `dispatch_queue_t` (not yet exposed or tunable).\n\n### Performance \u0026 Fine tuning\n\nThere are a few knobs you can play with to make MHTextSeach better fit your needs. \n\n- A `MHTextIndex` instance has a `skipStopWords` boolean property that is true by default and\n  that avoids indexing very common english words. (**TODO**: make that work with other languages)\n\n- It also has a `minimalTokenLength` that is equal to `2` by default. This sets a minimum for\n  the number of letters that a token needs to be for it to be indexed. This also greatly minimizes\n  the size of the index, as well as the indexing and searching time. It skips indexing single-letter\n  words and the last letter of every word, when set to `2`.\n  \n- When indexing long form texts (documents rather than, say, simple names), you can turn on the\n  `discardDuplicateTokens` boolean property on `MHTextIndex`. This makes the index only consider\n  the first occurence of every indexed token for a given piece of texts. If you are okay with \n  only knowing **if** a token appears in a text, rather than **where** does every occurence appear,\n  you can gain a speed bump in *indexing* time, by a factor of 3 to 5.\n\nThe following graphs show the indexing and searching time (in seconds), as a function of the size\nof text indexed, ranging from 500 KB to about 10 MB. The benchmarks were run on an iPhone 5.\n\n![](https://raw.github.com/matehat/MHTextSearch/master/MHTextSearch%20iOS%20Tests/benchmark.png)\n\n### Testing\n\nIf you want to run the tests, you will need Xcode 5, as the test suite uses the new XCTest. \n\nClone this repository and, once in it,\n\n```bash\n$ cd MHTextSearch\\ iOS\\ Tests\n$ pod install\n$ cd .. \u0026\u0026 open *.xcworkspace\n```\n\nCurrently, all tests were setup to work with the iOS test suite.\n\n### License\n\nDistributed under the [MIT license](LICENSE)\n\n[1]: http://cocoapods.org\n[2]: https://github.com/matehat/Objective-LevelDB\n[3]: https://github.com/matehat/Objective-LevelDB#concurrency\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmatehat%2Fmhtextsearch","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmatehat%2Fmhtextsearch","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmatehat%2Fmhtextsearch/lists"}