{"id":21413082,"url":"https://github.com/rixrix/silverstripe-lucene","last_synced_at":"2025-10-25T19:32:11.192Z","repository":{"id":66584645,"uuid":"1653186","full_name":"rixrix/silverstripe-lucene","owner":"rixrix","description":"This is a fork of Silverstripe Lucene plugin that is hosted at Google Code","archived":false,"fork":false,"pushed_at":"2011-04-23T10:36:34.000Z","size":398,"stargazers_count":5,"open_issues_count":0,"forks_count":11,"subscribers_count":0,"default_branch":"master","last_synced_at":"2025-04-07T23:01:55.122Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"http://code.google.com/p/lucene-silverstripe-plugin/","language":"PHP","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/rixrix.png","metadata":{"files":{"readme":"README","changelog":"CHANGELOG","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2011-04-23T10:19:58.000Z","updated_at":"2016-09-22T03:10:24.000Z","dependencies_parsed_at":"2023-02-20T07:15:31.708Z","dependency_job_id":null,"html_url":"https://github.com/rixrix/silverstripe-lucene","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/rixrix/silverstripe-lucene","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rixrix%2Fsilverstripe-lucene","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rixrix%2Fsilverstripe-lucene/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rixrix%2Fsilverstripe-lucene/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rixrix%2Fsilverstripe-lucene/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/rixrix","download_url":"https://codeload.github.com/rixrix/silverstripe-lucene/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rixrix%2Fsilverstripe-lucene/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":265237695,"owners_count":23732514,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-22T18:16:46.432Z","updated_at":"2025-10-25T19:32:11.121Z","avatar_url":"https://github.com/rixrix.png","language":"PHP","funding_links":[],"categories":[],"sub_categories":[],"readme":"###############################################\nLucene plugin for SilverStripe 2.4\n###############################################\n\nThis plugin for the SilverStripe framework allows you to harness the power of \nthe Lucene search engine on your site.\n\nUsing a variety of tools, you can also search PDF, Word, Excel, Powerpoint and \nplain text files.\n\nIt is easy to set up and use.\n\nThis plugin uses Zend_Search_Lucene from Zend, StandardAnalyzer by Kenny \nKatzgrau, and pdf-to-text by Joeri Stegeman for PDF scanning.\n\nZend_Search_Lucene is a PHP port of the Apache project's Lucene search engine.\n\nThis extension is inspired by the wpSearch plugin for WordPress.\nhttp://codefury.net/projects/wpSearch/\n\n\nMaintainer Contact\n-----------------------------------------------\nDarren Inwood\n\u003cdarren (dot) inwood (at) chrometoaster (dot) com\u003e\n\n\nRequirements\n-----------------------------------------------\nSilverStripe 2.4 or newer\n'Queued Jobs' module\n\nThis module is currently only tested on LAMP - Windows and Mac OS X should work,\nbut are untested.\n\n\nDocumentation\n-----------------------------------------------\nhttp://code.google.com/p/lucene-silverstripe-plugin/\n\nThere is also phpdoc generated documentation in the docs directory.\n\n\nInstallation Instructions\n-----------------------------------------------\n\nCheck out the archive into the root directory of your project.  This should be \nthe same folder as the 'sapphire' directory.\n\nVia SVN:\nsvn export http://lucene-silverstripe-plugin.googlecode.com/svn/trunk/ lucene\n\nThis will create a directory called 'lucene' containing the plugin files.\n\nYou will need to have the 'Queued Jobs' module installed in order to use Lucene:\n\nhttp://www.silverstripe.org/queued-jobs-module/\n  \nTo get queued jobs to run, you also need to add $_FILE_TO_URL_MAPPING to your\n_ss_environment.php file as described in the SilverStripe docs:\n\nhttp://doc.silverstripe.org/sapphire/en/topics/commandline\n\nRun /dev/build?flush=1 to tell your SilverStripe about your new module, and your \nnew search engine is installed!  (You still need to enable it - see below.)\n\n\nThird-Party Utility Installation\n--------------------------------\n\nTo enable pdf scanning using the pdftotext utility on Linux, ensure that the \ncommand-line utility is installed.  If you are using Debian or Ubuntu, either \nof the poppler-utils or xpdf-utils packages will provide this utility:\n\napt-get install poppler-utils\n\nIf you are on another Linux, Mac OS X, or Windows, the Xpdf program includes \npdftotext:\n\nhttp://www.foolabs.com/xpdf/\n\nIf you do not have the pdftotext utility installed, Lucene will use the \nPHP-based PDF2Text class by Joeri Stegeman instead.  However, this class is \nlimited in it's ability compared to pdftotext.\n\nWord, Excel and Powerpoint scanning all require the 'zip' PHP module to be \ninstalled.  If you don't have it, newer docx, xlsx and pptx documents won't be \nscanned.\n\nTo get scanning of older doc, xls and ppt documents working, you need to install\nthe catdoc command-line utility.  There are Windows and Mac OS X ports also.\n\nhttp://wagner.pp.ru/~vitus/software/catdoc/\nhttp://blog.brush.co.nz/2009/09/catdoc-windows/\nhttp://catdoc.darwinports.com/\n\n\nQuick Start\n-----------------------------------------------\n\nIf you just want to get up and running as quickly as possible with your Lucene \nsearch engine, install it as per above, and then add the following line to your \nproject's _config.php file:\n\nZendSearchLuceneSearchable::enable();\n\nIf you're using the Black Candy theme, or another theme that supports the \nstandard SilverStripe Fulltext Search, your search will now run using Lucene, \nindexing all Pages and indexable Files (PDF, Word, Excel, Powerpoint and HTML).\n\nTo get the most out of your new search engine, continue reading.\n\n\nConfiguration Instructions\n-----------------------------------------------\n\nENABLING THE SEARCH ENGINE\n\nBy default, the Lucene Search engine is not enabled.  To enable it, you need to \nadd the following into your _config.php file:\n\nZendSearchLuceneSearchable::enable();\n\nThis will configure all SiteTree and File objects by adding the \n'ZendSearchLuceneSearchable' extension to those classes.  The following fields \nwill be indexed whenever an object of this class is written to the database:\n\n'SiteTree' =\u003e 'Title,MenuTitle,Content,MetaTitle,MetaDescription,MetaKeywords',\n'File' =\u003e 'Filename,Title,Content'\n\nAfter enabling the search engine, you will need to build the index for the first \ntime.  There is a new button marked 'Rebuild search index' on the SiteConfig \npage, which is the page in the LHS column at the top, with the name of the site.\nThis will add a new job to the 'Jobs' list - this will give you a readout of how\nfar through reindexing your site is.\n\nIf you just want to get Lucene up and running as quickly as possible, you can \nskip down to the 'Usage Overview' section below - that's all the configuration \nyou need to do!\n\n\nINDEXING CLASSES\n\nIf you wish to enable the search engine, but not automatically add the extension \nto SiteTree and/or File, pass in an array containing the classes to index: \n(this only accepts SiteTree and File, see below for indexing other classes)\n\n// Use one of these lines to control which classes to extend\nZendSearchLuceneSearchable::enable(array('SiteTree', 'File'));\nZendSearchLuceneSearchable::enable(array('SiteTree'));\nZendSearchLuceneSearchable::enable(array('File'));\n\n// Do not automatically add the extension to any classes\nZendSearchLuceneSearchable::enable(array());\n\nIn order to index classes other than the defaults, you need to add the \nZendSearchLuceneSearchable extension with a list of which fields to index.\n\nFor instance, to index your custom Page class, which has custom Summary and \nIntro fields added: \n\nObject::add_extension(\n    'Page',\n    \"ZendSearchLuceneSearchable('\"\n    .\"Title,MenuTitle,MetaTitle,MetaDescription,MetaKeywords,\"\n    .\"Summary,Intro,Content')\"\n);\n\nYou can also index custom functions that return strings.  If your indexed object\nhas a method called 'getFoo()' that returns a string representing some special \nstate you want to index, adding 'getFoo' into the field list will index this\nstate.\n\nThere are four types of indexing used in Lucene:\n\n1. Keyword - Data that is searchable and stored in the index, but not broken up \ninto tokens for indexing. This is useful for being able to search on non-textual \ndata such as IDs or URLs.\n\n2. UnIndexed - Data that isn’t available for searching, but is stored with our \ndocument (eg. article teaser, article URL  and timestamp of creation)\n\n3. UnStored - Data that is available for search, but isn’t stored in the index \nin full (eg. the document content)\n\n4. Text – Data that is available for search and is stored in full (eg. title and \nauthor)\n\nThe MenuTitle, MetaTitle, MetaDescription and MetaKeywords fields will be \nindexed as Unstored.\nLastEdited and Created fields will be Unindexed.\nID and ClassName fields will be indexed as Keyword types.\nAll other fields will be indexed as Text.\n\n\nINDEXING RELATIONS\n\nYou can index has_one, has_many and many_many relations, using dot notation to \nindicate the fields to read on the related object.\n\nIf we have a has_one relation between Page and our custom class Foo, and Foo \nhas a text field called Bar, we can index it by adding Foo.Bar into the field\nlist when we add the extension to the Page type:\n\nObject::add_extension(\n    'Page',\n    \"ZendSearchLuceneSearchable('\"\n    .\"Title,MenuTitle,MetaTitle,MetaDescription,MetaKeywords,\"\n    .\"Content,Foo.Bar')\"\n);\n\nYou can nest relations several layers deep if necessary, eg. \nFoo.Bar.Baz.Buz - remember that the names used are the names of the relation \nfields, NOT the names of the classes being indexed.\n\n\nINDEXING FILES\n\nWhen indexing 'File' DataObjects, this module will detect the file type using \nthe file extension.  Detected types are .txt, .xls, .doc, .ppt, .xlsx, .docx, \n.htm, .html, .pptx, and .pdf.\n\nSee the 'Installation' section above for details on getting file scanning \nworking for various file types.\n\n\nADVANCED FIELD-LEVEL INDEXING OPTIONS\n\nYou can get more fine-grained control over how your classes are indexed by \nadding the ZendSearchLuceneSearchable extension with a JSON-encoded object as \nthe argument.\n\nYour object should be arranged as key-value pairs, the key being the name of the\nproperty, method or relation you wish to index, and the value being another \nobject containing key-value pairs indicating the options for that field.\n\nObject::add_extension(\n    'Page',\n    \"ZendSearchLuceneSearchable('\n        {\n            \"Title\" : true,\n            \"CreatedDate\" : {\n                name : \"Title\",\n                type : \"text\",\n                content_filter : \"strtotime\"\n            },\n            \"Intro\" : true,\n            \"Content\" : {\n                name : \"Content\",\n                type : \"unstored\"\n            },\n            \"Foo.Bar\" : {\n                name : \"Baz\"\n            },\n            \"Images\" : {\n                content_filter : [\"HelperClass\",\"countImages\"]\n            }\n        }    \n    ')\"\n);\n\nAny omitted config options will use the defaults.  Available config options for\neach field are:\n\n * name\n   The name to store this as in the document.  Default is the same as\n   the field name.  The field name of 'ID' is a special case - this should always \n   use a name of 'ObjectID', as this is used internally.\n\n * type\n   The type of indexing to use.  Default is \"text\", legal options are \"text\", \n   \"keyword\", \"unstored\" and \"unindexed\".\n\n * content_filter\n   a callback that should be used to transform the field value\n   prior to being indexed.  The callback will be called with one argument, \n   the field value as a string, and should return the transformed field value\n   also as a string.  Could be useful for eg. turning date strings into unix \n   timestamps prior to indexing.  A value of false will indicate that there\n   should be no content filtering, which is the default.\n\n\nADVANCED CLASS-LEVEL INDEXING OPTIONS\n\nYou can also provide a second JSON-encoded argument when initialising a class \nusing Object::add_extension.  This should contain key-value pairs indicating\nyour class-level configuration.\n\nObject::add_extension(\n    'Foo',\n    \"ZendSearchLuceneSearchable('Foo,Far,Faz','\n        {\n            \"index_filter\" : \"\\\"ID\\\" IN ( SELECT \\\"ID\\\" FROM \\\"Foo\\\" LEFT JOIN \\\"Other\\\" ON \\\"Foo\\\".\\\"ID\\\" = \\\"Other\\\".\\\"FooID\\\" WHERE \\\"Other\\\".\\\"FooID\\\" IS NOT NULL )\"\n        }\n    ')\"\n);\n\nCurrently there is only one configuration option:\n\n * index_filter\n   a string to be used as the second argument to DataObject::get() when assembling\n   the list of items of this class to index.  The default is an empty string, \n   which will get all items of that class.\n\nNote that the config can get a bit messy with all the nested escaped quotes.  \nYou may prefer to create PHP objects, json encode them and insert them that way:\n\n$fields = array(\n    'Foo' =\u003e array(\n        'name' =\u003e 'Foo',\n    ),\n    'Bar' =\u003e array(\n        'name' =\u003e 'Bar',\n        'type' =\u003e 'unstored',\n        'content_filter' =\u003e array('HelperClass','filterFunction')\n    )\n);\n$class = array(\n    'index_filter' =\u003e '\n    \"ID\" IN ( \n        SELECT \"ID\" \n        FROM \"Foo\" \n            LEFT JOIN \"Other\" \n            ON \"Foo\".\"ID\" = \"Other\".\"FooID\" \n        WHERE \"Other\".\"FooID\" IS NOT NULL \n    )'\n);\nObject::add_extension(\n    'Foo', \n    \"'\".json_encode($fields).\"', '\".json_encode($class).\"'\"\n);\n\n\nREBUILDING THE SEARCH INDEX\n\nThe search index is rebuilt on every /dev/build.  In case you want to disable\nthis, for example if your site is quite large and rebuilding the search index \ntakes a while, you can add the following to your _config.php:\n\nZendSearchLuceneSearchable::$reindexOnDevBuild = false;\n\nTo manually rebuild the search index, go to the SiteConfig page (at the very \ntop of the LHS site tree in the CMS, with the world icon) and there will be a\n'Rebuild Search Index' button at the bottom of the page.  Clicking this button \nwill start a Queued Job, which deletes the current index, scans the site for all\ncontent which should be indexed, and reindexes everything.\n\nYou can view reindex progress on the 'Jobs' tab, at the top of the CMS.  It will\ndisplay when the job was started, how long it has run for, how many items there\nare to be indexed, and how many have been indexed so far.  If there are any \nerrors, these will also show up here.\n\n\nPAGINATION\n\nThere are some pagination settings that allow you to control the pagination \nfunctions:  (Put these in your _config.php to change them)\n\n// Number of results to show on each page\nZendSearchLuceneSearchable::$pageLength = 10;\n\n// Maximum number of pages to show in the pagination\nZendSearchLuceneSearchable::$maxShowPages = 10;\n\n// Always show this number of pages at the start of the pagination\nZendSearchLuceneSearchable::$alwaysShowPages = 3;\n\n\nINDEX DIRECTORY\n\nYou can also set where to store the index:\n\n// These are the defaults.\nZendSearchLuceneSearchable::$cacheDirectory = TEMP_FOLDER;\nZendSearchLuceneWrapper::$indexName = 'Silverstripe';\n\nWith the default settings, the index will be created in the SilverStripe temp \nfolder, and will be called 'SilverStripe'.\n\n\nADVANCED INDEX CONFIGURATION\n\nhttp://zendframework.com/manual/en/zend.search.lucene.index-creation.html#zend.search.lucene.index-creation.optimization\n\nYou can use advanced configuration functions directly on the index:\n\n$index = ZendSearchLuceneWrapper::getIndex();\n\n// Retrieving index size\n$indexSize = $index-\u003ecount();\n$documents = $index-\u003enumDocs();\n\n// Index optimisation\n$index-\u003eoptimize();\n\nYou can also specify operations to be run on newly created indexes using \nZendSearchLuceneWrapper::addCreateIndexCallback().  On creation, any callbacks \nregistered using this function are run.  This allows you to set up any \noptimisation options you require on your index.  The Zend defaults are used if \nno callbacks are registered.\n\nTo use a callback, you can put something like this in your _config.php:\n\nfunction create_index_callback() {\n    $index = ZendSeachLuceneWrapper::getIndex();\n    $index-\u003esetMaxBufferedDocs(20);\n}\nZendSearchLuceneWrapper::addCreateIndexCallback('create_index_callback');\n\n\nUsage Overview\n-----------------------------------------------\n\nOnce you have configured and enabled the plugin, you can add a new token into \nyour template files to output the search form:\n\n\u003c!-- START search form --\u003e\n$ZendSearchLuceneForm\n\u003c!-- END search form --\u003e\n\nThis will post to the action ZendSearchLuceneResults, which will display the \nSearch Results page.\n\nThis module will also take over the $SearchForm token - this is for convenience, \nto get users up and running quickly using the out-of-the-box themes.  If you're \nplanning on customising the form markup, use $ZendSearchLuceneForm instead.\n\n\nCUSTOM SEARCH FORM\n\nTo customise your search form, override this method (or create a new one) and \noutput a Form object containing a field called 'Search' and an action of \nZendSearchLuceneResults.\n\n/* Custom search form */\nclass Your_Controller extends Page_Controller {\n\n   // . . .\n\n   function ZendSearchLuceneForm() {\n      $form = parent::ZendSearchLuceneForm();\n      // Customise the form\n      return $form;\n   }\n\n}\n\nIf you are using $ZendSearchLuceneForm in your templates, you can create a \ncustom template for the search form called ZendSearchLuceneForm.ss - it can go \nin either your root template folder, or in your Includes/ folder.  Copying \nsapphire/templates/SearchForm.ss is a good starting point.\n\n\nCUSTOM SEARCH RESULTS PAGE\n\nIn the templates/Layout folder of the plugin, you will find the \nLucene_results.ss file.  Copy this file into your own theme's Layout folder, and \nalter to your heart's content.\n\nAvailable templating tokens in this file are:\n\n$Query - The string that was searched for\n$TotalResults - Total number of hits for the search\n$TotalPages - Total number of pages for the query\n$ThisPage - The page number currently being viewed\n$StartResult - The number of the first result on this page\n$EndResult - The number of the last result on this page\n$PrevUrl - URL to the previous page of search results\n$NextUrl - URL to the next page of results\n\n\u003c% control Results %\u003e\n  \u003c!-- DataObjectSet containing the search results for the current page --\u003e\n  $score (relevance rating assigned by the search engine)\n  $Number (which number in the set this result is)\n  $Link (URL to this resource)\n  You can also use any fields that have been indexed, eg. $Content\n\u003c% end_control %\u003e\n\n\u003c% control SearchPages %\u003e\n  \u003c!-- This is a DataObjectSet containing the pagination pages --\u003e\n  $IsEllipsis  (whether this entry is a blank ellipsis to indicate more pages)\n  $PageNumber\n  $Link  (URL to this page of search results)\n  $Current   (Boolean indicating whether this is the current page)\n\u003c% end_control %\u003e  \n\nA useful extra function is the SearchTextHighlight string modifier.  If you use \neg. $Content.SearchTextHighlight in your template, this will output an HTML \nparagraph containing 25 words surrounding your search terms, with the search \nterms highlighted with \u003cstrong\u003e tags.\n\nThis modifier takes one optional argument, the number of words to display.  So \nto display a 50 word summary you would use:\n\n$Content.SearchTextHighlight(50) \n\n\nCUSTOMISE SEARCH FUNCTION\n\nLucene is actually a very powerful search engine, you can do a lot with it.  If \nyou have a more advanced search function you want to implement, you can build \nyour own form and submit it to your own action.  Check the Zend docs on building \nqueries for how to build the query you want from the form fields you've \nreceived.\n\nhttp://zendframework.com/manual/en/zend.search.lucene.searching.html\n\nclass Your_Controller extends Page_Controller {\n\n    /**\n     * Use $AdvancedSearchForm in your template to output this form.\n     */\n    function AdvancedSearchForm() {\n        $fields = new FieldSet(\n            new TextField('Query','First search query'),\n            new TextField('Subquery', 'Second search query')\n        );\n        $actions = new FieldSet(\n            new FormAction('AdvancedSearchResults', 'Search')\n        );\n        $form = new Form($this-\u003eowner, 'AdvancedSearchForm', $fields, $actions);\n        $form-\u003edisableSecurityToken();\n        return $form;\n    }\n\n    /**\n     * Processes the search form\n     */\n    function AdvancedSearchResults($data, $form, $request) {\n        // Assemble your custom query \n        $query = Zend_Search_Lucene_Search_QueryParser::parse(\n            $form-\u003edataFieldByName('Query')-\u003edataValue()\n        );\n        $subquery = Zend_Search_Lucene_Search_QueryParser::parse(\n            $form-\u003edataFieldByName('Subquery')-\u003edataValue()\n        );\n        $search = new Zend_Search_Lucene_Search_Query_Boolean();\n        $search-\u003eaddSubquery($query, true);\n        $search-\u003eaddSubquery($subquery, false);\n\n        // Get hits from the Lucene search engine.\n        $hits = ZendSearchLuceneWrapper::find($search);\n\n        // Convert these into a data array containing pagination info etc\n        $data = $this-\u003egetDataArrayFromHits($hits, $request);\n\n        // Display the results page\n        return $this-\u003eowner-\u003ecustomise($data)-\u003erenderWith(array('Advanced_results', 'Page'));\n    }\n\n}\n\n\nTODO\n-----------------------------------------------\n\n* Allow the use of multiple indexes per project\n* Query logging\n* Test in Windows / Mac OS X, add instructions for these OSes\n* Add a language file - text strings are already translatable via _t()\n* Make text highlighter more configurable.\n\n\nLinks\n-----------------------------------------------\n\nwpSearch plugin for WordPress\nhttp://codefury.net/projects/wpSearch/ \n\nZend_Search_Lucene documentation\nhttp://zendframework.com/manual/en/zend.search.lucene.html\n\nQueued Jobs module\nhttp://www.silverstripe.org/queued-jobs-module/\n\nXpdf (pdftotext PDF text extraction utility)\nhttp://www.foolabs.com/xpdf/\n\ncatdoc (MS Office text extraction utility)\nhttp://wagner.pp.ru/~vitus/software/catdoc/\nhttp://blog.brush.co.nz/2009/09/catdoc-windows/\nhttp://catdoc.darwinports.com/\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frixrix%2Fsilverstripe-lucene","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frixrix%2Fsilverstripe-lucene","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frixrix%2Fsilverstripe-lucene/lists"}