{"id":28184016,"url":"https://github.com/ad-freiburg/completesearch","last_synced_at":"2025-06-24T06:32:58.451Z","repository":{"id":41890110,"uuid":"295115766","full_name":"ad-freiburg/completesearch","owner":"ad-freiburg","description":"Search engine for semi-structured data (text and structured data) that provides all kinds of intelligent search features (keyword search, autocompletion, faceted search, error-tolerant search, synonym search, semantic search) very efficiently also on very large data.","archived":false,"fork":false,"pushed_at":"2024-02-19T14:13:26.000Z","size":30744,"stargazers_count":23,"open_issues_count":3,"forks_count":6,"subscribers_count":5,"default_branch":"master","last_synced_at":"2025-05-16T05:12:01.419Z","etag":null,"topics":["autocompletion","large-scale","search-engine"],"latest_commit_sha":null,"homepage":"","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ad-freiburg.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-09-13T09:10:10.000Z","updated_at":"2025-04-24T10:42:33.000Z","dependencies_parsed_at":"2022-08-11T20:20:35.990Z","dependency_job_id":null,"html_url":"https://github.com/ad-freiburg/completesearch","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/ad-freiburg/completesearch","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ad-freiburg%2Fcompletesearch","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ad-freiburg%2Fcompletesearch/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ad-freiburg%2Fcompletesearch/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ad-freiburg%2Fcompletesearch/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ad-freiburg","download_url":"https://codeload.github.com/ad-freiburg/completesearch/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ad-freiburg%2Fcompletesearch/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":261620416,"owners_count":23185516,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["autocompletion","large-scale","search-engine"],"created_at":"2025-05-16T05:11:56.211Z","updated_at":"2025-06-24T06:32:58.404Z","avatar_url":"https://github.com/ad-freiburg.png","language":"C++","funding_links":[],"categories":[],"sub_categories":[],"readme":"# CompleteSearch\n\n[![Build\nStatus](https://travis-ci.com/ad-freiburg/completesearch.svg?branch=master)](https://travis-ci.com/ad-freiburg/completesearch)\n\nCompleteSearch is a fast and interactive search engine for *context-sensitive prefix search* on a given collection of documents.\nIt does not only provide search results, like a regular search engine,\nbut also completions for the last (maybe only partially typed) query word that lead to a hit.\nThis can be used to provide very efficient support for a variety of features:\nquery autocompletion, faceted search, synonym search, error-tolerant search, semantic search.\nA list of publications on the techniques behind CompleteSearch and its many applications is provided at the end of this page.\n\nFor a demo on various datasets, just checkout this repository and follow the instructions below.\nWith a single command line, you get a working demo (you can choose from several datasets,\neach of the size of a few million documents, so not paticularly large, but also not small).\nCompleteSearch scales to collections with tens or even hundreds of millions of documents,\nwithout losing its interactivity.\n\n## 1. Checkout\n\nCheckout the repository and build the docker image\n\n    git clone https://github.com/ad-freiburg/completesearch\n    cd completesearch\n    docker build -t completesearch .\n\n## 2. Quickstart by Demo\n\nThe following command line builds a search index and then starts the search server\nfor the dataset specified via the `DB` variable (the name of any subdirectory of [applications](applications/) works).\nUnder the specified `PORT` you then have a generic UI, as well as an API (see Section 4 below).\n\n        export DB=movies \u0026\u0026 PORT=1622 \u0026\u0026 docker run -it --rm -e DB=${DB} -p ${PORT}:8080 -v $(pwd)/applications:/applications -v $(pwd)/data/:/data -v $(pwd)/ui:/ui --name completesearch.${DB} completesearch -c \"make DATA_DIR=/data/${DB} DB=${DB} csv pall start\"\n\nThis command line downloads and uncompresses the CSV, builds the index, and starts the server, all in one go.\nIf you have already downloaded the CSV, it will not be downloaded again (the Makefile target `csv:` then has no effect).\nIf you have already built the index once, you can omit the Makefile target `pall:` (which stand for precompute all).\n\n## 3. Relevant files\n\nRead this section if you want to understand a little deeper of what's going on with the fancy command line above.\nThe command line first builds a docker image from the code in this repository.\nSo far so good.\nIt then runs a docker container, which mounts three volumes, which we briefly explain next:\n\n**applications** This folder contains the configuration for each application.\nEach configuration just contains two files.\nA `Makefile` that specifies how to build the index (this is highly customizable, see below).\nAnd a `config.js` for customizing the generic UI.\n\n**data** This folder contains the CSV file with the original data (one record per line, in columns) and the index files.\nThey all have a common prefix. See below for more information on the index.\n\n**ui** This folder contains the code for the generic UI.\nIf you just want to use CompleteSearch as backend and build your own UI,\nyou don't have to mount this volume.\nIt's nice, however, to always have a working UI available for testing, without any extra work.\n\n## 4. The CompleteSearch index\n\nLike all search engines, CompleteSearch builds an index with the help of which it can then answer queries efficiently.\nIt is not an ordinary inverted index, but something more fancy: a half-inverted index or *hybird (HYB)* index.\nYou don't have to understand this if you just want to use CompleteSearch.\nBut if you are interested, you can learn more about it in the publications below.\n\nTo build the index, CompleteSearch requires two input files, one with suffix `.words` and one with suffix `.docs`.\nThe first contains the contents of your documents split into words.\nThe second contains the data that you want to display as search engine hits.\nThe two are usually related, but not exactly the same.\nThe format is very simple and is described by example [here](https://ad-wiki.informatik.uni-freiburg.de/completesearch/QuickIntro).\n\nIf you have special wishes, you can build these two input files yourself, from whatever your data is.\nThen you have full control over what CompleteSearch will and can do for you.\nHowever, in most applications, you can use our *generic CSV parser*.\nIt takes a CSV file (one record per line, with a fixed number of columns per line) as input,\nand from that produce the *.words* and the *.docs* file.\n\nThe CSV parse is very powerful and highly customizable.\nYou can see how it is used in the *Makefile* of the various example applications\n(in the subdirectories of the directory `applications`).\nA subset of the options is described in more detail [here](https://ad-wiki.informatik.uni-freiburg.de/completesearch/CsvParser).\nFor a complete list, look at the [code that parses the options](https://github.com/ad-freiburg/completesearch/blob/master/src/parser/CsvParserOptions.cpp).\n\n## 4. The CompleteSearch engine\n\nThe binary to start the CompleteSearch engine is called `startCompletionServer`.\nIt is very powerful and has a lot of options.\nFor some example uses, you can have a look at the `Makefile` in the director\n`applications` and at the included `Makefile` of one of the example applications.\nA detailed documentation of all the options can be found in the [README.md in the src directory](https://github.com/ad-freiburg/completesearch/tree/master/src).\n\nOnce started, you can either ask queries using our generic and customizable UI (see above).\nOr you can ask the backend directly, via the HTTP API provided by `startCompletionServer`.\nThe API is very simple and described [at the end of this page](https://ad-wiki.informatik.uni-freiburg.de/completesearch/QuickIntro).\nPlay around with it for one the example applications to get a feeling for what it does.\nYou can also look at the (rather simple) JavaScript code of the generic UI\nto get a feeling for how it works and what it can be used for.\n\n## 5. (Optional) Setup a subdomain\n\nTo show off your CompleteSearch instance to your friends, you may want it to run\nunder a fancy URL, and not `http://my.weird.hostname.somewhere:76154`.\nLet us assume you have an Apache webserver running on your machine.\nThen you can add the following section in your `apache.conf` or in a separte\nconfig file included by `apache.conf`.\nYou have to replace `servername` by the\n[fully qualified domain name (FQDN)](https://en.wikipedia.org/wiki/Fully_qualified_domain_name) of the\nmachine on which your Apache webserver is running.\nYou have to replace `hostname` by the FQDN of the machine on which the CompleteSearch frontend is running.\nThis can be the same machine as `servername`, but does not have to be.\n\n```xml\n\u003cVirtualHost *:80\u003e\n  ServerName example.cs.uni-freiburg.de\n  ServerAlias dblp example.cs.uni-freiburg.de\n  ServerAdmin webmaster@localhost\n\n  ProxyPreserveHost On\n  ProxyRequests Off\n\n  ProxyPass / http://\u003chostname\u003e:5000/\n  ProxyPassReverse / http://\u003chostname\u003e:5000\u003e/\n\n  ...\n\u003c/VirtualHost\u003e\n```\n\n## 6. Publications\n\nHere are some of the publications explaining the techniques behind CompleteSearch and what it can be used for.\nThis work was done at the [Max-Planck-Institute for Informatics](https://www.mpi-inf.mpg.de/departments/algorithms-complexity).\nIt's already a while ago, but turns out that the features and the efficiency\nprovided by CompleteSearch are still very much state of the art.\n\n[Type Less, Find More: Fast Autocompletion with a Succinct Index](https://www.researchgate.net/publication/47841931_Type_Less_Find_More_Fast_Autocompletion_Search_with_a_Succinct_Index) @ SIGIR 2006\n\n[The CompleteSearch Engine: Interactive, Efficient, and Towards IR\u0026DB Integration](http://cidrdb.org/cidr2007/papers/cidr07p09.pdf) @ CIDR 2007\n\n[ESTER: efficient search on text, entities, and relations](http://researchgate.net/publication/47842303_ESTER_efficient_search_on_Text_Entities_and_Relations) @ SIGIR 2007\n\n[Efficient interactive query expansion with complete search](https://dl.acm.org/doi/10.1145/1321440.1321560) @ CIKM 2007\n\n[Output-Sensitive Autocompletion Search](https://link.springer.com/article/10.1007/s10791-008-9048-x) @ Information Retrieval 2008\n\n[Semantic Full-Text Search with ESTER: Scalable, Easy, Fast](https://www.suchanek.name/work/publications/icdm2008.pdf) @ ICDM 2008 \n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fad-freiburg%2Fcompletesearch","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fad-freiburg%2Fcompletesearch","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fad-freiburg%2Fcompletesearch/lists"}