{"id":25900940,"url":"https://github.com/pixelastic/sherlock-search","last_synced_at":"2026-02-11T13:02:42.990Z","repository":{"id":148631064,"uuid":"49669841","full_name":"pixelastic/sherlock-search","owner":"pixelastic","description":"Search into all the Sherlock Holmes books","archived":false,"fork":false,"pushed_at":"2018-01-24T15:23:20.000Z","size":4445,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-12-02T19:15:08.482Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"http://pixelastic.github.io/sherlock-search/","language":"HTML","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/pixelastic.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2016-01-14T19:28:52.000Z","updated_at":"2024-04-13T02:48:33.000Z","dependencies_parsed_at":"2023-03-30T07:04:20.471Z","dependency_job_id":null,"html_url":"https://github.com/pixelastic/sherlock-search","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/pixelastic/sherlock-search","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pixelastic%2Fsherlock-search","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pixelastic%2Fsherlock-search/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pixelastic%2Fsherlock-search/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pixelastic%2Fsherlock-search/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/pixelastic","download_url":"https://codeload.github.com/pixelastic/sherlock-search/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pixelastic%2Fsherlock-search/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29333155,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-11T12:42:24.625Z","status":"ssl_error","status_checked_at":"2026-02-11T12:41:23.344Z","response_time":97,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-03-03T02:19:40.829Z","updated_at":"2026-02-11T13:02:42.970Z","avatar_url":"https://github.com/pixelastic.png","language":"HTML","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Sherlock Search\n\nSearch into the whole Sherlock Holmes book collection using Algolia.\n\n[http://pixelastic.github.io/sherlock-search/](http://pixelastic.github.io/sherlock-search/)\n\n## Why?\n\nI work and Algolia, and since I've joined I've been amazed by the power of the\nservice and its flexibility. I wanted to see if Algolia was a good fit for\nindexing and searching in huge amount of textual data.\n\nSo I started building a first version of this demo in two hours, indexing [A\nTale of Two Cities][1] from the [Gutenberg Project][2]. The results were\nsatisfying, so I started working on a more complete search demo.\n\n## How does it work?\n\nI used the HTML version of the Sherlock Holmes books from the [Gutenberg\nProject][3] as input, parsed it, and\npushed the records to Algolia.\n\nThe parsing basically involves creating one record for each `\u003cp\u003e` paragraph of\ntext from the source book. I also added the name of the book and parent chapter\nto each record. The chapter name is usually found in the closest previous `\u003ch2\u003e`\nor `\u003ch3\u003e` element.\n\nI also added the `bookOrder`, `chapterOrder` and `order` fields to each record\nthat respectively indicate the publishing order or each book, the chapter number\ninside the book and the ascending order of each paragraph inside a chapter. With\nthose values, I can display results in the natural reading order.\n\nSome paragraphs contained too many lines of text that were badly influencing the\nrelevance as well as making the display ugly. I split those records into several\nsmaller ones, with a maximum length of 300 characters, while still making sure\nI wasn't cutting any sentence in half.\n\nTo improve the display a bit more, I decided to add more context to the element\ndisplayed. That is the paragraph of text before and after the matching record\nare also displayed, but toned down. I went a bit further and even merged several\nresults together before displaying them if they were following each other, to\navoid duplicating context.\n\n```json\n{\n  \"author\": \"Sir Arthur Conan Doyle\",\n  \"book\": \"MEMOIRS OF SHERLOCK HOLMES\",\n  \"bookOrder\": 4,\n  \"chapterName\": \"Adventure XI. The Final Problem\",\n  \"chapterOrder\": 11,\n  \"content\": \"Of these the first and second were extremely condensed, while the last is, as I shall now show, an absolute perversion of the facts. It lies with me to tell for the first time what really took place between Professor Moriarty and Mr. Sherlock Holmes.\",\n  \"context\": {\n    \"next\": {\n      \"content\": \"It may be remembered that after my marriage, and my subsequent start in private practice, the very intimate relations which had existed between Holmes and myself became to some extent modified. He still came to me from time to time when he desired a companion in his investigation, but these occasions grew more and more seldom, until I find that in the year 1890 there were only three cases of which I retain any record.\",\n      \"objectID\": \"8eae8d862b3cfefea7a21d575fe8f91b\"\n    },\n    \"previous\": {\n      \"content\": \"I alone know the absolute truth of the matter, and I am satisfied that the time has come when no good purpose is to be served by its suppression. As far as I know, there have been only three accounts in the public press: that in the Journal de Geneve on May 6th, 1891, the Reuter's despatch in the English papers on May 7th, and finally the recent letter to which I have alluded.\",\n      \"objectID\": \"ea5b537d24f3756a2959c05f569aab37\"\n    }\n  },\n  \"objectID\": \"64776a27d82c597baffb365bf0ee8098\",\n  \"order\": 4,\n  \"tagName\": \"P\"\n}\n```\n\nOverall it represents a little more than 15.000 records, which is a [Hacker\nplan][4] and a half. Not that bad for the whole Sherlock Holmes bibliography.\n\nThe UI was easily achieved thanks to our\n[instantsearch.js](https://community.algolia.com/instantsearch.js/) JavaScript\nlibrary.\n\n## Challenges\n\nThe data I got from the Gutenberg project did not follow any convention. Each of\nthe 7 indexed books had their own slightly different markup. I had to write\n7 slightly different scripts to extract the data.\n\nOh and the last Sherlock Holmes book, [The Casebook of Sherlock Holmes][5], is\nnot yet elevated to public domain, so I could not index it.\n\n## Dev\n\nStart by doing an `npm install` then:\n\nRun `npm run build` to parse all the books in `./books` and extract records in\n`./records`.\n\nRun `npm run push` to push the data to the Algolia index (will need\n`ALGOLIA_API_KEY` env variable).\n\nRun `npm run serve` to serve a local version on `http://localhost:4001`.\n\nYou can run `npm run test` and `npm run test:watch` to run the tests, but there\nare very few of them.\n\nFinally, `npm run deploy` will build the website in `./public`, commit it to\n`gh-pages` and push it.\n\n## TODO\n\n- Order books by publishing dates\n- Remove the capitalization on titles\n- Display the chapters only if a book is selected\n- Order chapters by order\n- Make the sidebar larger\n- Add an icon for each book\n\n- Use a serif font on the hits https://www.google.com/fonts/specimen/Lora\n- Use a background that looks like a paper\n  http://www.photos-public-domain.com/wp-content/uploads/2012/05/ivory-off-white-paper-texture.jpg\n- Use a funky highlight http://codepen.io/pixelephant/pen/ghtfF\n- Use a nice and funky font on the chapter/book in the hits\n- Add a small avatar of the books\n- Add a ligature separator between blocks\n  https://commons.wikimedia.org/wiki/File:-\u003eFlourisch_01-\u003e.svg\n\n- Add a leather header?\n- Opened book effect between sidebar and hits\n\n[1]: https://www.gutenberg.org/files/98/98-h/98-h.htm\n[2]: http://www.gutenberg.org/\n[3]: http://www.gutenberg.org/ebooks/subject/76\n[4]: https://www.algolia.com/users/sign_up/hacker\n[5]: https://en.wikipedia.org/wiki/The_Case-Book_of_Sherlock_Holmes\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpixelastic%2Fsherlock-search","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpixelastic%2Fsherlock-search","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpixelastic%2Fsherlock-search/lists"}