{"id":19520915,"url":"https://github.com/sebastianbach/searcher","last_synced_at":"2025-06-13T20:07:14.495Z","repository":{"id":144573984,"uuid":"596739893","full_name":"SebastianBach/searcher","owner":"SebastianBach","description":"Python based reverse image search","archived":false,"fork":false,"pushed_at":"2023-02-27T18:48:21.000Z","size":21,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-02-26T00:28:41.838Z","etag":null,"topics":["dhash","python","reverse-image-search"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":false,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/SebastianBach.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-02-02T20:42:51.000Z","updated_at":"2023-02-02T21:22:26.000Z","dependencies_parsed_at":null,"dependency_job_id":"b72327ab-e78d-452a-b0cf-94b01e94ac92","html_url":"https://github.com/SebastianBach/searcher","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/SebastianBach/searcher","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SebastianBach%2Fsearcher","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SebastianBach%2Fsearcher/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SebastianBach%2Fsearcher/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SebastianBach%2Fsearcher/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/SebastianBach","download_url":"https://codeload.github.com/SebastianBach/searcher/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SebastianBach%2Fsearcher/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":259712409,"owners_count":22900038,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["dhash","python","reverse-image-search"],"created_at":"2024-11-11T00:28:18.671Z","updated_at":"2025-06-13T20:07:14.471Z","avatar_url":"https://github.com/SebastianBach.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# About\n\nA simple reverse image search engine, similar to *Google Images* or *TinEye*.\n\nCan parse online HTML pages or local folders.\nReverse image search is performed via CLI or a web-application.\n\nUses *dhash* as fingerprinting algorithm.\n\n\nProbably won't scale well, but works fine for personal use.\n\n\n# Installation\n\n* Requires Python 3.8 or newer\n* Install dependencies via ```pip install -r requirements.txt```\n\n\n# Workflow\n\n## Create Project\n\nUse the ```new_project.py``` script to create a new project at a given location:\n\n```shell\npython new_project.py C:\\my_project\n```\n\n\n## Parse Sources\n\nIn the new project folder, edit the *to_parse\\sources.txt* file.\nAdd URLs or local folders to parse.\n\n```\nhttp://example.com\nC:\\my_images\n```\n\nThe ```img_filter.txt``` file is an exclusion list. Image URLs that contain a certain string are not processed further.\n\nIn the folder *cookies*, create a new file for each domain for which cookies are needed in the request.\nThe file must be named after the domain e.g. ```example.com.json```, and must contain the cookie data as JSON:\n\n```json\n{\"PHPSESSID\": \"1234579\"}\n```\n\n\nRun the ```parse.py``` script to search for images on the web pages or in the folders.\n\n```\npython parse.py C:\\my_project\n```\n\nThe *to_analyze* folder contains now a text file for each parsed website or folder.\nThe text files list all found image URLs.\n\n\n## Analyze Images\n\nTo load the images and create the image fingerprints, use the ```analyze.py``` script:\n\n```\npython analyze.py C:\\my_project\n```\n\nThe image hashes as well as the image sources are stored in a SQLite database.\nThe preview images are stored in the *preview* folder.\n\n## Search Image via CLI\n\nSearch for an image source by handing over an image URL an the maximum hamming distance to the ```search.py``` script:\n\n```\npython search.py C:\\my_project https://just.some/image.jpg 10\n```\n\nThe arguments are:\n- absolute path of the project folder\n- the URL or absolute path of the image to search for\n- the hamming distance threshold\n\nIt will list the URLs of folders that contain the same or similar image.\n\n\n\n## Search via Web-App\n\nStart the web-app with the ```web.py``` script:\n\n```\npython python web.py C:\\my_project C:\\my_resources\n```\n\nThe arguments are:\n- absolute path of the project folder\n- absolute path of the web resources folder\n\n\nIn your browser, open the address ```localhost:5000```.\nEnter an image URL in the \"Image URL\" field and press \"Search\".\nPress ```CTRL+C``` to close the web app.\n\n## Build Web-App as Container\n\n\nBuild container with:\n\n```shell\ndocker build --tag image-search .\n```\n\nRun the container by mounting the project and web resources:\n\n```shell\ndocker run -it -v C:/my_project:/data -v C:/my_resources:/resources -p 5000:5000 image-search\n```\n\nIn your browser, open the address ```localhost:5000```.\nEnter an image URL in the \"Image URL\" field and press \"Search\".\nPress ```CTRL+C``` to close the web app.\n\n\n\n# Architecture\n\n```mermaid\n  flowchart LR;\n      A(fa:fa-file URLs)--\u003eB[HTML Parser];\n      AA(fa:fa-file Cookies) --\u003eB;\n      B \u003c--\u003e web([fa:fa-globe Websites])\n      B --\u003e L;\n      L(fa:fa-file Images) --\u003e P[Analyzer];\n      webimage([fa:fa-globe Images]) --\u003e P\n      P--\u003eD[(Database)];\n      D--\u003eF[fa:fa-window-maximize Web App];\n      D--\u003eCLI[fa:fa-window-maximize  CLI App];\n```\n\n## Folders\n\n- *apps* stores the mentioned applications\n- *searcher* is the module handling analyzing images, writing and querying the database.\n- *web_parser* is a utility module used to parse HTML pages for images.\n\n## Web-App Resources\n\n* *templates* contains Jinja templates used by the web app.\n* *web* contains the CSS file used by the web app.\n\n\n## Extension\n\nA **source generator** adds URLs or folders to the list of image sources to process. A custom generator is added by using the  ```source_generator``` decorator:\n\n```python\n\n@web_parser.parser_modules.source_generator\ndef my_generator(path:str):\n\n    sources = []\n\n    sources.append(\"https://www.example.com\")\n\n    return sources\n```\n\nA **html parser** searches the given HTML document for images. A custom parser is added by using the ```html_parser``` decorator:\n\n```python\n@web_parser.parser_modules.html_parser\ndef my_parser(doc: BeautifulSoup, website: str, p:searcher.project.Project) -\u003e None:\n\n    images = []\n\n    src = requests.compat.urljoin(website, \"image.jpg\")\n\n    if p.check_img_url(src):\n\n        job = web_parser.parser_modules.ImageJob()\n        job.image = src\n        job.info = \"\"\n\n        images.append(job)\n\n    return images\n```\n\n# To Do\n\n* refactor ```analyzer.py``` and ```web_parser``` module to allow for unit tests.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsebastianbach%2Fsearcher","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsebastianbach%2Fsearcher","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsebastianbach%2Fsearcher/lists"}