{"id":26018912,"url":"https://github.com/alaouimehdi1995/simplified-search-engine","last_synced_at":"2025-03-06T06:39:18.844Z","repository":{"id":50174242,"uuid":"81048038","full_name":"alaouimehdi1995/simplified-search-engine","owner":"alaouimehdi1995","description":"Multithreaded Web Crawler, Scraper, Indexer","archived":false,"fork":false,"pushed_at":"2022-12-08T03:52:10.000Z","size":49,"stargazers_count":8,"open_issues_count":3,"forks_count":1,"subscribers_count":2,"default_branch":"master","last_synced_at":"2023-03-10T07:59:02.046Z","etag":null,"topics":["container","crawl","crawler","crawling","database","docker","docker-compose","engine","index","indexer","indexing","mongodb","python","python-3","scraper","scraping","search-algorithm","search-engine","searching"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/alaouimehdi1995.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2017-02-06T04:34:21.000Z","updated_at":"2023-02-06T13:53:02.000Z","dependencies_parsed_at":"2023-01-25T07:30:42.857Z","dependency_job_id":null,"html_url":"https://github.com/alaouimehdi1995/simplified-search-engine","commit_stats":null,"previous_names":[],"tags_count":null,"template":null,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alaouimehdi1995%2Fsimplified-search-engine","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alaouimehdi1995%2Fsimplified-search-engine/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alaouimehdi1995%2Fsimplified-search-engine/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alaouimehdi1995%2Fsimplified-search-engine/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/alaouimehdi1995","download_url":"https://codeload.github.com/alaouimehdi1995/simplified-search-engine/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":242163847,"owners_count":20082223,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["container","crawl","crawler","crawling","database","docker","docker-compose","engine","index","indexer","indexing","mongodb","python","python-3","scraper","scraping","search-algorithm","search-engine","searching"],"created_at":"2025-03-06T06:39:17.907Z","updated_at":"2025-03-06T06:39:18.817Z","avatar_url":"https://github.com/alaouimehdi1995.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003ch1\u003eSimplified Searching Engine\u003c/h1\u003e\n\n[![Build Status](https://travis-ci.org/alaouimehdi1995/simplified-search-engine.png?branch=master)](https://travis-ci.org/alaouimehdi1995/simplified-search-engine)\n[![codecov](https://codecov.io/gh/alaouimehdi1995/simplified-search-engine/branch/master/graph/badge.svg)](https://codecov.io/gh/alaouimehdi1995/simplified-search-engine)\n\n\u003ch2\u003ethat crawls, scraps, indexes data and stores it into a database\u003c/h2\u003e\nThe program is written in Python Language, uses regex to parse HTML, and MultiThreading to go faster.\nThe database part is assured by MongoDB\nThe Project contains 4 files:\n\n\u003ch4\u003ePersonnalParser.py:\u003c/h4\u003e\n  - Contains PersonnalParser class, that gets HTML content, parses it, stores it and starts new PersonnalParser Thread for each link in the page content.\n  \n\u003ch4\u003eDBManager.py\u003c/h4\u003e\n  - Contains DBManager class, which assure the connexion with DB and inserting and/or finding operations.\n  \n\u003ch4\u003efill_database.py:\u003c/h4\u003e\n  - Contains the general settings like start URL, proxy settings and depth search. The first crawl Thread starts here.\n\n\u003ch4\u003emain.py\u003c/h4\u003e\n  - Contains the code that gets the user search, gets the database content and sorts the results by relevance.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Falaouimehdi1995%2Fsimplified-search-engine","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Falaouimehdi1995%2Fsimplified-search-engine","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Falaouimehdi1995%2Fsimplified-search-engine/lists"}