{"id":13671436,"url":"https://github.com/nicholaskajoh/devsearch","last_synced_at":"2026-01-16T16:25:20.827Z","repository":{"id":53460409,"uuid":"135925761","full_name":"nicholaskajoh/devsearch","owner":"nicholaskajoh","description":"A web search engine built with Python which uses TF-IDF and PageRank to sort search results.","archived":false,"fork":false,"pushed_at":"2021-03-30T08:32:58.000Z","size":37,"stargazers_count":54,"open_issues_count":0,"forks_count":14,"subscribers_count":6,"default_branch":"master","last_synced_at":"2024-11-11T09:43:43.041Z","etag":null,"topics":["crawler","flask","mongodb","pagerank","python","scrapy","search","search-engine","spider","tf-idf"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/nicholaskajoh.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2018-06-03T17:30:35.000Z","updated_at":"2024-08-22T16:30:09.000Z","dependencies_parsed_at":"2022-09-09T14:00:55.070Z","dependency_job_id":null,"html_url":"https://github.com/nicholaskajoh/devsearch","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nicholaskajoh%2Fdevsearch","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nicholaskajoh%2Fdevsearch/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nicholaskajoh%2Fdevsearch/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nicholaskajoh%2Fdevsearch/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/nicholaskajoh","download_url":"https://codeload.github.com/nicholaskajoh/devsearch/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":251187106,"owners_count":21549583,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["crawler","flask","mongodb","pagerank","python","scrapy","search","search-engine","spider","tf-idf"],"created_at":"2024-08-02T09:01:09.651Z","updated_at":"2026-01-16T16:25:20.801Z","avatar_url":"https://github.com/nicholaskajoh.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"# devsearch\nA web search engine built with Python which uses TF-IDF and PageRank to sort search results.\n\n## Stack\n- Flask (Python 3)\n- Scrapy\n- LXML\n- MongoEngine (MongoDB)\n- Bootstrap 4\n\n## Requirements\n- Docker\n- Docker Compose\n\n## Setup\n- Install Docker and Docker Compose.\n- Clone or download this repo.\n- Create a *.env* file from *.env.example*.\n- Run `docker-compose up`.\n\n## Crawling\n- Update the `SPIDER_ALLOWED_DOMAINS` variable in *.env* with domains you want the spider to crawl.\n- Add at least one url to the **crawl_list** collection (in MongoDB) for the spider to start with.\n- Run `docker-compose run web flask crawl` to crawl new web pages.\n- You can add the `--recrawl` option to update pages already crawled: `docker-compose run web flask crawl --recrawl True`.\n\n## Indexing\n- To index crawled pages, run `docker-compose run web flask index`.\n- To compute TFIDF, run the following one after the other:\n    - `docker-compose run web flask idf`\n    - `docker-compose run web flask tfidf`\n- To compute PageRank, run `docker-compose run web flask rank`.\n- To compute page-word score, run `docker-compose run web flask score`.\n\n## Deploy\n- Create a *.env.secret* file from *.env.secret.example*.\n- Run `docker-compose -f docker-compose.prod.yml up --build -d`.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnicholaskajoh%2Fdevsearch","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fnicholaskajoh%2Fdevsearch","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnicholaskajoh%2Fdevsearch/lists"}