{"id":29583753,"url":"https://github.com/duaa-a/web-crawler-with-tf-idf","last_synced_at":"2025-07-19T23:38:42.827Z","repository":{"id":304404436,"uuid":"970183998","full_name":"DuaA-A/Web-Crawler-with-TF-IDF","owner":"DuaA-A","description":"a simple search engine using Term Frequency - Inverse Document Frequency algorithm","archived":false,"fork":false,"pushed_at":"2025-07-12T20:55:54.000Z","size":1359,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-07-12T22:25:32.893Z","etag":null,"topics":["java","search-engine","term-frequency","term-frequency-inverse-document-frequency","web-crawler"],"latest_commit_sha":null,"homepage":"","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/DuaA-A.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-04-21T16:02:33.000Z","updated_at":"2025-07-12T20:58:05.000Z","dependencies_parsed_at":"2025-07-12T22:38:33.523Z","dependency_job_id":null,"html_url":"https://github.com/DuaA-A/Web-Crawler-with-TF-IDF","commit_stats":null,"previous_names":["duaa-a/web-crawler-with-tf-idf"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/DuaA-A/Web-Crawler-with-TF-IDF","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DuaA-A%2FWeb-Crawler-with-TF-IDF","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DuaA-A%2FWeb-Crawler-with-TF-IDF/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DuaA-A%2FWeb-Crawler-with-TF-IDF/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DuaA-A%2FWeb-Crawler-with-TF-IDF/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/DuaA-A","download_url":"https://codeload.github.com/DuaA-A/Web-Crawler-with-TF-IDF/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DuaA-A%2FWeb-Crawler-with-TF-IDF/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":266042413,"owners_count":23867962,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["java","search-engine","term-frequency","term-frequency-inverse-document-frequency","web-crawler"],"created_at":"2025-07-19T23:38:42.332Z","updated_at":"2025-07-19T23:38:42.799Z","avatar_url":"https://github.com/DuaA-A.png","language":"Java","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003c!DOCTYPE html\u003e\n\u003chtml lang=\"en\"\u003e\n\u003chead\u003e\n  \u003cmeta charset=\"UTF-8\" /\u003e\n  \u003cmeta name=\"viewport\" content=\"width=device-width, initial-scale=1.0\"/\u003e\n\u003c/head\u003e\n\u003cbody\u003e\n\n  \u003ch1\u003eWeb Crawler With TF-IDF\u003c/h1\u003e\n  \u003cp\u003eA Java project implements a simple web crawler and search engine using the TF-IDF (Term Frequency - Inverse Document Frequency) algorithm. It processes crawled web pages, builds an inverted index, calculates TF-IDF scores, and supports search queries using a query processor.\u003c/p\u003e\n\n  \u003ch2\u003eFeatures\u003c/h2\u003e\n  \u003cul\u003e\n    \u003cli\u003e\u003cstrong\u003eWebCrawler.java\u003c/strong\u003e: Crawls web pages to collect data.\u003c/li\u003e\n    \u003cli\u003e\u003cstrong\u003eTextProcessing.java\u003c/strong\u003e: Tokenizes, filters, and cleans text data.\u003c/li\u003e\n    \u003cli\u003e\u003cstrong\u003eStemmer.java\u003c/strong\u003e: Performs word stemming for normalization.\u003c/li\u003e\n    \u003cli\u003e\u003cstrong\u003eInvertedIndex.java\u003c/strong\u003e: Builds and stores the inverted index for quick lookup.\u003c/li\u003e\n    \u003cli\u003e\u003cstrong\u003eTFIDFCalculator.java\u003c/strong\u003e: Calculates TF-IDF scores for indexed terms.\u003c/li\u003e\n    \u003cli\u003e\u003cstrong\u003eQueryProcessor.java\u003c/strong\u003e: Handles user queries and ranks results using TF-IDF.\u003c/li\u003e\n    \u003cli\u003e\u003cstrong\u003eMain.java\u003c/strong\u003e: Entry point for running the application.\u003c/li\u003e\n  \u003c/ul\u003e\n\n  \u003ch2\u003eTechnologies Used\u003c/h2\u003e\n  \u003cul\u003e\n    \u003cli\u003eJava\u003c/li\u003e\n    \u003cli\u003eBasic File I/O\u003c/li\u003e\n    \u003cli\u003eCollections Framework\u003c/li\u003e\n    \u003cli\u003eString Processing\u003c/li\u003e\n  \u003c/ul\u003e\n\n  \u003ch2\u003eHow to Run\u003c/h2\u003e\n  \u003cpre\u003e\njavac *.java\njava Main\n  \u003c/pre\u003e\n  \u003cp\u003eEnsure all Java files are in the same directory or set up your project structure accordingly.\u003c/p\u003e\n\n  \u003ch2\u003eLicense\u003c/h2\u003e\n  \u003cp\u003eThis project is for educational purposes.\u003c/p\u003e\n\n\u003c/body\u003e\n\u003c/html\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fduaa-a%2Fweb-crawler-with-tf-idf","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fduaa-a%2Fweb-crawler-with-tf-idf","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fduaa-a%2Fweb-crawler-with-tf-idf/lists"}