{"id":24569128,"url":"https://github.com/ryomendev/codequest","last_synced_at":"2025-03-17T05:28:35.226Z","repository":{"id":243411673,"uuid":"812351047","full_name":"RyomenDev/CodeQuest","owner":"RyomenDev","description":"The Document Search Engine is a web application designed to facilitate efficient searching and retrieval of information from a collection of documents. It utilizes various natural language processing techniques to preprocess the documents, extract keywords, calculate term frequencies, and generate relevant search results based on user queries.","archived":false,"fork":false,"pushed_at":"2024-06-09T10:08:57.000Z","size":8968,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-01-23T14:56:11.970Z","etag":null,"topics":["bm25","express-ejs","natural","node-js","wink-lemmatizer"],"latest_commit_sha":null,"homepage":"https://codequest-jalp.onrender.com","language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/RyomenDev.png","metadata":{"files":{"readme":"readMe.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-06-08T16:33:25.000Z","updated_at":"2024-12-25T12:15:34.000Z","dependencies_parsed_at":"2024-06-08T18:14:20.886Z","dependency_job_id":"9b5e0bee-be38-4f1b-9190-4e9eff9d333c","html_url":"https://github.com/RyomenDev/CodeQuest","commit_stats":null,"previous_names":["ryomendev/codequest"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/RyomenDev%2FCodeQuest","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/RyomenDev%2FCodeQuest/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/RyomenDev%2FCodeQuest/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/RyomenDev%2FCodeQuest/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/RyomenDev","download_url":"https://codeload.github.com/RyomenDev/CodeQuest/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243979237,"owners_count":20378174,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bm25","express-ejs","natural","node-js","wink-lemmatizer"],"created_at":"2025-01-23T14:56:18.176Z","updated_at":"2025-03-17T05:28:35.190Z","avatar_url":"https://github.com/RyomenDev.png","language":"JavaScript","readme":"![\"View\"](./Media/page1.png)\n\n# \u003cu\u003e**_CodeQuest_**\u003c/u\u003e \u003ca href=\"https://codequest-jalp.onrender.com/\" style=\"font-size:smaller;\"\u003e(visit)\u003c/a\u003e\n\n## Project Description: Document Search Engine\n\n## \u003cu\u003eOverview:\u003c/u\u003e\n\nThe Document Search Engine is a web application designed to facilitate efficient searching and retrieval of information from a collection of documents. It utilizes various natural language processing techniques to preprocess the documents, extract keywords, calculate term frequencies, and generate relevant search results based on user queries.\n\n## \u003cu\u003eKey Features:\u003c/u\u003e\n\n- \u003cb\u003e\u003cspan style=\"font-size:larger;\"\u003eDocument Preprocessing:\u003c/span\u003e\u003c/b\u003e\n\n  - Removes stopwords and punctuation from the documents.\n  - Tokenizes and normalizes the text to extract meaningful keywords.\n\n- \u003cb\u003e\u003cspan style=\"font-size:larger;\"\u003eKeyword Generation:\u003c/span\u003e\u003c/b\u003e\n\n  - Generates a unique set of keywords from the document corpus.\n  - Organizes keywords for efficient indexing and searching.\n\n- \u003cb\u003e\u003cspan style=\"font-size:larger;\"\u003eTerm Frequency Calculation:\u003c/span\u003e\u003c/b\u003e\n\n  - Calculates the term frequency (TF) of each keyword within each document.\n  - Measures the frequency of occurrence of keywords to determine their significance.\n\n- \u003cb\u003e\u003cspan style=\"font-size:larger;\"\u003eInverse Document Frequency Calculation:\u003c/span\u003e\u003c/b\u003e\n\n  - Computes the inverse document frequency (IDF) of each keyword across the document corpus.\n  - Determines the importance of keywords based on their rarity across documents.\n\n- \u003cb\u003e\u003cspan style=\"font-size:larger;\"\u003eTF-IDF Vectorization:\u003c/span\u003e\u003c/b\u003e\n\n  - Combines TF and IDF to calculate the TF-IDF (Term Frequency-Inverse Document Frequency) vector for each document.\n  - Represents documents as vectors to quantify their relevance to search queries.\n\n- \u003cb\u003e\u003cspan style=\"font-size:larger;\"\u003eSearch Engine Functionality:\u003c/span\u003e\u003c/b\u003e\n  - Processes user queries and matches them against the indexed documents. - Ranks search results based on TF-IDF similarity and other relevance metrics.\n  - Provides relevant document snippets and links for user exploration.\n\n## \u003cu\u003eTechnologies Used:\u003c/u\u003e\n\n- \u003cb\u003e\u003cspan style=\"font-size:larger;\"\u003eNode.js and Express.js:\u003c/span\u003e\u003c/b\u003e Backend framework for handling HTTP requests, routing, and server-side logic.\n\n- \u003cb\u003e\u003cspan style=\"font-size:larger;\"\u003eJavaScript (ES6+):\u003c/span\u003e\u003c/b\u003e Programming language for implementing server-side and client-side functionalities.\n\n- \u003cb\u003e\u003cspan style=\"font-size:larger;\"\u003eNatural Language Processing (NLP) Libraries:\u003c/span\u003e\u003c/b\u003e\n  - stopword: Removes stopwords (common words) from documents.\n  - remove-punctuation: Eliminates punctuation marks from text.\n  - wink-lemmatizer: Lemmatizes words to their base form for better analysis.\n  - number-to-words and words-to-numbers: Convert numbers to words and vice versa for enhanced query processing.\n  - natural: Library for natural language processing tasks such as tokenization and lemmatization.\n  - string-similarity: Computes string similarity metrics for search and matching purposes.\n  - simple-spellchecker: Provides spell checking functionality for text processing tasks.\n\nThese dependencies enhance the capabilities of the Document Search Engine by providing additional functionality for text processing, spell checking, string similarity comparison, and validation tasks.\n\n- \u003cb\u003e\u003cspan style=\"font-size:larger;\"\u003eData Storage:\u003c/span\u003e\u003c/b\u003e\n\n  - File System (fs): Manages reading and writing documents and intermediate results.\n\n- \u003cb\u003e\u003cspan style=\"font-size:larger;\"\u003eWeb Development:\u003c/span\u003e\u003c/b\u003e\n\n  - Express EJS: Templating engine for rendering dynamic HTML content on the server side.\n  - HTML5 and CSS3: Frontend markup and styling for the user interface.\n\n- \u003cb\u003e\u003cspan style=\"font-size:larger;\"\u003eDependency Management:\u003c/span\u003e\u003c/b\u003e\n\n  - npm (Node Package Manager): Manages project dependencies and scripts for development and production environments.\n\n## \u003cu\u003eWorkflow:\u003c/u\u003e\n\n- \u003cb\u003e\u003cspan style=\"font-size:larger;\"\u003eDocument Processing:\u003c/span\u003e\u003c/b\u003e\n\n  - Documents are preprocessed to extract meaningful keywords and remove noise.\n  - Preprocessing involves tokenization, stopword removal, punctuation removal, and lemmatization.\n\n- \u003cb\u003e\u003cspan style=\"font-size:larger;\"\u003eIndexing:\u003c/span\u003e\u003c/b\u003e\n\n  - Keywords are indexed and stored for efficient retrieval during search operations.\n  - Term frequencies (TF) and inverse document frequencies (IDF) are calculated and stored for each keyword.\n\n- \u003cb\u003e\u003cspan style=\"font-size:larger;\"\u003eSearch Operations:\u003c/span\u003e\u003c/b\u003e\n\n  - User queries are processed and matched against the indexed keywords.\n  - Relevant documents are retrieved based on TF-IDF similarity and other ranking criteria.\n  - Search results are presented to the user along with document snippets and links for further exploration.\n\n- \u003cb\u003e\u003cspan style=\"font-size:larger;\"\u003eUser Interface:\u003c/span\u003e\u003c/b\u003e\n  - The web application provides a user-friendly interface for querying and browsing documents.\n  - Search results are displayed in a clear and organized manner, facilitating easy navigation and exploration.\n\n## \u003cu\u003eDependencies:\u003c/u\u003e\n\n**_express:_** Web framework for Node.js.\n**_ejs:_** Templating engine for generating dynamic HTML content.\n**_stopword:_** Library for removing stopwords from text.\n**_remove-punctuation:_** Utility for removing punctuation marks from strings.\n**_wink-lemmatizer:_** Tool for lemmatizing words to their base form.\n**_number-to-words and words-to-numbers:_** Modules for converting numbers to words and vice versa.\n\n## \u003cu\u003eConclusion:\u003c/u\u003e\n\nThe Document Search Engine project aims to enhance document retrieval and exploration by leveraging natural language processing techniques and advanced indexing algorithms. It provides users with a powerful and intuitive tool for searching, analyzing, and extracting insights from large document collections. With its robust functionality and user-friendly interface, the application serves as a valuable resource for researchers, students, and professionals seeking to efficiently access and navigate textual information.\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fryomendev%2Fcodequest","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fryomendev%2Fcodequest","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fryomendev%2Fcodequest/lists"}