Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/roaajadaa/sparksearchengine

Build a small-scale spark-based search engine which searches in a list of documents to find those answering a user’s query.
https://github.com/roaajadaa/sparksearchengine

bigdata indexing inverted-index mongodb scala search-engine spark

Last synced: 2 days ago
JSON representation

Build a small-scale spark-based search engine which searches in a list of documents to find those answering a user’s query.

Host: GitHub
URL: https://github.com/roaajadaa/sparksearchengine
Owner: Roaajadaa
Created: 2024-11-02T19:28:43.000Z (3 months ago)
Default Branch: main
Last Pushed: 2024-11-02T19:43:00.000Z (3 months ago)
Last Synced: 2024-12-21T02:22:34.353Z (about 2 months ago)
Topics: bigdata, indexing, inverted-index, mongodb, scala, search-engine, spark
Language: Scala
Homepage:
Size: 62.5 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# SparkSearchEngine

## Description

Build a small-scale spark-based search engine which searches in a list of documents to find those answering a user’s query.

## Key Features

### Inverted Index Construction
- The project reads the content of multiple documents and generates an inverted index that records each unique word, the count of documents containing that word, and a sorted list of those documents.

### Data Storage
- The resulting inverted index is saved in a file (`wholeInvertedIndex.txt`) and subsequently stored in a MongoDB collection named "dictionary" for efficient retrieval.

### Query Processing
- Users can input queries to search for specific words or phrases. The system retrieves and displays the relevant document identifiers, showcasing the documents that match the search criteria.

### Sorting and Organization
- Both the words in the index and the associated document lists are sorted alphabetically and in ascending order, ensuring clarity and ease of use.

## Conclusion
This project combines big data processing techniques with information retrieval principles, providing a practical application of Spark and MongoDB in building scalable search solutions.