Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/roaajadaa/sparksearchengine
Build a small-scale spark-based search engine which searches in a list of documents to find those answering a user’s query.
https://github.com/roaajadaa/sparksearchengine
bigdata indexing inverted-index mongodb scala search-engine spark
Last synced: 15 days ago
JSON representation
Build a small-scale spark-based search engine which searches in a list of documents to find those answering a user’s query.
- Host: GitHub
- URL: https://github.com/roaajadaa/sparksearchengine
- Owner: Roaajadaa
- Created: 2024-11-02T19:28:43.000Z (15 days ago)
- Default Branch: main
- Last Pushed: 2024-11-02T19:43:00.000Z (15 days ago)
- Last Synced: 2024-11-02T20:22:40.324Z (15 days ago)
- Topics: bigdata, indexing, inverted-index, mongodb, scala, search-engine, spark
- Language: Scala
- Homepage:
- Size: 0 Bytes
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# SparkSearchEngine
## Description
Build a small-scale spark-based search engine which searches in a list of documents to find those answering a user’s query.
## Key Features
### Inverted Index Construction
- The project reads the content of multiple documents and generates an inverted index that records each unique word, the count of documents containing that word, and a sorted list of those documents.### Data Storage
- The resulting inverted index is saved in a file (`wholeInvertedIndex.txt`) and subsequently stored in a MongoDB collection named "dictionary" for efficient retrieval.### Query Processing
- Users can input queries to search for specific words or phrases. The system retrieves and displays the relevant document identifiers, showcasing the documents that match the search criteria.### Sorting and Organization
- Both the words in the index and the associated document lists are sorted alphabetically and in ascending order, ensuring clarity and ease of use.## Conclusion
This project combines big data processing techniques with information retrieval principles, providing a practical application of Spark and MongoDB in building scalable search solutions.