Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/utkarshahuja2003/wikisearch
https://github.com/utkarshahuja2003/wikisearch
Last synced: about 6 hours ago
JSON representation
- Host: GitHub
- URL: https://github.com/utkarshahuja2003/wikisearch
- Owner: UtkarshAhuja2003
- License: mit
- Created: 2024-03-15T17:11:34.000Z (8 months ago)
- Default Branch: main
- Last Pushed: 2024-07-09T17:21:39.000Z (4 months ago)
- Last Synced: 2024-07-09T21:59:46.858Z (4 months ago)
- Language: C++
- Size: 137 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# WikiSearch
A fast and efficient search engine built with C++ using Wikipedia Dump data. Optimized for quick and accurate information retrieval.
Wikipedia Dump (90 GB) - : http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-pages-articles.xml.bz2
## Statistics
- For 90 GB of data Wiki XML Dump :
+ Size of index ( primary+secondary ) : 9.12 GB
+ Size of Metadata : 863 MB
+ Time to index : 3hr 30min (average)
+ Time to search : 0.34 sec (average on 100 searches)
## Features- User search system for faster information retrieval
- Web-based interface
- Direct links to actual Wikipedia pages
- Stemming for improved search accuracy## Screenshots
![image](https://github.com/UtkarshAhuja2003/WikiSearch/assets/70762626/b4cd5f00-9cb4-4512-b156-dc8942e44a8e)![image](https://github.com/UtkarshAhuja2003/WikiSearch/assets/70762626/cfe4b65f-4e8f-418c-bfcf-65a120a9b1ea)
## Installation
To install and run this project, follow these steps:
1. **Clone the repository:**
```sh
git clone https://github.com/UtkarshAhuja2003/WikiSearch.git
cd WikiSearch
```
2. **Create a build directory:**```sh
mkdir build
cd build
```
3. **Generate the build files with CMake:**```sh
cmake ..
```
4. **Build the project:**```sh
make
```
5. **Run the application:**```sh
./WikiSearch
```## File Structure
```bash
├──.github # Github actions workflow
├──build # Build files for the project
├──client # Web frontend
├──dependencies
├──res # Posting List, Metadata, WikiDump
└──src # Source code
├── index # Parse Wikipedia Data
├── search
└──utils # File Management, Stemming and Classifiers
```