https://github.com/gre1wy/oaa
KPI IPT course, 5 semester
https://github.com/gre1wy/oaa
db inverted-index lexer parser
Last synced: over 1 year ago
JSON representation
KPI IPT course, 5 semester
- Host: GitHub
- URL: https://github.com/gre1wy/oaa
- Owner: gre1wy
- Created: 2024-10-16T00:17:28.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-12-17T18:57:06.000Z (over 1 year ago)
- Last Synced: 2025-01-30T08:43:09.451Z (over 1 year ago)
- Topics: db, inverted-index, lexer, parser
- Language: Jupyter Notebook
- Homepage:
- Size: 501 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Algorithm Analysis Foundations
# Text Collection Management System
## Overview
This is a console-based program that allows users to manage a collection of text documents and perform efficient full-text search using an inverted index data structure
## Features
The program supports the following commands:
1. **CREATE `;`** - Creates a new collection with the specified name.
2. **INSERT ` "";`** - Adds a new document to the specified collection.
3. **PRINT_INDEX `;`** - Prints the internal structure of the inverted index built for the specified collection.
4. **SEARCH ` [WHERE ];`** - Searches for documents in the specified collection that match the given query. The query can be:
- `""` - Finds documents containing the specified keyword.
- `"" - ""` - Finds documents containing any word between `` and `` (inclusive).
- `"" ""` - Finds documents where `` and `` are exactly `N` words apart, regardless of their positions and order.
## Implementation Details
The program is implemented using the following components:
1. **Lexer (`lexer.py`)**: Responsible for tokenizing the input text into a sequence of tokens (e.g., keywords, identifiers, quoted strings).
2. **Parser (`parser.py`)**: Parses the sequence of tokens and executes the corresponding commands.
3. **Inverted Index (`invertedIndex.py`)**: Implements the inverted index data structure, which maps words to the documents they appear in and their positions within those documents.
4. **Database (`invertedIndex.py`)**: Manages the collections of documents and their associated inverted indexes.
5. **Main Entry Point (`main.py`)**: Provides the command-line interface and coordinates the interaction between the other components.
## Usage
...
## Error Handling
The program is designed to detect and handle various error situations, such as:
- Incorrect syntax of commands
- Attempting to create a collection with a name that is already in use
- Attempting to execute a command on a non-existent collection
When an error occurs, the program will output an appropriate error message to the user.