https://github.com/rafaelmoraes003/trybe-is-not-google
Program that simulates a document indexing algorithm similar to Google's, being able to identify occurrences of terms in TXT files.
https://github.com/rafaelmoraes003/trybe-is-not-google
algorithms data-structures pytest python
Last synced: about 1 year ago
JSON representation
Program that simulates a document indexing algorithm similar to Google's, being able to identify occurrences of terms in TXT files.
- Host: GitHub
- URL: https://github.com/rafaelmoraes003/trybe-is-not-google
- Owner: rafaelmoraes003
- Created: 2023-02-06T03:33:16.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2023-02-06T19:28:29.000Z (over 3 years ago)
- Last Synced: 2025-03-21T10:52:45.628Z (about 1 year ago)
- Topics: algorithms, data-structures, pytest, python
- Language: Python
- Homepage:
- Size: 39.1 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
Trybe Is Not Google (TING)
###
In this project, a program was implemented to simulate a document indexing algorithm similar to Google's. The program is able to identify occurrences of terms in TXT files.
###
Technologies used
###
###
How to use the application
###
Clone the application using the git clone command. After that, enter the project folder using the command `cd trybe-is-not-google`.
###
How to run the application
###
1. Create the virtual environment for the project
- `python3 -m venv .venv && source .venv/bin/activate`
2. Install the dependencies
- `python3 -m pip install -r dev-requirements.txt`
###
Using the indexing tool
Use the `python3 -m ting_word_searches.word_search` command to be able to use the tool.
Explanation
The process function is responsible for adding data from files in the queue (this function takes as arguments `the path of the file with the data` and an `instance of the Queue class`.
After the class instance has passed the `process` function, it can be used in the search tool.
The `exists_word` and `search_by_word` functions take two arguments: `the word to be searched for` and an `instance of the Queue class`.
The return of the `exists_word("sistema", queue_instance)` function is something like:
```Python
[
{
"palavra": "sistema",
"arquivo": "statics/new_globalized_paradigm-min.txt",
"ocorrencias": [
{"linha": 12},
{"linha": 18}
],
},
]
```
Similarly, the return of the `search_by_word("sistema", queue_instance)` function is something like:
```Python
[
{
"palavra": "sistema",
"arquivo": "statics/novo_paradigma_globalizado-min.txt",
"ocorrencias": [
{
"linha": 12,
"conteudo": "Neste sentido [...] do sistema [...]",
},
{
"linha": 18,
"conteudo": "No mundo atual, [...] estabelecimento do sistema [...]",
},
],
},
]
```