An open API service indexing awesome lists of open source software.

https://github.com/rafaelmoraes003/trybe-is-not-google

Program that simulates a document indexing algorithm similar to Google's, being able to identify occurrences of terms in TXT files.
https://github.com/rafaelmoraes003/trybe-is-not-google

algorithms data-structures pytest python

Last synced: about 1 year ago
JSON representation

Program that simulates a document indexing algorithm similar to Google's, being able to identify occurrences of terms in TXT files.

Awesome Lists containing this project

README

          

Trybe Is Not Google (TING)

###

In this project, a program was implemented to simulate a document indexing algorithm similar to Google's. The program is able to identify occurrences of terms in TXT files.

###

Technologies used

###


python logo

###

How to use the application

###

Clone the application using the git clone command. After that, enter the project folder using the command `cd trybe-is-not-google`.

###

How to run the application

###

1. Create the virtual environment for the project
- `python3 -m venv .venv && source .venv/bin/activate`

2. Install the dependencies
- `python3 -m pip install -r dev-requirements.txt`

###

Using the indexing tool

Use the `python3 -m ting_word_searches.word_search` command to be able to use the tool.

Explanation

The process function is responsible for adding data from files in the queue (this function takes as arguments `the path of the file with the data` and an `instance of the Queue class`.

After the class instance has passed the `process` function, it can be used in the search tool.

The `exists_word` and `search_by_word` functions take two arguments: `the word to be searched for` and an `instance of the Queue class`.

The return of the `exists_word("sistema", queue_instance)` function is something like:

```Python
[
{
"palavra": "sistema",
"arquivo": "statics/new_globalized_paradigm-min.txt",
"ocorrencias": [
{"linha": 12},
{"linha": 18}
],
},
]
```

Similarly, the return of the `search_by_word("sistema", queue_instance)` function is something like:

```Python
[
{
"palavra": "sistema",
"arquivo": "statics/novo_paradigma_globalizado-min.txt",
"ocorrencias": [
{
"linha": 12,
"conteudo": "Neste sentido [...] do sistema [...]",
},
{
"linha": 18,
"conteudo": "No mundo atual, [...] estabelecimento do sistema [...]",
},
],
},
]
```