An open API service indexing awesome lists of open source software.

https://github.com/lexiestleszek/document_ranking

Document ranking based on heat maps.
https://github.com/lexiestleszek/document_ranking

Last synced: 3 months ago
JSON representation

Document ranking based on heat maps.

Awesome Lists containing this project

README

        

### Document Ranking using Heatmaps

This script implements a ranking system using a max heap data structure. It reads input from standard input and processes queries to rank documents based on relevance scores.

A heap is a specialized tree-based data structure that satisfies the heap property. Here are the key points about heaps:

Definition and Properties
A heap is a complete binary tree data structure.
It follows the heap property: for every node, the value of its children is greater than or equal to its own value (for max-heap) or less than or equal to its own value (for min-heap)

```python
import sys
```
This line imports the `sys` module, which provides access to some variables and functions used or maintained by the interpreter.

```python
from heapq import heappush, heappop
```
This line imports two functions (`heappush` and `heappop`) from the `heapq` module. These functions are used for efficient priority queue operations.

```python
def main_func():
```
This defines a function named `main_func()` that will contain the main logic of the program.

```python
input = sys.stdin.readline
```
This line assigns the `readline()` method of `sys.stdin` to a variable called `input`. This allows us to read input lines efficiently without needing to call `sys.stdin.readline()` every time we want to read input.

```python
n = int(input())
ai = list(map(int, input().split()))
d = int(input())
fi = [list(map(int, input().split())) for _ in range(d)]
```
These lines read input and store it in variables:
- `n`: An integer representing the length of `ai`.
- `ai`: A list of integers representing document frequencies.
- `d`: An integer representing the number of documents.
- `fi`: A list of lists, where each inner list represents a document and contains integer values.

```python
relevance = [sum(f*a for f, a in zip(doc, ai)) for doc in fi]
```
This line calculates the relevance score for each document. It uses a list comprehension to iterate over each document (`doc`) in `fi`, zip it with `ai`, multiply corresponding elements, and sum them up.

```python
# Create a max heap of (relevance, index) pairs
heap = [(-r, i+1) for i, r in enumerate(relevance)]
heap.sort() # Convert to max heap
```
These lines create a max heap:
- The list comprehension creates tuples of (-relevance, index) pairs.
- The `sort()` method converts this list into a max heap.

```python
q = int(input())
```
This reads the number of queries to be processed.

```python
for _ in range(q):
```
This starts a loop that will execute q times.

```python
query = list(map(int, input().split()))
```
This reads a query and splits it into integers.

```python
if query[1] == 1:
```
This checks if the query is asking for the top k documents.

```python
k = query[2]
if k >= d:
result = [idx for _, idx in heap]
else:
result = [idx for _, idx in heap[:k]]
print(' '.join(map(str, result)))
```
These lines handle the query for top k documents:
- If k is greater than or equal to the number of documents, it prints all document indices.
- Otherwise, it prints the top k document indices.
- Both cases use list comprehensions to extract the indices from the heap.

```python
elif query[1] == 2:
```
This checks if the query is updating a document's relevance.

```python
i, j, v = query[1:]
old_value = fi[i-1][j-1]
fi[i-1][j-1] = v
relevance[i-1] += (v - old_value) * ai[j-1]
```
These lines update a document's relevance:
- Extract the document index, term index, and new value from the query.
- Store the old value and update the document.
- Calculate the change in relevance and update it.

```python
# Update the heap
for idx, (r, doc_idx) in enumerate(heap):
if doc_idx == i:
heap[idx] = (-relevance[i-1], doc_idx)
break
heap.sort() # Re-sort the heap
```
These lines update the heap:
- Find the relevant entry in the heap.
- Update its relevance and index.
- Sort the heap again to maintain the max heap property.

```python
main_func()
```
This calls the `main_func()` to start executing the program.

This code appears to be implementing a document ranking system, likely for a search engine or similar application. It uses a max heap to efficiently maintain the top-ranked documents based on their relevance scores.

Citations: