https://github.com/jackdoe/inverted
naive one file per term stored inverted index
https://github.com/jackdoe/inverted
Last synced: 4 days ago
JSON representation
naive one file per term stored inverted index
- Host: GitHub
- URL: https://github.com/jackdoe/inverted
- Owner: jackdoe
- Created: 2014-06-01T19:24:04.000Z (about 12 years ago)
- Default Branch: master
- Last Pushed: 2014-09-29T20:52:24.000Z (over 11 years ago)
- Last Synced: 2026-06-14T01:32:46.776Z (4 days ago)
- Language: C
- Size: 398 KB
- Stars: 2
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.txt
Awesome Lists containing this project
README
extremely naive inverted index implementation that uses only stored and mmaped array.
index format:
struct entry {
s32 id;
u16 freq;
u16 plen;
u8 payload[0];
} __attribute__((packed));
struct stats {
u32 last_id;
u32 n;
u32 max_payload_size;
} __attribute__((packed));
struct stored {
struct stats stats;
u8 begin[0];
} __attribute__((packed));
the payload size is decided once per index
creation, and it is the same for every entry
in the array.
for example:
StoredList documents("_documents.idx",128);
// creating documents 1 2 3 (with payload 0 - which will use sizeof(int) bytes)
documents.insert(1)
documents.insert(2)
documents.insert(3)
the Forward and the Inverted indexes are both just StoredLists
// adding the term 'new' to documents 1,2,3
StoredList t_new("_new.idx");
t_new.insert(1);
t_new.insert(2);
t_new.insert(3);
StoredList t_new("_york.idx");
t_york.insert(1);
t_york.insert(3);
StoredList t_state("_state.idx");
t_york.insert(3);
fetching entry from the forward index:
documents.entry_at(3)
simple query:
TermQuery q_new(&t_new);
TermQuery q_york(&t_york);
TermQuery q_state(&t_state);
BoolMustQuery q_new_york;
BoolMustQuery q_new_york_state;
q_new_york.add(&q_new);
q_new_york.add(&q_york);
q_new_york_state.add(&q_new_york);
q_new_york_state.add(&q_state);
Collector collector(q_new_york_state);
for (auto s : collector.topN(3,documents)) {
printf("doc: %d, score: %f\n",s.id,s.score);
}