https://github.com/fpopic/master-indexers
(Class) C++ script to preprocess input files (mapping ids to indices and vice versa) on a single core machine.
https://github.com/fpopic/master-indexers
cpp id-to-index indexing-engine recommender-system
Last synced: 3 months ago
JSON representation
(Class) C++ script to preprocess input files (mapping ids to indices and vice versa) on a single core machine.
- Host: GitHub
- URL: https://github.com/fpopic/master-indexers
- Owner: fpopic
- Created: 2017-05-24T21:13:20.000Z (almost 8 years ago)
- Default Branch: master
- Last Pushed: 2017-07-06T22:07:31.000Z (almost 8 years ago)
- Last Synced: 2025-01-10T19:42:14.709Z (4 months ago)
- Topics: cpp, id-to-index, indexing-engine, recommender-system
- Language: C++
- Homepage:
- Size: 10.7 KB
- Stars: 0
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
#### Usage:
Used to preprocess input files on a single machine (because it constructs shared lookup table,
therefore synchronisation and locking for mutable collection are needed on a multicore-distributed environment).
It maps ids to indices (e.g. userId => userIndex, itemId => itemIndex) and saves results to a file with it's lookup table.Indices are prefered over ids because linear algebra libraries operate with indices.
Scala time > 45 min (only for item-item matrix)\
C++ time < 5 min (for both matrices)Unindexer is used to revert indices to ids after the recommendations are computed.
#### Compile:
using g++:
```
g++ -static -std=c++11 -O3 indexer.cpp -o indexer
g++ -static -std=c++11 -O3 unindexer.cpp -o unindexer
```or with cmake:
```
mkdir cmake-build-debug
cmake --build cmake-build-debug --target indexer -- -j 4
cmake --build cmake-build-debug --target unindexer -- -j 4
```#### Run indexer:
```./indexer ```
Output files will be in the same folder as input with .indexed and .lookup suffix.
#### Output:
```
Header: itemId1,itemId2,a,b,c,d
Processed lines:1000000
...
Processed lines:71000000
Output file with 71930771 lines saved.
Lookup file with 67052 entries saved.
Time: 166sHeader: userId,date,itemId,quantity
Processed lines:1000000
...
Processed lines:29000000
Output file with 29154707 lines saved.
Lookup file with 20998 entries saved.
Time: 50sProcess finished with exit code 0
```#### Run unindexer:
```./unindexer ```
file must be formated like:
```"userId:itemId1,itemId2,...,itemIdX"```
Output files will be in the same folder as input with .unindexed suffix formated like:
```"userIndex:itemIndex1,itemIndex2,...,itemIndexX"```
where k is the number of top k recommendations for a user, so 0 <= X <= k#### Output:
```
Processed lines:1000
...
Processed lines:2000
Output file with 21225 lines saved.
Time: 4sProcess finished with exit code 0
```