https://github.com/fpopic/master-indexers

(Class) C++ script to preprocess input files (mapping ids to indices and vice versa) on a single core machine.
https://github.com/fpopic/master-indexers

cpp id-to-index indexing-engine recommender-system

Last synced: 3 months ago
JSON representation

(Class) C++ script to preprocess input files (mapping ids to indices and vice versa) on a single core machine.

Host: GitHub
URL: https://github.com/fpopic/master-indexers
Owner: fpopic
Created: 2017-05-24T21:13:20.000Z (almost 8 years ago)
Default Branch: master
Last Pushed: 2017-07-06T22:07:31.000Z (almost 8 years ago)
Last Synced: 2025-01-10T19:42:14.709Z (4 months ago)
Topics: cpp, id-to-index, indexing-engine, recommender-system
Language: C++
Homepage:
Size: 10.7 KB
Stars: 0
Watchers: 3
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

        #### Usage:

Used to preprocess input files on a single machine (because it constructs shared lookup table, 

therefore synchronisation and locking for mutable collection are needed on a multicore-distributed environment).

 

It maps ids to indices (e.g. userId => userIndex, itemId => itemIndex) and saves results to a file with it's lookup table. 

Indices are prefered over ids because linear algebra libraries operate with indices.

Scala time > 45 min (only for item-item matrix)\

C++   time < 5 min  (for both matrices)

Unindexer is used to revert indices to ids after the recommendations are computed.

#### Compile:

using g++:

```

g++ -static -std=c++11 -O3 indexer.cpp -o indexer

g++ -static -std=c++11 -O3 unindexer.cpp -o unindexer

```

or with cmake:

```

mkdir cmake-build-debug

cmake --build cmake-build-debug --target indexer -- -j 4

cmake --build cmake-build-debug --target unindexer -- -j 4

```

#### Run indexer:

```./indexer  ```

Output files will be in the same folder as input with .indexed and .lookup suffix.

#### Output:

```

Header: itemId1,itemId2,a,b,c,d

Processed lines:1000000

...

Processed lines:71000000

Output file with 71930771 lines saved.

Lookup file with 67052 entries saved.

Time: 166s

Header: userId,date,itemId,quantity

Processed lines:1000000

...

Processed lines:29000000

Output file with 29154707 lines saved.

Lookup file with 20998 entries saved.

Time: 50s

Process finished with exit code 0

```

#### Run unindexer:

```./unindexer   ```

 file must be formated like:

```"userId:itemId1,itemId2,...,itemIdX"```

Output files will be in the same folder as input with .unindexed suffix formated like:

```"userIndex:itemIndex1,itemIndex2,...,itemIndexX"```

 

where k is the number of top k recommendations for a user, so 0 <= X <= k

#### Output:

```

Processed lines:1000

...

Processed lines:2000

Output file with 21225 lines saved.

Time: 4s

Process finished with exit code 0

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/fpopic/master-indexers

Awesome Lists containing this project

README