https://github.com/walterxie/fastreader

Read big data in a delimited text format fast
https://github.com/walterxie/fastreader

Last synced: 2 months ago
JSON representation

Read big data in a delimited text format fast

Host: GitHub
URL: https://github.com/walterxie/fastreader
Owner: walterxie
Created: 2016-06-15T02:29:39.000Z (almost 9 years ago)
Default Branch: master
Last Pushed: 2016-06-17T03:29:52.000Z (almost 9 years ago)
Last Synced: 2025-02-02T02:28:00.225Z (4 months ago)
Language: C++
Size: 15.6 KB
Stars: 0
Watchers: 3
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

        # Fast Reader

Read big data fast, which is provided in a delimited text format .

The current performance is to take about 8 or 9 minutes to process one input file (structured like [100reads.txt](data/100reads.txt)) 

sized about 64G using Intel E7-2870 2.4GHz + 96G memory. 

## Input file

It has to be a delimited text file, such as tab delimited. The 1st row is column names and ignored, the 1st column is key of the *unordered_map*. 

More arguments to control this will be developed in future.

## Compile

```C++

g++ --std=c++0x -o FastReader main.cpp FastReader.cpp str_search.c -Wall -g -O2

./FastReader data/100reads.txt

```

## Extension

Modify the function *assign_line_stat_map* and *struct LineStat* to implement your own computation. 

The current process computes the minimum and maximum number from the 2nd column in the given input file.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/walterxie/fastreader

Awesome Lists containing this project

README