https://github.com/walterxie/fastreader
Read big data in a delimited text format fast
https://github.com/walterxie/fastreader
Last synced: 2 months ago
JSON representation
Read big data in a delimited text format fast
- Host: GitHub
- URL: https://github.com/walterxie/fastreader
- Owner: walterxie
- Created: 2016-06-15T02:29:39.000Z (almost 9 years ago)
- Default Branch: master
- Last Pushed: 2016-06-17T03:29:52.000Z (almost 9 years ago)
- Last Synced: 2025-02-02T02:28:00.225Z (4 months ago)
- Language: C++
- Size: 15.6 KB
- Stars: 0
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Fast Reader
Read big data fast, which is provided in a delimited text format .The current performance is to take about 8 or 9 minutes to process one input file (structured like [100reads.txt](data/100reads.txt))
sized about 64G using Intel E7-2870 2.4GHz + 96G memory.## Input file
It has to be a delimited text file, such as tab delimited. The 1st row is column names and ignored, the 1st column is key of the *unordered_map*.
More arguments to control this will be developed in future.## Compile
```C++
g++ --std=c++0x -o FastReader main.cpp FastReader.cpp str_search.c -Wall -g -O2./FastReader data/100reads.txt
```## Extension
Modify the function *assign_line_stat_map* and *struct LineStat* to implement your own computation.
The current process computes the minimum and maximum number from the 2nd column in the given input file.