https://github.com/aguven6/inmemory-data-processor
Convert tabular data to columnar data with index. Aim is to process huge data quicker especially in aggregation operation
https://github.com/aguven6/inmemory-data-processor
columnar-storage data data-structures parallel-computing parallel-programming processing
Last synced: about 2 months ago
JSON representation
Convert tabular data to columnar data with index. Aim is to process huge data quicker especially in aggregation operation
- Host: GitHub
- URL: https://github.com/aguven6/inmemory-data-processor
- Owner: aguven6
- License: mit
- Created: 2024-12-28T08:25:22.000Z (5 months ago)
- Default Branch: master
- Last Pushed: 2025-01-20T12:08:26.000Z (4 months ago)
- Last Synced: 2025-01-20T13:24:10.116Z (4 months ago)
- Topics: columnar-storage, data, data-structures, parallel-computing, parallel-programming, processing
- Language: C++
- Homepage:
- Size: 24.4 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE.txt
Awesome Lists containing this project
README
# In Memory Data Processing
-----------------------------------------------------------------------------------------------------------------Welcome here! Purpose of this study is create a columnar data structre from tabular data structures. Program will
fetch data from a tabular data source and create will vectors for each column with indexes.Let'a assume we have a csv file that includes following columns: Name,Surname,Age,BirthDate
We will create 4 vectors for each columns like:
Index is an autoincremental number
Vector1: Index,Name
Vector2: Index,Surname
Vector3: Index,Age
Vector4: Index,BirthdateTasks
-----------------------------------------------------------------------------------------------------------------
Filter Data
----------------------------------------------------1- Query By Single Value for a Column Where Name="Ahmet"
2- Query By Multiple Values for a Column Where Name="Ahmet" or Name="Metin"
3- Query Multiple Columns with Single Values Where Name="Ahmet" and Surname="Mehmet"
4- Query Multiple Columns with Multiple Values Where Name="Ahmet" or Name="Metin" and Age >30 and Age <50
Aggregation
----------------------------------------------------
1- Calculate Sum,Count,Avg, Max, Min for whole data2- Calculate Sum,Count,Avg, Max, Min with Filtering
3- Group By a Key and aggregate values for whole data
4- Group By a Key and aggregate values with Filtering
-----------------------------------------------------------------------------------------------------------------
When we complete this task. We will use GPUs to process data.
NEXT Step will be GPU usage for computing. We will divide and conquer that This is what Computer Science loves.