An open API service indexing awesome lists of open source software.

https://github.com/dongju93/sysmon-to-rocksdb

Query Elasticsearch to retrieve data, save it to CSV files, store it in RocksDB, and then use GraphQL to fetch the data.
https://github.com/dongju93/sysmon-to-rocksdb

csv elasticsearch graphql javascript nextjs postgresql rocksdb rust typescript

Last synced: 3 months ago
JSON representation

Query Elasticsearch to retrieve data, save it to CSV files, store it in RocksDB, and then use GraphQL to fetch the data.

Awesome Lists containing this project

README

          

## Ultimate goal diagram. Ver.1
```mermaid
flowchart TB
subgraph Windows11
Winlogbeat--Read-->Sysmon
end
subgraph LogStorageServer
Winlogbeat--Push-->Elasticsearch
end
subgraph PreprocessingServer
Elasticsearch<--Request/Response-->DataFetchBatch:::foo
DataFetchBatch:::foo--Write-->CSV
DataStoreBatch:::foo--Read-->CSV
end
subgraph DatabaseServer
DataStoreBatch:::foo--Store-->RocksDB
end
subgraph MiddlewareServer
DataViewBinary:::foo<--Iter./Fetch-->RocksDB
GraphQL:::bar<--Execute/Return-->DataViewBinary:::foo
end
subgraph ApplicaionServer
WebApplication:::foobar<--Query/Mutate-->GraphQL:::bar
end
Browser--Access-->WebApplication:::foobar

classDef foo stroke:#f00
classDef bar stroke:#0f0
classDef foobar stroke:#00f
```

# 1. Elasticsearch data to .csv file

First, you need to collect [SYSMON](https://learn.microsoft.com/ko-kr/sysinternals/downloads/sysmon) data with [WINLOGBEAT](https://www.elastic.co/kr/beats/winlogbeat) and stored with [ELASTICSEARCH](https://www.elastic.co/kr/elasticsearch)
Second, this code will extract data to CSV files with delimiter "\t"

it's parsing "message" field with "agent.name", "agent.id" field

may require to modify maximum size of search query, default is 10000
```
// replace with your Index name
PUT /.ds-winlogbeat-8.8.2-2023.08.06-000001/_settings
{
"max_result_window": 1000000
}
```

Please refer to the comments in the code for detailed explanation

## Quickstart
1. You need to create "elastic.rs" files, located "/src/envs"
- /src/envs/elasric.rs
```
pub const ES_URL_SECRET: &str = "YOUR ELASTICSEARCH URL";
pub const ID_SECRET: &str = "YOUR ELASTICSEARCH USERNAME (default is elaseic)";
pub const PW_SECRET: &str = "YOUR ELASTICSEARCH PASSWORD";
```
2. You need set your index name, the name may start with ".ds-winlogbeat" if you setup winlogbeat to elasticsearch automatically
and if index is multiple, set numbers and write index names within array
- /src/envs/env.rs
```
pub const INDICES: [&str; 1] = ["YOUR INDEX NAME"];

// if you have three indexes
// When the CSV is saved, if the file does not exist, a title line is added as the file is created, and if the file exists, the parsed data rows are added without the title line.
// To explain further, if you specify multiple indexes, the file will be created from the first index and the data will be added to the file created from the second index.
pub const INDICES: [&str; 3] = ["YOUR INDEX NAME 1", "YOUR INDEX NAME 2", "YOUR INDEX NAME 3"];
```
3. Set timestamp, query size, save location
- /src/envs/env.rs
```
pub const TIMESTAMP_START: &str = "START TIMESTAMP";
pub const TIMESTAMP_END: &str = "END TIMESTAMP";

pub const SIZE: usize = QUERY SIZE;

// between SAVELOCATION, CSVNAME event code will automatically generated
pub const SAVELOCATION: &str = "SAVE LOCATION";
pub const CSVNAME: &str = "FILENAME WITH FILE EXTENSTION (extenstion is .csv)";
```
4. Execute code
```
cargo build
cargo run --bin main
```

* Tip : Checking field types when selecting a wildcard type
```
// replace with your Index name
// When checking the message field type
GET /.ds-winlogbeat-8.8.2-2023.08.06-000001/_mapping/field/message
```

# 2. Data(.csv files) to RocksDB
1. Place csv files location
2. configure RocksDB location and execute code
```
cargo run --bin rocks
```

# 3. Data view on GraphQL(raw query)
1. change directory
```
cd graphql
```
2. Run graphQL server
```
npm run dev
```
3. Access apollo graphql server on 4000 port
```
http://localhost:4000
```

# 4. Data view on web(GUI)
1. change directory
```
cd webapp
```
2. Run node server
```
npm run dev
```
3. Access Next.js on 3000 port
```
http://localhost:3000
```

# 99. Todo
1. auto fetch elasticsearch data every one minute
2. if elasticsearch data exceed max than fetch more
3. auto import data to RocksDB right after csv parsing
4. data fetch from web application implements with react-query
5. cursor based pagination
6. web application api optimize
7. add union on graphql for multiple data types
8. fetch data from RocksDB using iteration (detach PostgreSQL) - speed test required - ✅
9. apply lib.rs to main.rs for crate maintenance

## Ultimate goal diagram. Ver.2
```mermaid
flowchart TB
subgraph Linux
Filebeat--Read-->Sysmon
Filebeat--Read-->Suricata
Filebeat--Read-->Zeek
Filebeat--Read-->Netflow
end
subgraph LogSendingServer
Filebeat--Push-->Logstash--Push-->Redis
end
subgraph PreprocessingServer
Redis--Stream-->DataStorePipe:::foo
end
subgraph DatabaseServer
PostgreSQL<--Replication-->Replica
DataStorePipe:::foo--Store-->RocksDB
end
subgraph MiddlewareServer
DataViewBinary:::foo<--Iter./Fetch-->RocksDB
GraphQL:::bar<--Execute/Return-->DataViewBinary:::foo
LargeDataViewBinary:::foo<--Iter./Fetch-->RocksDB
GraphQL:::bar<--Execute/Return-->LargeDataViewBinary:::foo
GraphQL:::bar<--SQL/Return-->PostgreSQL
end
subgraph ApplicaionServer
WebApplication1:::foobar<--Query/Mutate-->GraphQL:::bar
WebApplication2:::foobar<--Query/Mutate-->GraphQL:::bar
Nginx--Proxy-->WebApplication1:::foobar
Nginx--Proxy-->WebApplication2:::foobar
end
Browser--Access-->Nginx

classDef foo stroke:#f00
classDef bar stroke:#0f0
classDef foobar stroke:#00f
```