https://github.com/notdodo/sparktrail

Query AWS CloudTrail using Spark (python) to perform analysis
https://github.com/notdodo/sparktrail

Last synced: 10 months ago
JSON representation

Query AWS CloudTrail using Spark (python) to perform analysis

Host: GitHub
URL: https://github.com/notdodo/sparktrail
Owner: notdodo
Created: 2023-03-05T23:56:48.000Z (almost 3 years ago)
Default Branch: main
Last Pushed: 2024-04-08T12:27:52.000Z (almost 2 years ago)
Last Synced: 2024-04-09T15:29:53.084Z (almost 2 years ago)
Language: Python
Size: 146 KB
Stars: 0
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# SparkTrail [![CodeQL](https://github.com/notdodo/SparkTrail/actions/workflows/codeql.yml/badge.svg)](https://github.com/notdodo/SparkTrail/actions/workflows/codeql.yml)

Use this Python script to start a Spark standalone session to interact with the CloudTrail bucket.
Spark allow to query the logs using a SQL-like syntax.

The startup `main.py` script will automatically load SSO credentials and set AWS temporary credentials from the SSO to authenticated to the bucket.

## Usage

0. Spawn Poetry shell:

`poetry shell`

1. Start the cluster:

`PYSPARK_DRIVER_PYTHON=ipython PYTHONSTARTUP=main.py pyspark --packages org.apache.hadoop:hadoop-aws:3.3.4,com.amazonaws:aws-java-sdk-bundle:1.12.262 --driver-memory 15G --executor-memory5G --name SparkTrail`

2. Now the environment is configured and to start running queries link the S3 bucket. When the IPython shell is created, link the bucket and start performing queries:

```python
spark = link_s3("audit-cloudtrail-logs/AWSLogs/")
spark.select("Records.eventName").distinct().show(10)
```

### Notes

- You can use this script to any bucket with partitioned JSON files (e.g., Databricks audit logs)
- You may need to adjust the memory sizing to fit your environment
- You need to `aws sso login` before running the command

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/notdodo/sparktrail

Awesome Lists containing this project

README