https://github.com/samvas-codes/cspm-gpt

The following is a simple example of how LLMs and langchain agents can simplify asking questions to understand the security posture of a cloud environment.
https://github.com/samvas-codes/cspm-gpt

aws azure chatgpt cloud cloud-security cloud-security-audit cloud-security-posture-management cspm cybersecurity devsecops docker gcp gpt langchain neo4j open-source openai policy-as-code python

Last synced: 6 months ago
JSON representation

The following is a simple example of how LLMs and langchain agents can simplify asking questions to understand the security posture of a cloud environment.

Host: GitHub
URL: https://github.com/samvas-codes/cspm-gpt
Owner: samvas-codes
License: apache-2.0
Created: 2023-04-12T03:16:58.000Z (about 2 years ago)
Default Branch: main
Last Pushed: 2023-08-23T03:53:13.000Z (almost 2 years ago)
Last Synced: 2024-08-07T08:09:21.153Z (10 months ago)
Topics: aws, azure, chatgpt, cloud, cloud-security, cloud-security-audit, cloud-security-posture-management, cspm, cybersecurity, devsecops, docker, gcp, gpt, langchain, neo4j, open-source, openai, policy-as-code, python
Language: Python
Homepage:
Size: 21.5 MB
Stars: 19
Watchers: 4
Forks: 5
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# Cloud Security Posture Management(CSPM) powered by GPT-4

Cloud Security Posture Management (CSPM) tools have evolved over the past years. What began as only as audit of configurations of cloud resources has grown to a complex capability that requires complex querying across relationships, security related data and events.
Users using these tools need to customize them to their environment to make it easier to address security issues. This requires building their own custom logic or queries using query languages that can be difficult to learn and adapt to.

The following is a simple example of how LLMs and langchain agents can simplify asking questions to understand the security posture of a cloud environment. The project now supports the use of multiple DB types including PostgreSQL and Neo4j. It is made extensible by also using different data ingest platforms including Cartography and Cloudquery. Initial attempts to validate shortest paths have been successful if prompted appropriately. This is intended to be a PoC for generating ad-hoc attack paths if assets are labeled appropriately.

Disclaimer : The app is a demo and several improvements can be made. The queries made and results displayed are currentl best effort.

NOTE: The dockerfile hasnt been updated to include GraphDBs.

## Installation

There are two options to use this project.

1. Run it as is as a streamlit app
2. Run it as a containarized streamlit app

### Prequisites
1. Install cloudquery or Cartography on your machine.
2. This example uses either a Postgres or Neo4j DB as its backend

## Run Locally

Clone the project

```bash
git clone cspm-pt
```

Go to the project directory

```bash
cd cspm-gpt
```

Install dependencies

```bash
pip install requirements.txt
```

Start the app

```bash
streamlit run app.py
```

## Demo

Insert gif or link to demo

## Environment Variables

To run this project in a container, you will need to add the following environment variables to your .env file

AWS_ACCESS_KEY_ID=

AWS_SECRET_ACCESS_KEY=

AWS_DEFAULT_REGION=

PGDATA=

POSTGRES_USER=""

POSTGRES_PASSWORD=""

POSTGRES_DB=""

A sample .env file is provided which contains the default postgres configuration used.

In addition the streamlit app needs access to the OPENAI API KEY. To add this

1. Create a .streamlit/secrets.toml file in the project directory
2. Add the following OPENAI_API_KEY=

## Usage/Examples
Once the app is running and you have ingested data from your AWS accounts (eg) using Cloudquery, use the following prompts.

1. How many running ec2 instances are present? List the instance ids.
2. How many ebs volumes are attached to ec2 instances?
3. How many ec2 instances are public ? What are their public IPs ? List the instance ids and the public IPs as a table.
4. List all CIS checks that have failed. Get the resources that have failed these checks. List the checks failed and resources as a table.
5. How many ec2 instances also have an IAM role attached to it. List the instance IDs, IAM roles and the IAM policy attached to the role
6. Find the shortest path between an EC2 instance and an S3 bucket, describe how they are connected (GRAPH USE CASE)

## Running a few examples and their results
Lets start simple and find EC2 instances that have EBS volumes attached to them.
![ec2-ebs-llm-query](/img/ec2-ebs-example-generated-cypher.jpg?raw=true "Asking GPT to write a query to find EBS volumes attached to EC2 instances")

Lets verify if this query actually works by going to neo4j and querying the DB
![ec2-ebs-verified-query](/img/ec2-ebs-example-graph-verified-cypher-query.jpg?raw=true "Verifying a query to find EBS volumes attached to EC2 instances")

Now lets ask the typical attack path question - can you find an ec2 instance that has access to an s3 bucket
![ec2-s3-llm-query](/img/ec2-s3-example-generated-cypher.jpg?raw=true "Asking GPT to write a query to find EC2 instances with access to S3")

Is it hallucinating? Nope!
![ec2-s3-llm-query](/img/ec2-exposed-s3-example-verified-cypher.jpg?raw=true "Verifying a query to find EC2 instances with access to S3")

Taking it a step further, lets find internet exposed ec2 instance that has access to an s3 bucket
![ec2-s3-llm-query](/img/ec2-exposed-s3-example-generated-cypher.jpg?raw=true "Asking GPT to write a query to find EC2 instances with access to S3")

What I currently observe is, as long as the schema is known to GPT, and you can prompt engineer your question, the data is pretty accurate.

## Use cases
1. The app can be a natural language query builder for CSPM tools
2. The app can help SOC teams query their data on demand and visualize them
3. It can also help security engineers quickly develop complex queries using natural language
4. CISOs that want to understand the state of their environment can easily ask questions
## Roadmap

1. Updates to include knowledge graphs as a datasource (Neo4J, AWS Neptune) -- DONE
2. Adding vector stores to cache similar queries
3. Display generated queries to allow manual intervention

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/samvas-codes/cspm-gpt

Awesome Lists containing this project

README