Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/kirolos00daniel/aws-store-orders-analysis
AWS Orders Analysis
https://github.com/kirolos00daniel/aws-store-orders-analysis
athena aws aws-catalog-api-handler aws-glue iam kaggle quickinsight s3-bucket
Last synced: 29 days ago
JSON representation
AWS Orders Analysis
- Host: GitHub
- URL: https://github.com/kirolos00daniel/aws-store-orders-analysis
- Owner: Kirolos00Daniel
- License: mit
- Created: 2024-12-08T17:26:24.000Z (about 1 month ago)
- Default Branch: main
- Last Pushed: 2024-12-08T17:40:09.000Z (about 1 month ago)
- Last Synced: 2024-12-08T18:26:50.804Z (about 1 month ago)
- Topics: athena, aws, aws-catalog-api-handler, aws-glue, iam, kaggle, quickinsight, s3-bucket
- Homepage:
- Size: 1.88 MB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# **AWS Data Pipeline for Analytics and Visualization**
## **Overview**
This project demonstrates an end-to-end data pipeline leveraging AWS services to perform data ingestion, transformation, querying, and visualization. The goal is to enable efficient data analysis by using scalable and serverless technologies like **Amazon S3**, **AWS Glue**, **Athena**, and **QuickSight**.![Screenshot 2024-12-07 161506](https://github.com/user-attachments/assets/6c3723cb-5218-4aa5-89f6-29d99f381951)
---
## **Workflow**
### **Architecture Overview**
1. **Data Ingestion**:
- Data is uploaded and stored in an **Amazon S3** bucket.
- The dataset consists of order information, including details like `order ID`, `customer name`, `product category`, `profit`, `sales`, etc.
2. **Data Cataloging**:
- An **AWS Glue Crawler** scans the data in the S3 bucket and creates a schema in the Glue **Data Catalog** for efficient querying.3. **Querying**:
- **AWS Athena** connects to the Glue Data Catalog and allows querying of the S3 data using SQL.4. **Visualization**:
- **Amazon QuickSight** is used to create interactive dashboards and visualizations based on Athena queries.### **Screenshots**
- **Amazon S3**: Data is stored in structured folders within an S3 bucket.
- **AWS Glue**: Schema and metadata are generated using AWS Glue Crawlers.
- **Athena Queries**: SQL queries on data using Athena.
- **QuickSight Dashboards**: Visualizations showcasing insights such as sales by region, profits by city, and more.---
## **Tech Stack**
- **Amazon S3**: Storage for raw and processed data.
- **AWS Glue**: Data ETL and cataloging.
- **AWS Athena**: Serverless querying.
- **Amazon QuickSight**: Business intelligence and visualization.
- **SQL**: Querying language for Athena.
- **AWS IAM**: Secure access and role management.---
## **Steps to Reproduce**
### **1. Set Up S3 Bucket**
1. Create an S3 bucket and upload your dataset (e.g., CSV files).
2. Organize files into folders for easier partitioning if needed.### **2. Configure AWS Glue**
1. Create a Glue Crawler to scan the S3 bucket.
2. Define the data classification (e.g., CSV) and create a database in the Glue Data Catalog.
3. Verify the schema in the Glue Data Catalog.### **3. Query Data Using Athena**
1. Open **AWS Athena** and select the Glue database.
2. Run SQL queries on the data to analyze key metrics such as total sales, profit by region, etc.
- Example Query:
```sql
SELECT region, SUM(sales) AS total_sales
FROM orders
GROUP BY region
ORDER BY total_sales DESC;
```### **4. Visualize in QuickSight**
1. Connect QuickSight to Athena as a data source.
2. Build visualizations, such as:
- **Pie Chart**: Profit by city.
- **Bar Chart**: Sales by product category.
- **Line Chart**: Sales trends over time.
3. Publish and share the dashboard.---
## **Sample Queries**
- **Profit by Product Category**:
```sql
SELECT category, SUM(profit) AS total_profit
FROM orders
GROUP BY category
ORDER BY total_profit DESC;
```
- **Top 5 Cities by Sales**:
```sql
SELECT city, SUM(sales) AS total_sales
FROM orders
GROUP BY city
ORDER BY total_sales DESC
LIMIT 5;
```---
## **Results**
1. Insights into sales performance by region, city, and category.
2. Improved data accessibility and analysis using Athena and QuickSight.
3. Automation of data processing with AWS Glue.---
## **Future Enhancements**
- Automate data ingestion with **AWS Lambda**.
- Implement machine learning for sales forecasting using **Amazon SageMaker**.
- Set up real-time data updates for dashboards.---
## **License**
This project is licensed under the MIT License. See the LICENSE file for more details.### **MIT License**
```
MIT LicenseCopyright (c) 2024 [Your Name]
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
```
```You can copy and paste this Markdown file directly into your README.md file. Let me know if you need further adjustments!