Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
Projects in Awesome Lists tagged with aws-glue-crawler
A curated list of projects in awesome lists tagged with aws-glue-crawler .
https://github.com/sadafasad/linkedin-jobs-analysis
Unveiling job market trends with Scrapy and AWS
aws-athena aws-ec2 aws-glue-crawler aws-glue-data-catalog aws-quicksight aws-s3 python scrapy
Last synced: 10 Jan 2025
https://github.com/travelxml/kafka-python-aws-crawler-amazon-athena
A comprehensive tutorials / steps / scripts for setting up Apache Kafka on an Amazon EC2 instance, streaming logs to S3, and querying data with AWS Glue and Amazon Athena. Includes Zookeeper configuration, producer and consumer setup, and automated data catalog creation
aws aws-glue-crawler aws-s3 kafka kafka-connect kafka-consumer kafka-producer kafka-streams message message-broker message-queue
Last synced: 19 Nov 2024
https://github.com/shahidmalik4/aws-glue-stepfunctions-etl
This project automates an ETL pipeline using AWS Glue, S3, Athena, and Step Functions to transform raw Airbnb data. It cleanses, enriches, and organizes the data into separate raw and transformed databases, enabling efficient querying and analysis via Athena, with automated notifications through SNS.
aws aws-athena aws-glue aws-glue-crawler aws-s3 aws-sns aws-step-functions etl-pipeline pyspark
Last synced: 25 Nov 2024
https://github.com/h-fuzzy-logic/data-analytics-spring
This project combines some of my favorite technologies - open data, cloud computing, and Jupyter notebooks.
aws-athena aws-glue-crawler aws-s3 jupyter openscience pandas python seaborn
Last synced: 15 Dec 2024
https://github.com/jibbs1703/tickit-data-pipeline
This repository contains a data pipeline that extracts, transforms and loads data from an AWS S3 bucket into an AWS Redshift table using AWS Glue. The raw data is made available in AWS S3 in its raw form and then the pipeline enables AWS Glue extract the raw data from S3 bucket.
aws-glue aws-glue-crawler aws-glue-data-catalog aws-redshift aws-s3 data-validation etl-pipeline pydantic
Last synced: 25 Nov 2024
https://github.com/jibbs1703/tickit-data-lake
This repository demonstrates the creation of a robust, 3-tier data lake using AWS resources.
aws-glue aws-glue-crawler aws-glue-data-catalog aws-s3 data-lake database etl-pipeline medallion-architecture
Last synced: 25 Nov 2024
https://github.com/mihirkudale/stock-market-real-time-data-engineering-project
In this project, you will execute an End-To-End Data Engineering Project on Real-Time Stock Market Data using Kafka. We are going to use different technologies such as Python, Amazon Web Services (AWS), Apache Kafka, Glue, Athena, and SQL.
amazon-ec2 apache-kafka aws aws-athena aws-ec2 aws-glue-catalog aws-glue-crawler aws-s3 consumer csv jupyter-notebook kafka producer python stockmarket stockmarketanalysis
Last synced: 26 Nov 2024
https://github.com/desininja/quality-movie-data-pipeline
ETL pipeline using AWS services
aws aws-eventbridge aws-glue aws-glue-crawler aws-s3 aws-step-function data-engineering etl etl-pipeline redshift
Last synced: 17 Dec 2024