Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/san089/goodreads_etl_pipeline
An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
airflow airflow-dag apache-airflow apache-spark data-engineering data-engineering-pipeline data-lake data-migration emr-cluster etl-framework etl-job etl-pipeline goodreads-data-pipeline livy python redshift s3 scheduler spark warehouse
Last synced: 13 Jun 2024
https://github.com/san089/Udacity-Data-Engineering-Projects
Few projects related to Data Engineering including Data Modeling, Infrastructure setup on cloud, Data Warehousing and Data Lake development.
airflow airflow-operators aws aws-ec2 aws-s3 aws-sdk cassandra cassandra-database cloudformation cluster data data-engineering data-engineering-pipeline data-lake data-modeling data-warehouse etl-pipeline infrastructure postgres postgresql-database
Last synced: 12 Jun 2024
https://github.com/flow-php/flow
Flow PHP - strongly typed data processing framework
etl etl-framework etl-pipeline
Last synced: 11 Jun 2024
https://github.com/LukasLoeffler/data-graph
Flow and event based data processing
data-processing etl etl-pipeline flow-based-programming graph graphical-user-interface low-code no-code
Last synced: 09 Jun 2024
https://github.com/sdcastillo/ExamPAData
A container for data sets to help actuaries who are practicing predictive analytics
content-marketing cran education etl etl-pipeline
Last synced: 04 Jun 2024
https://github.com/restarone/violet_rails
an app engine for your business. Seamlessly implement business logic with a powerful API. Out of the box CMS, blog, forum and email functionality. Developer friendly & easily extendable for your next SaaS/XaaS project. Built with Rails 6, Devise, Sidekiq & PostgreSQL
blog cms ember emberjs etl-automation etl-framework etl-pipeline forum multi-tenancy multitenancy rails ruby ruby-on-rails rubyonrails saas saas-boilerplate template violet-rails wordpress-replacement xaas
Last synced: 02 Jun 2024
https://github.com/apache/incubator-streampark
Make stream processing easier! Easy-to-use streaming application development framework and operation platform.
apache development-framework easy-to-use etl-pipeline operation-platform streaming streampark
Last synced: 31 May 2024
https://github.com/NitinSPatil15/Project-3-Data-Warehouse-with-AWS
An ETL pipeline that extracts data from S3, stages them in Redshift, and transforms data into a set of dimensional tables
etl-pipeline python redshift-database s3-bucket sql
Last synced: 27 May 2024
https://github.com/AuFeld/Data_Engineering_Projects
A collection of data engineering projects: data modeling, ETL pipelines, data lakes, infrastructure configuration on AWS, data warehousing, containerization, and a dashboard to monitor data pipeline KPIs
airflow aws cassandra data-engineering data-lake data-warehouse docker emr etl-pipeline infrastructure-as-code infrastructure-setup postgresql python redshift s3 spark
Last synced: 27 May 2024
https://github.com/albertovpd/automated_etl_google_cloud-social_dashboard
A dashboard is worth a thousand words => https://datastudio.google.com/reporting/755f3183-dd44-4073-804e-9f7d3d993315
bigquery-table cloud-functions cloud-scheduler cloud-storage dashboard data-studio dataprep etl etl-jobs etl-pipeline gdelt google-cloud google-cloud-platform google-trends python sql twitter-api
Last synced: 26 May 2024
https://github.com/codecadre/imt-school-addresses
pulls addresses from IMT
etl-pipeline public-datasets scraping-websites
Last synced: 25 May 2024
https://github.com/orchest/orchest
Build data pipelines, the easy way 🛠️
airflow cloud dag data-pipelines data-science deployment docker etl etl-pipeline ide jupyter jupyterlab kubernetes machine-learning notebooks orchest pipelines python self-hosted
Last synced: 16 May 2024
https://github.com/patterns-app/patterns-devkit
Data pipelines from re-usable components
data-analysis data-engineering data-pipeline data-pipelines data-science etl etl-framework etl-pipeline etl-pipelines functional-reactive-programming immutability pipelines sql
Last synced: 13 May 2024
https://github.com/techascent/tech.ml.dataset
A Clojure high performance data processing system
clojure csv dataframe datascience dataset etl-pipeline java machine-learning xlsx
Last synced: 11 May 2024
https://github.com/DAGWorks-Inc/hamilton
Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage and metadata. Runs and scales everywhere python does.
dag data-analysis data-engineering data-science dataframe etl etl-framework etl-pipeline feature-engineering featurization hacktoberfest lineage llmops machine-learning mlops numpy orchestration pandas python software-engineering
Last synced: 28 Apr 2024
https://github.com/stitchfix/hamilton
A scalable general purpose micro-framework for defining dataflows. THIS REPOSITORY HAS BEEN MOVED TO www.github.com/dagworks-inc/hamilton
dag data-engineering data-platform data-science dataframe etl etl-framework etl-pipeline feature-engineering featurization hamilton hamiltonian machine-learning numpy pandas python software-engineering stitch-fix
Last synced: 20 Apr 2024
https://github.com/cyber-drop/ethereum_analytical_db
Ethereum Analytical Database - Ethereum data access solution that can be used for analytics and application development. The solution works on a fast DB - Clickhouse.
api blockchain clickhouse dex erc20 erc223 erc721 eth ethereum ethereum-etl etl etl-pipeline
Last synced: 13 Apr 2024
https://github.com/Zipstack/unstract
No-code LLM Platform to launch APIs and ETL Pipelines to structure unstructured documents
etl-pipeline llm-platform unstructured-data
Last synced: 11 Apr 2024
https://github.com/MassStreetAnalytics/etl-framework
A framework for moving data into a data warehouse.
data-warehouse etl etl-components etl-framework etl-pipeline python sql sqlserver
Last synced: 01 Apr 2024
https://github.com/michalmiki/postgresql-etl
Building Python ETL pipeline for PostgreSQL DB
etl-pipeline postgresql python
Last synced: 01 Apr 2024
https://github.com/SETL-Framework/setl
A simple Spark-powered ETL framework that just works 🍺
big-data data-analysis data-engineering data-science data-transformation dataset etl etl-pipeline framework machine-learning modularization pipeline scala setl spark
Last synced: 23 Mar 2024
https://github.com/TriplyDB/Documentation
Documentation for the TriplyDB and TriplyETL products
etl-framework etl-pipeline graph-database linked-data production-systems semantic-web
Last synced: 21 Mar 2024