Projects in Awesome Lists tagged with ingestion
A curated list of projects in awesome lists tagged with ingestion .
https://github.com/getlago/lago
Open Source Metering and Usage Based Billing API ⭐️ Consumption tracking, Subscription management, Pricing iterations, Payment orchestration & Revenue analytics
analytics billing clickhouse events fintech go ingestion invoices metering open-source payments pricing pricing-data-science react ruby self-hosted subscriptions usage-based-billing
Last synced: 22 Apr 2025
https://github.com/apache/gobblin
A distributed data integration framework that simplifies common aspects of big data integration such as data ingestion, replication, organization and lifecycle management for both streaming and batch data ecosystems.
apache data ingestion management replication
Last synced: 10 Apr 2025
https://github.com/apache/incubator-gobblin
A distributed data integration framework that simplifies common aspects of big data integration such as data ingestion, replication, organization and lifecycle management for both streaming and batch data ecosystems.
apache data ingestion management replication
Last synced: 21 Dec 2024
https://github.com/starlightsearch/embedanything
Production-ready Inference, Ingestion and Indexing built in Rust 🦀
colpali embedding-models index indexing information-retrieval ingestion jina large-language-models late-interaction machine-learning modernbert onnx onnxruntime openai rag rust rust-lang splade vector-database vision-language-model
Last synced: 14 Apr 2025
https://github.com/dicklesworthstone/automatic_log_collector_and_analyzer
Replace Splunk in your small company with this one weird trick!
Last synced: 12 Apr 2025
https://github.com/opensearch-project/data-prepper
Data Prepper is a component of the OpenSearch project that accepts, filters, transforms, enriches, and routes data at scale.
analytics ingestion java logs metrics observability opensearch traces
Last synced: 07 Apr 2025
https://github.com/azure/azure-event-hubs-spark
Enabling Continuous Data Processing with Apache Spark and Azure Event Hubs
apache apache-spark azure bigdata connector continuous databricks event-hubs eventhubs ingestion kafka microsoft real-time scala spark spark-streaming stream streaming structured-streaming
Last synced: 14 Apr 2025
https://github.com/lcandy2/gitingest-extension
✨ A extension can help you open git ingest to turn any git repository into a prompt-friendly text ingest for LLMs.
ai browser browser-extension chatgpt chrome chrome-extension chrome-extensions claude code firefox-addon gitingest ingestion llm prompt
Last synced: 07 Apr 2025
https://github.com/7-docs/7-docs
Use local files or public GitHub repository as a source and ask questions through ChatGPT about it
completion ingestion openai openai-api query vector-database
Last synced: 25 Feb 2025
https://github.com/jitsucom/bulker
Service for bulk-loading data to databases with automatic schema management (Redshift, Snowflake, BigQuery, ClickHouse, Postgres, MySQL)
data-engineering datawarehouse etl etl-pipeline ingestion pipeline
Last synced: 07 Mar 2025
https://github.com/datainsider-co/rocket-bi
A free, open-source, web-based self-service BI tailor-made for clickhouse, google bigquery, mysql, postgresql, vertica
analytics bigdata bigquery bussiness-intelligence clickhouse dashboard data etl hacktoberfest hacktoberfest2023 ingestion mysql postgresql vertica
Last synced: 05 Apr 2025
https://github.com/jgperrin/net.jgp.labs.spark
Apache Spark examples exclusively in Java
data-ingestion dataframe ingestion java spark udf
Last synced: 16 Apr 2025
https://github.com/netboxlabs/diode
Diode data ingestion for NetBox, from NetBox Labs
automation discovery ingestion netbox
Last synced: 05 Apr 2025
https://github.com/Cigna/ibis
IBIS is a workflow creation-engine that abstracts the Hadoop internals of ingesting RDBMS data.
cigna hadoop hadoop-ecosystem hadoop-framework ibis ingestion oozie sqoop sqoop2 workflow workflow-automation workflow-scheduler
Last synced: 27 Nov 2024
https://github.com/absaoss/hyperdrive
Extensible streaming ingestion pipeline on top of Apache Spark
apache-spark framework ingestion kafka pipeline spark spark-structured-streaming streaming streaming-etl
Last synced: 14 Feb 2025
https://github.com/netboxlabs/diode-netbox-plugin
Official NetBox Labs plugin for NetBox for Diode
ingestion netbox netbox-plugin
Last synced: 25 Jan 2025
https://github.com/jgperrin/net.jgp.books.spark.ch09
Spark in Action, 2e - chapter 9 - Advanced ingestion: finding data sources and building your own
apache-spark ingestion java java8 manning spark sparkwithjava
Last synced: 19 Apr 2025
https://github.com/apivideo/ingest.new
A simple demo application for uploading, ingesting, embedding videos and converting them to mp4s. From api.video (https://api.video)
embed hls hls-live-streaming ingestion mp4 video
Last synced: 09 Apr 2025
https://github.com/fellowtraveler/ngest
Python script for ingesting various files into a semantic graph. For text, images, cpp, python, rust, javascript, and PDFs.
agents ai autocoding embeddings graph ingestion neo4j rag retrieval semantic summarization
Last synced: 16 Feb 2025
https://github.com/aymane-maghouti/big-data-project
This project aims to predict smartphone prices using a combination of batch and stream processing techniques in a Big Data environment. The architecture follows the Lambda Architecture pattern, providing both real-time and batch processing capabilities to users.
apache-airflow apache-kafka apache-spark batch-processing big-data-projects hbase hdfs ingestion java lambda-architecture machine-learning postgresql-database powerbi pyspark python spring-boot streaming
Last synced: 14 Feb 2025
https://github.com/vertica/pstl
Parallel Streaming Transformation Loader
bigdata data-mining data-science etl-pipeline hadoop ingestion realtime-messaging streaming-data vertica
Last synced: 12 Nov 2024
https://github.com/sorcero/ingestum
Read-only mirror of https://gitlab.com/sorcero/community/ingestum
ingestion monitoring pdf processing python recognition transformers
Last synced: 19 Apr 2025
https://github.com/marceloboeira/crowd
👥 [WIP] An experimental High Available Reverse Proxy for Massive Asynchronous Message Consumption
asynchronous back-pressure entrypoint fan-out high-availability ingestion kafka message-queue queue rabbitmq redis reverse-proxy sqs stream
Last synced: 15 Mar 2025
https://github.com/clarifai/clarifai-python-datautils
Extract Transform and Load unstructured data into the Clarifai's AI platform
dataanalysis dataengineering ingestion ingestion-pipeline unstructured-data unstructured-data-analysis unstructured-image unstructured-text
Last synced: 13 Apr 2025
https://github.com/mrsimonemms/gobblr
Make your development databases gobble up known data
data-import data-ingestion database developer-tools development gitpod-compatible ingestion mongo mongodb mysql postgresql sql sqlite testing
Last synced: 13 Apr 2025
https://github.com/cjsaylor/datamnom
Generic data ingestion for Elasticsearch to be visualized by Kibana.
elasticsearch ingestion kibana visualizations
Last synced: 15 Mar 2025
https://github.com/ahammadnafiz/reporag
A fully interactive tool designed to streamline your GitHub repository prompt generation process and facilitate RAG (Retrieval-Augmented Generation) workflows
ai app code github ingestion langchain llm open-source prompt python rag repository retrieval-augmented-generation
Last synced: 23 Feb 2025
https://github.com/kharigardner/pyfivetran
Simple python interface for the Fivetran API. Powered by HTTPx.
api-wrapper data-engineering dataops etl fivetran httpx iaac ingestion integration python yaml-configuration
Last synced: 03 Dec 2024
https://github.com/endernoke/linkedingest
Turn LinkedIn profiles into AI-friendly text ingests.
ai ingestion linkedin linkedin-api linkedin-scraper summarizer
Last synced: 15 Apr 2025
https://github.com/abakermi/gitllm
A powerful GitHub repository analysis tool that helps you process and analyze repository content efficiently. Built with Next.js, Cloudflare Workers, and modern web technologies
Last synced: 13 Apr 2025
https://github.com/san089/yelp_project
This project is to create a Data lake for Yelp data-set and further using the it to create an Analytical Sandbox Data Science purpose and also creating a data warehouse for reporting purpose.
data-lake data-pipeline etl etl-pipeline ingestion load pyspark recommender-system redshift
Last synced: 06 Mar 2025
https://github.com/apanimesh061/yelpdatasetetl
A MongoDb to Elasticsearch ETL pipeline
elasticsearch etl-pipeline ingestion mongodb python-2
Last synced: 24 Feb 2025
https://github.com/akornatskyy/sample-etl-flink-java
The sample ingests multiline gzipped files of popular books into postgres.
batch-processing etl flink ingestion java postgres sample
Last synced: 28 Jan 2025
https://github.com/traveldevel/iot-hub-ingestion-rest
Ingestion REST api that writes to a Kafka topic
Last synced: 03 Apr 2025
https://github.com/jrcichra/ingestd
HTTP server that easily ingests data into a database
data gin hacktoberfest ingest ingestion restful-api
Last synced: 18 Feb 2025
https://github.com/timxor/bitcoind-data-ingestion
crypto payments bitcoind data ingestion
Last synced: 17 Feb 2025
https://github.com/windingtree/orgid-subgraph
This subgraph of The Graph allows querying information from the Winding Tree ecosystem using the GraphQL language.
ethereum graph-node graphql ingestion subgraph travel
Last synced: 17 Mar 2025
https://github.com/zezs/langchain-docs---ai-chat-assistant
This repository is dedicated to learning LangChain by creating a generative AI application. This web application uses Pinecone as a vector store to answer questions related to LangChain, utilizing sources from the official LangChain documentation.
ingestion langchain llm pineconedb python rag streamlit vector-database
Last synced: 09 Apr 2025
https://github.com/ryhkml/ytingest
Extract YouTube video, feed it to any LLM as knowledge
c google-gemini ingestion linux openai tiktoken youtube
Last synced: 16 Apr 2025
https://github.com/horothesun/tflbusstopsdbmanualingestor
TfL bus stops manual DB ingestor.
ci github-actions github-actions-ci ingestion ingestor ingestors javascript jest jest-test jest-tests
Last synced: 23 Feb 2025
https://github.com/dspriggs-ds/udacity-operational-djs
Getting an Azure Machine Learning Pipeline operational for consumption which includes example; for Udacity course
automl ingestion service-principals
Last synced: 31 Mar 2025
https://github.com/jmfeck/bigquery-local-framework
This repo provides tools to manage BigQuery operations locally, simplifying tasks like uploading flat files, running SQL queries, and downloading tables. It offers a unified interface for local BigQuery interactions, enabling more efficient interaction with it.
bigquery data-engineering ingestion pandas python
Last synced: 11 Mar 2025