An open API service indexing awesome lists of open source software.

Projects in Awesome Lists tagged with ingestion

A curated list of projects in awesome lists tagged with ingestion .

https://github.com/getlago/lago

Open Source Metering and Usage Based Billing API ⭐️ Consumption tracking, Subscription management, Pricing iterations, Payment orchestration & Revenue analytics

analytics billing clickhouse events fintech go ingestion invoices metering open-source payments pricing pricing-data-science react ruby self-hosted subscriptions usage-based-billing

Last synced: 22 Apr 2025

https://github.com/apache/gobblin

A distributed data integration framework that simplifies common aspects of big data integration such as data ingestion, replication, organization and lifecycle management for both streaming and batch data ecosystems.

apache data ingestion management replication

Last synced: 10 Apr 2025

https://github.com/apache/incubator-gobblin

A distributed data integration framework that simplifies common aspects of big data integration such as data ingestion, replication, organization and lifecycle management for both streaming and batch data ecosystems.

apache data ingestion management replication

Last synced: 21 Dec 2024

https://github.com/dicklesworthstone/automatic_log_collector_and_analyzer

Replace Splunk in your small company with this one weird trick!

ingestion log logging splunk

Last synced: 12 Apr 2025

https://github.com/opensearch-project/data-prepper

Data Prepper is a component of the OpenSearch project that accepts, filters, transforms, enriches, and routes data at scale.

analytics ingestion java logs metrics observability opensearch traces

Last synced: 07 Apr 2025

https://github.com/lcandy2/gitingest-extension

✨ A extension can help you open git ingest to turn any git repository into a prompt-friendly text ingest for LLMs.

ai browser browser-extension chatgpt chrome chrome-extension chrome-extensions claude code firefox-addon gitingest ingestion llm prompt

Last synced: 07 Apr 2025

https://github.com/7-docs/7-docs

Use local files or public GitHub repository as a source and ask questions through ChatGPT about it

completion ingestion openai openai-api query vector-database

Last synced: 25 Feb 2025

https://github.com/jitsucom/bulker

Service for bulk-loading data to databases with automatic schema management (Redshift, Snowflake, BigQuery, ClickHouse, Postgres, MySQL)

data-engineering datawarehouse etl etl-pipeline ingestion pipeline

Last synced: 07 Mar 2025

https://github.com/datainsider-co/rocket-bi

A free, open-source, web-based self-service BI tailor-made for clickhouse, google bigquery, mysql, postgresql, vertica

analytics bigdata bigquery bussiness-intelligence clickhouse dashboard data etl hacktoberfest hacktoberfest2023 ingestion mysql postgresql vertica

Last synced: 05 Apr 2025

https://github.com/jgperrin/net.jgp.labs.spark

Apache Spark examples exclusively in Java

data-ingestion dataframe ingestion java spark udf

Last synced: 16 Apr 2025

https://github.com/netboxlabs/diode

Diode data ingestion for NetBox, from NetBox Labs

automation discovery ingestion netbox

Last synced: 05 Apr 2025

https://github.com/Cigna/ibis

IBIS is a workflow creation-engine that abstracts the Hadoop internals of ingesting RDBMS data.

cigna hadoop hadoop-ecosystem hadoop-framework ibis ingestion oozie sqoop sqoop2 workflow workflow-automation workflow-scheduler

Last synced: 27 Nov 2024

https://github.com/absaoss/hyperdrive

Extensible streaming ingestion pipeline on top of Apache Spark

apache-spark framework ingestion kafka pipeline spark spark-structured-streaming streaming streaming-etl

Last synced: 14 Feb 2025

https://github.com/snollygolly/borrow-bot

:moneybag: A bot for maximizing the borrow subreddit

bot ingestion money mysql reddit

Last synced: 09 Mar 2025

https://github.com/netboxlabs/diode-netbox-plugin

Official NetBox Labs plugin for NetBox for Diode

ingestion netbox netbox-plugin

Last synced: 25 Jan 2025

https://github.com/jgperrin/net.jgp.books.spark.ch09

Spark in Action, 2e - chapter 9 - Advanced ingestion: finding data sources and building your own

apache-spark ingestion java java8 manning spark sparkwithjava

Last synced: 19 Apr 2025

https://github.com/apivideo/ingest.new

A simple demo application for uploading, ingesting, embedding videos and converting them to mp4s. From api.video (https://api.video)

embed hls hls-live-streaming ingestion mp4 video

Last synced: 09 Apr 2025

https://github.com/fellowtraveler/ngest

Python script for ingesting various files into a semantic graph. For text, images, cpp, python, rust, javascript, and PDFs.

agents ai autocoding embeddings graph ingestion neo4j rag retrieval semantic summarization

Last synced: 16 Feb 2025

https://github.com/aymane-maghouti/big-data-project

This project aims to predict smartphone prices using a combination of batch and stream processing techniques in a Big Data environment. The architecture follows the Lambda Architecture pattern, providing both real-time and batch processing capabilities to users.

apache-airflow apache-kafka apache-spark batch-processing big-data-projects hbase hdfs ingestion java lambda-architecture machine-learning postgresql-database powerbi pyspark python spring-boot streaming

Last synced: 14 Feb 2025

https://github.com/sorcero/ingestum

Read-only mirror of https://gitlab.com/sorcero/community/ingestum

ingestion monitoring pdf processing python recognition transformers

Last synced: 19 Apr 2025

https://github.com/marceloboeira/crowd

👥 [WIP] An experimental High Available Reverse Proxy for Massive Asynchronous Message Consumption

asynchronous back-pressure entrypoint fan-out high-availability ingestion kafka message-queue queue rabbitmq redis reverse-proxy sqs stream

Last synced: 15 Mar 2025

https://github.com/cjsaylor/datamnom

Generic data ingestion for Elasticsearch to be visualized by Kibana.

elasticsearch ingestion kibana visualizations

Last synced: 15 Mar 2025

https://github.com/ahammadnafiz/reporag

A fully interactive tool designed to streamline your GitHub repository prompt generation process and facilitate RAG (Retrieval-Augmented Generation) workflows

ai app code github ingestion langchain llm open-source prompt python rag repository retrieval-augmented-generation

Last synced: 23 Feb 2025

https://github.com/kharigardner/pyfivetran

Simple python interface for the Fivetran API. Powered by HTTPx.

api-wrapper data-engineering dataops etl fivetran httpx iaac ingestion integration python yaml-configuration

Last synced: 03 Dec 2024

https://github.com/endernoke/linkedingest

Turn LinkedIn profiles into AI-friendly text ingests.

ai ingestion linkedin linkedin-api linkedin-scraper summarizer

Last synced: 15 Apr 2025

https://github.com/abakermi/gitllm

A powerful GitHub repository analysis tool that helps you process and analyze repository content efficiently. Built with Next.js, Cloudflare Workers, and modern web technologies

git ingestion llm

Last synced: 13 Apr 2025

https://github.com/san089/yelp_project

This project is to create a Data lake for Yelp data-set and further using the it to create an Analytical Sandbox Data Science purpose and also creating a data warehouse for reporting purpose.

data-lake data-pipeline etl etl-pipeline ingestion load pyspark recommender-system redshift

Last synced: 06 Mar 2025

https://github.com/samber/go-quickwit

🍱 A Go ingestion client for Quickwit

analytics cloud go ingestion log quickwit s3 search storage tracing

Last synced: 22 Apr 2025

https://github.com/apanimesh061/yelpdatasetetl

A MongoDb to Elasticsearch ETL pipeline

elasticsearch etl-pipeline ingestion mongodb python-2

Last synced: 24 Feb 2025

https://github.com/akornatskyy/sample-etl-flink-java

The sample ingests multiline gzipped files of popular books into postgres.

batch-processing etl flink ingestion java postgres sample

Last synced: 28 Jan 2025

https://github.com/emcd/python-mimeogram

Exchange collections of files with Large Language Models.

ai anthropic chatgpt claude code deepseek gemini grok ingestion llm openai

Last synced: 03 Mar 2025

https://github.com/traveldevel/iot-hub-ingestion-rest

Ingestion REST api that writes to a Kafka topic

api ingestion kafka rest

Last synced: 03 Apr 2025

https://github.com/jrcichra/ingestd

HTTP server that easily ingests data into a database

data gin hacktoberfest ingest ingestion restful-api

Last synced: 18 Feb 2025

https://github.com/timxor/bitcoind-data-ingestion

crypto payments bitcoind data ingestion

bitcoind data ingestion

Last synced: 17 Feb 2025

https://github.com/bluecolor/tractor

A cross platform, plugin based data transfer tool

cli golang ingestion plugins rdbms

Last synced: 03 Apr 2025

https://github.com/romnn/postgresimporter

A simple python wrapper script based on pgfutter to load multiple dumped csv files into a postgres database.

csv database import ingestion pgfutter postgres

Last synced: 12 Mar 2025

https://github.com/windingtree/orgid-subgraph

This subgraph of The Graph allows querying information from the Winding Tree ecosystem using the GraphQL language.

ethereum graph-node graphql ingestion subgraph travel

Last synced: 17 Mar 2025

https://github.com/projectkeas/ingestion

The core ingestion API for KEAS

go ingestion k8s keas

Last synced: 22 Nov 2024

https://github.com/zezs/langchain-docs---ai-chat-assistant

This repository is dedicated to learning LangChain by creating a generative AI application. This web application uses Pinecone as a vector store to answer questions related to LangChain, utilizing sources from the official LangChain documentation.

ingestion langchain llm pineconedb python rag streamlit vector-database

Last synced: 09 Apr 2025

https://github.com/ryhkml/ytingest

Extract YouTube video, feed it to any LLM as knowledge

c google-gemini ingestion linux openai tiktoken youtube

Last synced: 16 Apr 2025

https://github.com/dspriggs-ds/udacity-operational-djs

Getting an Azure Machine Learning Pipeline operational for consumption which includes example; for Udacity course

automl ingestion service-principals

Last synced: 31 Mar 2025

https://github.com/jmfeck/bigquery-local-framework

This repo provides tools to manage BigQuery operations locally, simplifying tasks like uploading flat files, running SQL queries, and downloading tables. It offers a unified interface for local BigQuery interactions, enabling more efficient interaction with it.

bigquery data-engineering ingestion pandas python

Last synced: 11 Mar 2025

https://github.com/nasa-pds/nucleus

Nucleus is a software platform used to create workflows for the Planetary Data (PDS).

data ingestion pds planetary workflow

Last synced: 24 Feb 2025