An open API service indexing awesome lists of open source software.

Projects in Awesome Lists tagged with large-dataset

A curated list of projects in awesome lists tagged with large-dataset .

https://github.com/opendrivelab/driveagi

[CVPR 2024 Highlight] GenAD: Generalized Predictive Model for Autonomous Driving & Foundation Models in Autonomous System

autonomous-driving embodied-ai foundation-model general-artificial-intelligence large-dataset policy-learning video-dataset video-generation world-models

Last synced: 15 May 2025

https://github.com/DiskFrame/disk.frame

Fast Disk-Based Parallelized Data Manipulation Framework for Larger-than-RAM Data

data data-science large-dataset manipulation-data medium-data r

Last synced: 14 Mar 2025

https://github.com/xiaodaigh/disk.frame

Fast Disk-Based Parallelized Data Manipulation Framework for Larger-than-RAM Data

data data-science large-dataset manipulation-data medium-data r

Last synced: 14 Mar 2025

https://github.com/fair-acc/chart-fx

A scientific charting library focused on performance optimised real-time data visualisation at 25 Hz update rates for data sets with a few 10 thousand up to 5 million data points.

chart-fx charting-libraries data-visualisation hacktoberfest java javafx large-dataset scientific-visualization

Last synced: 04 Apr 2025

https://github.com/GSI-CS-CO/chart-fx

A scientific charting library focused on performance optimised real-time data visualisation at 25 Hz update rates for data sets with a few 10 thousand up to 5 million data points.

chart-fx charting-libraries data-visualisation hacktoberfest java javafx large-dataset scientific-visualization

Last synced: 21 Dec 2024

https://github.com/zzw922cn/tensorflow-input-pipeline

TensorFlow Input Pipeline Examples based on multi-thread and FIFOQueue

fifo-queue input-pipeline large-dataset mini-batch multi-threading small-dataset tensorflow tfrecords

Last synced: 26 Apr 2025

https://github.com/privefl/bigreadr

R package to read large text files based on splitting + data.table::fread

large-dataset r-package read-csv

Last synced: 22 Nov 2024

https://github.com/kyegomez/EXA-1

An EXA-Scale repository of Multi-Modality AI resources from papers and models, to foundational libraries!

artificial-intelligence dataset gpt4 jax kosmos large-dataset large-language-models multimodal multimodal-data multimodality pytorch pytorch-implementation triton

Last synced: 28 Mar 2025

https://github.com/matteodelabre/saxophone

Fast and lightweight event-driven streaming XML parser in pure JavaScript

javascript large-dataset parser sax xml

Last synced: 16 Mar 2025

https://github.com/guypeer8/csv-streamer

đź’§A stream based csv aggregator for limiting RAM usage while processing large data sets.

csv gzip large-dataset nodejs sqlite stream zlib

Last synced: 21 Nov 2024

https://github.com/gjcampbell/ooffice

Some components for internal, line of business angular apps

angular angular2 large-dataset performance tree virtualized virtualizer

Last synced: 17 Feb 2025

https://github.com/davidssmith/rawarray.jl

Raw array (RA) file format for simple, robust, and user-friendly N-dimensional array storage

bytes complex-numbers data-science file-format julia large-dataset large-files ra-format rawarray scientific-computing storage

Last synced: 07 May 2025

https://github.com/vjgpt/home-credit-default-risk

Objective of this competition is to use historical loan application data to predict whether or not an applicant will be able to repay a loan.

banking credit-risk gradient-boosting large-dataset lightgbm loan

Last synced: 10 Apr 2025

https://github.com/bugthesystem/cerebro

Finding The Median In Large Sets Of Numbers Split Across N Servers using zeromq and nodejs (experimental)

average distributed experimental large-dataset median nodejs zeromq

Last synced: 19 Feb 2025

https://github.com/Lizhecheng02/Kaggle-LLM-Detect_AI_Generated_Text

Detect whether the text is AI-generated by training a new tokenizer and combining it with tree classification models or by training language models on a large dataset of human & AI-generated texts.

ai-generated bpe classification ensemble large-dataset llm tokenizer wordpiece

Last synced: 06 Jan 2025

https://github.com/emahtab/mysql-test-dataset

Repository for MySQL test data set

dataset large-dataset mysql

Last synced: 02 Apr 2025

https://github.com/shreckye/jgrapht-memory-efficient-bipartite-graph

A memory-efficient matching algorithm (Kuhn–Munkres and Hopcroft–Karp) implementation based on JGraphT in Java

bipartite-graphs hopcroft-karp jgrapht kotlin kuhn-munkres large-dataset memory-efficient

Last synced: 04 Apr 2025

https://github.com/avijit-jana/classifying_cybersecurity_incidents

This project focuses on building a machine learning classification model to enhance the efficiency of Security Operation Centers (SOCs). Using the comprehensive GUIDE dataset, the model predicts the triage grade of cybersecurity incidents (True Positive, Benign Positive, or False Positive).

exploratory-data-analysis large-dataset machine-learning pandas python3 visualization

Last synced: 25 Feb 2025

https://github.com/rajkumargara/bike_rental_data_analysis

Chicago bike rental data analysis for business insights using R programming

data-analysis data-visualization data-wrangling large-dataset machine-learning-algorithms

Last synced: 03 Mar 2025

https://github.com/pngo1997/data-mining-sql

Twitter tweets Data Mining practice.

data-mining database large-dataset python sql

Last synced: 28 Feb 2025

https://github.com/mehrantsi/common-crawl-analyzer

Tools to extract and analyze domains and URLs from Common Crawl data files.

common-crawl large-dataset stemmer term-analysis term-frequency-inverse-document

Last synced: 16 May 2025

https://github.com/dimitrivavoulisportfolio/aws-serverless-nlp-sentiment-4m-product-reviews

This is a production ready DistilBERT Sentiment Analysis model for product reviews designed to work as a low cost market research tool with the nuiance of an actual market researcher.

aws distilbert distilbert-model large-dataset market-research nlp nlp-machine-learning product-reviews sentiment-analysis serverless

Last synced: 04 Apr 2025