Projects in Awesome Lists tagged with data-mining
A curated list of projects in awesome lists tagged with data-mining .
https://github.com/bulutyazilim/awesome-datascience
:memo: An awesome Data Science repository to learn and apply for real world problems.
analytics awesome-list data-mining data-science data-scientists data-visualization deep-learning hacktoberfest machine-learning science
Last synced: 17 Jun 2025
https://github.com/jaidedai/easyocr
Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.
cnn crnn data-mining deep-learning easyocr image-processing information-retrieval lstm machine-learning ocr optical-character-recognition python pytorch scene-text scene-text-recognition
Last synced: 17 Nov 2025
https://github.com/eriklindernoren/ml-from-scratch
Machine Learning From Scratch. Bare bones NumPy implementations of machine learning models and algorithms with a focus on accessibility. Aims to cover everything from linear regression to deep learning.
data-mining data-science deep-learning deep-reinforcement-learning genetic-algorithm machine-learning machine-learning-from-scratch
Last synced: 11 May 2025
https://github.com/eriklindernoren/ML-From-Scratch
Machine Learning From Scratch. Bare bones NumPy implementations of machine learning models and algorithms with a focus on accessibility. Aims to cover everything from linear regression to deep learning.
data-mining data-science deep-learning deep-reinforcement-learning genetic-algorithm machine-learning machine-learning-from-scratch
Last synced: 14 Mar 2025
https://github.com/JaidedAI/EasyOCR
Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.
cnn crnn data-mining deep-learning easyocr image-processing information-retrieval lstm machine-learning ocr optical-character-recognition python pytorch scene-text scene-text-recognition
Last synced: 14 Mar 2025
https://github.com/microsoft/lightgbm
A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.
data-mining decision-trees distributed gbdt gbm gbrt gradient-boosting kaggle lightgbm machine-learning microsoft parallel python r
Last synced: 09 Sep 2025
https://github.com/Microsoft/LightGBM
A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.
data-mining decision-trees distributed gbdt gbm gbrt gradient-boosting kaggle lightgbm machine-learning microsoft parallel python r
Last synced: 23 Apr 2025
https://github.com/microsoft/LightGBM
A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.
data-mining decision-trees distributed gbdt gbm gbrt gradient-boosting kaggle lightgbm machine-learning microsoft parallel python r
Last synced: 12 Mar 2025
https://github.com/piskvorky/gensim
Topic Modelling for Humans
data-mining data-science document-similarity fasttext gensim information-retrieval machine-learning natural-language-processing neural-network nlp python topic-modeling word-embeddings word-similarity word2vec
Last synced: 11 Dec 2025
https://github.com/rasbt/python-machine-learning-book
The "Python Machine Learning (1st edition)" book code repository and info resource
data-mining data-science logistic-regression machine-learning machine-learning-algorithms neural-network python scikit-learn
Last synced: 14 May 2025
https://github.com/tangyudi/ai-learn
人工智能学习路线图,整理近200个实战案例与项目,免费提供配套教材,零基础入门,就业实战!包括:Python,数学,机器学习,数据分析,深度学习,计算机视觉,自然语言处理,PyTorch tensorflow machine-learning,deep-learning data-analysis data-mining mathematics data-science artificial-intelligence python tensorflow tensorflow2 caffe keras pytorch algorithm numpy pandas matplotlib seaborn nlp cv等热门领域
algorithm artificial-intelligence caffe cv data-analysis data-mining data-science deep-learning keras machine-learning mathematics matplotlib nlp numpy pandas python pytorch seaborn tensorflow tensorflow2
Last synced: 14 May 2025
https://github.com/tangyudi/Ai-Learn
人工智能学习路线图,整理近200个实战案例与项目,免费提供配套教材,零基础入门,就业实战!包括:Python,数学,机器学习,数据分析,深度学习,计算机视觉,自然语言处理,PyTorch tensorflow machine-learning,deep-learning data-analysis data-mining mathematics data-science artificial-intelligence python tensorflow tensorflow2 caffe keras pytorch algorithm numpy pandas matplotlib seaborn nlp cv等热门领域
algorithm artificial-intelligence caffe cv data-analysis data-mining data-science deep-learning keras machine-learning mathematics matplotlib nlp numpy pandas python pytorch seaborn tensorflow tensorflow2
Last synced: 07 May 2025
https://github.com/yzhao062/pyod
A Python Library for Outlier and Anomaly Detection, Integrating Classical and Deep Learning Techniques
anomaly anomaly-detection autoencoder data-analysis data-mining data-science deep-learning fraud-detection machine-learning neural-networks novelty-detection out-of-distribution-detection outlier-detection outlier-ensembles outliers python python3 unsupervised-learning
Last synced: 12 May 2025
https://github.com/catboost/catboost
A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.
big-data catboost categorical-features coreml cuda data-mining data-science decision-trees gbdt gbm gpu gpu-computing gradient-boosting kaggle machine-learning python r tutorial
Last synced: 12 May 2025
https://github.com/sktime/sktime
A unified framework for machine learning with time series
ai anomaly-detection changepoint-detection data-mining data-science forecasting hacktoberfest machine-learning scikit-learn sktime time-series time-series-analysis time-series-classification time-series-regression time-series-segmentation
Last synced: 09 Sep 2025
https://github.com/yzhao062/Pyod
A Comprehensive and Scalable Python Library for Outlier Detection (Anomaly Detection)
anomaly anomaly-detection autoencoder data-analysis data-mining data-science deep-learning fraud-detection machine-learning neural-networks novelty-detection out-of-distribution-detection outlier-detection outlier-ensembles outliers python python3 unsupervised-learning
Last synced: 04 Apr 2025
https://github.com/montferret/ferret
Declarative web scraping
cdp chrome cli crawler crawling data-mining dsl go golang library query-language scraper scraping scraping-websites tool
Last synced: 10 May 2025
https://github.com/MontFerret/ferret
Declarative web scraping
cdp chrome cli crawler crawling data-mining dsl go golang hacktoberfest library query-language scraper scraping scraping-websites tool
Last synced: 13 Mar 2025
https://github.com/biolab/orange3
🍊 :bar_chart: :bulb: Orange: Interactive data analysis
classification clustering data-mining data-science data-visualization decision-trees machine-learning numpy orange orange3 pandas plotting python random-forest regression scikit-learn scipy visual-programming visualization
Last synced: 14 May 2025
https://github.com/rasbt/mlxtend
A library of extension and helper modules for Python's data analysis and machine learning libraries.
association-rules data-mining data-science machine-learning python supervised-learning unsupervised-learning
Last synced: 13 May 2025
https://github.com/microsoft/rd-agent
Research and development (R&D) is crucial for the enhancement of industrial productivity, especially in the AI era, where the core aspects of R&D are mainly focused on data and models. We are committed to automating these high-value generic R&D processes through our open source R&D automation tool RD-Agent, which lets AI drive data-driven AI.
agent ai automation data-mining data-science development llm research
Last synced: 12 May 2025
https://github.com/deanmalmgren/textract
extract text from any document. no muss. no fuss.
data-mining natural-language-processing python text-mining
Last synced: 12 May 2025
https://github.com/alibaba/alink
Alink is the Machine Learning algorithm platform based on Flink, developed by the PAI team of Alibaba computing platform.
apriori classification clustering data-mining feature-engineering flink flink-machine-learning flink-ml fm graph-algorithms graph-embedding kafka machine-learning recommender recommender-system regression statistics word2vec xgboost
Last synced: 14 May 2025
https://github.com/alibaba/Alink
Alink is the Machine Learning algorithm platform based on Flink, developed by the PAI team of Alibaba computing platform.
apriori classification clustering data-mining feature-engineering flink flink-machine-learning flink-ml fm graph-algorithms graph-embedding kafka machine-learning recommender recommender-system regression statistics word2vec xgboost
Last synced: 14 Mar 2025
https://github.com/kanaries/graphic-walker
An open source alternative to Tableau. Embeddable visual analytic
bi data data-analysis data-mining data-visualization eda k6s kanaries low-code pivot-table react tableau tableau-alternative typescript vega vega-lite visualization
Last synced: 22 Jan 2026
https://github.com/automeris-io/webplotdigitizer
Computer vision assisted tool to extract numerical data from plot images.
charts computer-vision data-mining html javascript reverse-engineering visualization webplotdigitizer
Last synced: 18 Dec 2025
https://github.com/tirthajyoti/papers-literature-ml-dl-rl-ai
Highly cited and useful papers related to machine learning, deep learning, AI, game theory, reinforcement learning
artificial-intelligence data-mining data-science deep-learning game-theory hardware learning-theory literature machine-learning machine-learning-algorithms neural-network paper pattern-recognition reinforcement-learning silicon statistical-learning statistics
Last synced: 24 Oct 2025
https://github.com/tirthajyoti/Papers-Literature-ML-DL-RL-AI
Highly cited and useful papers related to machine learning, deep learning, AI, game theory, reinforcement learning
artificial-intelligence data-mining data-science deep-learning game-theory hardware learning-theory literature machine-learning machine-learning-algorithms neural-network paper pattern-recognition reinforcement-learning silicon statistical-learning statistics
Last synced: 27 Mar 2025
https://github.com/dblalock/bolt
10x faster matrix and vector operations
compression data-mining database machine-learning
Last synced: 15 May 2025
https://github.com/Kanaries/graphic-walker
An open source alternative to Tableau. Embeddable visual analytic
bi data data-analysis data-mining data-visualization eda k6s kanaries low-code pivot-table react tableau tableau-alternative typescript vega vega-lite visualization
Last synced: 14 Mar 2025
https://github.com/wzbsocialsciencecenter/pdftabextract
A set of tools for extracting tables from PDF files helping to do data mining on (OCR-processed) scanned documents.
data-mining image-processing ocr pdf python tables
Last synced: 14 May 2025
https://github.com/WZBSocialScienceCenter/pdftabextract
A set of tools for extracting tables from PDF files helping to do data mining on (OCR-processed) scanned documents.
data-mining image-processing ocr pdf python tables
Last synced: 26 Mar 2025
https://github.com/invoice-x/invoice2data
Extract structured data from PDF invoices
Last synced: 14 May 2025
https://github.com/safe-graph/graph-fraud-detection-papers
A curated list of Graph/Transformer-based fraud, anomaly, and outlier detection papers & resources
academic-publications anomaly-detection awsome-list data-mining data-science dataset deep-learning foundation-models fraud-detection graph-algorithms graph-convolutional-networks graph-neural-networks llm machine-learning outlier-detection papers security spam-detection survey transformer
Last synced: 26 Jan 2026
https://github.com/paddlepaddle/research
novel deep learning research works with PaddlePaddle
computer-vision data-mining deep-learning knowledge-graph nlp spatial-temporal
Last synced: 15 May 2025
https://github.com/PaddlePaddle/Research
novel deep learning research works with PaddlePaddle
computer-vision data-mining deep-learning knowledge-graph nlp spatial-temporal
Last synced: 30 Mar 2025
https://github.com/404notf0und/AI-for-Security-Learning
安全场景、基于AI的安全算法和安全数据分析业界实践
data-analysis data-mining machine-learning security
Last synced: 27 Apr 2025
https://github.com/404notf0und/ai-for-security-learning
安全场景、基于AI的安全算法和安全数据分析业界实践
data-analysis data-mining machine-learning security
Last synced: 26 Jan 2026
https://github.com/yimeng-zhang/feature-engineering-and-feature-selection
A Guide for Feature Engineering and Feature Selection, with implementations and examples in Python.
data-mining feature-engineering feature-extraction feature-selection machine-learning python
Last synced: 16 May 2025
https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection
A Guide for Feature Engineering and Feature Selection, with implementations and examples in Python.
data-mining feature-engineering feature-extraction feature-selection machine-learning python
Last synced: 06 May 2025
https://github.com/microsoft/RD-Agent
Research and development (R&D) is crucial for the enhancement of industrial productivity, especially in the AI era, where the core aspects of R&D are mainly focused on data and models. We are committed to automating these high-value generic R&D processes through our open source R&D automation tool RD-Agent, which lets AI drive data-driven AI.
agent ai automation data-mining data-science development llm research
Last synced: 24 Oct 2025
https://github.com/sepandhaghighi/pycm
Multi-class confusion matrix library in Python
accuracy ai artificial-intelligence classification confusion-matrix data data-analysis data-mining data-science deep-learning deeplearning evaluation machine-learning mathematics matrix ml multiclass-classification neural-network statistical-analysis statistics
Last synced: 13 May 2025
https://github.com/demidovakatya/vvedenie-mashinnoe-obuchenie
:memo: Подборка ресурсов по машинному обучению
collections data-mining data-science deep-learning machine-learning mooc neural-networks nlp russian university
Last synced: 26 Jan 2026
https://github.com/ebay/tsv-utils
eBay's TSV Utilities: Command line tools for large, tabular data files. Filtering, statistics, sampling, joins and more.
cli command-line csv d data-mining data-science delimited-files dlang reservoir-sampling sampling shuffle statistics tabular-data tsv uniq
Last synced: 27 Jan 2026
https://github.com/eBay/tsv-utils
eBay's TSV Utilities: Command line tools for large, tabular data files. Filtering, statistics, sampling, joins and more.
cli command-line csv d data-mining data-science delimited-files dlang reservoir-sampling sampling shuffle statistics tabular-data tsv uniq
Last synced: 14 Apr 2025
https://github.com/circl/ail-framework
AIL framework - Analysis Information Leak framework. Project moved to https://github.com/ail-project
ail-framework analysis data-mining information-leak information-security leak privacy security security-incidents
Last synced: 14 May 2025
https://github.com/patmartin/dex
Dex : The Data Explorer -- A data visualization tool written in Java/Groovy/JavaFX capable of powerful ETL and publishing web visualizations.
d3 d3js data-analysis data-mining data-science data-visualization datavis datavisualization dataviz groovy java javafx visualization
Last synced: 16 May 2025
https://github.com/PatMartin/Dex
Dex : The Data Explorer -- A data visualization tool written in Java/Groovy/JavaFX capable of powerful ETL and publishing web visualizations.
d3 d3js data-analysis data-mining data-science data-visualization datavis datavisualization dataviz groovy java javafx visualization
Last synced: 04 May 2025
https://github.com/CIRCL/AIL-framework
AIL framework - Analysis Information Leak framework. Project moved to https://github.com/ail-project
ail-framework analysis data-mining information-leak information-security leak privacy security security-incidents
Last synced: 14 Apr 2025
https://github.com/alan-turing-institute/clevercsv
CleverCSV is a Python package for handling messy CSV files. It provides a drop-in replacement for the builtin CSV module with improved dialect detection, and comes with a handy command line application for working with CSV files.
csv csv-converter csv-export csv-files csv-format csv-import csv-parser csv-parsing csv-reader csv-reading data-analysis data-mining data-science datascience machine-learning python python-library python3
Last synced: 13 May 2025
https://github.com/alan-turing-institute/CleverCSV
CleverCSV is a Python package for handling messy CSV files. It provides a drop-in replacement for the builtin CSV module with improved dialect detection, and comes with a handy command line application for working with CSV files.
csv csv-converter csv-export csv-files csv-format csv-import csv-parser csv-parsing csv-reader csv-reading data-analysis data-mining data-science datascience machine-learning python python-library python3
Last synced: 26 Mar 2025
https://github.com/annoviko/pyclustering
pyclustering is a Python, C++ data mining library.
algorithms c-plus-plus clustering data-mining data-science machine-learning neural-networks oscillatory-networks python python3
Last synced: 14 May 2025
https://github.com/aeon-toolkit/aeon
A toolkit for machine learning from time series
data-mining data-science machine-learning scikit-learn time-series time-series-analysis time-series-anomaly-detection time-series-classification time-series-clustering time-series-regression time-series-segmentation
Last synced: 12 Dec 2025
https://github.com/lightaime/deep_gcns_torch
Pytorch Repo for DeepGCNs (ICCV'2019 Oral, TPAMI'2021), DeeperGCN (arXiv'2020) and GNN1000(ICML'2021): https://www.deepgcns.org
3d-point-clouds bioinformatics cheminformatics computer-vision data-mining deep-gcns deep-learning geometric-deep-learning graph-convolutional-networks graph-neural-networks pytorch science-research social-network
Last synced: 16 May 2025
https://github.com/nfstream/nfstream
NFStream: a Flexible Network Data Analysis Framework.
artificial-intelligence cybersecurity data-analysis data-mining data-science dataset-generation deep-packet-inspection machine-learning ndpi netflow network-analysis network-monitoring network-security packet-analyser packet-capture pcap python traffic-analysis traffic-classification
Last synced: 14 May 2025
https://github.com/k0lb3/unitypy
UnityPy is python module that makes it possible to extract/unpack and edit Unity assets
assetstudio data-mining python python3 unity unity-asset unity-asset-extractor unitypack
Last synced: 13 May 2025
https://github.com/K0lb3/UnityPy
UnityPy is python module that makes it possible to extract/unpack and edit Unity assets
assetstudio data-mining python python3 unity unity-asset unity-asset-extractor unitypack
Last synced: 24 Apr 2025
https://github.com/TheAlgorithms/R
Collection of various algorithms implemented in R.
algorithm algorithms classification clustering data-mining datamanipulation education hacktoberfest learning machine-learning practice preprocessing r r-language r-programming regression
Last synced: 29 Jul 2025
https://github.com/WenjieDu/PyPOTS
A Python toolkit/library for reality-centric machine/deep learning and data mining on partially-observed time series, including SOTA neural network models for scientific analysis tasks of imputation/classification/clustering/forecasting/anomaly detection/cleaning on incomplete industrial (irregularly-sampled) multivariate TS with NaN missing values
classification clustering data-mining data-science deep-learning forecasting healthcare imputation incomplete industrial interpolation machine-learning missing-values missingness neural-network partially-observed-time-series pytorch science-research time-series time-series-analysis
Last synced: 01 Apr 2025
https://github.com/ipython-books/cookbook-2nd
IPython Cookbook, Second Edition, by Cyrille Rossant, Packt Publishing 2018
computing data-analysis data-mining data-science data-visualization ipython jupyter jupyter-notebook machine-learning numerical-computation python visualization
Last synced: 16 May 2025
https://github.com/thealgorithms/r
Collection of various algorithms implemented in R.
algorithm algorithms classification clustering data-mining datamanipulation education hacktoberfest learning machine-learning practice preprocessing r r-language r-programming regression
Last synced: 14 May 2025
https://github.com/minqi824/adbench
Official Implement of "ADBench: Anomaly Detection Benchmark", NeurIPS 2022.
anomaly-detection benchmark data-mining data-sicence deep-learning ensemble-learning machine-learning neural-networks outlier-detection python semi-supervised-learning supervised-learning unsupervised-learning
Last synced: 15 May 2025
https://github.com/Minqi824/ADBench
Official Implement of "ADBench: Anomaly Detection Benchmark", NeurIPS 2022.
anomaly-detection benchmark data-mining data-sicence deep-learning ensemble-learning machine-learning neural-networks outlier-detection python semi-supervised-learning supervised-learning unsupervised-learning
Last synced: 07 Apr 2025
https://github.com/sunlabuiuc/PyHealth
A Deep Learning Python Toolkit for Healthcare Applications.
clinical-data clinical-research data-mining deep-learning electronic-health-record electronic-medical-record healthcare medical-code preprocessing
Last synced: 10 May 2025
https://github.com/anfederico/stocktalk
data-mining sentiment-analysis twitter
Last synced: 14 May 2025
https://github.com/GoogleCloudPlatform/DataflowJavaSDK
Google Cloud Dataflow provides a simple, powerful model for building both batch and streaming parallel data processing pipelines.
big-data data-analysis data-mining data-processing data-science google-cloud-dataflow
Last synced: 01 May 2025
https://github.com/googlecloudplatform/dataflowjavasdk
Google Cloud Dataflow provides a simple, powerful model for building both batch and streaming parallel data processing pipelines.
big-data data-analysis data-mining data-processing data-science google-cloud-dataflow
Last synced: 03 Oct 2025
https://github.com/elki-project/elki
ELKI Data Mining Toolkit
anomalydetection cluster-analysis clustering data-analysis data-mining data-mining-algorithms data-science distance-functions index indexing java machine-learning outlier-detection outliers time-series visualization
Last synced: 14 May 2025
https://github.com/jerlendds/osintbuddy
Node graphs, OSINT data mining, and plugins. Connect unstructured and public data for transformative insights. The rewrite can be found @ osintbuddy/osintbuddy
data-mining data-visualization information-gathering node-graph ontology osint osint-python plugin-system plugins python3 reconnaissance typescript
Last synced: 18 Jul 2025
https://github.com/ipython-books/cookbook-2nd-code
Code of the IPython Cookbook, Second Edition, by Cyrille Rossant, Packt Publishing 2018 [read-only repository]
computing data-analysis data-mining data-science data-visualization ipython jupyter jupyter-notebook machine-learning numerical-computation python visualization
Last synced: 12 Apr 2025
https://github.com/ashishpatel26/amazing-feature-engineering
Feature engineering is the process of using domain knowledge to extract features from raw data via data mining techniques. These features can be used to improve the performance of machine learning algorithms. Feature engineering can be considered as applied machine learning itself.
data-analysis data-mining data-science data-scientists data-visualization deep-learning feature-engineering feature-extraction feature-scaling feature-selection features machine-learning scikit-learn
Last synced: 16 May 2025
https://github.com/ail-project/ail-framework
AIL framework - Analysis Information Leak framework
ail-framework data-mining information-extraction information-security leak
Last synced: 15 May 2025
https://github.com/ashishpatel26/Amazing-Feature-Engineering
Feature engineering is the process of using domain knowledge to extract features from raw data via data mining techniques. These features can be used to improve the performance of machine learning algorithms. Feature engineering can be considered as applied machine learning itself.
data-analysis data-mining data-science data-scientists data-visualization deep-learning feature-engineering feature-extraction feature-scaling feature-selection features machine-learning scikit-learn
Last synced: 10 Apr 2025
https://github.com/dataproofer/Dataproofer
A proofreader for your data
cli command-line csv data-analysis data-mining data-science excel nodejs spreadsheet
Last synced: 30 Mar 2025
https://github.com/jphall663/interpretable_machine_learning_with_python
Examples of techniques for training interpretable ML models, explaining ML models, and debugging ML models for accuracy, discrimination, and security.
accountability data-mining data-science decision-tree fairness fatml gradient-boosting-machine h2o iml interpretability interpretable interpretable-ai interpretable-machine-learning interpretable-ml lime machine-learning machine-learning-interpretability python transparency xai
Last synced: 16 May 2025
https://github.com/yzhao062/combo
(AAAI' 20) A Python Toolbox for Machine Learning Model Combination
aggregation data-mining data-science ensemble-learning machine-learning machine-learning-pipelines model-combination pipeline-framework python
Last synced: 08 Apr 2025
https://business-science.github.io/timetk/
Time series analysis in the `tidyverse`
coercion coercion-functions data-mining dplyr forecast forecasting forecasting-models machine-learning r-package series-decomposition series-signature tibble tidy tidyquant tidyverse time time-series timeseries
Last synced: 23 Jul 2025
https://github.com/business-science/timetk
Time series analysis in the `tidyverse`
coercion coercion-functions data-mining dplyr forecast forecasting forecasting-models machine-learning r-package series-decomposition series-signature tibble tidy tidyquant tidyverse time time-series timeseries
Last synced: 15 May 2025
https://github.com/chris-greening/instascrape
Powerful and flexible Instagram scraping library for Python, providing easy-to-use and expressive tools for accessing data programmatically
beginner-friendly data-mining data-science instagram instagram-data instagram-scraper lightweight python python-scraper python3 webscraping
Last synced: 07 Apr 2025
https://github.com/McGill-DMaS/Kam1n0-Community
The Kam1n0 Assembly Analysis Platform
binary-analysis data-mining machine-learning reverse-engineering
Last synced: 09 May 2025
https://github.com/chaoss/grimoirelab
GrimoireLab: platform for software development analytics and insights
chaoss data-mining data-visualization grimoirelab insights metrics software-analytics
Last synced: 21 Jan 2026
https://github.com/holgerbrandl/krangl
krangl is a {K}otlin DSL for data w{rangl}ing
data-mining datascience java kotlin sql
Last synced: 11 Apr 2025
https://chaoss.github.io/grimoirelab/
GrimoireLab: platform for software development analytics and insights
chaoss data-mining data-visualization grimoirelab insights metrics software-analytics
Last synced: 03 Apr 2025
https://github.com/programminghistorian/jekyll
Jekyll-based static site for The Programming Historian
api data-management data-manipulation data-mining dh digital-humanities exhibits linked-open-data mapping multi-lingual network-analysis open-educational-resources open-source pedagogy programming-historian python r-studio scraping text-analysis web-scraping
Last synced: 14 Mar 2025
https://github.com/hackingmaterials/matminer
Data mining for materials science
condensed-matter data-mining machine-learning materials-science matminer
Last synced: 21 Oct 2025
https://github.com/jchao01/TradingView-data-scraper
Extract price and indicator data from TradingView charts to create ML datasets
algorithmic-trading data-mining json tradingview webscraping
Last synced: 26 Mar 2025
https://github.com/h2oai/mli-resources
H2O.ai Machine Learning Interpretability Resources
accountability data-mining data-science explainable-ml fairness fatml h2o iml interpretability interpretable-ai interpretable-machine-learning interpretable-ml jupyter-notebooks machine-learning machine-learning-interpretability mli python transparency xai xgboost
Last synced: 05 Apr 2025
https://github.com/serengil/chefboost
A Lightweight Decision Tree Framework supporting regular algorithms: ID3, C4.5, CART, CHAID and Regression Trees; some advanced techniques: Gradient Boosting, Random Forest and Adaboost w/categorical features support for Python
adaboost c45-trees cart categorical-features data-mining data-science decision-trees gbdt gbm gbrt gradient-boosting gradient-boosting-machine gradient-boosting-machines id3 kaggle machine-learning python random-forest regression-tree
Last synced: 14 May 2025
https://github.com/CogComp/cogcomp-nlp
CogComp's Natural Language Processing Libraries and Demos: Modules include lemmatizer, ner, pos, prep-srl, quantifier, question type, relation-extraction, similarity, temporal normalizer, tokenizer, transliteration, verb-sense, and more.
big-data cogcomp data-mining dependency-parsing lemmatization lemmatizer named-entity-recognition natural-language-processing natural-language-understanding ner nlp parts-of-speech-tagging pos pos-tagging relation-extraction similarity tokenizer transliteration
Last synced: 27 Mar 2025
https://github.com/kk7nc/rmdl
RMDL: Random Multimodel Deep Learning for Classification
classification cnn convolutional-neural-networks data-mining deep-learning deep-neural-networks dnn ensemble-learning image-classification information-retrieval keras machine-learning multimodel recurrent-neural-networks rnn tensorflow text-classification text-mining
Last synced: 13 Apr 2025
https://github.com/desbordante/desbordante-core
Desbordante is a high-performance data profiler that is capable of discovering many different patterns in data using various algorithms. It also allows to run data cleaning scenarios using these algorithms. Desbordante has a console version and an easy-to-use web application.
anomaly-detection correlations data-analytics data-cleaning data-cleansing data-engineering data-exploration data-mining data-mining-algorithms data-preprocessing data-profiling data-science data-wrangling exploratory-data-analysis feature-engineering feature-extraction feature-selection knowledge-discovery spreadsheets tabular-data
Last synced: 22 Nov 2025
https://github.com/chuanconggao/PrefixSpan-py
The shortest yet efficient Python implementation of the sequential pattern mining algorithm PrefixSpan, closed sequential pattern mining algorithm BIDE, and generator sequential pattern mining algorithm FEAT.
bide data-mining feat pattern-mining prefixspan
Last synced: 26 Mar 2025
https://github.com/ScriptSmith/instamancer
Scrape Instagram's API with Puppeteer
data-mining instagram instagram-api instagram-scraper puppeteer scrape
Last synced: 04 Apr 2025
https://github.com/scriptsmith/instamancer
Scrape Instagram's API with Puppeteer
data-mining instagram instagram-api instagram-scraper puppeteer scrape
Last synced: 04 Apr 2025
https://github.com/airbnb/artificial-adversary
🗣️ Tool to generate adversarial text examples and test machine learning models against them
adversarial-examples black-box-attacks black-box-benchmarking classification data-mining data-science machine-learning metrics python python2 python3 spam spam-classification spam-detection spam-filtering text text-analysis text-classification text-mining text-processing
Last synced: 08 Oct 2025
https://fraud-detection-handbook.github.io/fraud-detection-handbook/
Reproducible Machine Learning for Credit Card Fraud Detection - Practical Handbook
credit-card credit-card-fraud data-mining data-science fraud-detection machine-learning open-data
Last synced: 19 Nov 2025
https://github.com/Desbordante/desbordante-core
Desbordante is a high-performance data profiler that is capable of discovering many different patterns in data using various algorithms. It also allows to run data cleaning scenarios using these algorithms. Desbordante has a console version and an easy-to-use web application.
anomaly-detection correlations data-analytics data-cleaning data-cleansing data-engineering data-exploration data-mining data-mining-algorithms data-preprocessing data-profiling data-science data-wrangling exploratory-data-analysis feature-engineering feature-extraction feature-selection knowledge-discovery spreadsheets tabular-data
Last synced: 03 Apr 2025
https://github.com/matrix-profile-foundation/matrixprofile
A Python 3 library making time series data mining tasks, utilizing matrix profile algorithms, accessible to everyone.
algorithms anomaly-detection clustering data-mining data-science hacktoberfest matrixprofile motif-discovery python python2 python3 segmentation time-series time-series-analysis
Last synced: 16 May 2025
https://github.com/ScriptSmith/reaper
Social media scraping / data collection tool for the Facebook, Twitter, Reddit, YouTube, Pinterest, and Tumblr APIs
api data-collection data-mining data-scraping facebook gui pinterest reddit scraping socialmedia tumblr twitter youtube
Last synced: 04 Apr 2025
https://github.com/scriptsmith/reaper
Social media scraping / data collection tool for the Facebook, Twitter, Reddit, YouTube, Pinterest, and Tumblr APIs
api data-collection data-mining data-scraping facebook gui pinterest reddit scraping socialmedia tumblr twitter youtube
Last synced: 07 Apr 2025