Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Projects in Awesome Lists tagged with feature-engineering

A curated list of projects in awesome lists tagged with feature-engineering .

https://github.com/alibaba/alink

Alink is the Machine Learning algorithm platform based on Flink, developed by the PAI team of Alibaba computing platform.

apriori classification clustering data-mining feature-engineering flink flink-machine-learning flink-ml fm graph-algorithms graph-embedding kafka machine-learning recommender recommender-system regression statistics word2vec xgboost

Last synced: 17 Dec 2024

https://github.com/alibaba/Alink

Alink is the Machine Learning algorithm platform based on Flink, developed by the PAI team of Alibaba computing platform.

apriori classification clustering data-mining feature-engineering flink flink-machine-learning flink-ml fm graph-algorithms graph-embedding kafka machine-learning recommender recommender-system regression statistics word2vec xgboost

Last synced: 26 Oct 2024

https://github.com/apachecn/fe4ml-zh

:book: [译] 面向机器学习的特征工程

book feature-engineering machine-learning python

Last synced: 19 Dec 2024

https://github.com/salesforce/transmogrifai

TransmogrifAI (pronounced trăns-mŏgˈrə-fī) is an AutoML library for building modular, reusable, strongly typed machine learning workflows on Apache Spark with minimal hand-tuning

ai automated-machine-learning automl dsl einstein estimators feature-engineering features machine-learning ml pipelines salesforce scala spark sparkml structured-data transformations transformers transmogrification transmogrify

Last synced: 19 Dec 2024

https://github.com/salesforce/TransmogrifAI

TransmogrifAI (pronounced trăns-mŏgˈrə-fī) is an AutoML library for building modular, reusable, strongly typed machine learning workflows on Apache Spark with minimal hand-tuning

ai automated-machine-learning automl dsl einstein estimators feature-engineering features machine-learning ml pipelines salesforce scala spark sparkml structured-data transformations transformers transmogrification transmogrify

Last synced: 30 Oct 2024

https://github.com/visualize-ml/book6_first-course-in-data-science

Book_6_《数据有道》 | 鸢尾花书:从加减乘除到机器学习;欢迎大家批评指正!纠错多的同学会得到赠书感谢!

data data-science data-visualization feature-engineering machine-learning python

Last synced: 19 Dec 2024

https://github.com/metarank/metarank

A low code Machine Learning personalized ranking service for articles, listings, search results, recommendations that boosts user engagement. A friendly Learn-to-Rank engine

automl data-engineering data-science deep-learning feature-engineering feature-extraction kubernetes machine-learning neural-networks personalization ranking scala search

Last synced: 19 Dec 2024

https://github.com/dagworks-inc/hamilton

Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage/tracing and metadata. Runs and scales everywhere python does.

dag data-analysis data-engineering data-science dataframe etl etl-framework etl-pipeline feature-engineering hacktoberfest lineage llmops machine-learning mlops orchestration pandas python rag software-engineering

Last synced: 17 Dec 2024

https://github.com/DAGWorks-Inc/hamilton

Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage/tracing and metadata. Runs and scales everywhere python does.

dag data-analysis data-engineering data-science dataframe etl etl-framework etl-pipeline feature-engineering hacktoberfest lineage llmops machine-learning mlops orchestration pandas python rag software-engineering

Last synced: 29 Oct 2024

https://github.com/featureform/featureform

The Virtual Feature Store. Turn your existing data infrastructure into a feature store.

data-quality data-science embeddings embeddings-similarity feature-engineering feature-store hacktoberfest machine-learning ml mlops python vector-database

Last synced: 18 Dec 2024

https://github.com/4paradigm/OpenMLDB

OpenMLDB is an open-source machine learning database that provides a feature platform computing consistent features for training and inference.

database-for-ai database-for-machine-learning feature-engineering feature-extraction feature-store featureops featurestore in-memory-database machine-learning machine-learning-database mlops

Last synced: 06 Nov 2024

https://github.com/4paradigm/openmldb

OpenMLDB is an open-source machine learning database that provides a feature platform computing consistent features for training and inference.

database-for-ai database-for-machine-learning feature-engineering feature-extraction feature-store featureops featurestore in-memory-database machine-learning machine-learning-database mlops

Last synced: 17 Dec 2024

https://github.com/yimeng-zhang/feature-engineering-and-feature-selection

A Guide for Feature Engineering and Feature Selection, with implementations and examples in Python.

data-mining feature-engineering feature-extraction feature-selection machine-learning python

Last synced: 15 Dec 2024

https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection

A Guide for Feature Engineering and Feature Selection, with implementations and examples in Python.

data-mining feature-engineering feature-extraction feature-selection machine-learning python

Last synced: 13 Nov 2024

https://github.com/asavinov/intelligent-trading-bot

Intelligent Trading Bot: Automatically generating signals and trading based on machine learning and feature engineering

algorithmic-trading artificial-intelligence bitcoin crypto crypto-trading cryptocurrency feature-engineering machine-learning trading trading-bots

Last synced: 20 Dec 2024

https://github.com/nvidia-merlin/nvtabular

NVTabular is a feature engineering and preprocessing library for tabular data designed to quickly and easily manipulate terabyte scale datasets used to train deep learning based recommender systems.

deep-learning feature-engineering feature-selection gpu machine-learning nvidia preprocessing recommendation-system recommender-system

Last synced: 17 Dec 2024

https://github.com/NVIDIA-Merlin/NVTabular

NVTabular is a feature engineering and preprocessing library for tabular data designed to quickly and easily manipulate terabyte scale datasets used to train deep learning based recommender systems.

deep-learning feature-engineering feature-selection gpu machine-learning nvidia preprocessing recommendation-system recommender-system

Last synced: 12 Nov 2024

https://github.com/functime-org/functime

Time-series machine learning at scale. Built with Polars for embarrassingly parallel feature extraction and forecasts on panel data.

feature-engineering forecasting machine-learning panel-data polars python time-series

Last synced: 05 Nov 2024

https://github.com/fraunhoferportugal/tsfel

An intuitive library to extract features from time series.

classification colab-notebook data-science feature-engineering feature-extraction time-series

Last synced: 26 Oct 2024

https://github.com/stitchfix/hamilton

A scalable general purpose micro-framework for defining dataflows. THIS REPOSITORY HAS BEEN MOVED TO www.github.com/dagworks-inc/hamilton

dag data-engineering data-platform data-science dataframe etl etl-framework etl-pipeline feature-engineering featurization hamilton hamiltonian machine-learning numpy pandas python software-engineering stitch-fix

Last synced: 26 Sep 2024

https://github.com/jeongyoonlee/kaggler

Code for Kaggle Data Science Competitions

automl feature-engineering kaggle kaggler machine-learning python

Last synced: 20 Dec 2024

https://github.com/ashishpatel26/amazing-feature-engineering

Feature engineering is the process of using domain knowledge to extract features from raw data via data mining techniques. These features can be used to improve the performance of machine learning algorithms. Feature engineering can be considered as applied machine learning itself.

data-analysis data-mining data-science data-scientists data-visualization deep-learning feature-engineering feature-extraction feature-scaling feature-selection features machine-learning scikit-learn

Last synced: 20 Dec 2024

https://github.com/google/temporian

Temporian is an open-source Python library for preprocessing ⚡ and feature engineering 🛠 temporal data 📈 for machine learning applications 🤖

cpp feature-engineering python temporal-data time-series

Last synced: 19 Dec 2024

https://github.com/autoviml/featurewiz

Use advanced feature engineering strategies and select best features from your data set with a single line of code. Created by Ram Seshadri. Collaborators welcome.

best-encoders categorical-variables feature-engg feature-engineering feature-extraction feature-selection featuretools rfe rfecv xgboost

Last synced: 20 Dec 2024

https://github.com/AutoViML/featurewiz

Use advanced feature engineering strategies and select best features from your data set with a single line of code. Created by Ram Seshadri. Collaborators welcome.

best-encoders categorical-variables feature-engg feature-engineering feature-extraction feature-selection featuretools rfe rfecv xgboost

Last synced: 25 Nov 2024

https://github.com/ashishpatel26/Amazing-Feature-Engineering

Feature engineering is the process of using domain knowledge to extract features from raw data via data mining techniques. These features can be used to improve the performance of machine learning algorithms. Feature engineering can be considered as applied machine learning itself.

data-analysis data-mining data-science data-scientists data-visualization deep-learning feature-engineering feature-extraction feature-scaling feature-selection features machine-learning scikit-learn

Last synced: 07 Nov 2024

https://github.com/v1tzor/TimePlanner

Mobile app for planning tasks for the day with multimodule architecture, MVI, Compose, Room, Voyager, AlarmManager, Notification, Charts

alarmmanager android charts clean-architecture compose feature-engineering flow jetpack-compose kotlin kotlin-coroutines material3 mvi mvi-clean-architecture notifications planner-app room unittest voyager

Last synced: 09 Nov 2024

https://github.com/Aura-healthcare/hrv-analysis

Package for Heart Rate Variability analysis in Python

feature-engineering heart-rate-variability python rr-interval

Last synced: 06 Nov 2024

https://github.com/solegalli/feature-engineering-for-machine-learning

Code repository for the online course Feature Engineering for Machine Learning

data-science feature-engineering feature-extraction machine-learning python

Last synced: 20 Dec 2024

https://github.com/evinism/mistql

A query / expression language for performing computations on JSON-like structures. Tuned for clientside ML feature extraction.

expression-language feature-engineering feature-extraction hacktoberfest javascript json machine-learning mistql python query typescript

Last synced: 31 Oct 2024

https://github.com/upgini/upgini

Data search & enrichment library for Machine Learning → Easily find and add relevant features to your ML & AI pipeline from hundreds of public and premium external data sources, including open & commercial LLMs

automated-feature-engineering automl automl-pipeline chatgpt data-enrichment data-science feature-engineering feature-extraction feature-selection features kaggle kaggle-solution large-language-models llm machine-learning open-data open-datasets public-data python-library scikit-learn

Last synced: 20 Dec 2024

https://github.com/jalajthanaki/nlpython

This repository contains the code related to Natural Language Processing using python scripting language. All the codes are related to my book entitled "Python Natural Language Processing"

deep-learning feature-engineering feature-extraction feature-selection natural-language-processing parsing part-of-speech python-scripting-language python2 text-mining

Last synced: 22 Dec 2024

https://github.com/jalajthanaki/NLPython

This repository contains the code related to Natural Language Processing using python scripting language. All the codes are related to my book entitled "Python Natural Language Processing"

deep-learning feature-engineering feature-extraction feature-selection natural-language-processing parsing part-of-speech python-scripting-language python2 text-mining

Last synced: 27 Nov 2024

https://github.com/alibaba/feathub

FeatHub - A stream-batch unified feature store for real-time machine learning

apache-flink data data-engineering data-quality data-science feature-engineering feature-store machine-learning mlops streaming

Last synced: 05 Nov 2024

https://github.com/nyanp/nyaggle

Code for Kaggle and Offline Competitions

experiment-tracking feature-engineering kaggle machine-learning ml

Last synced: 21 Dec 2024

https://github.com/howl-anderson/hanzi_char_featurizer

汉字字符特征提取器 (featurizer),提取汉字的特征(发音特征、字形特征)用做深度学习的特征 | A Chinese character feature extractor, which extracts the features of Chinese characters (pronunciation features, glyph features) as features for deep learning

character-level-featurize chinese-char-feature chinese-char-feature-dataset chinese-feature-engineering feature-engineering hanzi hanzi-char-feature

Last synced: 21 Nov 2024

https://github.com/thinkingmachines/geomancer

Automated feature engineering for geospatial data

bigquery feature-engineering geospatial machine-learning openstreetmap

Last synced: 29 Sep 2024

https://github.com/databrickslabs/automl-toolkit

Toolkit for Apache Spark ML for Feature clean-up, feature Importance calculation suite, Information Gain selection, Distributed SMOTE, Model selection and training, Hyper parameter optimization and selection, Model interprability.

apache-spark feature-engineering machinelearning ml pyspark scala spark

Last synced: 28 Sep 2024

https://github.com/vinta/albedo

A recommender system for discovering GitHub repos, built with Apache Spark

apache-spark elasticsearch feature-engineering machine-learning python recommender-system scala

Last synced: 19 Dec 2024

https://github.com/fuqiuai/sklearn-feature-engineering

使用sklearn做特征工程

feature-engineering kaggle sklearn

Last synced: 10 Oct 2024

https://github.com/risenw/datasist

A Python library for easy data analysis, visualization, exploration and modeling

data-analysis data-science data-visualization feature-engineering machine-learning python-3

Last synced: 22 Dec 2024

https://github.com/dominance-analysis/dominance-analysis

This package can be used for dominance analysis or Shapley Value Regression for finding relative importance of predictors on given dataset. This library can be used for key driver analysis or marginal resource allocation models.

classification-model dominance dominance-analysis dominance-statistics feature-engineering feature-importance feature-selection keydrivers logistic-regression multiple-regression predictor predictor-importance pseudo-r-square r-square regression-models relative-importance shapley-value

Last synced: 12 Nov 2024

https://github.com/paulescu/bytewax-hopsworks-example

Compute and store real-time features for crypto trading using Bytwax (stream processing) and Hopsworks (Feature Store)

bytewax feature-engineering feature-store hopsworks machine-learning real-time

Last synced: 13 Dec 2024

https://github.com/paulescu/build-and-deploy-real-time-feature-pipeline

Develop and deploy a real-time feature pipeline in Python, using Bytewax 🐝 and Hopsworks Feature Store.

bytewax feature-engineering hopsworks ml mlops python realtime streamlit

Last synced: 13 Dec 2024

https://github.com/shusentang/bdc2019

2019中国高校计算机大赛——大数据挑战赛 第三名解决方案

competition data-mining deep-learning feature-engineering machine-learning

Last synced: 12 Dec 2024

https://github.com/nikolaydubina/go-featureprocessing

🔥 Fast, simple sklearn-like feature processing for Go

feature-engineering go machine-learning

Last synced: 15 Dec 2024

https://github.com/ajayarunachalam/msda

Library for multi-dimensional, multi-sensor, uni/multivariate time series data analysis, unsupervised feature selection, unsupervised deep anomaly detection, and prototype of explainable AI for anomaly detector

anamoly-detection-using-graphs anomaly-detection correlation data-analysis deep-learning deep-neural-networks explainable-artificial-intelligence feature-engineering feature-selection multidimensional-data multisensor python pytorch sensor sensor-data signal-processing tabular-data time-series variation visualization

Last synced: 18 Dec 2024

https://github.com/imsanjoykb/data-science-regular-bootcamp

Regular practice on Data Science, Machien Learning, Deep Learning, Solving ML Project problem, Analytical Issue. Regular boost up my knowledge. The goal is to help learner with learning resource on Data Science filed.

artificial-intelligence data-analysis data-science data-science-notebook data-science-projects data-visualization database-connection deep-learning etl-pipeline etl-process feature-engineering machine-learning mysql-database neural-network numpy pandas postgresql python python-automation sqlite

Last synced: 12 Oct 2024

https://github.com/NITRO-AI/NitroFE

NitroFE is a Python feature engineering engine which provides a variety of modules designed to internally save past dependent values for providing continuous calculation.

feature feature-engineering features indicator indicators machine-learning time-series timeseries

Last synced: 27 Nov 2024

https://github.com/ibis-project/ibis-ml

IbisML is a library for building scalable ML pipelines using Ibis.

feature-engineering ibis machine-learning sql

Last synced: 21 Dec 2024

https://github.com/asavinov/prosto

Prosto is a data processing toolkit radically changing how data is processed by heavily relying on functions and operations with functions - an alternative to map-reduce and join-groupby

business-intelligence data-preparation data-preprocessing data-processing data-science data-wrangling feature-engineering map-reduce olap pandas python spark workflow

Last synced: 07 Nov 2024

https://github.com/mratsim/home-credit-default-risk

Default risk prediction for Home Credit competition - Fast, scalable and maintainable SQL-based feature engineering pipeline

feature-engineering kaggle machine-learning xgboost

Last synced: 22 Oct 2024

https://github.com/IliaZenkov/sklearn-audio-classification

An in-depth analysis of audio classification on the RAVDESS dataset. Feature engineering, hyperparameter optimization, model evaluation, and cross-validation with a variety of ML techniques and MLP

audio audio-data classification deep-learning-tutorial deep-neural-networks dnns emotion emotion-detection emotion-recognition feature-engineering machine-learning machine-learning-tutorials mlp-model model-evaluation ravdess-dataset scikit-learn sklearn

Last synced: 28 Oct 2024

https://github.com/Desbordante/desbordante-core

Desbordante is a high-performance data profiler that is capable of discovering many different patterns in data using various algorithms. It also allows to run data cleaning scenarios using these algorithms. Desbordante has a console version and an easy-to-use web application.

anomaly-detection correlations data-analytics data-cleaning data-cleansing data-engineering data-exploration data-mining data-mining-algorithms data-preprocessing data-profiling data-science data-wrangling exploratory-data-analysis feature-engineering feature-extraction feature-selection knowledge-discovery spreadsheets tabular-data

Last synced: 04 Nov 2024

https://github.com/404notf0und/FXY

Security-Scenes-Feature-Engineering-Toolkit, Continuous Integration.一款安全数据特征化工具

data-analysis data-mining feature-engineering machine-learning security security-scenes

Last synced: 21 Nov 2024

https://github.com/404notf0und/fxy

Security-Scenes-Feature-Engineering-Toolkit, Continuous Integration.一款安全数据特征化工具

data-analysis data-mining feature-engineering machine-learning security security-scenes

Last synced: 07 Nov 2024