An open API service indexing awesome lists of open source software.

Projects in Awesome Lists tagged with data-exploration-and-preprocessing

A curated list of projects in awesome lists tagged with data-exploration-and-preprocessing .

https://github.com/AI-Northstar-Tech/vector-io

Comprehensive Vector Data Tooling. The universal interface for all vector database, datasets and RAG platforms. Easily export, import, backup, re-embed (using any model) or access your vector data from any vector databases or repository.

chromadb data-backup data-exploration-and-preprocessing data-export data-import datastax huggingface huggingface-datasets kdb lancedb milvus parquet pinecone qdrant turbopuffer vector-database vector-search-engine visualization zilliz

Last synced: 09 Mar 2025

https://github.com/sayamalt/company-bankruptcy-prediction

Successfully developed a machine learning model which can accurately predict whether a firm will become bankrupt or not, depending on various features such as net value growth rate, borrowing dependency, cash/total assets, etc.

binary-classification cicd-deployment cross-validation data-exploration-and-preprocessing data-visualization docker-container exploratory-data-analysis feature-engineering github-actions hyperparameter-optimization machine-learning model-deployment model-training-and-evaluation

Last synced: 28 Dec 2024

https://github.com/nafisalawalidris/employee-attrition-control

The Employee Attrition Control project uses data analysis and predictive modeling to understand and address employee turnover. It provides insights and recommendations to reduce attrition and improve employee satisfaction and retention.

data-exploration-and-preprocessing data-visualization-and-storytelling feature-selection-and-engineering model-training-and-evaluation predictive-modeling-techniques python statistical-analysis time-series-analysis turnover-analysis

Last synced: 16 Mar 2025

https://github.com/andersoncrs/analisis_exploratorio_de_datos-eda-_rendimiento_estudiantil

Este análisis exploratorio de datos (EDA) realizado sobre el conjunto de datos de rendimiento estudiantil tiene como objetivo identificar y comprender los factores que influyen en el desempeño académico de los estudiantes. A través de la limpieza, transformación y visualización de datos, se busca descubrir patrones y relaciones significatvas.

data-analysis data-exploration data-exploration-and-preprocessing data-visualization seaborn

Last synced: 30 Mar 2025

https://github.com/sayamalt/taxi-trip-fare-prediction

Successfully created a machine learning model which can accurately predict the fare of a taxi trip based on several features such as trip duration, tip amount, etc.

cross-validation data-exploration-and-preprocessing data-visualization exploratory-data-analysis feature-engineering hyperparameter-optimization machine-learning model-deployment model-selection model-training-and-evaluation regression-modelling

Last synced: 19 Feb 2025

https://github.com/sayamalt/global-news-headlines-text-summarization

Successfully established a text summarization model using Seq2Seq modeling with Luong Attention, which can give a short and concise summary of the global news headlines.

attention-mechanism data-exploration-and-preprocessing luong-attention model-architecture-and-implementation model-inference natural-language-processing seq2seq-model text-generation text-summarization text-tokenization

Last synced: 19 Feb 2025

https://github.com/sayamalt/credit-card-approval-prediction

Successfully developed a machine learning model which can accurately predict up to 100% accuracy whether a credit card application of a given applicant would be approved or not, based on several demographic features such as applicant age, total income, marital status, total years of work experience, etc.

binary-classification cicd-deployment cross-validation data-exploration-and-preprocessing data-visualization exploratory-data-analysis feature-engineering hyperparameter-optimization machine-learning model-deployment model-retraining model-selection model-testing model-training-and-evaluation

Last synced: 19 Feb 2025

https://github.com/sayamalt/employee-attrition-prediction

Successfully established a machine learning model which can accurately predict whether an employee of a given company will leave it in the impending future or not, based on several employee details and employment metrics.

binary-classification continuous-deployment continuous-integration cross-validation data-exploration-and-preprocessing data-visualization exploratory-data-analysis feature-engineering hyperparameter-optimization machine-learning model-deployment model-training-and-evaluation

Last synced: 19 Feb 2025

https://github.com/aniket2021448/movie-recommender-system

A Machine Learning Project implemented from scratch which involves web scraping, data engineering, exploratory data analysis, NLP processing and ML, achieving the functionality of a Content based movie recommender system

data-exploration data-exploration-and-preprocessing free-hosting-service machine-learning natural-language-processing nltk-python numpy pandas streamlit-webapp

Last synced: 23 Feb 2025

https://github.com/srosalino/prediction_of_seoul_bikes_demand

The objective of this project is to predict the number of bicycles needed to be made available each hour in order to make the service as efficient as possible

cross-validation data-exploration-and-preprocessing hyperparameter-tuning machine-learning regularization-methods scikit-learn

Last synced: 25 Feb 2025