Projects in Awesome Lists tagged with data-exploration-and-preprocessing
A curated list of projects in awesome lists tagged with data-exploration-and-preprocessing .
https://github.com/AI-Northstar-Tech/vector-io
Comprehensive Vector Data Tooling. The universal interface for all vector database, datasets and RAG platforms. Easily export, import, backup, re-embed (using any model) or access your vector data from any vector databases or repository.
chromadb data-backup data-exploration-and-preprocessing data-export data-import datastax huggingface huggingface-datasets kdb lancedb milvus parquet pinecone qdrant turbopuffer vector-database vector-search-engine visualization zilliz
Last synced: 09 Mar 2025
https://github.com/sayamalt/company-bankruptcy-prediction
Successfully developed a machine learning model which can accurately predict whether a firm will become bankrupt or not, depending on various features such as net value growth rate, borrowing dependency, cash/total assets, etc.
binary-classification cicd-deployment cross-validation data-exploration-and-preprocessing data-visualization docker-container exploratory-data-analysis feature-engineering github-actions hyperparameter-optimization machine-learning model-deployment model-training-and-evaluation
Last synced: 28 Dec 2024
https://github.com/nafisalawalidris/employee-attrition-control
The Employee Attrition Control project uses data analysis and predictive modeling to understand and address employee turnover. It provides insights and recommendations to reduce attrition and improve employee satisfaction and retention.
data-exploration-and-preprocessing data-visualization-and-storytelling feature-selection-and-engineering model-training-and-evaluation predictive-modeling-techniques python statistical-analysis time-series-analysis turnover-analysis
Last synced: 16 Mar 2025
https://github.com/andersoncrs/analisis_exploratorio_de_datos-eda-_rendimiento_estudiantil
Este análisis exploratorio de datos (EDA) realizado sobre el conjunto de datos de rendimiento estudiantil tiene como objetivo identificar y comprender los factores que influyen en el desempeño académico de los estudiantes. A través de la limpieza, transformación y visualización de datos, se busca descubrir patrones y relaciones significatvas.
data-analysis data-exploration data-exploration-and-preprocessing data-visualization seaborn
Last synced: 30 Mar 2025
https://github.com/sayamalt/taxi-trip-fare-prediction
Successfully created a machine learning model which can accurately predict the fare of a taxi trip based on several features such as trip duration, tip amount, etc.
cross-validation data-exploration-and-preprocessing data-visualization exploratory-data-analysis feature-engineering hyperparameter-optimization machine-learning model-deployment model-selection model-training-and-evaluation regression-modelling
Last synced: 19 Feb 2025
https://github.com/sayamalt/symptoms-disease-text-classification
Successfully developed a fine-tuned BERT transformer model which can accurately classify symptoms to their corresponding diseases upto an accuracy of 89%.
bert-fine-tuning data-exploration-and-preprocessing exploratory-data-analysis fine-tune-bert-tensorflow hugging-face-transformers model-architecture-and-implementation model-inference model-training-and-evaluation multiclass-classification natural-language-processing text-classification text-preprocessing text-tokenization
Last synced: 19 Feb 2025
https://github.com/sayamalt/financial-news-sentiment-analysis
Successfully developed a fine-tuned DistilBERT transformer model which can accurately predict the overall sentiment of a piece of financial news up to an accuracy of nearly 81.5%.
data-exploration-and-preprocessing distilbert-model fine-tune-bert-tensorflow hugging-face-transformers model-architecture-and-implementation model-inference model-training-and-evaluation multiclass-classification natural-language-processing sentiment-analysis text-preprocessing text-tokenization
Last synced: 19 Feb 2025
https://github.com/sayamalt/global-news-headlines-text-summarization
Successfully established a text summarization model using Seq2Seq modeling with Luong Attention, which can give a short and concise summary of the global news headlines.
attention-mechanism data-exploration-and-preprocessing luong-attention model-architecture-and-implementation model-inference natural-language-processing seq2seq-model text-generation text-summarization text-tokenization
Last synced: 19 Feb 2025
https://github.com/sayamalt/credit-card-approval-prediction
Successfully developed a machine learning model which can accurately predict up to 100% accuracy whether a credit card application of a given applicant would be approved or not, based on several demographic features such as applicant age, total income, marital status, total years of work experience, etc.
binary-classification cicd-deployment cross-validation data-exploration-and-preprocessing data-visualization exploratory-data-analysis feature-engineering hyperparameter-optimization machine-learning model-deployment model-retraining model-selection model-testing model-training-and-evaluation
Last synced: 19 Feb 2025
https://github.com/sayamalt/employee-attrition-prediction
Successfully established a machine learning model which can accurately predict whether an employee of a given company will leave it in the impending future or not, based on several employee details and employment metrics.
binary-classification continuous-deployment continuous-integration cross-validation data-exploration-and-preprocessing data-visualization exploratory-data-analysis feature-engineering hyperparameter-optimization machine-learning model-deployment model-training-and-evaluation
Last synced: 19 Feb 2025
https://github.com/aniket2021448/movie-recommender-system
A Machine Learning Project implemented from scratch which involves web scraping, data engineering, exploratory data analysis, NLP processing and ML, achieving the functionality of a Content based movie recommender system
data-exploration data-exploration-and-preprocessing free-hosting-service machine-learning natural-language-processing nltk-python numpy pandas streamlit-webapp
Last synced: 23 Feb 2025
https://github.com/venkat-a/happiness-prediction
Prediction of happy Customers based on Happiness Survey Data
data-exploration-and-preprocessing hyperparameter-tuning machine-learning matplotlib metrics model-evaluation-and-tuning seaborn sklearn visualization
Last synced: 25 Feb 2025
https://github.com/srosalino/prediction_of_seoul_bikes_demand
The objective of this project is to predict the number of bicycles needed to be made available each hour in order to make the service as efficient as possible
cross-validation data-exploration-and-preprocessing hyperparameter-tuning machine-learning regularization-methods scikit-learn
Last synced: 25 Feb 2025