Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
Projects in Awesome Lists by MoinDalvs
A curated list of projects in awesome lists by MoinDalvs .
https://github.com/moindalvs/forecasting_airline_passengers_traffic
Forecast the Airlines Passengers. Prepare a document for each model explaining how many dummy variables you have created and RMSE value for each model. Finally which model you will use for Forecasting.
additive arima-forecasting data-science double-exponential-smoothing forecasting holt-winters holt-winters-forecasting multiplicative sarima-model seasonality-analysis simple-exponential-smoothing stationarity stationarity-test time-series-forecasting timeseries-analysis trend-analysis triple-exponential-smoothing
Last synced: 17 Nov 2024
https://github.com/moindalvs/excelr_data_science_assignments
Find all EXCELR Data Science Assignment Here
Last synced: 17 Nov 2024
https://github.com/moindalvs/neural_networks_forest_fire_classification
PREDICT THE BURNED AREA OF FOREST FIRES WITH NEURAL NETWORKS
auc-roc-curve binary-classification classification dropout-layers epochs hyperparameter-optimization hyperparameter-tuning keras-classification-models neural-networks tensorflow weight-initialization
Last synced: 17 Nov 2024
https://github.com/moindalvs/assignment_knn_glass
Problem Statement Implement a KNN model to classify the different types of Glass
Last synced: 17 Nov 2024
https://github.com/moindalvs/assignment_random_forest_1
Use Random Forest to prepare a model on fraud data treating those who have taxable income <= 30000 as "Risky" and others are "Good"
bagging-ensemble bagging-trees data-science hyperparameter-tuning random-forest-classifier
Last synced: 17 Nov 2024
https://github.com/moindalvs/assignment_2_set_1
Assignment 2 Set 2
barplot boxplot descriptive-statistics inference labels matplotlib outlier-detection outliers pie-chart probability scipy-stats seaborn-plots
Last synced: 17 Nov 2024
https://github.com/moindalvs/neural_network_regression_gas_turbines
Predicting Turbine Energy Yield (TEY) using ambient variables as features.
activation-functions dropout-keras dropout-layers hyperparameter-optimization hyperparameter-tuning keras-neural-networks neural-networks neurons regression weights-and-biases
Last synced: 17 Nov 2024
https://github.com/moindalvs/excelr_data_analyst_sql_assignment_part3
1. Write a stored procedure that accepts the month and year as inputs and prints the ordernumber, orderdate and status of the orders placed in that month.
Last synced: 17 Nov 2024
https://github.com/moindalvs/neural_networks_from_scratch
Neural_Networks_From_Scratch
activation-function activation-functions back-propagation classification deep-learning gradient-descent hyperparameter-optimization hyperparameter-tuning keras-neural-networks machine-learning neural-network neural-networks optimization-algorithms optimizer tensorflow weight-initialization
Last synced: 17 Nov 2024
https://github.com/moindalvs/assignment_random_forest_2
A cloth manufacturing company is interested to know about the segment or attributes causes high sale. Approach - A Random Forest can be built with target variable Sale (we will first convert it in categorical variable) & all other variable will be independent in the analysis.
data-science hyperparameter-tuning numpy pandas python random-forest-classifier sklearn
Last synced: 17 Nov 2024
https://github.com/moindalvs/resume_screening_and_parser
Business objective- The document classification solution should significantly reduce the manual human effort in the HRM. It should achieve a higher level of accuracy and automation with minimal human intervention Sample Data Set Details: Resumes and financial documents
data-science doc2txt doc2vec docx-converter docx-to-pdf docx2txt pdf-document-processor pdf2txt streamlit text text-analysis text-classification text-mining text-processing unstructured-data
Last synced: 17 Nov 2024
https://github.com/moindalvs/a_guide_for_actuarial_science
What is Actuarial analyst ? what are their responsibilities, skills required and interview questions
Last synced: 17 Nov 2024
https://github.com/moindalvs/survival_analysis_from_scratch
An introduction to the concepts of Survival Analysis and its implementation in lifelines package for Python.
additive-models cox-regression data-science hazard hazard-ratios kaplan-meier kaplanmeierfitter lifeline log-rank-test multiplicative-model survival survival-analysis survival-regression
Last synced: 17 Nov 2024
https://github.com/moindalvs/assignment_hypothesis_test
A F&B manager wants to determine whether there is any significant difference in the diameter of the cutlet between two units. A randomly selected sample of cutlets was collected from both units and measured? Analyze the data and draw inferences at 5% significance level. Please state the assumptions and tests that you carried out to check validity of the assumptions.
2sample-2tail anova-test chisquare chisquare-test contingency-table hypothesis hypothesis-testing numpy numpy-arrays pandas scipy-stats testofindependence
Last synced: 17 Nov 2024
https://github.com/moindalvs/assignment_east-west_airlines
Problem Statement Perform clustering (Hierarchical,K means clustering and DBSCAN) for the airlines data to obtain optimum number of clusters
clustering-algorithm data-science dbscan-clustering epsilon-greedy hierarchical-clustering kmeans-clustering
Last synced: 17 Nov 2024
https://github.com/moindalvs/assignment-basic-stats-level1
Assignment Basic Stats
bia binomial-distribution boxplot confidence-intervals datatypes descriptive-statistics inferential-statistics normal-distribution numpy pandas probability python3 scipy-stats statistics z-score
Last synced: 17 Nov 2024
https://github.com/moindalvs/forecasting_cocacola_prices.
Prepare a document for each model explaining how many dummy variables you have created and RMSE value for each model. Finally which model you will use for Forecasting.
arima data-science double-exponential-smoothing forecasting holt-winters-forecasting sarima-models sarimax simple-exponential-smoothing time-series-analysis time-series-forecasting time-series-prediction triple-exponential-smoothing
Last synced: 17 Nov 2024
https://github.com/moindalvs/co2_emission_forecasting
P-140 Air Quality forecasting(CO2 emissions) Business Objective: To forecast Co2 levels for an organization so that the organization can follow government norms with respect to Co2 emission levels. Data Set Details: Time parameter and levels of Co2 emission
arima-forecasting cyclic deployment exponential-smoothing forecasting-models holt-winters-forecasting holts-winter lstm-neural-networks moving-average pickle rnn-model sarima-model time-series time-series-analysis
Last synced: 17 Nov 2024
https://github.com/moindalvs/k-nearest_neigbor_knn
KNN (K_Nearest_Neighbor)
Last synced: 17 Nov 2024
https://github.com/moindalvs/how_to_convert_pdf_to_docx_in_python
How to convert .pdf extension files into .docx file in python?
docx pdf pdf-converter pdf-files pdf2docx
Last synced: 17 Nov 2024
https://github.com/moindalvs/excelr_data_analyst_sql_assignment_part2
1. select all employees in department 10 whose salary is greater than 3000. [table: employee]
Last synced: 17 Nov 2024
https://github.com/moindalvs/excelr_data_analyst_python_assignment
Learn Python
Last synced: 17 Nov 2024
https://github.com/moindalvs/from-the-following-tables-write-a-sql-query-to-display-the-customer-name-customer-city-grade-deli
Assignment 3: From the following tables write a SQL query to display the customer_name, customer city, grade, deliveryagent. deliver yagent city. The result should be ordered by ascending on customer_id. customer table: customer_id|customer_name | city | grade | deliver yagent_id 3002|NickRimando |New York | 100 | 5001 3007 | Brad Davis | New York | 200 | 5001 3005 | Graham Zusi | California | 200 | 5002 3008 | Julian Green [London | 300 | 5002 3004 | Fabian Johnson | Paris | 300 | 5006 3009 | Geoff Cameron | Berlin {| 100 | 5003 3003 | Jozy Altidor {Moscow | 200 | 5007 3001 | Brad Guzan | London | | 5005 deliveryagent table deliveryagent_id| name | city | commission 5001 | James Hoag | New York | 0.15 5002 | Nail Knite | Paris | 013 5005 | Pit Alex [London | 0.1 5006 | Mc Lyon | Paris | 0.44 5007| Paul Adam | Rome | 0.13
Last synced: 17 Nov 2024
https://github.com/moindalvs/excelr_data_analyst_sql_assignment_part1
1. create a database called 'assignment' (Note please do the assignment tasks in this database)
Last synced: 17 Nov 2024
https://github.com/moindalvs/connect_mysql_with_python
How to connect MySQL with Python and write Queries to convert table into pd.DataFrame()
Last synced: 17 Nov 2024
https://github.com/moindalvs/clustering_techniques
Hierarchical, KMeans and DBSCAN clustering techniques
Last synced: 17 Nov 2024
https://github.com/moindalvs/assignment_multi_linear_regression_2
Consider only the below columns and prepare a prediction model for predicting Price. Corolla<-Corolla[c("Price","Age_08_04","KM","HP","cc","Doors","Gears","Quarterly_Tax","Weight")]
cooks-distance data-science data-structures data-visualization exploratory-data-analysis feature-engineering feature-selection influencers multi-collinearity-issue outlier-removal outliers-detection predictive-modeling
Last synced: 17 Nov 2024
https://github.com/moindalvs/assignment_web_scraping_emotion_mining
Extract reviews of any product from ecommerce website like amazon
amazon data-science emotion-analysis multipe-page nltk review-sentiments scraper scraping scraping-websites sentiment-analysis text-analysis text-mining text-processing textblob webscraping wordcloud
Last synced: 17 Nov 2024
https://github.com/moindalvs/assignment_multi_linear_regression_1
Prepare a prediction model for profit of 50_startups data. Do transformations for getting better predictions of profit and make a table containing R^2 value for each prepared model.
Last synced: 17 Nov 2024
https://github.com/moindalvs/assignment_naive_bayes_salary_dataset
Problem Statement Prepare a classification model using Naive Bayes for salary data
Last synced: 17 Nov 2024
https://github.com/moindalvs/assignment_pca_wine_dataset
Case Summary Perform Principal component analysis and perform clustering using first 3 principal component scores (both Heirarchical and k mean clustering(scree plot or elbow curve) and obtain optimum number of clusters and check whether we have obtained same number of clusters with the original data (class column we have ignored at the begining who shows it has 3 clusters)
data-science feature-selection jupyter-notebook pca pca-analysis python tsne
Last synced: 17 Nov 2024
https://github.com/moindalvs/assignment_knn_zoo
Problem Statement Implement a KNN model to classify the animals into categories
Last synced: 17 Nov 2024
https://github.com/moindalvs/assignment_logistic_regression
Predicting Customer Response to Telemarketing Campaigns for Term Deposit
Last synced: 17 Nov 2024
https://github.com/moindalvs/assignment_crime_data_clustering
Content This data set contains statistics, in arrests per 100,000 residents for assault, murder, and rape in each of the 50 US states in 1973. Also given is the percent of the population living in urban areas.This is a systematic approach for identifying and analyzing patterns and trends in crime using USArrest dataset.
clustering-algorithm data-science dbscan-clustering epsilon hierarchical-clustering kmeans-clustering
Last synced: 17 Nov 2024
https://github.com/moindalvs/assignment_2_set_2
Topics: Normal distribution, Functions of Random Variables
cdf confidence-intervals cumulative-distribution-function normal-distribution percentile random-variables scipy-stats stats survival-functions z-score
Last synced: 17 Nov 2024
https://github.com/moindalvs/book_recommendation_system
Build a Book Recommendation System
Last synced: 17 Nov 2024
https://github.com/moindalvs/python_for_data_science
1.Python Basics function
arithmetic-operations boolean-operations for-loop-in-python function if-else indexing jupyter-notebook list-comprehensions list-methods lists-python numpy-library pandas-tutorial python3 sequence slicing string tuples-in-python variables-python
Last synced: 17 Nov 2024
https://github.com/moindalvs/learn_about_pandas_series
How to Install Pandas package into Jupyter Notebook or any of the other Python Packagesa and learn more about Series
dataframe dictionary import-csv pandas pandas-dataframe pandas-python pandas-series pandas-tutorial pandaslibrary pip-install python-lists series-objects
Last synced: 17 Nov 2024
https://github.com/moindalvs/how_to_convert_docx_or_doc_to_pdf_in_python
Convert any file .docx or doc into .pdf file
convert converter doc2pdf docs documents docx2pdf pdf pdf-converter pdf-document
Last synced: 17 Nov 2024
https://github.com/moindalvs/hypothesis_test_3
Sales of products in four different regions is tabulated for males and females. Find if male-female buyer rations are similar across regions
chi2 chi2-contingency contingency-table hypothesis-testing hypothesis-tests matplotlib pandas-dataframe scipy-stats seaborn-plots testofindependence
Last synced: 17 Nov 2024
https://github.com/moindalvs/interview_qna_statistics_data_science
Data Science Statistics Interview Questions
central-limit-theorem data-science distributions hypothesis-tests linear-regression sampling-theory statistics
Last synced: 17 Nov 2024
https://github.com/moindalvs/hypothesis_test_4
TeleCall uses 4 centers around the globe to process customer order forms. They audit a certain % of the customer order forms. Any error in order form renders it defective and has to be reworked before processing. The manager wants to check whether the defective % varies by centre. Please analyze the data at 5% significance level and help the manager draw appropriate inferences
anova-test f-statistics hypothesis-testing
Last synced: 17 Nov 2024
https://github.com/moindalvs/simple_linear_regression_1
Predicting Delivery Time Using Sorting Time
aic bic data-transformation f-statistics likelihood log-transformation matplotlib numpy ols-regression ordinary-least-squares pandas-dataframe pandas-library prediction predictive-modeling residuals rmse-score simple-linear-regression sklearn sklearn-library
Last synced: 17 Nov 2024
https://github.com/moindalvs/how_to_convert_pdf_doc_or_any_file_into_docx_in_python
Convert .pdf, .doc or any file into docx using python
converter doc2txt docx docx-converter file-converter pdf pdf2docx
Last synced: 17 Nov 2024
https://github.com/moindalvs/web_scraping_amazon_product_reviews
Web Scraping Amazon Reviews with Multiple Pages loop till the Last Page
amazon beautifulsoup docker html nlp review scrapy splash webscraping
Last synced: 17 Nov 2024
https://github.com/moindalvs/mysql_ddl_dml_tcl_and_more
MYSQL_Basics
mssql mssql-database mysql mysql-database mysql-server sql sql-query sql-server sqlite-database
Last synced: 17 Nov 2024
https://github.com/moindalvs/gradient_boosting_algorithms_from_scratch
4 Boosting Algorithms You Should Know – GBM, XGBoost, LightGBM & CatBoost
boosting-algorithms catboost-algorithm data-science decision-trees gradient-boosting lightbgm random-forest xgboost-algorithm
Last synced: 17 Nov 2024
https://github.com/moindalvs/text_mining_nlp
Natural Language Processing
bag-of-words classifier data-science fake-news lemmatization nlp pipeline sentiment-analysis sentiment-classification spacy spacy-pipeline stemming text-classification text-mining tfidf tokenization vectorizer
Last synced: 17 Nov 2024
https://github.com/moindalvs/hypothesis_testing_2
A hospital wants to determine whether there is any difference in the average Turn Around Time (TAT) of reports of the laboratories on their preferred list. They collected a random sample and recorded TAT for reports of 4 laboratories. TAT is defined as sample collected to report dispatch. Analyze the data and determine whether there is any difference in average TAT among the different laboratories at 5% significance level.
anova anova-test fstatistics hypothesis-test hypothesis-testing matplotlib-pyplot numpy-library pandas-library scipy-stats seaborn-plots
Last synced: 17 Nov 2024
https://github.com/moindalvs/web_scraping_amazon_products_image_and_url
E-commerce companies use recommendation systems to provide suggestions to the customers. They use item-item collaborative filtering, which scales to massive datasets and produces high quality recommendation systems in the real time. This system is a kind of an information filtering system which seeks to predict the "rating" or preferences which user is interested in.
Last synced: 17 Nov 2024
https://github.com/moindalvs/data_analyst_challenge
Overview Please understand the below mentioned real-life scenario and try to solve the challenge.
Last synced: 17 Nov 2024
https://github.com/moindalvs/pca_dimensionality_reduction
Principal Component Analysis Let's discuss PCA! Since this isn't exactly a full machine learning algorithm, but instead an unsupervised learning algorithm, we will just have a lecture on this topic, but no full machine learning project (although we will walk through the cancer set with PCA).
data-science pca pca-analysis principle-component-analysis
Last synced: 17 Nov 2024
https://github.com/moindalvs/svm_hyperparameter_tuning_kernel_tricks
Effect of Gamma values and C values visualization on dataset and errors/misclassification
data-science gamma hyperparameter-tuning kernel regularization sklearn svm-classifier svm-kernel visualization
Last synced: 17 Nov 2024
https://github.com/moindalvs/simple_linear_regression_2
Building a prediction model for Salary hike using Years of Experience
data-transformation log-transformation ols-regression ordinary-least-squares prediction-model scipy-stats simple-linear-regression sklearn-library
Last synced: 17 Nov 2024
https://github.com/moindalvs/gradient_descent_for_beginners
Gradient_descent_Complete_In_Depth_for beginners
adagrad adam-optimizer back-propagation gradient-descent hyperparameter-optimization hyperparameter-tuning mini-batch-gradient-descent momentum nadam optimization optimization-methods optimizer rmsprop stochastic-gradient-descent
Last synced: 17 Nov 2024
https://github.com/moindalvs/multi_class_classification_iris
Multiple Class Iris Dataset Classification Model
Last synced: 17 Nov 2024
https://github.com/moindalvs/t-sne_and_umap_visuals
t-SNE (pronounced tiz-knee), which stands for t-distributed Stochastic Neighbor Embedding was proposed much more recently by Laurens van der Maaten and Geoffrey Hinton in their 2008 paper. This works in a similar way to PCA but has some key differences: Firstly, this is a stochastic method. So if you run multiple t-SNE plots on the same dataset it can look different. Another difference is that this is an iterative method. It works by repeatedly moving datapoints closer or further away from each other depending on how 'similar' they are. The new representation is non-linear. This makes it harder to interpret but it can be very effective at 'unravelling' highly non-linear data. The main downside to t-SNE is that is very slow compared to the other dimensionality techniques. This is because it makes calculations on a pair-wise basis, which does not scale well with large datasets.
Last synced: 17 Nov 2024
https://github.com/moindalvs/sentiment_analysis_on_-elon_musk_tweets
Perform sentimental analysis on the Elon-musk tweets (Elon-musk.csv)
bag-of-words cleaning-data elon-musk feature-engineering nlp nltk polarity sentiment-analysis sentiment-intensity sentiment-polarity spacy subjectivity text-mining text-processing textblob-sentiment-analysis tfidf tfidf-vectorizer tokenizer tweet-analysis twitter-sentiment-analysis
Last synced: 17 Nov 2024
https://github.com/moindalvs/learn_visualization_on_matplotlib
Matplotlib The Figure is the overall window or page that everything is drawn on. It’s the top-level component of all. To the figure you add Axes. The Axes is the area on which the data is plotted. A figure can have multiple axes. Note: when you see, for example, plt.xlim, you’ll call ax.set_xlim() behind the covers. All methods of an Axes object exist as a function in the pyplot module and vice versa. Mostly, you’ll use the functions of the pyplot module because they’re much cleaner, at least for simple plots!
barchart bivariate-analysis boxplot horizontal-bar-charts matplotlib matplotlib-pyplot matplotlib-python matplotlib-tutorial piechart subplots univariate-analysis
Last synced: 17 Nov 2024
https://github.com/moindalvs/logistic_regression_claimants
Overview¶ CASENUM- Case number to identify the claim, a numeric vector ATTORNEY Whether the claimant is represented by an attorney (=1 if yes and =2 if no) CLMSEX Claimant’s gender (=1 if male and =2 if female), a numeric vector CLMINSUR Whether or not the driver of the claimant’s vehicle was uninsured (=1 if yes, =2 if no) SEATBELT Whether or not the claimant was wearing a seatbelt/child restraint (=1 if yes, =2 if no) CLMAGE Claimant’s age, a numeric vector LOSS The claimant’s total economic loss (in thousands)
Last synced: 17 Nov 2024
https://github.com/moindalvs/resume_classification
Business objective- The document classification solution should significantly reduce the manual human effort in the HRM. It should achieve a higher level of accuracy and automation with minimal human intervention
classification classification-algorithm data-science docx docx2txt ensemble-machine-learning pdfplumber resume-app resume-parser text-analysis text-classification text-mining text-processing textract
Last synced: 17 Nov 2024
https://github.com/moindalvs/learn_seaborn_visualization_python
Seaborn Datasets
Last synced: 17 Nov 2024
https://github.com/moindalvs/learn_simple_linear_regression
Learn about Simple Linear Regression for Data Science
box-cox-transformation correlation data-science data-transformation log-transformation model-validation ols-regression prediction-model simple-linear-regression sklearn statsmodels
Last synced: 17 Nov 2024
https://github.com/moindalvs/learn_microsoft_sql_server
Microsoft SQL Server Management System
Last synced: 17 Nov 2024
https://github.com/moindalvs/learn_multi_linear_regression
Prediction of Miles per gallon (MPG) Using Cars Dataset
cooks-distance correlation-matrix data-preprocessing data-science feature feature-engineering feature-selection high-leverage i multi-linear-regression normality-test outliers-detection outliers-influence partial-least-squares-regression residual-analysis variance-inflation-factor
Last synced: 17 Nov 2024
https://github.com/moindalvs/learn_hypothesis_testing_for_data_science
Hypothesis for Data Science
anova-test chi2 chi2-contingency goodness-of-fit independent-sample-t-test one-sample-t-test paired-sample-t-test proportion-test testofindependence two-sample-t-test
Last synced: 17 Nov 2024
https://github.com/moindalvs/model_validation_techniques
1. train_test_split 2.K_fold 3.LeaveoneOut 4.Cross Validation Score 5.Logistic Regression
cross-validation data-science kfold-cross-validation leave-one-out-cross-validation logistic-regression train-test-split train-test-validation
Last synced: 17 Nov 2024
https://github.com/moindalvs/learn_feature_engineering
Data Set: House Prices: Advanced Regression Techniques Feature Engineering with 80+ Features
data-science data-transformation handling-missing-value label-encoding log-transformation minmaxscaling missing-values
Last synced: 17 Nov 2024
https://github.com/moindalvs/learn_feature_selection_house_price
Data Set: House Prices: Advanced Regression Techniques
data-science feature-selection lasso-regression
Last synced: 17 Nov 2024
https://github.com/moindalvs/learn_feature_engg_time_series
Feature Engineering on Time Series Dataset (Flight Price Prediction)
data-science data-structures data-transformation feature-engineering feature-extraction feature-selection label-encoder onehot-encoding time-series-analysis
Last synced: 17 Nov 2024
https://github.com/moindalvs/learn_eda_on_zomato_dataset
Zomato Dataset What is the top 10 most preferred Cuisines?
Last synced: 17 Nov 2024
https://github.com/moindalvs/learn_eda_for_data_science
Univariate, Bivariate and Multi-variate Analysis
bivariate-analysis correlation-analysis data-science data-transformation data-type-conversion data-types-and-structures data-visualization duplicates-removal exploratory-data-analysis imputation missing-values multi-variate-analysis normalization outlier-detection pandas-profiling standardization univariate-analysis
Last synced: 17 Nov 2024
https://github.com/moindalvs/learn_eda_house_price_dataset
Data Set: House Prices: Advanced Regression Techniques Exploratory Data Analysis on more than 80 features
cardinality data-analysis data-science data-structures data-visualization missing-values
Last synced: 17 Nov 2024
https://github.com/moindalvs/learn_about_python_dataframes
Learn about Pandas Dataframe
clipboard-copy dataframe dataframes dropna duplicates duplicates-removal fillna gif import-csv ipython-display merge-dataframe missing-data pandas-dataframe pandas-dataframes pandas-python summary-statistics tocsv youtube-video
Last synced: 17 Nov 2024
https://github.com/moindalvs/learn_probability_for_data_science
TIme-Series on Stocks Datasets
cdf continuous cumulative-distribution-function gains loss pandas-dataframe pandas-python percent-change risk-analysis stock-analysis stocks stocks-data survival-functions time-series time-series-analysis variability-analysis
Last synced: 17 Nov 2024
https://github.com/moindalvs/learn_about_python_dictionary
In Dictionary Keys has to be Immutable values such as tuple,string and numeric as well and¶ Values can be Anything Mutable or Immutable
from-zero-to-hero jupyter-notebook literals python3 string tuples-in-python zip
Last synced: 17 Nov 2024
https://github.com/moindalvs/learn_confidence_inteval_for_data_science
Reverse lookup for z test
confidence-intervals data-science normal-distribution ppf scipy-stats t-distribution
Last synced: 17 Nov 2024
https://github.com/moindalvs/sql_basic-queries_
MySQL
microsoft-sql-server mysql mysql-database mysql-workbench sql sql-server
Last synced: 12 Oct 2024