Projects in Awesome Lists by MoinDalvs

https://github.com/moindalvs/forecasting_airline_passengers_traffic

Forecast the Airlines Passengers. Prepare a document for each model explaining how many dummy variables you have created and RMSE value for each model. Finally which model you will use for Forecasting.

additive arima-forecasting data-science double-exponential-smoothing forecasting holt-winters holt-winters-forecasting multiplicative sarima-model seasonality-analysis simple-exponential-smoothing stationarity stationarity-test time-series-forecasting timeseries-analysis trend-analysis triple-exponential-smoothing

Last synced: 17 Nov 2024

https://github.com/moindalvs/excelr_data_science_assignments

Find all EXCELR Data Science Assignment Here

Last synced: 17 Nov 2024

https://github.com/moindalvs/neural_networks_forest_fire_classification

PREDICT THE BURNED AREA OF FOREST FIRES WITH NEURAL NETWORKS

auc-roc-curve binary-classification classification dropout-layers epochs hyperparameter-optimization hyperparameter-tuning keras-classification-models neural-networks tensorflow weight-initialization

Last synced: 17 Nov 2024

https://github.com/moindalvs/time_series_forecasting_from_scratch

arima-forecasting autoregressive exponential-smoothing forecasting forecasting-models holt-winters holt-winters-forecasting moving-average sarima-model time-series time-series-analysis

Last synced: 17 Nov 2024

https://github.com/moindalvs/assignment_decision_tree_1

cost-complexity-pruning data-science decision decision-tree-classifier hyper-parameter-optimization hyperparameter-tuning post-pruning pruning-optimization

Last synced: 17 Nov 2024

https://github.com/moindalvs/assignment_knn_glass

Problem Statement Implement a KNN model to classify the different types of Glass

Last synced: 17 Nov 2024

https://github.com/moindalvs/assignment_random_forest_1

Use Random Forest to prepare a model on fraud data treating those who have taxable income <= 30000 as "Risky" and others are "Good"

bagging-ensemble bagging-trees data-science hyperparameter-tuning random-forest-classifier

Last synced: 17 Nov 2024

https://github.com/moindalvs/assignment_2_set_1

Assignment 2 Set 2

barplot boxplot descriptive-statistics inference labels matplotlib outlier-detection outliers pie-chart probability scipy-stats seaborn-plots

Last synced: 17 Nov 2024

https://github.com/moindalvs/assignment_decision_tree_2

adasyn-sampling data-science decision-trees imbalanced-data smote-oversampler smoteenn smotetomek

Last synced: 17 Nov 2024

https://github.com/moindalvs/neural_network_regression_gas_turbines

Predicting Turbine Energy Yield (TEY) using ambient variables as features.

activation-functions dropout-keras dropout-layers hyperparameter-optimization hyperparameter-tuning keras-neural-networks neural-networks neurons regression weights-and-biases

Last synced: 17 Nov 2024

https://github.com/moindalvs/excelr_data_analyst_sql_assignment_part3

1. Write a stored procedure that accepts the month and year as inputs and prints the ordernumber, orderdate and status of the orders placed in that month.

Last synced: 17 Nov 2024

https://github.com/moindalvs/neural_networks_from_scratch

Neural_Networks_From_Scratch

activation-function activation-functions back-propagation classification deep-learning gradient-descent hyperparameter-optimization hyperparameter-tuning keras-neural-networks machine-learning neural-network neural-networks optimization-algorithms optimizer tensorflow weight-initialization

Last synced: 17 Nov 2024

https://github.com/moindalvs/assignment_random_forest_2

A cloth manufacturing company is interested to know about the segment or attributes causes high sale. Approach - A Random Forest can be built with target variable Sale (we will first convert it in categorical variable) & all other variable will be independent in the analysis.

data-science hyperparameter-tuning numpy pandas python random-forest-classifier sklearn

Last synced: 17 Nov 2024

https://github.com/moindalvs/resume_screening_and_parser

Business objective- The document classification solution should significantly reduce the manual human effort in the HRM. It should achieve a higher level of accuracy and automation with minimal human intervention Sample Data Set Details: Resumes and financial documents

data-science doc2txt doc2vec docx-converter docx-to-pdf docx2txt pdf-document-processor pdf2txt streamlit text text-analysis text-classification text-mining text-processing unstructured-data

Last synced: 17 Nov 2024

https://github.com/moindalvs/assignment_svm_forest_fire_prediction

adasyn-sampling auc-roc-curve data-science hyperparameter-tuning kernel-trick regularization smote-sampling smoteenn smotetomek svm-classifier svm-kernel

Last synced: 17 Nov 2024

https://github.com/moindalvs/assignment_svm_salary_dataset

Last synced: 17 Nov 2024

https://github.com/moindalvs/a_guide_for_actuarial_science

What is Actuarial analyst ? what are their responsibilities, skills required and interview questions

Last synced: 17 Nov 2024

https://github.com/moindalvs/survival_analysis_from_scratch

An introduction to the concepts of Survival Analysis and its implementation in lifelines package for Python.

additive-models cox-regression data-science hazard hazard-ratios kaplan-meier kaplanmeierfitter lifeline log-rank-test multiplicative-model survival survival-analysis survival-regression

Last synced: 17 Nov 2024

https://github.com/moindalvs/assignment_hypothesis_test

A F&B manager wants to determine whether there is any significant difference in the diameter of the cutlet between two units. A randomly selected sample of cutlets was collected from both units and measured? Analyze the data and draw inferences at 5% significance level. Please state the assumptions and tests that you carried out to check validity of the assumptions.

2sample-2tail anova-test chisquare chisquare-test contingency-table hypothesis hypothesis-testing numpy numpy-arrays pandas scipy-stats testofindependence

Last synced: 17 Nov 2024

https://github.com/moindalvs/assignment_east-west_airlines

Problem Statement Perform clustering (Hierarchical,K means clustering and DBSCAN) for the airlines data to obtain optimum number of clusters

clustering-algorithm data-science dbscan-clustering epsilon-greedy hierarchical-clustering kmeans-clustering

Last synced: 17 Nov 2024

https://github.com/moindalvs/assignment-basic-stats-level1

Assignment Basic Stats

bia binomial-distribution boxplot confidence-intervals datatypes descriptive-statistics inferential-statistics normal-distribution numpy pandas probability python3 scipy-stats statistics z-score

Last synced: 17 Nov 2024

https://github.com/moindalvs/forecasting_cocacola_prices.

Prepare a document for each model explaining how many dummy variables you have created and RMSE value for each model. Finally which model you will use for Forecasting.

arima data-science double-exponential-smoothing forecasting holt-winters-forecasting sarima-models sarimax simple-exponential-smoothing time-series-analysis time-series-forecasting time-series-prediction triple-exponential-smoothing

Last synced: 17 Nov 2024

https://github.com/moindalvs/recommendation_system_for_beginners

Last synced: 17 Nov 2024

https://github.com/moindalvs/co2_emission_forecasting

P-140 Air Quality forecasting(CO2 emissions) Business Objective: To forecast Co2 levels for an organization so that the organization can follow government norms with respect to Co2 emission levels. Data Set Details: Time parameter and levels of Co2 emission

arima-forecasting cyclic deployment exponential-smoothing forecasting-models holt-winters-forecasting holts-winter lstm-neural-networks moving-average pickle rnn-model sarima-model time-series time-series-analysis

Last synced: 17 Nov 2024

https://github.com/moindalvs/k-nearest_neigbor_knn

KNN (K_Nearest_Neighbor)

Last synced: 17 Nov 2024

https://github.com/moindalvs/how_to_convert_pdf_to_docx_in_python

How to convert .pdf extension files into .docx file in python?

docx pdf pdf-converter pdf-files pdf2docx

Last synced: 17 Nov 2024

https://github.com/moindalvs/excelr_data_analyst_sql_assignment_part2

1. select all employees in department 10 whose salary is greater than 3000. [table: employee]

Last synced: 17 Nov 2024

https://github.com/moindalvs/excelr_data_analyst_python_assignment

Learn Python

Last synced: 17 Nov 2024

https://github.com/moindalvs/association_rule

Last synced: 17 Nov 2024

https://github.com/moindalvs/from-the-following-tables-write-a-sql-query-to-display-the-customer-name-customer-city-grade-deli

Assignment 3: From the following tables write a SQL query to display the customer_name, customer city, grade, deliveryagent. deliver yagent city. The result should be ordered by ascending on customer_id. customer table: customer_id|customer_name | city | grade | deliver yagent_id 3002|NickRimando |New York | 100 | 5001 3007 | Brad Davis | New York | 200 | 5001 3005 | Graham Zusi | California | 200 | 5002 3008 | Julian Green [London | 300 | 5002 3004 | Fabian Johnson | Paris | 300 | 5006 3009 | Geoff Cameron | Berlin {| 100 | 5003 3003 | Jozy Altidor {Moscow | 200 | 5007 3001 | Brad Guzan | London | | 5005 deliveryagent table deliveryagent_id| name | city | commission 5001 | James Hoag | New York | 0.15 5002 | Nail Knite | Paris | 013 5005 | Pit Alex [London | 0.1 5006 | Mc Lyon | Paris | 0.44 5007| Paul Adam | Rome | 0.13

Last synced: 17 Nov 2024

https://github.com/moindalvs/excelr_data_analyst_sql_assignment_part1

1. create a database called 'assignment' (Note please do the assignment tasks in this database)

Last synced: 17 Nov 2024

https://github.com/moindalvs/connect_mysql_with_python

How to connect MySQL with Python and write Queries to convert table into pd.DataFrame()

Last synced: 17 Nov 2024

https://github.com/moindalvs/clustering_techniques

Hierarchical, KMeans and DBSCAN clustering techniques

Last synced: 17 Nov 2024

https://github.com/moindalvs/assignment_multi_linear_regression_2

Consider only the below columns and prepare a prediction model for predicting Price. Corolla<-Corolla[c("Price","Age_08_04","KM","HP","cc","Doors","Gears","Quarterly_Tax","Weight")]

cooks-distance data-science data-structures data-visualization exploratory-data-analysis feature-engineering feature-selection influencers multi-collinearity-issue outlier-removal outliers-detection predictive-modeling

Last synced: 17 Nov 2024

https://github.com/moindalvs/assignment_web_scraping_emotion_mining

Extract reviews of any product from ecommerce website like amazon

amazon data-science emotion-analysis multipe-page nltk review-sentiments scraper scraping scraping-websites sentiment-analysis text-analysis text-mining text-processing textblob webscraping wordcloud

Last synced: 17 Nov 2024

https://github.com/moindalvs/assignment_recommendation_system_books

collaborative-filtering correlation-filters cosine-similarity data-science recommendation-engine recommender-system

Last synced: 17 Nov 2024

https://github.com/moindalvs/assignment_multi_linear_regression_1

Prepare a prediction model for profit of 50_startups data. Do transformations for getting better predictions of profit and make a table containing R^2 value for each prepared model.

Last synced: 17 Nov 2024

https://github.com/moindalvs/assignment_naive_bayes_salary_dataset

Problem Statement Prepare a classification model using Naive Bayes for salary data

Last synced: 17 Nov 2024

https://github.com/moindalvs/assignment_pca_wine_dataset

Case Summary Perform Principal component analysis and perform clustering using first 3 principal component scores (both Heirarchical and k mean clustering(scree plot or elbow curve) and obtain optimum number of clusters and check whether we have obtained same number of clusters with the original data (class column we have ignored at the begining who shows it has 3 clusters)

data-science feature-selection jupyter-notebook pca pca-analysis python tsne

Last synced: 17 Nov 2024

https://github.com/moindalvs/assignment_knn_zoo

Problem Statement Implement a KNN model to classify the animals into categories

Last synced: 17 Nov 2024

https://github.com/moindalvs/assignment_logistic_regression

Predicting Customer Response to Telemarketing Campaigns for Term Deposit

Last synced: 17 Nov 2024

https://github.com/moindalvs/assignment_association_rules

apriori-algorithm association-rules data data-science data-visualization

Last synced: 17 Nov 2024

https://github.com/moindalvs/assignment_crime_data_clustering

Content This data set contains statistics, in arrests per 100,000 residents for assault, murder, and rape in each of the 50 US states in 1973. Also given is the percent of the population living in urban areas.This is a systematic approach for identifying and analyzing patterns and trends in crime using USArrest dataset.

clustering-algorithm data-science dbscan-clustering epsilon hierarchical-clustering kmeans-clustering

Last synced: 17 Nov 2024

https://github.com/moindalvs/assignment_2_set_2

Topics: Normal distribution, Functions of Random Variables

cdf confidence-intervals cumulative-distribution-function normal-distribution percentile random-variables scipy-stats stats survival-functions z-score

Last synced: 17 Nov 2024

https://github.com/moindalvs/assignment_-2_set_4

Last synced: 17 Nov 2024

https://github.com/moindalvs/assignement_2_set_3

Last synced: 17 Nov 2024

https://github.com/moindalvs/book_recommendation_system

Build a Book Recommendation System

Last synced: 17 Nov 2024

https://github.com/moindalvs/python_for_data_science

1.Python Basics function

arithmetic-operations boolean-operations for-loop-in-python function if-else indexing jupyter-notebook list-comprehensions list-methods lists-python numpy-library pandas-tutorial python3 sequence slicing string tuples-in-python variables-python

Last synced: 17 Nov 2024

https://github.com/moindalvs/learn_about_pandas_series

How to Install Pandas package into Jupyter Notebook or any of the other Python Packagesa and learn more about Series

dataframe dictionary import-csv pandas pandas-dataframe pandas-python pandas-series pandas-tutorial pandaslibrary pip-install python-lists series-objects

Last synced: 17 Nov 2024

https://github.com/moindalvs/how_to_convert_docx_or_doc_to_pdf_in_python

Convert any file .docx or doc into .pdf file

convert converter doc2pdf docs documents docx2pdf pdf pdf-converter pdf-document

Last synced: 17 Nov 2024

https://github.com/moindalvs/hypothesis_test_3

Sales of products in four different regions is tabulated for males and females. Find if male-female buyer rations are similar across regions

chi2 chi2-contingency contingency-table hypothesis-testing hypothesis-tests matplotlib pandas-dataframe scipy-stats seaborn-plots testofindependence

Last synced: 17 Nov 2024

https://github.com/moindalvs/interview_qna_statistics_data_science

Data Science Statistics Interview Questions

central-limit-theorem data-science distributions hypothesis-tests linear-regression sampling-theory statistics

Last synced: 17 Nov 2024

https://github.com/moindalvs/hypothesis_test_4

TeleCall uses 4 centers around the globe to process customer order forms. They audit a certain % of the customer order forms. Any error in order form renders it defective and has to be reworked before processing. The manager wants to check whether the defective % varies by centre. Please analyze the data at 5% significance level and help the manager draw appropriate inferences

anova-test f-statistics hypothesis-testing

Last synced: 17 Nov 2024

https://github.com/moindalvs/simple_linear_regression_1

Predicting Delivery Time Using Sorting Time

aic bic data-transformation f-statistics likelihood log-transformation matplotlib numpy ols-regression ordinary-least-squares pandas-dataframe pandas-library prediction predictive-modeling residuals rmse-score simple-linear-regression sklearn sklearn-library

Last synced: 17 Nov 2024

https://github.com/moindalvs/how_to_convert_pdf_doc_or_any_file_into_docx_in_python

Convert .pdf, .doc or any file into docx using python

converter doc2txt docx docx-converter file-converter pdf pdf2docx

Last synced: 17 Nov 2024

https://github.com/moindalvs/web_scraping_amazon_product_reviews

Web Scraping Amazon Reviews with Multiple Pages loop till the Last Page

amazon beautifulsoup docker html nlp review scrapy splash webscraping

Last synced: 17 Nov 2024

https://github.com/moindalvs/mysql_ddl_dml_tcl_and_more

MYSQL_Basics

mssql mssql-database mysql mysql-database mysql-server sql sql-query sql-server sqlite-database

Last synced: 17 Nov 2024

https://github.com/moindalvs/naive_bayes_multinomial_gaussian_bernoulli

Last synced: 17 Nov 2024

https://github.com/moindalvs/decision_tree_cart

Last synced: 17 Nov 2024

https://github.com/moindalvs/gradient_boosting_algorithms_from_scratch

4 Boosting Algorithms You Should Know – GBM, XGBoost, LightGBM & CatBoost

boosting-algorithms catboost-algorithm data-science decision-trees gradient-boosting lightbgm random-forest xgboost-algorithm

Last synced: 17 Nov 2024

https://github.com/moindalvs/text_mining_nlp

Natural Language Processing

bag-of-words classifier data-science fake-news lemmatization nlp pipeline sentiment-analysis sentiment-classification spacy spacy-pipeline stemming text-classification text-mining tfidf tokenization vectorizer

Last synced: 17 Nov 2024

https://github.com/moindalvs/hypothesis_testing_2

A hospital wants to determine whether there is any difference in the average Turn Around Time (TAT) of reports of the laboratories on their preferred list. They collected a random sample and recorded TAT for reports of 4 laboratories. TAT is defined as sample collected to report dispatch. Analyze the data and determine whether there is any difference in average TAT among the different laboratories at 5% significance level.

anova anova-test fstatistics hypothesis-test hypothesis-testing matplotlib-pyplot numpy-library pandas-library scipy-stats seaborn-plots

Last synced: 17 Nov 2024

https://github.com/moindalvs/different_types_linear_regressions

data-science l1-regularization l2-regularization lasso-regression linear linear-regression polynomial-regression ridge-regression

Last synced: 17 Nov 2024

https://github.com/moindalvs/web_scraping_amazon_products_image_and_url

E-commerce companies use recommendation systems to provide suggestions to the customers. They use item-item collaborative filtering, which scales to massive datasets and produces high quality recommendation systems in the real time. This system is a kind of an information filtering system which seeks to predict the "rating" or preferences which user is interested in.

Last synced: 17 Nov 2024

https://github.com/moindalvs/data_analyst_challenge

Overview Please understand the below mentioned real-life scenario and try to solve the challenge.

Last synced: 17 Nov 2024

https://github.com/moindalvs/pca_dimensionality_reduction

Principal Component Analysis Let's discuss PCA! Since this isn't exactly a full machine learning algorithm, but instead an unsupervised learning algorithm, we will just have a lecture on this topic, but no full machine learning project (although we will walk through the cancer set with PCA).

data-science pca pca-analysis principle-component-analysis

Last synced: 17 Nov 2024

https://github.com/moindalvs/svm_hyperparameter_tuning_kernel_tricks

Effect of Gamma values and C values visualization on dataset and errors/misclassification

data-science gamma hyperparameter-tuning kernel regularization sklearn svm-classifier svm-kernel visualization

Last synced: 17 Nov 2024

https://github.com/moindalvs/simple_linear_regression_2

Building a prediction model for Salary hike using Years of Experience

data-transformation log-transformation ols-regression ordinary-least-squares prediction-model scipy-stats simple-linear-regression sklearn-library

Last synced: 17 Nov 2024

https://github.com/moindalvs/gradient_descent_for_beginners

Gradient_descent_Complete_In_Depth_for beginners

adagrad adam-optimizer back-propagation gradient-descent hyperparameter-optimization hyperparameter-tuning mini-batch-gradient-descent momentum nadam optimization optimization-methods optimizer rmsprop stochastic-gradient-descent

Last synced: 17 Nov 2024

https://github.com/moindalvs/multi_class_classification_iris

Multiple Class Iris Dataset Classification Model

Last synced: 17 Nov 2024

https://github.com/moindalvs/t-sne_and_umap_visuals

t-SNE (pronounced tiz-knee), which stands for t-distributed Stochastic Neighbor Embedding was proposed much more recently by Laurens van der Maaten and Geoffrey Hinton in their 2008 paper. This works in a similar way to PCA but has some key differences: Firstly, this is a stochastic method. So if you run multiple t-SNE plots on the same dataset it can look different. Another difference is that this is an iterative method. It works by repeatedly moving datapoints closer or further away from each other depending on how 'similar' they are. The new representation is non-linear. This makes it harder to interpret but it can be very effective at 'unravelling' highly non-linear data. The main downside to t-SNE is that is very slow compared to the other dimensionality techniques. This is because it makes calculations on a pair-wise basis, which does not scale well with large datasets.

Last synced: 17 Nov 2024

https://github.com/moindalvs/moindalvs

Config files for my GitHub profile.

config github-config

Last synced: 17 Nov 2024

https://github.com/moindalvs/sentiment_analysis_on_-elon_musk_tweets

Perform sentimental analysis on the Elon-musk tweets (Elon-musk.csv)

bag-of-words cleaning-data elon-musk feature-engineering nlp nltk polarity sentiment-analysis sentiment-intensity sentiment-polarity spacy subjectivity text-mining text-processing textblob-sentiment-analysis tfidf tfidf-vectorizer tokenizer tweet-analysis twitter-sentiment-analysis

Last synced: 17 Nov 2024

https://github.com/moindalvs/learn_visualization_on_matplotlib

Matplotlib The Figure is the overall window or page that everything is drawn on. It’s the top-level component of all. To the figure you add Axes. The Axes is the area on which the data is plotted. A figure can have multiple axes. Note: when you see, for example, plt.xlim, you’ll call ax.set_xlim() behind the covers. All methods of an Axes object exist as a function in the pyplot module and vice versa. Mostly, you’ll use the functions of the pyplot module because they’re much cleaner, at least for simple plots!

barchart bivariate-analysis boxplot horizontal-bar-charts matplotlib matplotlib-pyplot matplotlib-python matplotlib-tutorial piechart subplots univariate-analysis

Last synced: 17 Nov 2024

https://github.com/moindalvs/logistic_regression_claimants

Overview¶ CASENUM- Case number to identify the claim, a numeric vector ATTORNEY Whether the claimant is represented by an attorney (=1 if yes and =2 if no) CLMSEX Claimant’s gender (=1 if male and =2 if female), a numeric vector CLMINSUR Whether or not the driver of the claimant’s vehicle was uninsured (=1 if yes, =2 if no) SEATBELT Whether or not the claimant was wearing a seatbelt/child restraint (=1 if yes, =2 if no) CLMAGE Claimant’s age, a numeric vector LOSS The claimant’s total economic loss (in thousands)

Last synced: 17 Nov 2024

https://github.com/moindalvs/resume_classification

Business objective- The document classification solution should significantly reduce the manual human effort in the HRM. It should achieve a higher level of accuracy and automation with minimal human intervention

classification classification-algorithm data-science docx docx2txt ensemble-machine-learning pdfplumber resume-app resume-parser text-analysis text-classification text-mining text-processing textract