Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Projects in Awesome Lists by MoinDalvs

A curated list of projects in awesome lists by MoinDalvs .

https://github.com/moindalvs/forecasting_airline_passengers_traffic

Forecast the Airlines Passengers. Prepare a document for each model explaining how many dummy variables you have created and RMSE value for each model. Finally which model you will use for Forecasting.

additive arima-forecasting data-science double-exponential-smoothing forecasting holt-winters holt-winters-forecasting multiplicative sarima-model seasonality-analysis simple-exponential-smoothing stationarity stationarity-test time-series-forecasting timeseries-analysis trend-analysis triple-exponential-smoothing

Last synced: 17 Nov 2024

https://github.com/moindalvs/excelr_data_science_assignments

Find all EXCELR Data Science Assignment Here

Last synced: 17 Nov 2024

https://github.com/moindalvs/assignment_knn_glass

Problem Statement Implement a KNN model to classify the different types of Glass

Last synced: 17 Nov 2024

https://github.com/moindalvs/assignment_random_forest_1

Use Random Forest to prepare a model on fraud data treating those who have taxable income <= 30000 as "Risky" and others are "Good"

bagging-ensemble bagging-trees data-science hyperparameter-tuning random-forest-classifier

Last synced: 17 Nov 2024

https://github.com/moindalvs/excelr_data_analyst_sql_assignment_part3

1. Write a stored procedure that accepts the month and year as inputs and prints the ordernumber, orderdate and status of the orders placed in that month.

Last synced: 17 Nov 2024

https://github.com/moindalvs/assignment_random_forest_2

A cloth manufacturing company is interested to know about the segment or attributes causes high sale. Approach - A Random Forest can be built with target variable Sale (we will first convert it in categorical variable) & all other variable will be independent in the analysis.

data-science hyperparameter-tuning numpy pandas python random-forest-classifier sklearn

Last synced: 17 Nov 2024

https://github.com/moindalvs/resume_screening_and_parser

Business objective- The document classification solution should significantly reduce the manual human effort in the HRM. It should achieve a higher level of accuracy and automation with minimal human intervention Sample Data Set Details: Resumes and financial documents

data-science doc2txt doc2vec docx-converter docx-to-pdf docx2txt pdf-document-processor pdf2txt streamlit text text-analysis text-classification text-mining text-processing unstructured-data

Last synced: 17 Nov 2024

https://github.com/moindalvs/a_guide_for_actuarial_science

What is Actuarial analyst ? what are their responsibilities, skills required and interview questions

Last synced: 17 Nov 2024

https://github.com/moindalvs/assignment_hypothesis_test

A F&B manager wants to determine whether there is any significant difference in the diameter of the cutlet between two units. A randomly selected sample of cutlets was collected from both units and measured? Analyze the data and draw inferences at 5% significance level. Please state the assumptions and tests that you carried out to check validity of the assumptions.

2sample-2tail anova-test chisquare chisquare-test contingency-table hypothesis hypothesis-testing numpy numpy-arrays pandas scipy-stats testofindependence

Last synced: 17 Nov 2024

https://github.com/moindalvs/assignment_east-west_airlines

Problem Statement Perform clustering (Hierarchical,K means clustering and DBSCAN) for the airlines data to obtain optimum number of clusters

clustering-algorithm data-science dbscan-clustering epsilon-greedy hierarchical-clustering kmeans-clustering

Last synced: 17 Nov 2024

https://github.com/moindalvs/forecasting_cocacola_prices.

Prepare a document for each model explaining how many dummy variables you have created and RMSE value for each model. Finally which model you will use for Forecasting.

arima data-science double-exponential-smoothing forecasting holt-winters-forecasting sarima-models sarimax simple-exponential-smoothing time-series-analysis time-series-forecasting time-series-prediction triple-exponential-smoothing

Last synced: 17 Nov 2024

https://github.com/moindalvs/co2_emission_forecasting

P-140 Air Quality forecasting(CO2 emissions) Business Objective: To forecast Co2 levels for an organization so that the organization can follow government norms with respect to Co2 emission levels. Data Set Details: Time parameter and levels of Co2 emission

arima-forecasting cyclic deployment exponential-smoothing forecasting-models holt-winters-forecasting holts-winter lstm-neural-networks moving-average pickle rnn-model sarima-model time-series time-series-analysis

Last synced: 17 Nov 2024

https://github.com/moindalvs/k-nearest_neigbor_knn

KNN (K_Nearest_Neighbor)

Last synced: 17 Nov 2024

https://github.com/moindalvs/how_to_convert_pdf_to_docx_in_python

How to convert .pdf extension files into .docx file in python?

docx pdf pdf-converter pdf-files pdf2docx

Last synced: 17 Nov 2024

https://github.com/moindalvs/excelr_data_analyst_sql_assignment_part2

1. select all employees in department 10 whose salary is greater than 3000. [table: employee]

Last synced: 17 Nov 2024

https://github.com/moindalvs/from-the-following-tables-write-a-sql-query-to-display-the-customer-name-customer-city-grade-deli

Assignment 3: From the following tables write a SQL query to display the customer_name, customer city, grade, deliveryagent. deliver yagent city. The result should be ordered by ascending on customer_id. customer table: customer_id|customer_name | city | grade | deliver yagent_id 3002|NickRimando |New York | 100 | 5001 3007 | Brad Davis | New York | 200 | 5001 3005 | Graham Zusi | California | 200 | 5002 3008 | Julian Green [London | 300 | 5002 3004 | Fabian Johnson | Paris | 300 | 5006 3009 | Geoff Cameron | Berlin {| 100 | 5003 3003 | Jozy Altidor {Moscow | 200 | 5007 3001 | Brad Guzan | London | | 5005 deliveryagent table deliveryagent_id| name | city | commission 5001 | James Hoag | New York | 0.15 5002 | Nail Knite | Paris | 013 5005 | Pit Alex [London | 0.1 5006 | Mc Lyon | Paris | 0.44 5007| Paul Adam | Rome | 0.13

Last synced: 17 Nov 2024

https://github.com/moindalvs/excelr_data_analyst_sql_assignment_part1

1. create a database called 'assignment' (Note please do the assignment tasks in this database)

Last synced: 17 Nov 2024

https://github.com/moindalvs/connect_mysql_with_python

How to connect MySQL with Python and write Queries to convert table into pd.DataFrame()

Last synced: 17 Nov 2024

https://github.com/moindalvs/clustering_techniques

Hierarchical, KMeans and DBSCAN clustering techniques

Last synced: 17 Nov 2024

https://github.com/moindalvs/assignment_multi_linear_regression_2

Consider only the below columns and prepare a prediction model for predicting Price. Corolla<-Corolla[c("Price","Age_08_04","KM","HP","cc","Doors","Gears","Quarterly_Tax","Weight")]

cooks-distance data-science data-structures data-visualization exploratory-data-analysis feature-engineering feature-selection influencers multi-collinearity-issue outlier-removal outliers-detection predictive-modeling

Last synced: 17 Nov 2024

https://github.com/moindalvs/assignment_multi_linear_regression_1

Prepare a prediction model for profit of 50_startups data. Do transformations for getting better predictions of profit and make a table containing R^2 value for each prepared model.

Last synced: 17 Nov 2024

https://github.com/moindalvs/assignment_naive_bayes_salary_dataset

Problem Statement Prepare a classification model using Naive Bayes for salary data

Last synced: 17 Nov 2024

https://github.com/moindalvs/assignment_pca_wine_dataset

Case Summary Perform Principal component analysis and perform clustering using first 3 principal component scores (both Heirarchical and k mean clustering(scree plot or elbow curve) and obtain optimum number of clusters and check whether we have obtained same number of clusters with the original data (class column we have ignored at the begining who shows it has 3 clusters)

data-science feature-selection jupyter-notebook pca pca-analysis python tsne

Last synced: 17 Nov 2024

https://github.com/moindalvs/assignment_knn_zoo

Problem Statement Implement a KNN model to classify the animals into categories

Last synced: 17 Nov 2024

https://github.com/moindalvs/assignment_logistic_regression

Predicting Customer Response to Telemarketing Campaigns for Term Deposit

Last synced: 17 Nov 2024

https://github.com/moindalvs/assignment_crime_data_clustering

Content This data set contains statistics, in arrests per 100,000 residents for assault, murder, and rape in each of the 50 US states in 1973. Also given is the percent of the population living in urban areas.This is a systematic approach for identifying and analyzing patterns and trends in crime using USArrest dataset.

clustering-algorithm data-science dbscan-clustering epsilon hierarchical-clustering kmeans-clustering

Last synced: 17 Nov 2024

https://github.com/moindalvs/book_recommendation_system

Build a Book Recommendation System

Last synced: 17 Nov 2024

https://github.com/moindalvs/learn_about_pandas_series

How to Install Pandas package into Jupyter Notebook or any of the other Python Packagesa and learn more about Series

dataframe dictionary import-csv pandas pandas-dataframe pandas-python pandas-series pandas-tutorial pandaslibrary pip-install python-lists series-objects

Last synced: 17 Nov 2024

https://github.com/moindalvs/hypothesis_test_3

Sales of products in four different regions is tabulated for males and females. Find if male-female buyer rations are similar across regions

chi2 chi2-contingency contingency-table hypothesis-testing hypothesis-tests matplotlib pandas-dataframe scipy-stats seaborn-plots testofindependence

Last synced: 17 Nov 2024

https://github.com/moindalvs/hypothesis_test_4

TeleCall uses 4 centers around the globe to process customer order forms. They audit a certain % of the customer order forms. Any error in order form renders it defective and has to be reworked before processing. The manager wants to check whether the defective % varies by centre. Please analyze the data at 5% significance level and help the manager draw appropriate inferences

anova-test f-statistics hypothesis-testing

Last synced: 17 Nov 2024

https://github.com/moindalvs/web_scraping_amazon_product_reviews

Web Scraping Amazon Reviews with Multiple Pages loop till the Last Page

amazon beautifulsoup docker html nlp review scrapy splash webscraping

Last synced: 17 Nov 2024

https://github.com/moindalvs/hypothesis_testing_2

A hospital wants to determine whether there is any difference in the average Turn Around Time (TAT) of reports of the laboratories on their preferred list. They collected a random sample and recorded TAT for reports of 4 laboratories. TAT is defined as sample collected to report dispatch. Analyze the data and determine whether there is any difference in average TAT among the different laboratories at 5% significance level.

anova anova-test fstatistics hypothesis-test hypothesis-testing matplotlib-pyplot numpy-library pandas-library scipy-stats seaborn-plots

Last synced: 17 Nov 2024

https://github.com/moindalvs/web_scraping_amazon_products_image_and_url

E-commerce companies use recommendation systems to provide suggestions to the customers. They use item-item collaborative filtering, which scales to massive datasets and produces high quality recommendation systems in the real time. This system is a kind of an information filtering system which seeks to predict the "rating" or preferences which user is interested in.

Last synced: 17 Nov 2024

https://github.com/moindalvs/data_analyst_challenge

Overview Please understand the below mentioned real-life scenario and try to solve the challenge.

Last synced: 17 Nov 2024

https://github.com/moindalvs/pca_dimensionality_reduction

Principal Component Analysis Let's discuss PCA! Since this isn't exactly a full machine learning algorithm, but instead an unsupervised learning algorithm, we will just have a lecture on this topic, but no full machine learning project (although we will walk through the cancer set with PCA).

data-science pca pca-analysis principle-component-analysis

Last synced: 17 Nov 2024

https://github.com/moindalvs/svm_hyperparameter_tuning_kernel_tricks

Effect of Gamma values and C values visualization on dataset and errors/misclassification

data-science gamma hyperparameter-tuning kernel regularization sklearn svm-classifier svm-kernel visualization

Last synced: 17 Nov 2024

https://github.com/moindalvs/multi_class_classification_iris

Multiple Class Iris Dataset Classification Model

Last synced: 17 Nov 2024

https://github.com/moindalvs/t-sne_and_umap_visuals

t-SNE (pronounced tiz-knee), which stands for t-distributed Stochastic Neighbor Embedding was proposed much more recently by Laurens van der Maaten and Geoffrey Hinton in their 2008 paper. This works in a similar way to PCA but has some key differences: Firstly, this is a stochastic method. So if you run multiple t-SNE plots on the same dataset it can look different. Another difference is that this is an iterative method. It works by repeatedly moving datapoints closer or further away from each other depending on how 'similar' they are. The new representation is non-linear. This makes it harder to interpret but it can be very effective at 'unravelling' highly non-linear data. The main downside to t-SNE is that is very slow compared to the other dimensionality techniques. This is because it makes calculations on a pair-wise basis, which does not scale well with large datasets.

Last synced: 17 Nov 2024

https://github.com/moindalvs/moindalvs

Config files for my GitHub profile.

config github-config

Last synced: 17 Nov 2024

https://github.com/moindalvs/learn_visualization_on_matplotlib

Matplotlib The Figure is the overall window or page that everything is drawn on. It’s the top-level component of all. To the figure you add Axes. The Axes is the area on which the data is plotted. A figure can have multiple axes. Note: when you see, for example, plt.xlim, you’ll call ax.set_xlim() behind the covers. All methods of an Axes object exist as a function in the pyplot module and vice versa. Mostly, you’ll use the functions of the pyplot module because they’re much cleaner, at least for simple plots!

barchart bivariate-analysis boxplot horizontal-bar-charts matplotlib matplotlib-pyplot matplotlib-python matplotlib-tutorial piechart subplots univariate-analysis

Last synced: 17 Nov 2024

https://github.com/moindalvs/logistic_regression_claimants

Overview¶ CASENUM- Case number to identify the claim, a numeric vector ATTORNEY Whether the claimant is represented by an attorney (=1 if yes and =2 if no) CLMSEX Claimant’s gender (=1 if male and =2 if female), a numeric vector CLMINSUR Whether or not the driver of the claimant’s vehicle was uninsured (=1 if yes, =2 if no) SEATBELT Whether or not the claimant was wearing a seatbelt/child restraint (=1 if yes, =2 if no) CLMAGE Claimant’s age, a numeric vector LOSS The claimant’s total economic loss (in thousands)

Last synced: 17 Nov 2024

https://github.com/moindalvs/resume_classification

Business objective- The document classification solution should significantly reduce the manual human effort in the HRM. It should achieve a higher level of accuracy and automation with minimal human intervention

classification classification-algorithm data-science docx docx2txt ensemble-machine-learning pdfplumber resume-app resume-parser text-analysis text-classification text-mining text-processing textract

Last synced: 17 Nov 2024

https://github.com/moindalvs/learn_seaborn_visualization_python

Seaborn Datasets

Last synced: 17 Nov 2024

https://github.com/moindalvs/learn_microsoft_sql_server

Microsoft SQL Server Management System

Last synced: 17 Nov 2024

https://github.com/moindalvs/learn_feature_engineering

Data Set: House Prices: Advanced Regression Techniques Feature Engineering with 80+ Features

data-science data-transformation handling-missing-value label-encoding log-transformation minmaxscaling missing-values

Last synced: 17 Nov 2024

https://github.com/moindalvs/learn_statistics_for_data_science

Central tendency, distribution, skewness and kurtosis

kurtosis mean median model numpy pandas skewness

Last synced: 17 Nov 2024

https://github.com/moindalvs/learn_feature_selection_house_price

Data Set: House Prices: Advanced Regression Techniques

data-science feature-selection lasso-regression

Last synced: 17 Nov 2024

https://github.com/moindalvs/learn_eda_on_zomato_dataset

Zomato Dataset What is the top 10 most preferred Cuisines?

eda exploratory-data-analysis

Last synced: 17 Nov 2024

https://github.com/moindalvs/learn_eda_house_price_dataset

Data Set: House Prices: Advanced Regression Techniques Exploratory Data Analysis on more than 80 features

cardinality data-analysis data-science data-structures data-visualization missing-values

Last synced: 17 Nov 2024

https://github.com/moindalvs/learn_about_python_dictionary

In Dictionary Keys has to be Immutable values such as tuple,string and numeric as well and¶ Values can be Anything Mutable or Immutable

from-zero-to-hero jupyter-notebook literals python3 string tuples-in-python zip

Last synced: 17 Nov 2024