Projects in Awesome Lists by Raksh710

https://github.com/raksh710/king_county_house_price_regression

Did a comparison between CatBoostRegressor and Keras to find out which model performed best on king county house price regression dataset from kaggle. Link to the notebook: https://www.kaggle.com/raksh710/catboost-vs-keras-cb-wins

Last synced: 10 Nov 2024

https://github.com/raksh710/plotly-dash-stock-dashboard

Dynamic Dashboard created using plotly-dash for stock price historical values.

Last synced: 10 Nov 2024

https://github.com/raksh710/data_scientist_salaries

Predicting the Salary of data science jobs (for example Data Scientist, Data Engineer, Machine Learning Engineer, Data Analyst, BI Engineer etc.) in USD based on various factors like Work Year (the year in which you are looking for job), Pay grade, Average pay scale in the Country (where the job is located), experience level, Employment type etc.

flask-application heroku-deployment random-forest regression regression-models salary-prediction

Last synced: 10 Nov 2024

https://github.com/raksh710/raksh710

Config files for my GitHub profile.

config github-config

Last synced: 10 Nov 2024

https://github.com/raksh710/building-efficient-portfolio-using-various-trade-strategies

Building an efficient Active Portfolio which yields a high Sharpe Ratio on 8 instruments using various trade strategies in order to get a high Sharpe Ratio.

capm financial-information monte-carlo-simulation portfolio-construction quantitative-finance yfinance-library

Last synced: 10 Nov 2024

https://github.com/raksh710/pdf-bot

Customized chatbot for a particular PDF file

Last synced: 10 Nov 2024

https://github.com/raksh710/malware_attack_classification

We are working on UMD's info challenge and our dataset is ISCXIDS2012 cybersecurity dataset.

Last synced: 10 Nov 2024

https://github.com/raksh710/sentiment-analysis-youtube-comments

Uses Google's v3 API to get the top 100 relevant comments and do a sentiment analysis on each comment and then, finally return the 'Average' sentiment. The application is hosted using Salesforce Heroku which is a PaaS.

Last synced: 10 Nov 2024

https://github.com/raksh710/my_resume

My Resume

Last synced: 10 Nov 2024

https://github.com/raksh710/space-titanic-kaggle

Rank-121 as of March 20, 2023. The task is to predict whether a passenger was transported to an alternate dimension during the Spaceship Titanic's collision with the spacetime anomaly. To make these predictions, we're given a set of personal records recovered from the ship's damaged computer system.

Last synced: 10 Nov 2024

https://github.com/raksh710/predict_future_sale_coursera

This repository contains my assignment solutions to the course "How to win a kaggle data challenge (Challenge Predict Future sales)" from coursera. Please feel free to review and drop in your feedback

Last synced: 10 Nov 2024

https://github.com/raksh710/shark_tank_analysis

Predicting whether a company on ABC's popular program "Shark Tank" accepted an offer is our goal. Entrepreneurs present their ideas to a panel of investors on the show Shark Tank, which debuted in 2009. In exchange for stock, the entrepreneur requests that the investors invest money.

behavioral-analytics consumer-analytics flask-application heroku-deployment kaggle knn-classification python shark-tank-analysis

Last synced: 10 Nov 2024

https://github.com/raksh710/predict-sleep-rob-mulla-kaggle-competition

Predict My Sleep is a Kaggle Competition hosted by Rob Mulla (Youtuber and Twitch Streamer). Tryint to predict his sleep patterns since 2022 using historic data.

Last synced: 10 Nov 2024

https://github.com/raksh710/flower_detection_using_cnn

Flower detection using CNN

Last synced: 10 Nov 2024

https://github.com/raksh710/cnn_for_detecting_pneumonia

Using CNN to detect and classify which chest x-ray images have pneumonia and which ones are normal. The data is taken from Kaggle platform. : https://www.kaggle.com/paultimothymooney/chest-xray-pneumonia

Last synced: 10 Nov 2024

https://github.com/raksh710/pyspark_ddos_attack_classification

Last synced: 10 Nov 2024

https://github.com/raksh710/medical_personal_cost

Task was to forecast the medical cost associated with each patient given their medical parameters and health history. CatBoost algorithm was implemented on the data after scaling (Standardization) was done.

Last synced: 10 Nov 2024

https://github.com/raksh710/healthcare_analytics

The task is to correctly predict the number of days a patient would be staying in a hospital, out of 10 different categories. 16 different parameters were given. EDA, Feature Engineering, resampling has been performed to properly do data preprocessing. Ultimately CatBoost Classification model has been implemented to achieve more than 41% accuracy.

Last synced: 10 Nov 2024

https://github.com/raksh710/malicious_website_recognition

Classifying Malicious website from benign ones using CatBoost Classifier. Process involves Exploration of data, Data Cleaning, Resampling of data (to handle highly imbalanced data), Model implementation and Evaluation.

Last synced: 10 Nov 2024

https://github.com/raksh710/heart_attack_analysis_and_prediction

Performed an analysis on a dataset and predicting which patients are more likely to suffer from a heart attack. link: https://www.kaggle.com/raksh710/87-accuracy-85-f1-score-knn-14-lr-svc-rf-cbc The dataset is available on kaggle and so is my notebook on this

Last synced: 10 Nov 2024

https://github.com/raksh710/bitcoin_transaction_analysis

Last synced: 10 Nov 2024

https://github.com/raksh710/covid-19_tweets_sentiment_analysis

Predicted the sentiment associated with tweets made on the topic of Covid-19 pandemic. Tweets were classified into "Positive", "Extremely Positive", "Neutral","Negative" and "Extremely Negative". TF-IDF Vectorization was used to vectorize the tokens present in the tweets and then to classify "CatBoost" algorithm was used. Ultimately achieving an accuracy of around 57%.

Last synced: 10 Nov 2024

https://github.com/raksh710/loan_default_prediction

A major chunk of bank revenue is generated by credit cards. Customers who fail to pay their credit card dues on time could potentially cost banks a lot of revenue. Issuing credit cards to customers who have a higher likelihood of not paying their dues on time involves a higher risk for the bank. Issuing these customers' cards with a higher interest rate would work in favor of the bank. Inorder to make a informed decision about which customer is high risk and which one is low risk, the firm would benefit from a predition model which would accurately predict if the customer would default or not. Prediction can be done based on factors like job, education, balance, loans, and house ownership. Finding out which are the most common factors that defaulters have will also help the bank to be cautious before issuing a credit card to customers who fall into one of those categories.

Last synced: 10 Nov 2024

https://github.com/raksh710/house_price_advanced_regression

Last synced: 10 Nov 2024

https://github.com/raksh710/anime_recommender_system

Recommends Anime using Content based filtering (using TFIDF vectorization and sigmoid kernel) and collaborative filtering (using KNN)

anime collaborative-filtering content-based-recommendation kaggle recommendation-system tfidf-text-analysis

Last synced: 10 Nov 2024

https://github.com/raksh710/mnist

The input data contained image data ( grayscale(color_scale = 1) data of width=28, height=28) of digits from 0 to 9 which are to be identified by the model. I implemented CNN which consisted of convolutional layers as well as MaxPool layers. I achieved 99.6 % accuracy on the test set. Link to my notebook: https://www.kaggle.com/raksh710/mnist-using-cnn-99-6-test-accuracy

Last synced: 10 Nov 2024

https://github.com/raksh710/bike_sharing_data

Implementing various ML Regression model on bike sharing data shared by Capital Bikeshare (Washington D.C.)

Last synced: 10 Nov 2024

https://github.com/raksh710/stroke_prediction_raksh710

Predicting whether a patient will have a stroke (=1) or not (=0). The unique thing about this dataset is that it is highly unbalanced having around 95% of its dependent value column (i.e stroke column) as 0 and only around 5% as 1. Apart from that it also has a lot of missing data which had to be fixed, and for that, I used various categorical columns (and turned them into dummies) and then figured out on which features the missing value depended the most, and then mapping it accordingly. Since the dataset was highly unbalanced I remade another dataset with around 80% as stroke=0 and the rest 20% as stroke=1. I then used various classification algorithm to find out which fits the best, turns out Random Forest did. The metric used here was "F1 Score" instead of accuracy (because in the case of a highly unbalanced dataset accuracy could be misleading as it could be really high by only predicting only one value. Please have a look and let me know your thoughts on this: https://www.kaggle.com/raksh710/stroke-prediction-raksh710 © 2021 GitHub, Inc.

Last synced: 10 Nov 2024

https://github.com/raksh710/titanic-kaggle-code

Last synced: 10 Nov 2024

https://github.com/raksh710/diet_recommendation_fastembed

Recommends a recipe along with its full description using fastembed library and

docker docker-image dockerfile dockerhub-image fastapi fastembed heroku-app heroku-deployment nlp-machine-learning numpy pandas recommender-system uvicorn

Last synced: 20 Dec 2024

https://github.com/raksh710/whatsapp_chat_analysis

I did a thorough analysis (of word count and message count by per person on annual basis) of the chats present in a whatsapp group of mine which includes me and my friends. Following is the way I proceeded in a really brief manner 1) Exported the chat data from whatsapp(excluding the media files). Used excel to make it a csv file (delimiters used were semicolon, tab, and colon(only for one column)) 2)Imported the data (using pandas in the form of a dataframe) in a jupyter notebook. Now this dataframe was like a sparse matrix. 3) Initially the data was filled with junk values, so I had to clean it. I started out with removing any thing which doesn't belonged to alphanumerics, this removed all the emojis. I made a function and ran it accross the dataframe which removed the alphanumerics and replaced them with a pipe (|) symbol and later I filtered out this pipe symbol from the dataframe. 4) In total there were around 6100+ samples and 36 instances, all of them had texts. 5) All these column were filled with N/A values. I remooved all those columed which had more than 95% N/A values as they could be treated as junk 6) Changed the date column's format to datetime and extracted month and year. 7) After all this I was left with a huge text column, year column, month column, and contact info column. 8) Calculated the length of the text column in order to calculate how many letters or alphanumeric characters were used in each message. 9) I categorized the datframe by grouping them by various parameters like Contact, Year etc. 10) Made various plots and represented the data by each category. NOTE: You can reference the code to understand the entire process in detail.

Last synced: 10 Nov 2024

https://github.com/raksh710/landscape_classification

Given an input image, classify the image in the following category: 'buildings': 0, 'forest': 1, 'glacier': 2, 'mountain': 3, 'sea': 4, 'street': 5 <br> </br> Above are the keys along with their tag (or value) are mentioned. A CNN model has been used with 3 Conv2D, 3 MaxPool2d, 1 Flatten, one dropout and 2 Dense layers. <br> </br> After training the CNN model on 14034 images belonging to 6 classes, the CNN model was validated on a validation set with 3000 images belonging to 6 classes, on which an accuracy of 84.17% was achieved. Steps: 1) Specify train, validation and test directory (where images are stored) 2) Use Image Generator to create more samples out of the given number of training samples (in order to detect the class more accurately). Images went through various processes like: zoomed in/out, sheared, rorated etc. 3) Images from train and validation were subjected to the Image Generator created in step: 2. Note that in training the shuffle was True and that in validation it was False, because we want to keep the validation set in order to evalue the accuracy (which required the images to be in order) 4) Image samples from train directory were fed to the CNN model and evaluated on the validation directory. 5) Image samples from test directory were also predicted and evaluated manually.

Last synced: 10 Nov 2024