https://github.com/oya163/datascience101

Pushing things as I do data science stuff
https://github.com/oya163/datascience101

data-science dataset embeddings quora text-preprocessing traditional-machine-learning

Last synced: about 1 month ago
JSON representation

Pushing things as I do data science stuff

Host: GitHub
URL: https://github.com/oya163/datascience101
Owner: oya163
Created: 2018-11-08T22:54:20.000Z (over 7 years ago)
Default Branch: master
Last Pushed: 2019-01-01T02:31:56.000Z (over 7 years ago)
Last Synced: 2025-02-24T11:37:08.301Z (over 1 year ago)
Topics: data-science, dataset, embeddings, quora, text-preprocessing, traditional-machine-learning
Language: Jupyter Notebook
Homepage:
Size: 4.48 MB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Data Science 101

We formed a group to discuss and learn about the data science. We meet every alternative days to work on Kaggle problem. We go through the theoretical and practical knowledge required to help us prepare for data science internship.

- Titanic
- Kaggle Dataset kernel
- Traditional machine learning
- Data preprocessing and data visualization
- [Kaggle Link](https://www.kaggle.com/dejavu23/titanic-survival-seaborn-and-ensembles/notebook)

- Movie Sentiment Analysis
- Text preprocessing (Removal of punctuations, html tags)
- Creation of dataset in proper csv formatted file
- Usage of NLTK for tokenization for TfidfVectorizer
- Machine Learning algorithms
- Logistic Regression
- Support Vector Machine
- Naive Bayes
- KNN
- Perceptron, MLP
- GridSearchCV for all above
- Visualization
- Word cloud (top unigrams, bigrams, trigrams)
- Confusion matrix
- ROC AUC curve
- Histogram of top negative/positive features
- Corpus creation for word embeddings
- Use of gensim for word embeddings word2vec
- [Dataset](http://ai.stanford.edu/~amaas/data/sentiment/)

- Quora Insincere Questions Analysis
- [ ] Text preprocessing
- [ ] Traditional machine learning
- [ ] Deep learning
- [Dataset](https://www.kaggle.com/c/quora-insincere-questions-classification/data)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/oya163/datascience101

Awesome Lists containing this project

README