An open API service indexing awesome lists of open source software.

https://github.com/oya163/datascience101

Pushing things as I do data science stuff
https://github.com/oya163/datascience101

data-science dataset embeddings quora text-preprocessing traditional-machine-learning

Last synced: 4 days ago
JSON representation

Pushing things as I do data science stuff

Awesome Lists containing this project

README

          

# Data Science 101

We formed a group to discuss and learn about the data science. We meet every alternative days to work on Kaggle problem. We go through the theoretical and practical knowledge required to help us prepare for data science internship.

- Titanic
- Kaggle Dataset kernel
- Traditional machine learning
- Data preprocessing and data visualization
- [Kaggle Link](https://www.kaggle.com/dejavu23/titanic-survival-seaborn-and-ensembles/notebook)

- Movie Sentiment Analysis
- Text preprocessing (Removal of punctuations, html tags)
- Creation of dataset in proper csv formatted file
- Usage of NLTK for tokenization for TfidfVectorizer
- Machine Learning algorithms
- Logistic Regression
- Support Vector Machine
- Naive Bayes
- KNN
- Perceptron, MLP
- GridSearchCV for all above
- Visualization
- Word cloud (top unigrams, bigrams, trigrams)
- Confusion matrix
- ROC AUC curve
- Histogram of top negative/positive features
- Corpus creation for word embeddings
- Use of gensim for word embeddings word2vec
- [Dataset](http://ai.stanford.edu/~amaas/data/sentiment/)

- Quora Insincere Questions Analysis
- [ ] Text preprocessing
- [ ] Traditional machine learning
- [ ] Deep learning
- [Dataset](https://www.kaggle.com/c/quora-insincere-questions-classification/data)