https://github.com/omarsar/text_mining_lab_2017

Requirements for Text Mining Summer Course (Lab Session)
https://github.com/omarsar/text_mining_lab_2017

ai data-minig data-science deep-nlp machine-learning nlp text-mining word2vec

Last synced: about 1 year ago
JSON representation

Requirements for Text Mining Summer Course (Lab Session)

Host: GitHub
URL: https://github.com/omarsar/text_mining_lab_2017
Owner: omarsar
Created: 2017-06-26T05:30:52.000Z (about 9 years ago)
Default Branch: master
Last Pushed: 2017-07-05T02:53:41.000Z (about 9 years ago)
Last Synced: 2025-03-24T18:13:09.211Z (over 1 year ago)
Topics: ai, data-minig, data-science, deep-nlp, machine-learning, nlp, text-mining, word2vec
Language: Jupyter Notebook
Size: 14.3 MB
Stars: 4
Watchers: 1
Forks: 4
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

          Hello Everyone, 

Here is the list of packages needed for our Text Mining Lab Session scheduled for 6/29/2017 (2:00-5:00 p.m.)

#### Updates:

------------------

* I have uploaded some poster examples of some past students. (Check the `posters` folder)

* For the guys intereted in the slack community, send me your email to `ellfae@gmail` and I will provide an invite

* If you have any other questions or technical problems, feel free to stop by Idea Lab Delta 701. I will be more than happy to assist. 

* I may extend the python notebook based on the excellent questions you guys asked (e.g., more statistics, visuals, etc.)

* Lastly, good luck and enjoy your stay here. 

#### Software:

------------------

* Python 3 (coding will be done strictly using Python 3)

* Anaconda Environment (recommended but not mandatory) (https://www.continuum.io/downloads)

* Jupyter (http://jupyter.org/)

* Google's word2vec (Download the file... warning! it is really huge)(https://drive.google.com/file/d/0B7XkCwpI5KDYNlNUTTlSS21pQmM/edit?usp=sharing)

* Gensim (https://radimrehurek.com/gensim/)

* Scikit Learn (http://scikit-learn.org/stable/) (get the latest version)

* Pandas (http://pandas.pydata.org/)

* Matplotlib (https://matplotlib.org/)

* NLTK (for stopwords) (http://www.nltk.org/)

#### Computing Resources:

-------------------

* Operating System: Preferably Linux or MacOS (Windows break but you can try it out)

* RAM: 4GB 

* Disk Space: 8GB (mostly to store word embeddings)

#### Test:

-------------------

Once you have installed all the necessary packages, you can test to see if everything is working by running the following python code:

```python

import logging

logging.root.handlers = []  # Jupyter messes up logging so needs a reset

logging.basicConfig(format='%(asctime)s : %(levelname)s : %(message)s', level=logging.INFO)

from smart_open import smart_open

import pandas as pd

import numpy as np

from numpy import random

import gensim

import nltk

from sklearn.cross_validation import train_test_split

from sklearn import linear_model

from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer

from sklearn.metrics import accuracy_score, confusion_matrix

import matplotlib.pyplot as plt

from gensim.models import Word2Vec

from sklearn.neighbors import KNeighborsClassifier

from sklearn import linear_model

from nltk.corpus import stopwords

%matplotlib inline

```

If you have any further questions please feel free to contact me at ellfae@gmail.com

Have Fun,

Elvis Saravia (Text Mining TA)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/omarsar/text_mining_lab_2017

Awesome Lists containing this project

README