{"id":16778624,"url":"https://github.com/omarsar/text_mining_lab_2017","last_synced_at":"2025-04-10T20:43:47.805Z","repository":{"id":151625615,"uuid":"95411615","full_name":"omarsar/text_mining_lab_2017","owner":"omarsar","description":"Requirements for Text Mining Summer Course  (Lab Session)","archived":false,"fork":false,"pushed_at":"2017-07-05T02:53:41.000Z","size":14978,"stargazers_count":4,"open_issues_count":0,"forks_count":4,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-03-24T18:13:09.211Z","etag":null,"topics":["ai","data-minig","data-science","deep-nlp","machine-learning","nlp","text-mining","word2vec"],"latest_commit_sha":null,"homepage":null,"language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/omarsar.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-06-26T05:30:52.000Z","updated_at":"2024-07-25T15:11:21.000Z","dependencies_parsed_at":"2023-05-25T07:30:12.097Z","dependency_job_id":null,"html_url":"https://github.com/omarsar/text_mining_lab_2017","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/omarsar%2Ftext_mining_lab_2017","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/omarsar%2Ftext_mining_lab_2017/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/omarsar%2Ftext_mining_lab_2017/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/omarsar%2Ftext_mining_lab_2017/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/omarsar","download_url":"https://codeload.github.com/omarsar/text_mining_lab_2017/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248294110,"owners_count":21079784,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","data-minig","data-science","deep-nlp","machine-learning","nlp","text-mining","word2vec"],"created_at":"2024-10-13T07:28:16.368Z","updated_at":"2025-04-10T20:43:47.796Z","avatar_url":"https://github.com/omarsar.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"Hello Everyone, \n\nHere is the list of packages needed for our Text Mining Lab Session scheduled for 6/29/2017 (2:00-5:00 p.m.)\n\n#### Updates:\n------------------\n* I have uploaded some poster examples of some past students. (Check the `posters` folder)\n* For the guys intereted in the slack community, send me your email to `ellfae@gmail` and I will provide an invite\n* If you have any other questions or technical problems, feel free to stop by Idea Lab Delta 701. I will be more than happy to assist. \n* I may extend the python notebook based on the excellent questions you guys asked (e.g., more statistics, visuals, etc.)\n* Lastly, good luck and enjoy your stay here. \n\n#### Software:\n------------------\n\n* Python 3 (coding will be done strictly using Python 3)\n* Anaconda Environment (recommended but not mandatory) (https://www.continuum.io/downloads)\n* Jupyter (http://jupyter.org/)\n* Google's word2vec (Download the file... warning! it is really huge)(https://drive.google.com/file/d/0B7XkCwpI5KDYNlNUTTlSS21pQmM/edit?usp=sharing)\n* Gensim (https://radimrehurek.com/gensim/)\n* Scikit Learn (http://scikit-learn.org/stable/) (get the latest version)\n* Pandas (http://pandas.pydata.org/)\n* Matplotlib (https://matplotlib.org/)\n* NLTK (for stopwords) (http://www.nltk.org/)\n\n#### Computing Resources:\n-------------------\n* Operating System: Preferably Linux or MacOS (Windows break but you can try it out)\n* RAM: 4GB \n* Disk Space: 8GB (mostly to store word embeddings)\n\n\n#### Test:\n-------------------\nOnce you have installed all the necessary packages, you can test to see if everything is working by running the following python code:\n\n```python\nimport logging\nlogging.root.handlers = []  # Jupyter messes up logging so needs a reset\nlogging.basicConfig(format='%(asctime)s : %(levelname)s : %(message)s', level=logging.INFO)\nfrom smart_open import smart_open\nimport pandas as pd\nimport numpy as np\nfrom numpy import random\nimport gensim\nimport nltk\nfrom sklearn.cross_validation import train_test_split\nfrom sklearn import linear_model\nfrom sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer\nfrom sklearn.metrics import accuracy_score, confusion_matrix\nimport matplotlib.pyplot as plt\nfrom gensim.models import Word2Vec\nfrom sklearn.neighbors import KNeighborsClassifier\nfrom sklearn import linear_model\nfrom nltk.corpus import stopwords\n%matplotlib inline\n\n```\n\nIf you have any further questions please feel free to contact me at ellfae@gmail.com\n\nHave Fun,\n\nElvis Saravia (Text Mining TA)","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fomarsar%2Ftext_mining_lab_2017","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fomarsar%2Ftext_mining_lab_2017","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fomarsar%2Ftext_mining_lab_2017/lists"}