{"id":15014153,"url":"https://github.com/tiesdekok/python_nlp_tutorial","last_synced_at":"2025-04-12T06:04:35.153Z","repository":{"id":50631202,"uuid":"108233339","full_name":"TiesdeKok/Python_NLP_Tutorial","owner":"TiesdeKok","description":"This repository provides everything to get started with Python for Text Mining / Natural Language Processing (NLP)","archived":false,"fork":false,"pushed_at":"2020-06-05T03:28:50.000Z","size":454,"stargazers_count":120,"open_issues_count":1,"forks_count":65,"subscribers_count":8,"default_branch":"master","last_synced_at":"2024-10-14T04:03:19.242Z","etag":null,"topics":["computational-linguistics","natural-language-processing","nlp","nltk","python","research","spacy","text-mining","textblob","textual-analysis"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/TiesdeKok.png","metadata":{"files":{"readme":"readme.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2017-10-25T07:10:26.000Z","updated_at":"2024-10-10T12:38:29.000Z","dependencies_parsed_at":"2022-09-13T21:01:16.464Z","dependency_job_id":null,"html_url":"https://github.com/TiesdeKok/Python_NLP_Tutorial","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TiesdeKok%2FPython_NLP_Tutorial","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TiesdeKok%2FPython_NLP_Tutorial/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TiesdeKok%2FPython_NLP_Tutorial/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TiesdeKok%2FPython_NLP_Tutorial/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/TiesdeKok","download_url":"https://codeload.github.com/TiesdeKok/Python_NLP_Tutorial/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":219848897,"owners_count":16556331,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["computational-linguistics","natural-language-processing","nlp","nltk","python","research","spacy","text-mining","textblob","textual-analysis"],"created_at":"2024-09-24T19:45:15.923Z","updated_at":"2024-10-14T04:03:25.526Z","avatar_url":"https://github.com/TiesdeKok.png","language":"Jupyter Notebook","funding_links":["https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick\u0026hosted_button_id=2UKM4JREAPTBG"],"categories":[],"sub_categories":[],"readme":"\u003ch1 align=\"center\"\u003e\n   \u003cimg src=\"https://i.imgur.com/IAj2Koq.png\" alt=\"Get started with Python for Text Mining (NLP)\" title=\"Get started with Python for Text Mining (NLP)\" /\u003e\n\u003c/h1\u003e\n\u003cp align=\"center\"\u003e  \n \u003ca href=\"https://opensource.org/licenses/MIT\"\u003e\u003cimg src=\"https://img.shields.io/badge/license-MIT-blue.svg\"\u003e\u003c/a\u003e\n \u003ca href=\"https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick\u0026hosted_button_id=2UKM4JREAPTBG\"\u003e\u003cimg src=\"https://img.shields.io/badge/buy%20me%20a-coffee-yellow.svg\"\u003e\u003c/a\u003e\n \u003cimg src=\"https://img.shields.io/badge/last%20updated-June%202020-3d62d1\"\u003e\n\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n  Want to learn how to use \u003cstrong\u003ePython for Text Mining / Natural Language Processing (NLP)\u003c/strong\u003e? \u003cbr\u003e\n  This repository has everything that you need to get started! \u003cbr\u003e\u003cbr\u003e\n  \u003cspan style='font-size: 15pt'\u003e\u003cstrong\u003eAuthor:\u003c/strong\u003e Ties de Kok (\u003ca href=\"https://www.TiesdeKok.com\"\u003ePersonal Page\u003c/a\u003e)\u003c/span\u003e\n  \u003ch4 align=\"center\"\u003e These materials accompany a PhD session on NLP for Accounting Research: \u003ca href=\"http://www.tiesdekok.com/AccountingNLP_Slides/\", target=\"_blank\"\u003eslides\u003c/a\u003e\n  \u003cbr\u003e\n  \u003ch4 align=\"center\"\u003eQuick link to the notebook: \u003ca href=\"https://nbviewer.jupyter.org/github/TiesdeKok/Python_NLP_Tutorial/blob/master/NLP_Notebook.ipynb\", target=\"_blank\"\u003eopen notebook\u003c/a\u003e\u003c/h4\u003e\n \u003c/p\u003e\n\n\n## Table of contents\n\n  * [Introduction](#introduction)\n    * [Who is this repository for?](#audience)\n    * [How to use this repository?](#howtouse)\n  * [Using Jupyter](#usingpython)\n  * [Code along](#codealong)\n     * [Clone repository](#clonerepo)\n     * [Install environment](#installenv)\n  * [Packages](#packages)\n  * [Questions?](#questions)\n  * [License](#license)\n  * [Special thanks](#specialthanks)\n\n\u003ch2 id=\"introduction\"\u003eIntroduction\u003c/h2\u003e\n\nThe goal of this GitHub page is to provide you with everything you need to get started with Python and Natural Language Processing (NLP)  \n\nThe following topics are discussed:  \n\n\u003cimg src=\"https://i.imgur.com/c3aCZLA.png\" width=\"60%\" /\u003e \n\n(*Note: the neural network part is only a reference to the Stanford course CS224n*)\n\n\u003ch3 id=\"audience\"\u003eWho is this repository for?\u003c/h3\u003e\n\nThe topics and techniques demonstrated in this repository are primarily oriented towards empirical research projects in fields such as Accounting, Finance, Marketing, Political Science, and other Social Sciences. \n\nHowever, many of the basics are also perfectly applicable if you are looking to use Python for any other type of Data Science!\n\n\u003ch3 id=\"howtouse\"\u003eHow to use this repository?\u003c/h3\u003e\n\nThis repository is written to facilitate learning by doing. \n\nAll the material is written up in a Jupyter Notebook. See: `NLP_notebook.ipynb`.    \nThe topics are split up by task description.\n\nIt is best to view the notebook locally or on nbviewer using this link: [click here](https://nbviewer.jupyter.org/github/TiesdeKok/Python_NLP_Tutorial/blob/master/NLP_Notebook.ipynb)\n\nAn `environment.yml` file is provided that you can install using `conda`, this will automatically install all the packages used in the notebook. \n\nInstructions on how to install the environment are provided here: [Install environment](#installenv)\n\n### Not yet familiar with the basic Python syntax?\n\nPlease check out my \"Getting started with Python for Research\" repository: [click here](https://github.com/TiesdeKok/LearnPythonforResearch)\n\n\n\u003ch2 id=\"usingpython\"\u003eUsing Jupyter\u003c/h2\u003e\n\nTo run the provided notebook file you need to use Jupyter Lab or Jupyter Notebook. \n\n[Jupyter](http://jupyter.org/) comes pre-installed with the Anaconda distribution so you should have everything already installed and ready to go. The `environment.yml` will also install Jupyter Lab if you prefer to use that. \n\n***What is the Jupyter Notebook?***\n\nFrom the [Jupyter](http://jupyter.org/) website:\n\u003e The Jupyter Notebook is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and explanatory text. \n\nIn other words, the Jupyter Notebook allows you to program Python code straight from your browser!\n\n***How does the Jupyter Notebook work in the background?***\n\nThe diagram below sums up the basics components of Jupyter:\n\n\u003cimg src=\"https://i.imgur.com/1zFzbyw.png\" title=\"Jupyter Notebook\" width = 400px/\u003e\n\nAt the heart there is the *Jupyter Server* that handles everything, the *Jupyter Notebook* which is accessed and used through your browser, and the *kernel* that executes the code. We will be focusing on the natively included *Python Kernel* but Jupyter is language agnostic so you can also use it with other languages/software such as 'R'.\n\nIt is worth noting that in most cases you will be running the `Jupyter Server` on your own computer and will connect to it locally in your browser (i.e. you don't need to be connected to the internet). However, it is also possible to run the Jupyter Server on a different computer, for example a high performance computation server in the cloud, and connect to it over the internet.\n\n***How to start a Jupyter Notebook?***\n\nThe primary method that I would recommend to start a Jupyter Notebook is to use the command line (terminal) directly:\n\n1. Open your command prompt / terminal (on Windows I recommend the Anaconda Prompt)\n2. Activate the environment `conda activate PythonNLPTutorial`  \n3. `cd` (i.e. Change) to the desired starting directory   \n   for example: `cd \"C:\\Files\\Work\\Project_1\"`  \n   *Note:* if you are changing do folder on another drive you might have to also switch drives by typing, for example, `E:` \n4. Start the Jupyter Notebook server by typing: `jupyter notebook` or `jupyter lab`\n\nThis should automatically open up the corresponding Jupyter Notebook/Lab in your default browser.\nYou can also manually go to the Jupyter Notebook/Lab by going to `localhost:8888` with your browser.\n\n***How to close a Jupyter Notebook/Lab server?***\n\nIf you want to close down the Jupyter Server: open up the command prompt window that runs the server and press `CTRL + C` twice.   \nMake sure that you have saved any open Jupyter Notebooks!\n\n***How to use the Jupyter Notebook?***\n\n*Some shortcuts are worth mentioning for reference purposes:*\n\n`command mode` --\u003e enable by pressing `esc`   \n `edit mode` --\u003e enable by pressing `enter`   \n\n|  `command mode` |`edit mode`  | `both modes`\n|---  |---  |---\n|  `Y` : cell to code |  `Tab` : code completion or indent | `Shift-Enter` : run cell, select below\n| `M` : cell to markdown  |   `Shift-Tab` : tooltip | `Ctrl-Enter` : run cell \n| `A` : insert cell above   |     `Ctrl-A` : select all | \n| `B` : insert cell below   |   `Ctrl-Z` : undo | \n| `X`: cut selected cell |   \n\n\u003ch2 id=\"codealong\"\u003eCode along!\u003c/h2\u003e\n\n\n\u003ch3 id=\"clonerepo\"\u003e\u003cstrong\u003eOption 1:\u003c/strong\u003e clone repository\u003c/h3\u003e\n\nYou can essentially \"download\" the contents of this repository by cloning the repository. \n\nYou can do this by clicking \"Clone or download\" button and then \"Download ZIP\":\n\n\u003cimg src=\"https://i.imgur.com/Ysak4s3.png\" title=\"Jupyter Notebook\" width = 300px/\u003e\n\nIf you extract the downloaded ZIP to a folder you can start the Jupyter Notebook/Lab in that folder and access the notebook.\n\n\u003ch2 id=\"installenv\"\u003eEnvironment\u003c/h2\u003e\n\nYou can install the environment by following these steps:\n\n1. Make sure you have Anaconda installed ([link](https://docs.anaconda.com/anaconda/install/))\n2. Open your command prompt / terminal (on Windows I recommend the Anaconda Prompt)   \n3. `cd` (i.e. Change) to the folder where you extracted the ZIP file   \n   for example: `cd \"C:\\Files\\Work\\Project_1\"`  \n   *Note:* if you are changing do folder on another drive you might have to also switch drives by typing, for example, `E:` \n4. Run the following command `conda env create -f environment.yml`  \n5. Activate the environment with: `conda activate PythonNLPTutorial`\n\nA full list of all the packages used is provided in the `environment.yml` file. \n\n\u003ch3 id=\"binder\"\u003e\u003cstrong\u003eOption 2:\u003c/strong\u003e use Binder\u003c/h3\u003e\n\n[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/TiesdeKok/Python_NLP_Tutorial/master?urlpath=lab)\n\n\n*Note:* some functionality might not work on Binder. \n\n\u003ch2 id=\"questions\"\u003eQuestions?\u003c/h2\u003e\n\nIf you have questions or experience problems please use the `issues` tab of this repository.\n\n\u003ch2 id=\"license\"\u003eLicense\u003c/h2\u003e\n\n[MIT](LICENSE) - Ties de Kok - 2020\n\n\u003ch2 id=\"specialthanks\"\u003eSpecial Thanks\u003c/h2\u003e\n\nhttps://github.com/teles/array-mixer for having an awesome readme that I used  as a template. ","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftiesdekok%2Fpython_nlp_tutorial","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftiesdekok%2Fpython_nlp_tutorial","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftiesdekok%2Fpython_nlp_tutorial/lists"}