{"id":13934881,"url":"https://github.com/datacamp/datacamp_facebook_live_nlp","last_synced_at":"2025-08-12T02:17:03.107Z","repository":{"id":79996706,"uuid":"105057981","full_name":"datacamp/datacamp_facebook_live_nlp","owner":"datacamp","description":"DataCamp Facebook Live Code Along Session 1: Enjoy.","archived":false,"fork":false,"pushed_at":"2017-11-26T23:34:23.000Z","size":1380,"stargazers_count":126,"open_issues_count":1,"forks_count":84,"subscribers_count":27,"default_branch":"master","last_synced_at":"2025-07-05T06:06:02.947Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/datacamp.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2017-09-27T19:03:36.000Z","updated_at":"2022-11-26T20:03:44.000Z","dependencies_parsed_at":null,"dependency_job_id":"ce611d82-1895-4bb9-abba-23b88fbe34b3","html_url":"https://github.com/datacamp/datacamp_facebook_live_nlp","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/datacamp/datacamp_facebook_live_nlp","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/datacamp%2Fdatacamp_facebook_live_nlp","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/datacamp%2Fdatacamp_facebook_live_nlp/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/datacamp%2Fdatacamp_facebook_live_nlp/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/datacamp%2Fdatacamp_facebook_live_nlp/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/datacamp","download_url":"https://codeload.github.com/datacamp/datacamp_facebook_live_nlp/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/datacamp%2Fdatacamp_facebook_live_nlp/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":269987170,"owners_count":24508185,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-12T02:00:09.011Z","response_time":80,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-07T23:01:17.555Z","updated_at":"2025-08-12T02:17:03.066Z","avatar_url":"https://github.com/datacamp.png","language":"Jupyter Notebook","funding_links":[],"categories":["Jupyter Notebook"],"sub_categories":[],"readme":"\n# Frequencies of words in novels: a Data Science pipeline\n\nwith DataCamp's very own Hugo Bowne-Anderson. Follow him on twitter [@hugobowne](https://twitter.com/hugobowne)\n\n\u003cp align=\"center\"\u003e\n\u003cimg src=\"img/live_preview.jpeg\" width=\"550\"\u003e\n\u003c/p\u003e\n\n\n## Description\n\n\u003cp align=\"center\"\u003e\n\u003cimg src=\"img/fb_live_schematic.png\" width=\"550\"\u003e\n\u003c/p\u003e\n\nIn this live code-along session, you'll learn how to build a Data Science pipeline to plot frequency distributions of words in *Moby Dick*, among many other novels.\nWe won't give you the novels: you'll learn to scrape them from the website [Project Gutenberg](https://www.gutenberg.org/) (large corpus of books) using the Python package `requests` and how\nto extract the novels from this web data using `BeautifulSoup`. Then you'll dive in to analyzing the novels using the Natural Language ToolKit (`nltk`).\nIn the process you'll learn about important aspects of Natural Language Processing (NLP) such as tokenization and stopwords.\nYou'll come out being able to visualize word frequency distributions of any novel that you can find on Project Gutenberg.\nThe NLP skills you develop, however, will be applicable to much of the data that Data Scientists encounter as the vast proportion of the world's data is unstructured data and includes a great deal of text.\n\nFor example, what would the following word frequency distribution be from?\n\n\u003cp align=\"center\"\u003e\n\u003cimg src=\"img/d-x.png\" width=\"450\"\u003e\n\u003c/p\u003e\n\n## Prerequisites\n\nNot a lot. It would help if you knew\n\n* programming fundamentals and the basics of the Python programming language (e.g., variables, for loops);\n* a bit about Jupyter Notebooks;\n* your way around the terminal/shell.\n\n\n**However, I have always found that the most important and beneficial prerequisite is a will to learn new things so if you have this quality, you'll definitely get something out of this code-along session.**\n\nAlso, if you'd like to watch and **not** code along, you'll also have a great time and these notebooks will be downloadable afterwards also.\n\nIf you are going to code along and use the [Anaconda distribution](https://www.anaconda.com/download/) of Python 3 (see below), I ask that you install it before the session.\n\n\n## Getting set up computationally\n\n### 1. Clone the repository\n\nTo get set up for this live coding session, clone this repository. You can do so by executing the following in your terminal:\n\n```\ngit clone https://github.com/datacamp/datacamp_facebook_live_nlp\n```\n\nAlternatively, you can download the zip file of the repository at the top of the main page of the repository. If you prefer not to use git or don't have experience with it, this a good option.\n\n### 2. Download Anaconda (if you haven't already)\n\nIf you do not already have the [Anaconda distribution](https://www.anaconda.com/download/) of Python 3, go get it (n.b., you can also do this w/out Anaconda using `pip` to install the required packages, however Anaconda is great for Data Science and I encourage you to use it).\n\n### 3. Create your conda environment for this session\n\nNavigate to the relevant directory `datacamp_facebook_live_nlp` and install required packages in a new conda environment:\n\n```\nconda env create -f environment.yml\n```\n\nThis will create a new environment called fb_live_nlp. To activate the environment on OSX/Linux, execute\n\n```\nsource activate fb_live_nlp\n```\nOn Windows, execute\n\n```\nactivate fb_live_nlp\n```\n\n\n### 4. Open your Jupyter notebook\n\nIn the terminal, execute `jupyter notebook`.\n\nThen open the notebook `NLP_FB_live_coding.ipynb` and we're ready to get coding. Enjoy.\n\n\n### Code\nThe code in this repository is released under the [MIT license](LICENSE). Read more at the [Open Source Initiative](https://opensource.org/licenses/MIT). All text remains the Intellectual Property of DataCamp. If you wish to reuse, adapt or remix, get in touch with me at hugo at datacamp com to request permission.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdatacamp%2Fdatacamp_facebook_live_nlp","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdatacamp%2Fdatacamp_facebook_live_nlp","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdatacamp%2Fdatacamp_facebook_live_nlp/lists"}