Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/EmilHvitfeldt/useR2020-text-modeling-tutorial


https://github.com/EmilHvitfeldt/useR2020-text-modeling-tutorial

Last synced: about 2 months ago
JSON representation

Awesome Lists containing this project

README

        

# Predictive modeling with text using tidy data principles

**Authors:** [Emil Hvitfeldt](https://www.hvitfeldt.me/), [Julia Silge](https://juliasilge.com/)

**Materials for our [useR! 2020](https://user2020.r-project.org/) online tutorial on 24 July 2020**

This tutorial was hosted by [R-Ladies en Argentina](https://github.com/RLadiesEnArgentina/user2020tutorial).

- Have you ever encountered text data and suspected there was useful insight latent within it but felt frustrated about how to find that insight?
- Are you familiar with the basics of predictive modeling, and ready to learn how unstructured text data can be used for prediction within the tidyverse and tidymodels ecosystems?
- Do you need a flexible framework for handling text data that empowers you to build supervised predictive models?

Text data is increasingly important in many domains, and tidy modeling principles can be applied to natural language processing tasks. This presentation is designed to provide practical guidance and directly applicable knowledge for data scientists and analysts who want to integrate text into their modeling pipelines.

In [this 90-minute tutorial](https://youtu.be/Sz8RB_fPYOk), learn how to preprocess text data for modeling, train models, and evaluate model performance. We use slides and live coding during the tutorial to walk through a realistic case study. The tutorial [was streamed, recorded, and captioned](https://youtu.be/Sz8RB_fPYOk), and there will be supporting materials and code on GitHub for you to work through afterward.

## Expected level of audience's R background

Intermediate familiarity with R, RStudio, basics of regression and classification modeling, and tidyverse packages such as dplyr and ggplot2.

## What is in this repo

There are two main resources in this repo:

- Slides, which you can [see rendered here](https://emilhvitfeldt.github.io/useR2020-text-modeling-tutorial/) and the [source for here](https://github.com/EmilHvitfeldt/useR2020-text-modeling-tutorial/blob/master/index.Rmd)
- An [R Markdown file to work through](https://github.com/EmilHvitfeldt/useR2020-text-modeling-tutorial/blob/master/text_modeling.Rmd)

If you get stuck, you can [post a question as an issue on this repo](https://github.com/EmilHvitfeldt/useR2020-text-modeling-tutorial/issues) or [post on RStudio Community](https://rstd.io/tidymodels-community).