https://github.com/oxford-cs-deepnlp-2017/practical-3
Oxford Deep NLP 2017 course - Practical 3: Text Classification with RNNs
https://github.com/oxford-cs-deepnlp-2017/practical-3
deep-learning machine-learning natural-language-processing nlp oxford
Last synced: 4 months ago
JSON representation
Oxford Deep NLP 2017 course - Practical 3: Text Classification with RNNs
- Host: GitHub
- URL: https://github.com/oxford-cs-deepnlp-2017/practical-3
- Owner: oxford-cs-deepnlp-2017
- Created: 2017-02-06T00:00:10.000Z (over 9 years ago)
- Default Branch: master
- Last Pushed: 2017-02-06T00:04:40.000Z (over 9 years ago)
- Last Synced: 2025-01-17T11:13:02.264Z (over 1 year ago)
- Topics: deep-learning, machine-learning, natural-language-processing, nlp, oxford
- Size: 139 KB
- Stars: 97
- Watchers: 16
- Forks: 76
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Practical 3: Text Classification with RNNs
[Chris Dyer, Phil Blunsom, Yannis Assael, Brendan Shillingford, Yishu Miao]
In this practical, you can explore one of two applications of RNNs: text classification or language modelling (you are welcome to try both, too). We will be using the training/dev/test splits that we created in Practical 2.
## Text classification (Task 1)
Last week’s practical introduced text classification as a problem that could be solved with deep learning. The document representation function we used was very simple: an average over the word embeddings in the document. This week, you will use RNNs to compute the representations of the documents.
In the figure below, on the left we show the document representation function that was used in last week’s practical. Your goal in this task is to adapt your code to use the architecture on the right.

Note that in Practical 3, **x** is defined to be the average of the RNN hidden states (the **h_t**’s), just just the sum.
### Questions
1. What are the benefits and downsides of the RNN-based representation over the bag of words representation used last week? How would availability of data affect your answer?
2. One possible architectural variant is to use only the final hidden state of the RNN as the document representation (i.e., x) rather than the average of the hidden states over time. How does this work? What are the potential benefits and downsides to this representation?
3. Try different RNN architectures, e.g., simple Elman RNNs or GRUs or LSTMs. Which ones work best?
4. What happens if you use a bidirectional LSTM (i.e., the dashed arrows in the figure)?
**(Optional, for enthusiastic students)** RNNs are expensive use as “readers” on long sequences. Truncated backpropagation through time (truncated BPTT) can be used to get better parallelism. You are encouraged to use this to get better computational efficiency.
## Language Modelling with RNNs (Task 2)
As covered in lecture last week, RNN language models use the chain rule to decompose the probability of a sequence into the product of probabilities of words, conditional of previously generated words:

To avoid problems with floating point underflow, you it is customary to model this in log space.
Given a training sequence training graph for a language model looks like this:

Your task is to train an RNN language model on the training portion of the TED data, using the validation set to determine when to stop optimising the parameters of the model.
A language model can be evaluated quantitatively by computing the (per-word) perplexity of the model on a held-out test corpus,

where |*test set*| is the length of the test set in words, including any \ tokens. (Note: you can measure length in terms of any units, including characters, words, or sentences, these are just ways of quantifying how much uncertainty the model has about different units.)
To evaluate the model qualitatively, generate random samples from the model by sampling from p(w\_t | **w**\_{\