https://github.com/tjmahr/fillgaze
Helper functions for interpolating missing eyetracking data
https://github.com/tjmahr/fillgaze
Last synced: 3 months ago
JSON representation
Helper functions for interpolating missing eyetracking data
- Host: GitHub
- URL: https://github.com/tjmahr/fillgaze
- Owner: tjmahr
- License: gpl-3.0
- Created: 2017-08-01T15:52:30.000Z (almost 8 years ago)
- Default Branch: master
- Last Pushed: 2017-08-10T14:27:30.000Z (almost 8 years ago)
- Last Synced: 2025-02-24T13:56:36.785Z (3 months ago)
- Language: R
- Homepage:
- Size: 454 KB
- Stars: 3
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.Rmd
- License: LICENSE
Awesome Lists containing this project
README
---
output: github_document
---```{r, echo = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "fig/README-"
)
```# fillgaze
[](https://travis-ci.org/tjmahr/fillgaze)
The goal of fillgaze is to provide helper functions for interpolating missing
eyetracking data.## Installation
You can install fillgaze from github with:
```{r gh-installation, eval = FALSE}
# install.packages("devtools")
devtools::install_github("tjmahr/fillgaze")
```## Package overview
This package was created in response to a very strange file of eyetracking data.
```{r example, message = FALSE}
df <- readr::read_csv("inst/test-gaze.csv")
```Here is the problem with this file:
```{r original data, fig.height = 3}
library(dplyr, warn.conflicts = FALSE)
library(ggplot2)ggplot(head(df, 40)) +
aes(x = Time - min(Time)) +
geom_hline(yintercept = 0, size = 2, color = "white") +
geom_point(aes(y = GazeX, color = "GazeX")) +
geom_point(aes(y = GazeY, color = "GazeY")) +
labs(x = "Time (ms)", y = "Screen location (pixels)",
color = "Variable")
```Every second or third point is incorrectly placed offscreen, indicated by a
negative pixel values for the gaze locations. It is physiologically impossible
for a person's gaze to oscillate so quickly and with such magnitude (the gaze
is tracked on a large screen display).We would like to interpolate spans of missing data using
neighboring points. That's the point of this package.
The steps to solve the problem involve:- [x] Converting offscreen values into proper `NA` values.
- [x] Identifying and describing gaps of missing values (streaks of
successive `NA`s).
- [x] Interpolating the values in a gap.
- [x] Writing tests to confirm that everything works as expected. :innocent:### Setting values in several columns to `NA`
We need to mark offscreen points as properly missing data.
`set_values_to_na()` takes a dataframe and named filtering predicates.
Here's the basic usage.```{r, eval = FALSE}
set_values_to_na(dataframe, {col_name} = {function to determine NA values})
```The values that return `TRUE` for each function are replaced with `NA` values.
For example, `set_values_to_na(df, var1 = ~ .x < 0)` would:* look for the column `var1` in the dataframe,
* check which values of `.x < 0` are true where `.x` is a placeholder/pronoun
for the values in `df$var1`,
* and replace those values where the test is `TRUE` with `NA`.```{r}
library(fillgaze)
original_df <- dfdf <- df %>%
set_values_to_na(
GazeX = ~ .x < -100,
GazeY = ~ .x < -100,
LEyeCoordX = ~ .x < -.1,
LEyeCoordY = ~ .x < -.1,
REyeCoordX = ~ .x < -.1,
REyeCoordY = ~ .x < -.1)# Before and after on some of the GazeX values
data_frame(before = head(original_df$GazeX), after = head(df$GazeX))
```Now, those offscreen points will not be plotted because they are `NA`.
```{r trimmed data, fig.height = 3}
last_plot() %+% head(df, 40)
```### Finding gaps in the data
We can use `find_gaze_gaps()` to locate the gaps in a column of data. This
function mostly is used internally. Users are not expectedly to routinely use
this function, but I cover it here because the function for filling gaps relies
on the data in this dataframe.```{r, width = 120}
find_gaze_gaps(df, GazeX) %>%
print(width = 120)
```Each row describes a gap in the column.
* `start_row` and `end_row` contain row numbers of nearest non-`NA` values.
* `na_rows` is the number of successive `NA`s in the gap.
* `start_value`and `end_value` contain the nearest non-`NA` values.
* `change_value` is the difference between `start_value` and `end_value`.The function also measure the duration of the gap (`change_time`). By default,
it uses row numbers (`.rowid`) to measure duration. We can use an explicit
column to use as the measure of time.```{r, width = 120}
find_gaze_gaps(df, GazeX, time_var = Time) %>%
print(width = 120)
```The function also respects dplyr grouping, so that e.g., false gaps are not found
between trials.```{r}
df %>%
group_by(Trial) %>%
find_gaze_gaps(GazeX)
```### Interpolating values in gaps
`fill_gaze_gaps()` will fill in the gaps in selected columns. We can set limits
on which gaps are filled:- `max_na_rows`: don't fill gaps with more than successive `NA` rows than `max_na_rows`
- `max_duration`: don't fill gaps with a duration larger than `max_duration`
- `max_sd`: don't fill gaps where the relative change in the variable is more than `max_sd` standard deviations in magnitude```{r}
df <- df %>%
group_by(Trial) %>%
fill_gaze_gaps(GazeX, time_var = Time, max_na_rows = 5)
```In this example, only `GazeX` has been interpolated. The median value is used.
We can compare the results with the `GazeY` column.```{r}
df %>% select(Time:GazeY)
````fill_gaze_gaps()` also works with the variable selection helpers from dplyr/tidyselect.
```{r}
df <- df %>%
fill_gaze_gaps(GazeX, GazeY, matches("EyeCoord"),
time_var = Time, max_na_rows = 5, max_sd = 2) %>%
ungroup()
df
``````{r interpolated, fig.height = 3}
was_offscreen <- (original_df$GazeX < -100) %>%
ifelse("Interpolated", "Original\nRaw Data")last_plot() %+%
head(df, 40) +
aes(alpha = head(was_offscreen, 40), shape = head(was_offscreen, 40)) +
scale_alpha_discrete(name = "Point", range = c(.3, 1)) +
labs(shape = "Point")
```