Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/cbrnr/bci-event-2021


https://github.com/cbrnr/bci-event-2021

eeg python tutorial

Last synced: about 1 month ago
JSON representation

Awesome Lists containing this project

README

        

# EEG analysis with MNE-Python and MNELAB
## Description
In this workshop, we will analyze EEG data in Python. We will use [MNE-Python](https://mne.tools), which is currently the most popular Python package for EEG/MEG analysis. In addition, we will showcase how simple tasks can be performed with [MNELAB](https://github.com/cbrnr/mnelab), a graphical user interface for MNE-Python. Using real-world EEG data, we will investigate both induced and evoked activity. Whereas ERD/ERS is used to quantify induced activity in specific frequency bands, we perform simple averaging over epochs to compute event-related potentials (ERPs). We will learn how to perform both types of analyses in this workshop. Some experience with Python is useful, but not required to follow along in this workshop, because we will start from scratch and learn how to set up a working Python environment for EEG analysis.

## Setting up Python
### Installing Python
Installing Python is simple:

- On Windows and macOS, use the [official installers](https://www.python.org/) (make sure to add Python to the path).
- On Linux, use your package manager (you probably have Python already installed).

### Managing packages
This gives us a bare-bones Python environment with hardly any packages installed. We can use the [`pip` command line tool](https://pip.pypa.io/en/stable/) to manage Python packages. Let's find out which packages are currently installed. Open a terminal and enter the following command:

pip list

This list will probably be very short and only include `pip` and `setuptools`, two essential packages that are shipped with Python. It is likely that these packages are also already outdated. You can get a list of all outdated packages with:

pip list --outdated

If there are outdated packages, you can upgrade each package individually. For example, assuming that `setuptools` is outdated, you can update it with:

pip install --upgrade setuptools

### Packages for EEG analysis
Now we need to install packages that allow us to perform EEG analysis. We use `pip install` followed by the package name(s) we would like to install. Specifically, we need the following packages:

pip install mne mnelab pyxdf scikit-learn ipython

Of course, `mne` and `mnelab` are required for MNE-Python and MNELAB. We also need `pyxdf` to add support for reading [XDF](https://github.com/sccn/xdf/wiki/Specifications) files and `scikit-learn` for training a classifier. The `ipython` package installs [IPython](https://ipython.org/), an enhanced interactive Python interpreter which is much more convenient to use than the default `python` interpreter.

## Setting up Visual Studio Code
A good editor is essential for writing Python code. Although you can use your favorite editor, this workshop uses Visual Studio Code with its Python extension.

### Installing Visual Studio Code
Head over to the [Visual Studio Code website](https://code.visualstudio.com/), download the latest version for your platform, and hit install.

### Installing the Python extension
Start Visual Studio Code and install the Python extension (available in the Extensions section in the left sidebar). After that, open the command palette (CtrlShiftP on Windows/Linux and ShiftP on macOS) and type in "linter". Select the "Python: Select Linter" command and choose "flake8". After that, a confirmation popup should appear, and you need to click on "Install" (this runs `pip install flake8` in the background, which you could also do manually on the command line if you want).

Now we have everything we need to analyze EEG data! In Visual Studio Code, open a new terminal with the command "Terminal: Create New Terminal" (or "View: Toggle Terminal") and run `ipython`. We will use this interactive interpreter window to run our code.

## ERD/ERS analysis with MNE-Python
### Loading the data
The example motor imagery (MI) data set we will be analyzing is available in multiple [XDF](https://github.com/sccn/xdf/wiki/Specifications) files. MNE-Python does not support this file format out of the box, but we can use the [pyxdf](https://github.com/xdf-modules/pyxdf/tree/main) package and MNELAB to import the data.

```python
from pyxdf import match_streaminfos, resolve_streams
```

First, let's list all streams contained in a specific XDF file:

```python
resolve_streams("MI_BCI2021_AK_01.xdf")
```

In this particular case, there are only two streams. The first stream (with a `stream_id` of `1`) contains 35 EEG channels (sampled at 500Hz), whereas the second stream (with `stream_id` of `2`) contains markers.

Although each stream ID is unique within a given XDF file, this might not be the case across multiple files. Indeed, the EEG stream is associated with a `stream_id` of `2` (not `1` as before) in the third example file:

```python
resolve_streams("MI_BCI2021_AK_03.xdf")
```

We will use the `read_raw_xdf()` function from MNELAB to import XDF files and convert the data to a standard format that MNE-Python understands (a so-called [`Raw` object](https://mne.tools/stable/generated/mne.io.Raw.html) which represents continuous data).

```python
from mnelab.io.xdf import read_raw_xdf
```

However, we need to specify the stream ID we would like to load, for example:

```python
raw = read_raw_xdf("MI_BCI2021_AK_01.xdf", stream_ids=[1])
raw = read_raw_xdf("MI_BCI2021_AK_03.xdf", stream_ids=[2])
```

This is cumbersome if all we really want to import is the EEG stream (and we don't care about its stream ID). In this case, we can use `pyxdf.match_streaminfos()` to automatically query the XDF file for the stream ID of the first EEG stream:

```python
streams = resolve_streams("MI_BCI2021_AK_01.xdf")
stream_id = match_streaminfos(streams, [{"type": "EEG"}])[0]
raw = read_raw_xdf("MI_BCI2021_AK_01.xdf", stream_ids=[stream_id])
```

Note that the marker stream is automatically available as annotations associated with the `annotations` attribute:

```python
raw.annotations
```

Here's what the different annotation values mean for this particular file (this information is not standardized and needs to be retrieved from the documentation or someone familiar with the particular recording):
- 1: trial start
- 2: arrow pointing left
- 3: arrow pointing right
- 4: arrow pointing down
- 8: trial end

Other metadata associated with the `raw` object is available in the `info` attribute:

```python
raw.info
```

Before we continue to inspect the data, let's load and concatenate all four example files into a single `Raw` object:

```python
import mne

raws = []
for fname in ("MI_BCI2021_AK_01.xdf", "MI_BCI2021_AK_02.xdf",
"MI_BCI2021_AK_03.xdf", "MI_BCI2021_AK_04.xdf"):
stream_id = match_streaminfos(resolve_streams(fname), [{"type": "EEG"}])[0]
raws.append(read_raw_xdf(fname, stream_ids=[stream_id], fs_new=500))

raw = mne.concatenate_raws(raws)
del raws
```

Now `raw` contains data from all four files. Note that we pass `fs_new=500` to resample all EEG streams to exactly 500 Hz. This is necessary, because XDF stores EEG (and other time series) with time stamps and does not rely on an a prior fixed sampling frequency.

### Preprocessing
Concatenating our example files introduced new `"BAD boundary"` and `"EDGE boundary"` annotations:

```python
raw.annotations
```

We can count the occurrence of each annotation type:

```python
import numpy as np

np.unique(raw.annotations.description, return_counts=True)
```

These new annotations indicate the boundaries between the original data, and we can safely ignore them in our analysis (meaning that we do not have to remove them explicitly).

We already saw that the data contains 35 channels. Let's take a look at the associated channel names:

```python
raw.info["ch_names"]
```

The last three channels are labeled `ACC_X`, `ACC_Y`, and `ACC_Z`, which implies that they contain acceleration data and not EEG. Therefore, we will remove them from further analysis:

```python
raw.drop_channels(["ACC_X", "ACC_Y", "ACC_Z"])
```

The reference used in this recording is FCz, which is not part of the data channels (because by definition it is always equal to zero). Let's add this reference channel to the data, which will be useful when we later re-reference to other channels:

```python
raw.add_reference_channels("FCz")
```

Because we have standardized channel labels, we can assign a so-called montage (which contains channel locations in 3D space). We can use the built-in `"easycap-M1"` montage for this purpose:

```python
raw.set_montage("easycap-M1")
```

Here's a visualization of all channel locations on a cartoon head:

```python
raw.plot_sensors(show_names=True)
```

Let's now re-reference to the average of TP9 and TP10:

```python
raw.set_eeg_reference(["TP9", "TP10"])
```

Just as a sanity check, the average of these two channels (which we just set as the new reference) should be zero:

```python
np.allclose(raw.get_data(["TP9", "TP10"]).mean(axis=0), 0)
```

Let's take a look at the continuous EEG data:

```python
raw.plot(n_channels=33)
```

We could use this browser to manually mark segments containing artifacts, but we'll skip that in the interest of time. Luckily, the data looks pretty clean anyway.

### Epoching
Next, we are going to epoch the data around events of interest (2, 3, and 4). First, we have to create events from existing annotations (these are two very similar concepts, but mostly for historical reasons creating epochs requires events and does not work with annotations).

```python
events, _ = mne.events_from_annotations(raw)
```

Events are represented as a NumPy array with three columns. The first column contains event onsets, the second column is almost always not interesting, and the last column contains event types. Because we are only concerned with event types 2, 3, and 4, we filter these events with a standard indexing operation:

```python
events = events[np.in1d(events[:, 2], (2, 3, 4)), :]
```

We should have 120 events, three events per epoch:

```python
events.shape
```

Using this event array, we can take the continuous data and create epochs ranging from 2 seconds before to 6 seconds after each event. In other words, an epoch comprises data from -2 to 6 seconds around an event. In addition, we will only consider three channels C3, Cz, and C4 in our analysis.

```python
tmin, tmax = -2, 6
epochs = mne.Epochs(
raw,
events,
dict(left=2, right=3, feet=4),
tmin,
tmax,
picks=("C3", "Cz", "C4"),
baseline=None,
preload=True
)
```

### ERD/ERS maps
Finally, we compute time/frequency ERD/ERS maps using `tfr_multitaper()` as follows:

```python
import matplotlib.pyplot as plt
from matplotlib.colors import TwoSlopeNorm
from mne.time_frequency import tfr_multitaper

freqs = np.arange(2, 31)
baseline = -1.5, -0.5
vmin, vmax = -1, 1.5
cnorm = TwoSlopeNorm(vmin=vmin, vcenter=0, vmax=vmax)
for event in epochs.event_id: # for each condition
fig, axes = plt.subplots(1, 3, figsize=(15, 5))
tfr, itc = tfr_multitaper(epochs[event], freqs=freqs, n_cycles=freqs)
tfr.crop(-1.5, 5.5).apply_baseline(baseline, mode="percent")
itc.crop(-1.5, 5.5)
tfr.plot(
vmin=vmin,
vmax=vmax,
title=event,
axes=axes,
cmap="RdBu",
cnorm=cnorm,
show=False
)
for i, ax in enumerate(axes):
ax.set_title(epochs.info["ch_names"][i])
plt.show()
```

This includes frequencies from 2 to 30 Hz. The baseline interval ranges from -1.5 to -0.5 s. To avoid boundary effects, we crop the original time interval (-2 to 6 s) to -1.5 to 5.5 s. We also set the color range to values from -1 (maximum ERD) to 1.5 (maximum ERS). The `cnorm` object makes sure that white (the center color in the color map `"RdBu"`) is mapped to the value 0 (neither ERD nor ERS).

### Classifying motor imagery
In BCI applications, brain activity needs to be classified in real-time. We will try to train a simple classifier on our example data using only two of the three motor imagery conditions, namely left hand versus feet. We will create our feature matrix `X` and label vector `y` from epoched data, this time retaining all channels:

```python
epochs = mne.Epochs(
raw,
events,
dict(left=2, feet=4),
tmin,
tmax,
baseline=None,
preload=True
)
X = epochs.get_data()
y = epochs.events[:, 2]
```

As an instructive example, we will use CSP on bandpass-filtered epochs in combination with a logistic regression classifier. Note that although CSP has been shown to work well with motor imagery data, there are improved methods that might yield better results. Furthermore, we will not tune any (hyper)parameters here, but simply use some default values. Finally, as we could already observe in the ERD/ERS maps, the imagery conditions do not seem to be particularly distinct from each other (but this is not surprising because patterns vary considerably across individuals).

With that out of the way, let's create a pipeline consisting of a bandpass filter (between 8–30 Hz), PCA (which retains the 30 largest components), CSP, and logistic regression. We are going to need the following imports:

```python
from sklearn.pipeline import make_pipeline
from sklearn.linear_model import LogisticRegression
from sklearn.decomposition import PCA
from mne.decoding import (
CSP,
FilterEstimator,
UnsupervisedSpatialFilter,
cross_val_multiscore
)
```

```python
clf = make_pipeline(
FilterEstimator(epochs.info, 8, 30),
UnsupervisedSpatialFilter(PCA(30)),
CSP(),
LogisticRegression()
)
```

Now we can train and evaluate this pipeline in a 10-fold cross-validation scheme:

```python
scores = cross_val_multiscore(clf, X, y, cv=10)
```

The array `score` contains the classification accuracy for each fold, and we could compute the mean as follows:

```python
scores.mean()
```

A mean cross-validation accuracy of around 72% is not too shabby given that we used default parameters everywhere.

## ERD/ERS analysis with MNELAB
Alright, let's try to reproduce this workflow with the graphical user interface of MNELAB. Type `mnelab` or `python -m mnelab` in a terminal to start MNELAB.

### Loading the data
The main window looks pretty empty initially. In fact, almost all commands are disabled until you load a data set. Let's start with loading the first example file. Click on the "Open" icon in the toolbar or select "File" – "Open..." and select the file "MI_BCI2021_AK_01.xdf" in the dialog window. Because XDF files can contain multiple streams, another dialog window appears listing all streams contained in the file (along with some basic information such as the name, type, number of channels, data format, and sampling frequency). Only the EEG stream can be selected (marker streams are automatically imported), so click on "OK".

Now the MNELAB main window shows information on the currently active file (which is highlighted in the sidebar on the left).

Let's now repeat this process for the remaining three example files. When you are done, you should see these four files in the sidebar (remember that only one file is active at a time).

We will now concatenate these data sets. First, let's select the first file in the sidebar. Then, select "Edit" – "Append data...", drag the three listed data sets from "Source" to "Destination", and confirm with "OK". A new data set in the sidebar appears named "MI_BCI2021_AK_01 (appended)". Let's rename it to "MI_BCI2021_AK" by double-clicking on the entry in the sidebar and editing the name accordingly.

Since we don't need the four individual example data sets anymore, we can select and close each one of them ("File" – "Close").

### Preprocessing
Next, we drop the three acceleration channels by selecting "Edit" – "Pick channels...". In the dialog window, we select only those channels that we would like to pick. Or rather, since all channels are already selected we deselect the last three channels by holding down Ctrl (on Windows and Linux) or (on macOS) and clicking on each channel. After confirming with "OK", we are asked if we want to overwrite the existing data set – let's click on "Yes" (we don't need the previous data set anymore).

The data set name changed to "MI_BCI2021_AK (channels dropped)" – if you don't like the name, feel free to change it!

Next up is re-referencing the data. First, we will add the current reference FCz as a normal channel and then reference all channels to the mean of TP9 and TP10. To do this, select "Edit" – "Set reference...". Select "Channel(s)", enter "FCz", and hit "OK" (overwriting the existing data set). Second, open the same dialog again and enter "TP9,TP10" in the "Channel(s)" field. Now the data is referenced to the average of those two channels.

Because we have channel labels according to the standardized extended 10–20 system, we can assign channel locations to these labels. Click on "Edit" – "Set montage..." and select "easycap-M1". To confirm if the assigned locations are correct, click on "Plot" – "Channel locations" to create a two-dimensional cartoon.

### Epoching
Our next step is to epoch the data. Historically, epoching requires events, but our data contains only annotations. Since events and annotations are two implementations of the same underlying idea, we can convert between these representations. In our case, we can select "Tools" – "Create events from annotations", which adds 360 events to the data. We can use these events to create epochs by selecting "Tools" – "Create epochs...". In the dialog window, we select events 2, 3, and 4, enter the values -2 and 6 in the fields labeled "Interval around events", and deselect "Baseline correction". Because we are only interested in C3, Cz, and C4, we then pick these channels using "Edit" – "Pick channels...".

### ERD/ERS maps
Using these epochs, we can compute ERD/ERS maps by clicking on "Plot" – "ERDS maps...". In the dialog window, we enter 1 Hz to 31 Hz for the frequency range (step size 1 Hz), -1.5 s to 5.5 s for the time range (because we want to crop 0.5 s from the start and end to avoid boundary effects), and -1.5 s to -0.5 s for the baseline. This creates ERD/ERS maps for each event type (2, 3, and 4 in our case).

## ERP analysis with MNE-Python
### Loading the data
This time, our data set is stored in the Brainvision file format, which consists of three files (.eeg, .vhdr, and .vmrk). The .eeg file contains the EEG data, the .vhdr header file contains metadata such as channel names and sampling frequency, and the .vmrk file contains markers. We will use the function `mne.io.read_raw_brainvision()` to import the data, and we pass the name of the .vhdr file:

```python
raw = mne.io.read_raw_brainvision("VisualOddball_BCI2021_AK_01.vhdr")
```

### Preprocessing
Let's take a look at some properties of the data. We can get a quick overview by inspecting the `info` attribute and calling the `describe()` method:

```python
raw.info
raw.describe()
```

We can see that the data consists of 31 EEG channels, and sensor locations have been added automatically (because the `dig` field is populated). We also learn that the data was lowpass-filtered at 140 Hz, but apparently no highpass filter was used. The sampling frequency is 500 Hz. Let's check out the sensor locations (notice that signals are referenced to Cz, which is not part of the data array and therefore this channel also doesn't show up in the montage plot):

```python
raw.plot_sensors(show_names=True)
```

The paradigm is a classic visual oddball paradigm with frequent (the letter "o") and rare (the letter "x") visual stimuli. The task of the participant was to count the rare letters "x". This should elicit a P300, which we will try to visualize in this analysis. Corresponding markers should already be available in the `annotations` attribute:

```python
raw.annotations
```

Here, "Stimulus/S 1" corresponds to the frequent event "o", whereas "Stimulus/S 2" corresponds to the rare event "x". We do not need the first annotation (which indicates the start of the paradigm), so we delete it:

```python
raw.annotations.delete(0)
raw.annotations
```

Before we epoch the data, let's apply a 0.1 Hz highpass filter to remove slow drifts. Because MNE does not load the data into memory by default, we need to explicitly do that now (or pass `preload=True` when loading the file):

```python
raw.load_data()
raw.filter(0.1, None)
```

In the interest of time, we skip the artifact correction step here, but the continuous data looks pretty nice already:

```python
raw.plot()
```

### Epoching
Next, we'll create epochs from -200 ms to 800 ms relative to stimulus onset. This step also includes baseline correction (the time segment from -200 ms to 0 ms). Epoching requires events, which we need to generate from the existing annotations:

```python
events, event_id = mne.events_from_annotations(raw)
```

In addition to `events`, we also get a mapping of annotation names to event types in `event_id`:

```python
event_id
```

However, we can take this opportunity to create epochs with more descriptive event names. Instead of passing `event_id` to the `Epochs` initializer, we use the following more informative mapping:

```python
epochs = mne.Epochs(raw, events, dict(o=1, x=2), tmin=-0.2, tmax=0.8, preload=True)
```

Let's take a look at this object:

```python
epochs
```

This looks correct, we have 120 frequent and 30 rare events. Let's plot the epoched data:

```python
epochs.plot(events=events)
```

Now we should perform another round of artifact correction and drop epochs with large artifacts. To demonstrate one option how to find such bad epochs automatically, we will drop all epochs that contain signals larger than 150 µV (peak to peak):

```python
epochs.drop_bad(reject=dict(eeg=150e-6))
```

### Averaging
We are now ready to average epochs within the two conditions to compute evoked potentials using the `average()` method on a subset of epochs:

```python
frequent = epochs["o"].average()
rare = epochs["x"].average()
```

There are several plotting methods for these evoked objects, for example:

```python
frequent.plot(gfp=True)
rare.plot_joint()
```

Let's focus on channel Pz and visualize both conditions in one plot:

```python
mne.viz.plot_compare_evokeds(dict(rare=rare, frequent=frequent), picks="Pz")
```

This shows a nice P300 for the rare condition. To improve this visualization further, we could add a 95% confidence interval around each evoked time course:

```python
mne.viz.plot_compare_evokeds(
dict(rare=list(epochs["x"].iter_evoked()), frequent=list(epochs["o"].iter_evoked())),
picks="Pz"
)
```