The Neurophysiological Biomarker Toolbox (NBT)

# Part III: Fitting a VAR model to real EEG data

In the part II we learnt how to create a VAR model with a known topology but having otherwise random coefficients. In this part we are going to learn how to fit a VAR model to some real EEG data, i.e. how to create a VAR model whose coefficients are such that the dynamics of the model match as close as possible the dynamics of the EEG data.

The first thing that you need to do is to clear your workspace and close any figure that may still be open:

  close all;
clear classes;


Unless you have already done it, you should also run the command:

  import external.eeglab.eegplot;

### A real EEG dataset

The EEG data is stored in a .mat file located under your tutorial_sync directory. Let's load it into MATLAB's workspace:

load('eeg.mat');

where I have assumed that you are currently in the tutorial_sync directory. If you are not then you can either change your current working directory (using MATLAB's command cd) or especify the full path of file eeg.mat when calling to load.

The EEG is sampled at 1000 Hz so let's inspect the data with the command:

  eegplot(eeg, 'srate', 1000);


### Fitting a VAR model

Fitting a model to this EEG data is the same as saying that we are going to learn what coefficient matrices $B_\tau$ would generate VAR observations that look as close as possible to the EEG dataset. We can perform this learning task using the command:

  >> arfitObj = learn(var.arfit, eeg)
  arfitObj =
    var.arfit
Package: var
    Properties:
MinOrder: 1
MaxOrder: 15
OrderCriterion: 'sbc'
AsymptoticMinOrder: 1
DataMean: [7x1 double]
Coeffs: [7x77 double]
ResCov: [7x7 double]
Residuals: [7x23258 double]
Order: 11
NbDims: 7

Notice that variable arfitObj is not of class dynamics.var but of a class called var.arfit:

  >> whos
Name          Size                 Bytes  Class        Attributes
    arfitObj      1x1                1307335  var.arfit
eeg           7x23258            1302448  double                  

The reason for this is that var.arfit objects are used to _estimate_ VAR models from data while dynamics.var objects are simply used to generate artificial VAR models of known characteristics.

##### Minimum and maximum model order

Properties MinOrder and MaxOrder of object arfitObj are the minimum and maximum model orders considered by the VAR estimation algorithm. Property OrderCriterion indicates that Swartz Bayesian Criterion (SBC) was used to automatically select, between the minimum and maximum allowed orders, the model order that matched best the EEG data. Property Order shows that the SBC criterion decided that the correct model order was p=11.

##### Model innovations (aka residuals)

Let's plot our EEG data together with the innovation process of its VAR model:

  eegplot(eeg, 'srate', 1000,'data2', arfitObj.Residuals);


Notice how the innovation process of the model is now stored in a property called Residuals. The reason is that most VAR estimation algorithms estimate the model coefficients by minimizing the variance of the innovation process. Thus the name of Residuals, because it is the residual that is left after fitting the VAR coefficients to the data.

### Is the model fit good enough?

You may be tempted to think that the smaller the variance of the residuals (i.e. of the estimated innovation process) the better the model fit. However, you would be wrong. Remember that when we defined the VAR model we did not say anything about the variance of the innovation process. That is, in the model equation:

where $h(t)=[h_1(t),...,h_d(t)]$ we did not say anything about the variance of $h_1(t), h_2(t), ...$. The only thing that we did say was that the innovation process is an independent and identically distributed (i.i.d.) process. This means that $h_1(t)$ is completely unrelated (independent from) $h_1(t+\Delta)$ for any $\Delta\ne 0$. And the same should hold for the other components of the innovation process $h_2(t), ...,h_d(t)$. An obvious consequence of this requirement is that the individual innovation processes should not be autocorrelated. We can test whether this model requirement is fulfilled by plotting their corresponding autocorrelation functions:

for i = 1:7

      figure;
misc.acf(arfitObj.Residuals(i,:),50);
end

which should create these figures (click on them to zoom in):

The dashed-dotted lines mark the 95% confidence interval for the autocorrelation of an i.i.d. Gaussian process with the same number of data samples. Ideally, you would like all the autocorrelation values to be within those two lines (points outside of that margin are marked with asterisks). Indeed, it seems that the fit is not perfect but it will (almost) never be when you work with real data. You can get a qualitative idea of how good your model fit is by comparing the autocorrelation functions of the model residuals with the autocorrelation of the EEG data that you are attempting to model:

for i = 1:7

      figure;
misc.acf(eeg(i,:),50);print('-dpng', ['eeg-acf-' num2str(i)]);
print('-dpng',['eeg-acf-' num2str(i)]);
end

which will generate the figures:

As you can see, the autocorrelations of the original EEG data are much larger than the autocorrelations of the model residuals. Therefore you may say that the model fits your data reasonably well. But to really consider it a good fit we also have to take into consideration how large our model model order is relative to the amount of data that we used to _learn_ the model coefficients. The question below is related to this issue.

### A couple of questions for you

What will happen with the autocorrelation functions of the model residuals if you increase or decrease the model order? Will the residual autocorrelations increase or decrease? Why?

##### Hint:

You can force the VAR estimator to use a especific model order (instead of the order chosen by the SBC criterion) by learning the model with the following command:

  arfitObj = var.arfit('MinOrder', 30, 'MaxOrder', 30); % will use order 30
arfitObj = learn(arfitObj, eeg);

### What now?

Now you are ready to go to Part IV where we will use the model that we just fitted to our EEG data to generate EEG surrogates with a known information flow topology. Then, you will challenge a fellow student to discover the underlying topology that you generated.