in this demo the following features in latentix will be demonstrated addition of variables from...

In this demo the following features in LatentiX will be demonstrated

• addition of variables from external files via the clipboard• renaming variables• deleting variables• handling category variables• colouring plots by variables and sets• creating calibration- and validation-sets (using “Set composer”)• creating object- and variable sets (using “Create sets”)• variable selection (Principal Variables)• making predictions• plotting the prediction results• transferring results (tables and plots) to reports

and more ...

The demodata is avaible from the internet:

The original paper

See also:http://home.ccr.cancer.gov/ncifdaproteomics/ppatterns.asp

http://home.ccr.cancer.gov/ncifdaproteomics/ppatterns.asp

Comments to the paperThere might be some problems with the experimental design ...

Respons to comments

From: http://www.mathworks.com/products/bioinfo/demos.html?file=/products/demos/shipping/bioinfo/mspreprodemo.html

With the MATLAB® Bioinformatics Toolbox® the data are pre-processed:

Let’s look at the data in LatentiX:

http://www.mathworks.com/products/bioinfo/demos.html?file=/products/demos/shipping/bioinfo/mspreprodemo.html

Load the dataset from the “Demo datasets” menu.

Note: the number of available datasets can vary

The dataset consist of 216 objects and4000 measured variables.

The category variable is found in a separate Excel-file, and is added to the instrumental data.

Open the Excel-file, mark the range and select Copy

Return to LatentiX and select“Add variables from clipboard ...”

Open the Excel-file: Ovarian_cancer.xls

Change the numeric value corresponding to “Normal” to “0” (zero)

Because some of the imported data are non-numeric, you can automatically create a category variable

Give the variable a better name:

NOTE: Due to a bug in LatentiX, you have to import at least two variables, and then delete the unnecessary variable afterwards.

Delete the variable “obj. no.”:

We now have 216 objects and4001 variables, the last one being the category variable “Cancer”.

It’s a good time to save the data on the disk – here the data is saved in the folder “C:\temp\My Latentix files”

De-select the variable “Cancer” by holding the Ctlr-button down while clicking on its name in the variable list box.

Next autoscale the instrumental variables.

The plot is immediately updated

Select PLS as model type

Click on “Y”

Select “Cancer” as the Y-variable

Click “Calculate” Choose CV: Random (repeated)as validation method

Give the model a good name: click on “Name”

Let’s have a look on the scores-plot

It’s a good idea to take a note!

There seems to be some discriminating power in the 4000 variables

We now create two object sets “Healthy” and “Sick”

Select“Color according to”“Continous” and select“Cancer”

Go to the workbench and select Objects, Create sets (shortcut: ALT+O, C):

In “Criteria 1” select “Where Cancer == 0”click “Create sets” andgive the set a name

“Cancer == 0” is suggested, but change it e.g. to “Healthy” and click OK

In “Criteria 1” now select “Where Cancer == 1” and follow the same procedure ...

We look at the scores again now colouring by the two sets:

We get - of cause - the exact same plot, but the legendsare more meaningful

We have used all 216 objects and all 4000 variables until now.

To avoid over-fitting when selecting “good” subsets of variables, we will split the objects into a calibration- and a validation set.

For that purpose, we use the “Set composer ...”

Select“Color according to”“Sets” and select the two new sets “Healthy” and “Sick”

Go to the workbench and select Objects, Set composer (shortcut: ALT+O, O):

Sort by “Data value”

Select “Cancer”

Right-click in “Search result”

NOTE: You might have to selected “Sort method: Alphabetic” once and then again “Sort method: Data value” to get the shown picture

Click “Exit”

Calculate the Principal Variables.

It is very important, that this is basedon the calibration set only – beware ofover fit!

Select matrix X to enable the menu “Principal Variables” (only available in the full version)

Select the 16 variables, whichare most descriptive for “Cancer”.

These 16 variables describe 90%of the total variation.

Click “Select in workbench”and close the window

When the “Principal variables” window is closed,the 16 important variables are highlighted in the variables box.

It is convenient to define a set of these variables.

Select: Variables, Define set ...

in the workbench and write a name, e.g. “PV-16”

Select “Color by Cancer”

Calculate a new PLS-model using only the calibration set “CAL” (162 objects) and only the 16 principal variables “PV-16”.

Use the same settings (validation method etc.) as before and give the model a good name.

Choose Plot, Scores to see the scores plot

The subset of 16 principal variables gives a better discrimination between sick and healthy people than did the 4000 original variables.

A lot of noisy and irrelevant variables have been removed.

We will now test the model on the independent objects. I.e. objects that have not participated in the selection of variables nor in the PLS-model.

Go to the workbench and make a prediction of the set “VAL” using the PLS-model based on the set “CAL”:

Select “Options, Lines on selected sets” to emphasise the grouping

1

2

3

4

5

A clear distinction between sick and healthy is also seen among the independent validation objects.

Thus, the selected variables are interesting and could maybe be worth a closer study.

You might want to make plots of loadings or regression coefficients and copy it to the Windows clipboard or directly to PowerPoint – see the next slides:

-0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8

-0.5

-0.4

-0.3

-0.2

-0.1

0

0.1

0.2

0.3

0.4

Cancer

Var #2332

Var #3734

Var #1891

Var #1045

Var #2734

Var #3502

Var #2162

PLS Loadings [Model 2: PLS on CAL using 16 principal varia ...]

Loadings PC#1 (14.016%)

Var # 919

Var #2036

Var #3136

Var #2588

Var #2351

Var #3474

Var # 905

Var #2762

Var #2814

Load

ings

PC

#2 (

15.8

81%

)

Make a Loadings plot and select “Tables, Current plot” (ALT+T, C). Paste into Excel.

Select “Plots, Copy plot to PowerPoint” (ALT+P, O)

or “Plots, Copy plot to Clipboard” (ALT+P, D)

You could also look at the regression coefficients (the plot is pasted into Excel from the clipboard):

THE END

Note, that the group Healthy is measured at day #1, whereas the group Sick is measured at day #2 and #3.

Thus, we can not be tell, whether the revealed effects are due to human disease or to changes in the instrument - this is a problem.

The principles shown in this demo are, however, valid.

in this demo the following features in latentix will be demonstrated addition of variables from...

Documents