Hello! Guest! Please Login or Register! Log Out

Tutorial on Expression Profile Filters


Table of Contents

Introduction
1. Absolute Value Filter
2. MAS5.0 Call Filter
3. Variation Filter
4. Fold Change Filter
5. Statistical Test Filter
6. ANOVA Filter
7. Variation Rank Filter
8. Composite Filters Customized
9. Usage of Expression Profile Filters


Introduction

Microarray GeneChips has lot of probe sets analyzed at one time, usually over 20K genes  for Affymetrix oligonucleotide chips. Most genes will remain invariant within a single experiments. To analyze the high-dimension data effectively, the invariant genes need be removed, and only the differentially-expressed genes are retained for further analysis. This step is also essential to overcome the limitation of memory and computation time in analysis. In addition, some low expressed, or extremely highly expressed probe sets are data noise which is harmful to statistical data analysis. Such genes need be excluded from subsequent statistical analysis.

Most filters are designed to find the most variable genes. The variations are measured in different ways, so the filters are different, each aiming at a different aspect of variations. The "Absolute Value Filter" and the "MAS5.0 Call Filter" are mainly designed to remove genes not showing reliable expression.

Different data filters can be combined at the filter page before filtering, or at the "My Gene List" page after filtering.

Back to Top


1. Absolute Value Filter

Select probe sets whose absolute values  in desired ranges for option A: In all arrays from whole experiment, or option B:  In selected arrays. For Robust Muti-Array (RMA) estimations, use the log2  transformed values here.

This filter enables users to remove genes, or more precisely, probe sets whose expression is too low to be regarded as expressed, or too high to be regarded as specific. The low expression probe sets are highly variable. Inclusion them in analysis may introduce high noise to data set and bias the analysis. For extremely high expression values, there might be problems of cross-hybridization of other types of non-specific hybridization.

Back to Top


2. MAS5.0 Call Filter

It select probe sets declared as "Present" by Affymetrix MAS5.0 Suite for option A: In all arrays from whole experiment, or option B:  In selected arrays. MAS5.0 Suite determine the Presence/Absence call based on the absolute values and variations in the row probe-level intensities of a probe set.

This filter will remove probe sets that are regarded as not expressed (absent) by Affymetrix MAS5.0 in required number of hybridizations from the experiments. To fine-tune the filter, use option B, which let user to specify hybridizations where probe sets are required to be expressed.

Back to Top


3. Variation Filter

The variation filter selects probe sets whose absolute values change max- min>= x1 AND and ratio max/min >= x2 folds for option A: In all arrays from whole experiment, or option B:  In selected arrays.

This filter aims at finding genes with high variability across all (Option A) or selected hybridization (Option B), in terms of both absolute and relative variations. The absolute variation will ensure the selected genes showing both high relative variation  and reliable expression estimation, and excluding those with very low expression values, because the low expression estimations are not reliable.

Back to Top


4. Fold Change Filter

The fold change filter select probe sets where mean expression values of group A must Increase/Decrease/Both against that of group B by at least Y folds. The meaning of this filter is obvious. Users can use it to find probe sets (genes) with desired relative change between two groups. The groups can be specified flexibly according to their goal in data analysis.

Although some statistician cautious against the use of this filter, it may be very useful when being used with some intelligence. You can use it to get genes showing different expression between 2 hybridization, by assigning the two hybridizations to two groups. This filter can find genes showing credible expression change between treatments, but have large variation under both treatments. They maybe missed by statistical tests due to the large variance.

Check the "Fold Change Filter"  checkbox, define the two groups to be filtered. Enter the fold threshold by which expression of genes in one group should exceed that of the other group. Select "Increase" or "Decrease" to define which group should be more differentially expressed over the other. Select ‘Both’ to keep genes in which either group’s expression level exceeds the other by the defined fold threshold.

Back to Top


5. Statistical Test Filter

Select for genes with raw or adjusted FWER P-value in statistical test must <= p1 and satisfy FDR<= p2. The test is carried out on each gene from specified groups of hybridizations.

The statistical filters  performs parametric two-sample  or multiple-sample tests on user-defined groups of hybridizations. The groups maybe a collection of hybridizations share meaningful similarity in treatment. Due to the low number of replicates in microarray experiments, the assumptions of the tests are often violated. So it is advisable to interpret the results with caution.

For two-sample test, the two groups of T-Test can be assumed to have unequal variance (Welch T-Test) or  have equal variance ( Classical Student's  T-Test). The corresponding non-parametric method is Wilcoxon Rank Sum Test, which is also known as Mann Whitney Test. Local Pooled Error Test (LPE) is one of the proposed methods designed to compensate for the low replicate number in most microarray experiments. It divided the gene list ordered by gene expression values into quantiles, and calculate pooled variance for each quantile. This value is used as the variance for all genes in this quantile.

The parametric  one-way ANOVA analysis and its nonparametric counterpart, Kruskal-Wallis Rank Sum Test, are used for situations where multiple groups need be compared together.

Currently, all these tests assume that the hybridizations in each groups are from same random population. So please check if this assumption hold for your defined groups before using the tests.

Each chip has 22K probe sets for Barley1 or ATH1 GeneChips. So there exists problem of multiple testing, which must be corrected.  We offer adjusted p-values for simple multiple testing procedures using functions from Bioconductor's multtest package. The function computes adjusted p-values for simple multiple testing procedures from a vector of raw (unadjusted) p-values. The procedures include the Bonferroni, Holm (1979)  procedures for strong control of the family-wise Type I error rate (FWER), and the Benjamini & Hochberg (1995) and Benjamini & Yekutieli (2001) procedures for (strong) control of the false discovery rate (FDR).  Using FDR is more powerful, as it offers control of false discovery rate at desired level. The FWER methods are simply too conservative for most microarray data sets.
Back to Top


6. ANOVA Filter

This filter identifies probe sets with Select probe sets with differences among treatments. Each groups represents replicate hybridizations receiving the same treatment of experiment factors. One-way ANOVA is performed. The treatment (group) designations are listed with hybridization summary table at the bottom of expression profile query page.

Back to Top


7. Variation Rank Filter-Select the Probe Sets Most Variable in an Experiment

This filter select the X% of probe sets with the highest variation across hybridizations. Variation is measured with expression coefficient of variance (CV), which is independent of the scale, and unlike the ANOVA filter, not consider the group membership of individual hybridizations.
Back to Top


8. Composite Filters Customized

The composite filters combines power of several filters. One example use is to select genes with significant variation, as obtained from filters 3 to 8, and at the same time, meet requirements specified in filters i or 2.
Complex combination can be constructed with suitable usage of Boolean Operators "AND" or "OR". and define precedence with parentheses, i.e. (1 OR 2) AND (3 OR 7).
Back to Top


9. Usage of Expression Profile Filters

Search by Gene Expression Profiles:   User can select one experiment, and find probe sets showing desired expression profile in absolute value and variation using multiple filters.

1. Select "Expression Profile" item from the menu under "Analysis & Viz." --> "Create Gene List".

2. Select the experiment and expression estimation/normalization method from drop-down lists

3. Select one or more filters by checking the checkbox on the left of the filters. The "Multiple filter" or "Composite Filter" button must be checked if you want to run several filters in a time, otherwise, only the first checked filter will be used.

4. In "Composite Filter" page, customize a combined filter in textbox of filter 9 with Boolean operators. If not specified, each checked filter will be run separately. If specified, the combined filter will be run along with the separate filters.

5. Press "Run" button to conduct the query.

6. On the resulting page, choose ONE filter result you are interested in by checking the checkbox. Note: If more than one is checked, only the first checked one will be used.

7. Save to gene list,  download the results in tab-delimited formats by pressing the "Save" button.

8. To analyze the data set, follow step 9, at the result page, click  "Analyze" to do hierarchical clustering, K-Means partition, SOM, Sammon's multidimensional scaling (MDS) or principal Component Analysis (PCA).

Back to Top

 

Copyright@2001-2005 The BarleyBase Group
All rights reserved.

For problems with the webpages, contact barleybasewebmaster