Hello! Guest! Please Login or Register! Log Out

Data Navigation, Visualization and Analysis


General Steps and Introduction: 

Analysis of a microarray experiments involved checking the general design of experiment, navigation and quality checking of the hybridizations, select qualifying genes or probe sets (filtering)  for further analysis,  and multivariate data analysis with unsupervised and supervised learning methods. Navigation and visualization at the probe set level or probe level usually occurs after probe set filtering.

For most users, the steps of using BarleyBase can be divided as:

  1. Log in to get access to all experiment open to you, select experiment and estimation methods.
  2. Data exploration, visualization, summary statistics and quality checking.
  3. Probe set (gene)  filtering to create gene lists for further analysis. Here a result gene list include both gene names and their expression in a given experiment.
  4. Gene list management and visualization.
  5. Data analysis -- Pattern Recognition
  6.  
    1. Data transformation
    2. Setup analysis
    3. Analysis results visualization
    4. Management of analysis results- save, deletion results. Save sub-clusters to data sets for further analysis.

In general, BarleyBase data analysis flow is designed to:

  • Organize all data navigation, visualization and analysis in coherent and united fashion.

  • Provide a general guideline for the step in microarray data analysis in Barleybase for users to go through the process step-by-step.

  • Let users save and manipulate gene lists resulting from filtering and from analysis.

  • The  gene lists are the starting point of pattern recognition analysis.

  • Analysis results can be saved, as links to webpage showing results, as well as analysis parameters.


1. Select an Experiment

 After log in, go to browse experiment page to search for interested experiments.  Otherwise, you may skip this experiment query step and simply choose one from the dropdown list.


2. Data Navigation and Visualization

Navigation and visualization for microarray data are available at different levels. This will help users choosing suitable parameters in gene filtering and analysis. It also aids in microarray experiment quality diagnostics.

2.1 At experiment level: 

  • Links are available on experiment overview page. The page also shows experiment description and list of hybridization.

  • Histograms of all perfect match probes (PM) intensity in all chips. Aiding quick inspection of signal distributions across hybridizations.

  • Boxplots of all PM intensity in all chips. Useful for checking distribution and outlier extent. Will also show mm side by side or on separate graph.

  • RNA degeneration plots and statistics. General trend in RNA preparation, cRNA synthesis quality.

  • Boxplots of normalized RMA or MAS 5.0 expression estimations in all chips. Useful for checking expression distribution and outlier extent after normalization.

2.2 At hybridization level:

  • Use hybridization overview page to show experiment description and list of hybridization details. The experiment factors and factor level information here is key for design statistical analysis. 

  • Hybridization overview page links to the hybridization detail pages and sample detail pages.

  •  Image of PM from full-detail page for visually checking global quality, and any spatial abnormality.

  •  Histogram of PM and MM intensities of the hybridization. Usually, we can expect PM distributed more at higher intensity zone.

  • Statistics for raw intensity data (PM and MM)

  • Statistics from Affymetrix MAS5.0 report, *.RPT files. It contains their quality measures.

  • Variation at probe level within treatment, summary and histogram?

  • On-demand scatter plots and MvA plots for any two hybridization or treatments. Better to be able to highlight DE genes when user click/mouse over the gene list. This uses visualization page, and will add JavaScript based interaction capabilities.

  • On-demand  MvA pairs plot matrix for all replicates per treatment, which shows the reproducibility among replicates.

All above-mentioned results, except for on-demand scatter plots and mva plots, are pre-computed and retrieved from database for speed consideration.  

2.3 At Gene List Level

  • Choose a  gene list from gene list management page. Lines-graphs and heatmaps are available for selected probe sets from the gene list.

2.4 At probe set (gene) level

  • Usually, user will come here after conducting probe set filtering or after searching.  User can also comes here from exemplar and probe set detail pages.

  • Line-graph is pre-computed for expression (MAS5 and RMA estimations) across hybridizations in an experiment.

  • Expression view (heatmap) of expression profile neighbors, which may represent co-regulated genes.

  • For an example, please view the Contig15950_at page.

2.5 At probe level

  • User can also comes here from exemplar and probe set detail pages by following probe set-level links.

  • Bar-plots with standard deviation shown, allows comparison of intensities across hybridizations for same probe, or across probe pairs for same hybridization.

  • For an example, click the "Get barplot for Contig15950_at" link at Contig15950_at page.


3. Gene or Probe Set Filtering,  and Identification of Differentially Expressed Genes  

This is the start point for real data analysis. It will let users select a subset of probe sets. Probe sets  are alternatively called as microarray elements, features or genes in literature. BarleyBase provides 3 types of filters: 

  • Expression-based expression profiles filters, which all use the un-log transformed data.

    • Experiment level: Rank of Coefficient of variation filter, ANOVA p-value filter, fold of change (FC) over reference treatment. Absolute value filter, Presence call filter and variation filter.

    • Subset level: statistical test filter, fold of change (FC) filter, value filter, presence call filter and variation filter in user-defined groups or hybridization.

    • More methods of statistical DE identification methods will be added. The method need be able to reduce Type II error, combined with FDR control.

  • Biological Context-Based —Sequence similarity ( will accept batch BLAST based query in future), keyword, gene family and pathway.

  • Arbitrary —Users can input their own list of probe set or exemplar names and retrieve the expression data for them. It will accept probe set names, exemplar names or  mixed types of names.

The results from the 3 types of filters can be combined through the "My Gene Lists" page .


To identify genes differentially expressed under different conditions from  microarray experiments, two methods are provided as part of the the expression-based filters:

  • Fold of change (FC) filter: Examine the fold of change of the expression levels of a gene (probe set)  between two conditions. If the ratio is above a predefined cut-off threshold (e.g. two- or four-fold change), these genes are declared to be differentially expressed, and are selected for further analysis. This approach is convenient, but is also problematic, because the cut-off value is arbitrarily set, and it is difficult to assess the rate of false positives (unchanged genes declared differentially expressed) and rate of false negatives (missed differentially expressed genes).
 
  • Statistical test filte: Currently, users can do regular parametric t-test and non-parametric Kruskal-Wallis Rank Sum test. More advanced statistical methods that can be used in conjunction with permutation tests to identify differentially expressed genes will be added when hardware allows.

4. Gene List Management 

Management of gene list:

  • After gene filtering, users may save the qualifying gene list  for future analysis.

  • After  analysis, users may save the dataset for a given sub-cluster for future use.

  • Gene lists are saved as probe set name lists, the parameters in filtering or analysis, and their expression values in an experiment.

  • Gene list management can be used for comparative analysis between experiments, between analysis methods, or between expression estimation methods.

A gene list include information for:

  • User ID and date,

  • Experiment ID and expression estimation method.

  • Filtering and analysis and parameters

  • Probe sets list (Limited to 4000 probe sets).

  • Each saved data sets can be renamed by user, and will be shown in a drop down list.

  • Registered users can save gene lists under their ID, which is hidden from other users.  For general users,  "guest" is used as ID. And the gene lists are visible and editable to all users. So registration is recommended for protecting your work.

  • Dataset will be kept for 10 days.

Creation of new gene list: 

  • Method 1. Using same genes or probe sets,  and retrieve expressions values for another experiment and estimation method. Create new gene list with same probe sets from existing data set,. This is most useful in cross experiment comparison on probe set behavior.

  • Method 2. Users may combine several gene lists using Boolean operators AND/OR/NOT/XOR/XOR2. This can be useful in comparing gene lists from different analysis results, or for construction of composite filer for genes.

Gene list deletion

  • Users may delete their data sets.

Visualization and simple analysis for gene lists:

  • Tables of annotation or expression.

  • Expression Profile Anti-Neighbor plot – TO DO.

  • Expression profile line graphs and heatmaps for user chosen genes from gene list.

Gene list data download:

  • Tab-delimited ASCII text files for annotation and expression.

  • Prepared data for TMEV, CLUSTER, GENECLUSTER, GENESPRING -- TO DO.


5. Data analysis -- Pattern Recognition

Unsupervised pattern recognition methods are implemented for online analysis. Methods includes hierarchical clustering k-means partition, SOM, PCA and Sammon's non-linear mapping. The analysis help page described each method in detail.

Analyses all start from the saves gene list.

  • Setup analysis

  • Data transformation

  • Analysis and results visualization

  • Management of analysis results- save, deletion, combination.

5.1 Setup analysis

Use the "Analysis Set Up" page. Users can select data set for analysis here, and run one or more analysis.

5.2 Data transformation

Transformation can be carried out  in the following specified orders,  by:

  • Log2 transformation.

  • Mean or median centering, after log2 transformation.

  • Scale by dividing by standard deviation of the probe set in experiment.

  • For Affymetrix GeneChip single-channel expression data, the centering need to be on log2 transformed data.

5.3 Analysis results visualization information

  • General information: analysis and parameters, and data sets name.

  • Cluster information: number of clusters, per cluster stat.

  • Expression graphs for Whole Gene List: Dendrogram and heatmap (hc), partition (PAM), grid (SOM), 3D scatter plot (PCA or Sammon's MDS).

  • Expression graph for sub clusters: Line-graphs and heatmaps per cluster.

  • Users may save probe sets in an interested clusters/partitions as new gene list for usage in:

  • Refined analysis

  • Method comparison: Compare with similar clusters obtained by other methods, or using other analysis parameters.

  • Construct new combined gene list.

5.4. Management of analysis results- save, deletion, combination.

  •   Users may delete the results, otherwise they will be kept for 2 weeks.

  •   View any one or more or all clusters from saved analysis.

  •   Combine associated data sets from selected clusters.

 

Copyright@2001-2005 The BarleyBase Group
All rights reserved.

For problems with the webpages, contact barleybasewebmaster