Hello! Guest! Please Login or Register! Log Out

Functionalize Expression (FuncExpression)

Lishuang Shen


1. Introduction:

FuncExpression is a web-based resource for functional interpretation of  high-throughput genomics data. FuncExpression focus on two way integration of PLANT gene functional information and the large scale gene expression data. Major Animal and Fungi species are supported.

FuncExpression consisted of 2 major function classes.

 (1). Expression2Function aims to provide gene function information to the gene lists, and allows further cross-validation with gene expression data. from BarleyBase/PLEXdb. The gene lists are obtained from microarray and other non-microarray genomics experiments.  (2). Function2Expression retrieve plant gene expression profiles according to gene functional annotations.  Both function classes are fully integrated with our microarray data numerical analysis tools when applicable.

The gene function information include the well-structured gene ontology classification, InterPro functional domain prediction/annotation, metabolic pathways and gene family information.

In addition to interpreting microarray data, FuncExpression is a general purpose tool for  functional comparison of other types of PLANT, FUNGAL, and ANIMAL gene name lists generated from genomics, proteomics, or EST projects. This module can be used independent of microarray data.

2. Gene Functional Annotations and Preparation

(1). Gene ontology: FuncExpression supports 4 types GO annotations, totaling 138 databases, compiled by BarleyBase. This is the major component of FuncExpression.

Microarray:


All Affymetrix human, animal and Prokaryotic gene expression analysis arrays listed at Affymetrix Support Page are supported with Affymetrix Gene Ontology annotations.

All Affymetrix plant arrays except for Citrus are supported.  with BarleyBase in-house annotations, based on TAIR, Gramene, GOA and Uniprot annotations. 22K ATH1 (TAIR), 8K AG (TAIR) and 57K Rice (GRAMENE) are based on direct mapping to same species proteins. Otheras are transitive annotations based on GOA sequence similarity between GeneChip exemplars and UniProt protein entries. Users can choose stringent (Expect <= 1e-20 in BLASTX) or loose (Expect <= 1e-5 in BLASTX) annotations. Stringent annotation is recommended. These include Barley1 (BarleyBase), 16K Grape (TIGR), Maize 18K, and 61K Soybean  (BarleyBase) GeneChips.

NSF 58K maize and 20K rice spotted arrays and 18K fungal Fusarium is also annotated with the second method;

Protein:

All 33 species listed at Gene Ontology Consortium GO Annotation page  Annotation page, including Arabidopsis (TAIR, TIGR, and GOA ), Rice (Gramene), Pseudomonas syringae DC3000, budding yeast, fission yeast. Animals including human, mouse, rat, C. elegans, fly and zebrafish etc. The annotations are current as of January 2006;  

Gene Index: 

Electronic annotations for 16 plant species (TIGR), 9 fungi species (TIGR), current of March 2005; 

ESTs and cDNAs:

Electronic annotations for 16 plant species (BarleyBase), 9 fungi species (BarleyBase). Using GenBank Accession numbers as input. They are transitive annotation based on TIGR Gene Index annotations and the membership of ESTs in Gene Index, current of March 2005.
Notice: This input type can not be meaningfully compared for p-values due to the highly redundant nature of the IDs. Use it only for roughly assigning sequences into GO classes.

(2). Metabolic pathways:  annotation for proteins of 14 species are downloaded from KEGG. 56 Affymetrix platforms for Barley1, human, mouse and other species are are electronically annotated by BarleyBase based on KEGG annotation by same species mapping to exemplar matching proteins, or by sequence similarity to model species genes associated with pathways.

(3). Gene family (Discontinued): annotation for Arabidopsis (TAIR).

(4). InterPro functional domain (Discontinued): annotation for Arabidopsis proteins (TAIR).

3. Expression2Function -- Interpreting Expression Profiles Under Gene Function Context

Multiple gene lists can be classified, compared and visualized according to the gene ontology, metabolic pathway and gene family information of member genes. It allows further cross-validation with expression data from related experiments, which is backed with our comprehensive plant microarray expression data repository at Barleybase/PLEXdb.

(1). Modules

A. Expression2GO --  Compare several types of gene lists for their distribution in GO classes among the lists.

B. Expression2Pathway -- Compare the ATH1 lists for their distribution in metabolic pathways.

C. Expression2GeneFamily -- Compare the ATH1 lists for their distribution among  gene families.

D. Expression2Domain -- Compare the ATH1 lists for distribution among functional domains from InterPro.

(2). Input-- Data Sources and Formats

Source A. Gene lists from microarray experiments, conducted on BarleyBase - supported platforms. The inputs are microarray element names, including exemplar and probe set names. Supports GO, Pathway and Gene Family.

Source B. Gene lists from non-microarray experiments, including genomics, proteomics, EST, and other high-throughput genomics experiments. The inputs are Gene Index TC numbers, and protein accession numbers, and protein name. Supports GO, Pathway, Interpro domain and Gene Family .

Input Preparation Method 1: Using gene lists and data  pre-saved within BarleyBase: (1). Compare multiple saved lists; (2) Compare  two pre-saved gene lists and their gene subsets, including the intersection and difference subsets; and  (3) Compare gene lists from the clusters from a BarleyBase clustering/partitioning result.

Input Preparation Method 2: Importing genes lists from outside BarleyBase. Multiple gene lists (up to 10) can be input for comparison with reference list, and between all the lists.

Data Format for Input Preparation Method 2:

  • No data formatting is needed for Input Preparation Method 1.

  • For Input Preparation Method 2, please refer to a Sample Input for ATH1 GeneChip. The input is multiple lists denoted by a list header line: "MY_LIST:###LIST_Name###", where LIST_Name can be any user preferred name. After each header line, users can input multiple gene names separated by comma, tab, or white space. Other free text input is supported, though it may not always work accurately.

MY_LIST:###LIST1###
245306_at
245628_at
245637_at
......

MY_LIST:###LIST2###
252102_at
252123_at
252265_at
......

(3). Output-- Visualization and Tables

Detailed classification for each gene list in each functional class is output as color-highlighted HTML tables. It includes the number of matches, enrichment fold and p-values, and the names of matching genes.

Fisher's Exact Test and Hypergeometric Distribution are used to find significantly enriched and depleted gene functional classes. Benjamini and Hochberg (BH) multiple test correction is used to get FDR (false discovery rate). The p-values and FDRs are stored as tab-delimited text files.

Two types of barplots are used to visualize the comparison results:

A. Plot by gene functional classes for all the gene lists including population reference list. The enrichment folds, the number,  and % of matches in each gene list are plotted side-by-side, together with the reference list.  The information, actual match  number, %, and enrichment fold of each list against reference list are shown as legend in the plots.

 B. Plot by gene lists for represented gene functional classes. Barplot are for Enrichment Fold v.s. Reference List, or for Percentage in Functional Classes. The enrichment folds, the number,  and % of matches in each GO classes are plotted side-by-side, optionally sorted by enrichment fold.  The information, actual match  number, %, and enrichment fold of each list against reference list are shown as legend in the plots.

(4). Usage of Expression2Function

-> Choose Gene Function Type (for example, GO)

-> Choose your top level GO term

-> Select input type

-> Select species/platform combination from list

-> If applicable, change the annotations threshold for BLASTX-based annotations

-> If applicable, change the Total and annotated numbers of sequences of the Reference. This is for accurate enrichment quantification and p-value calculation of users' own lists v.s. reference, but not needed for comparison between users' own input lists. Reference is defined as the global seuquence population where you draw your input list from. You can use your own values to override the values provided by FuncExpression if you have your own information about the Reference size, number of GO annotated sequences in the reference.

-> Select gene list input type and provide input

-> Press "Run" button

-> Check and save results. Cross-validate gene list classification results with expression values if microarray data is available by (a). Select target microarray experiment and (b) click in the textboxes for gene names.

4. Function2Expression -- From Gene Functional Annotation to Expression Profile

This is a collection of gene list creation methods for retrieval of gene expression data, based on several types of gene functional annotations.

(1). Modules

A.GO2Expression --  Browse and search Gene Ontology tree, and retrieve probe sets or genes from selected GO classes.

B.Pathway2Expression -- Find probe sets  from Arabidopsis ATH1 GeneChip corresponding to enzymes from your interested metabolic or regulatory pathways. Based on KEGG and TAIR pathway data.

C.GeneFamily2Expression -- Find probe sets  from Arabidopsis ATH1 GeneChip corresponding to a given gene family (Discontinued).

(2). Input

The input are selected GO terms, pathways, or gene families. Target experiment must be defined or selected for retrieving expression values. Please follow instructions on corresponding pages.

(3). Output

The output are the qualifying gene lists, which can be further feed into microarray data numerical analysis and visualization tools.

5. Change Log

June 11, 2006:

Added GO support to all Affymetrix Plant GeneChips (except for Citrus), and NSF maize 58K and Rice 20K spotted array. Added Fusarium 18K GeneChip.

For BLASTX based annotations in plants and fusarium, added option for choosing stringent and loose threshold in BLAST.

Added option for overriding Reference gene list total and annotated sequence numbers for accurate enrichment and p-value calculation versus reference list.

December 29, 2005:

Added pathways for multiple animal and microbe species with KEGG annotations.

Added GO support to ALL Affymetrix Animal and microbe platforms. More plant species supported.

Added GO support to all species annotations from Gene Ontology Consortium annotation page.

March 2005:

FuncExpression is added to GO tool list at Gene Ontology Consortium website.

Added GO and pathway supports for animal and fungal proteins.

November 14, 2004:

Prototype of FuncExpression was out. It supported GO, Gene family, Interpro, and KEGG pathway analysis for Arabidopsis ATH1 22K and Barley1 22K GeneChips.

กก

FuncExpression is under ACTIVE development. Please regard it as a Beta test version, and use caution in interpreting results.

Please send questions, feature request, bug reports, and comments about this tool to the Lishuang Shen: lshen@iastate.edu or Shen_Lishang@yahoo.com.

Back to Top              

 

Copyright@2001-2005 The BarleyBase Group
All rights reserved.

For problems with the webpages, contact barleybasewebmaster