I. DAT file 1. What's DAT file? -A hybridized Affymetrix GeneChip is scanned and then an image is generated. The image is saved as a file with extension DAT. -A DAT file is not in plain text format and it can't be opened by generic image viewers. It can be opened by MAS of Affymetrix and it might be opened by Adobe Photoshop. II. CEL file 1. What's CEL file? -A CEL file is obtained from a DAT file, which is an scanned image of a chip. It's one data format of Affymetrix. For example, 0308-6.CEL is obtained from 0308-6.DAT by using the algorithms of Affymetrix. -A CEL file is in plain text format. It's giving the position and intensity information of each probe for one GeneChip and it also tells the position of masks and outliers. -The meaning of the fields in a typical CEL file: -x : The x coordinate of a cell. -y : The y coordinate of a cell. -intensity : The intensity of a cell. -sd : The standard deviation of a cell. -pixels : The number of pixels in a cell. -Example CEL file: [CEL] Version=3 [HEADER] Cols=712 Rows=712 TotalX=712 TotalY=712 OffsetX=0 OffsetY=0 GridCornerUL=227 233 GridCornerUR=4486 237 GridCornerLR=4475 4507 GridCornerLL=216 4503 Axis-invertX=0 AxisInvertY=0 swapXY=0 DatHeader=[0..46119] 0308-5:CLS=4733 RWS=4733 XIN=3 YIN=3 VE=17 2.0 05/19/03 13:56:51   Barley1.1sq          6 Algorithm=Percentile AlgorithmParameters=Percentile:75;CellMargin:2;OutlierHigh:1.500;OutlierLow:1.004 [INTENSITY] NumberCells=506944 CellHeader=X Y MEAN STDV NPIXELS 0 0 156.8 16.0 16 1 0 12911.0 1434.1 16 2 0 142.3 18.8 16 3 0 12829.3 1500.8 16 4 0 52.3 9.0 16 5 0 170.5 17.2 16 6 0 11867.0 1595.1 16 7 0 149.5 24.7 16 ... 707 711 59.5 10.5 16 708 711 11769.0 1920.5 16 709 711 191.8 30.3 16 710 711 11500.3 2001.5 16 711 711 248.0 43.7 16 [MASKS] NumberCells=0 CellHeader=X Y [OUTLIERS] NumberCells=2198 CellHeader=X Y 65 0 67 0 69 0 71 0 ... 503 711 561 711 563 711 [MODIFIED] NumberCells=0 CellHeader=X Y ORIGMEAN III. CSV file 1. What's CSV file? -A CSV file in BarleyBase means a file with extension 'csv', which means comma separated value. -Since there is no probeset name information in CEL file, and for further analysis it's very important to have, we created CSV file by combining a CEL file with th1. What's CEL file? -A CEL file is obtained from a DAT file, which is an scanned image of a chip. It's one data format of Affymetrix. For example, 0308-6.CEL is obtained from 0308-6.DAT by using the algorithms of Affymetrix. -A CEL file is in plain text format. It's giving the position and intensity information of each probe for one GeneChip and it also tells the position of masks and outliers. -The meaning of the fields in a typical CEL file: -x : The x coordinate of a cell. -y : The y coordinate of a cell. -intensity : The intensity of a cell. -sd : The standard deviation of a cell. -pixels : The number of pixels in a cell. -Example CEL file: [CEL] Version=3 [HEADER] Cols=712 Rows=712 TotalX=712 TotalY=712 OffsetX=0 OffsetY=0 GridCornerUL=227 233 GridCornerUR=4486 237 GridCornerLR=4475 4507 GridCornerLL=216 4503 Axis-invertX=0 AxisInvertY=0 swapXY=0 DatHeader=[0..46119] 0308-5:CLS=4733 RWS=4733 XIN=3 YIN=3 VE=17 2.0 05/19/03 13:56:51   Barley1.1sq          6 Algorithm=Percentile AlgorithmParameters=Percentile:75;CellMargin:2;OutlierHigh:1.500;OutlierLow:1.004 [INTENSITY] NumberCells=506944 CellHeader=X Y MEAN STDV NPIXELS 0 0 156.8 16.0 16 1 0 12911.0 1434.1 16 2 0 142.3 18.8 16 3 0 12829.3 1500.8 16 4 0 52.3 9.0 16 5 0 170.5 17.2 16 6 0 11867.0 1595.1 16 7 0 149.5 24.7 16 ... 707 711 59.5 10.5 16 708 711 11769.0 1920.5 16 709 711 191.8 30.3 16 710 711 11500.3 2001.5 16 711 711 248.0 43.7 16 [MASKS] NumberCells=0 CellHeader=X Y [OUTLIERS] NumberCells=2198 CellHeader=X Y 65 0 67 0 69 0 71 0 ... 503 711 561 711 563 711 [MODIFIED] NumberCells=0 CellHeader=X Y ORIGMEANe corresponding CDF file. And the prefix of CSV file is the same as that in CEL file. For example, 0308-6.csv is generated from 0308-6.cel and Barley1.CDF. 2. The meaning of the fields in a typical csv file: -probeset_name: The name of the probeset. -probe_pair : The probe pair number or ID in one probe set. E.x. 1,2,...11 -pm_x : The x coordinate of the perfect match. -pm_y : The y coordinate of the perfect match. -pm_intensity : The intensity of perfect match. -pm_sd : The standard deviation of the perfect match. -pm_pixels : The number of pixels in the perfect match. -mm_x : The x coordinate of the mismatch. -mm_y : The y coordinate of the mismatch. -mm_intensity : The intensity of mismatch. -mm_sd : The standard deviation of the mismatch. -mm_pixels : The number of pixels in the mismatch. 3. Example CSV file probeset_name, probe_pair, pm_x, pm_y, pm_intensity, pm_sd, pm_pixels, mm_x, mm_y, mm_intensity, mm_sd, mm_pixels 1200459_Reg_88-1740_at,1,80,546,65.5,16.4,16,80,545,54.3,8.2,16 1200459_Reg_88-1740_at,2,108,172,93.3,9,12,108,171,100.8,21.4,16 1200459_Reg_88-1740_at,3,51,36,143,29.6,16,51,35,357.8,103,16 1200459_Reg_88-1740_at,4,154,288,101.3,48.8,16,154,287,114.8,66.3,16 1200459_Reg_88-1740_at,5,368,152,960.5,54.1,16,368,151,1564.8,58.4,16 1200459_Reg_88-1740_at,6,418,662,520,67.3,16,418,661,1733,282.7,16 1200459_Reg_88-1740_at,7,653,304,88.8,15.8,16,653,303,84,13.8,16 1200459_Reg_88-1740_at,8,469,4,140,21.7,16,469,3,257,37.2,16 1200459_Reg_88-1740_at,9,572,218,119.3,19.8,16,572,217,102.3,15,16 1200459_Reg_88-1740_at,10,599,122,439,62.8,16,599,121,699.8,95.3,16 1200459_Reg_88-1740_at,11,369,640,97.5,20.5,16,369,639,119.3,38.6,16 1289374_Reg_826-1545_at,1,268,128,89.3,8.8,16,268,127,226.3,41.8,12 1289374_Reg_826-1545_at,2,667,588,799.3,139.8,16,667,587,437,111,16 1289374_Reg_826-1545_at,3,11,340,85,16.1,16,11,339,84.3,16.8,16 1289374_Reg_826-1545_at,4,509,186,571.3,135.7,16,509,185,304.3,116.7,16 1289374_Reg_826-1545_at,5,568,496,93.3,18.1,16,568,495,88.3,13.7,16 1289374_Reg_826-1545_at,6,70,134,127.5,34.9,16,70,133,63,8,16 1289374_Reg_826-1545_at,7,488,126,56,8.8,16,488,125,77,17.1,16 1289374_Reg_826-1545_at,8,255,588,143.3,28.3,16,255,587,154,26.6,16 1289374_Reg_826-1545_at,9,272,146,195,24,16,272,145,104.5,14.9,16 1289374_Reg_826-1545_at,10,59,126,70.3,11.4,16,59,125,67.8,9.9,16 1289374_Reg_826-1545_at,11,557,138,128.3,30.2,16,557,137,657.8,61.6,16 149174_Reg_66-1115_at,1,657,510,100.5,22.1,16,657,509,126.8,20.2,16 149174_Reg_66-1115_at,2,131,672,420.5,43.3,16,131,671,237.8,28.8,16 149174_Reg_66-1115_at,3,617,710,312.3,57.4,16,617,709,709.8,142.9,16 149174_Reg_66-1115_at,4,135,34,238.5,30.7,16,135,33,120,16.7,16 149174_Reg_66-1115_at,5,541,382,275.3,43.3,16,541,381,189.5,30.8,16 149174_Reg_66-1115_at,6,88,186,3207.5,1051.8,16,88,185,394,55.6,16 149174_Reg_66-1115_at,7,676,36,299.5,24.5,16,676,35,148.8,25.3,16 149174_Reg_66-1115_at,8,678,398,75.8,15.1,16,678,397,71,12,16 149174_Reg_66-1115_at,9,338,436,152.5,22.1,16,338,435,91,16.8,16 149174_Reg_66-1115_at,10,321,538,149.3,21,16,321,537,253.5,26.6,16 ... Y10834_at,9,264,128,192,36,16,264,127,204.3,58.2,16 Y10834_at,10,7,210,217.5,24,16,7,209,181.3,12.5,16 Y10834_at,11,56,516,74,12.7,16,56,515,101.8,22.3,16 Z48624_x_at,1,661,246,119.3,19.9,16,661,245,334.3,40.6,16 Z48624_x_at,2,688,490,598.3,71.5,16,688,489,457,91.8,16 Z48624_x_at,3,482,308,201.8,23.1,16,482,307,184,19.5,16 Z48624_x_at,4,330,128,539,74.9,16,330,127,278.8,49.4,16 Z48624_x_at,5,225,230,198.3,26,16,225,229,396.3,99,16 Z48624_x_at,6,131,648,502,53.2,16,131,647,206.8,31.5,16 Z48624_x_at,7,316,262,47.5,8.1,16,316,261,66.3,10.9,16 Z48624_x_at,8,384,464,74.3,12.5,16,384,463,74.3,15.3,16 Z48624_x_at,9,298,628,68.5,12.6,16,298,627,82.3,18.8,16 Z48624_x_at,10,416,226,165.8,28.4,16,416,225,104.5,15.7,12 Z48624_x_at,11,79,104,85.5,18.9,16,79,103,139.3,35,16 IV. Expression data 1. What's expression data? -Expression data is the data obtained after normalization based on CEL files. Currently, we are using two nomalization methods: -MAS5 -RMA -Normalized expression values are in tab-delimited plain-text format. All arrays for one experiment are included together in the same file. 2. Details of normalization methods and parameters -One type is the Affymetrix Microarray Suite 5.0 (MAS 5.0) absolute statistical estimation. The estimation is obtained with the Affymetrix's statistical expression algorithm. Two files for MAS 5.0 estimation are provided as tab-delimited text files for each experiment. The brief version (For example,MAS5_brief.txt) contains only the expression values (signals) from all arrays in an experiment. The extended version(For example,MAS5_detailed.txt) contains 5 types information for the probe sets: Stat Pair, Stat Pairs Used, Signal, Detection and Detection p-value. For most analysis, the default parameters used are used with all chips scaled to target signal (TGT) 500: Alpha1 = 0.05, Alpha2 = 0.065, Tau = 0.015, Gamma1H = 0.0045, Gamma1L = 0.0045, Gamma2H = 0.006, Gamma2L = 0.006 Perturbation = 1.1 and TGT = 500. For more information on the meaning of the parameters and the outputs fields, please refer to Affymetrix MAS5.0 manual When the arrays from one experiment are determined, through biological knowledge, to have different total signals, the native or control group arrays are normalized to TGT=500, then the mean scaling factor are applied to all arrays from the experiment. For example, in "Cross-species Detection in Barley1 GeneChip Array" experiment, the TGT for barley chips are used as reference and scaled to TGT = 500 at SF=3.418, all other chips are scaled with same SF. -The other estimation is from Robust Multi-array Average or Robust Multi-chip Average (RMA). It is calculated with affy package's rma function from Bioconductor project. This function implements RMA in 3 steps: First step, make background-corrected probe-specific correction of the perfect match (PM) probes, this correction uses a model assuming that observed intensity is the sum of signal and noise. In this step, the base-2 logarithm of each background- corrected PM intensity is obtained. Second step, normalize corrected PM probes using quantile normalization, and finally expression measure is calculated with median polish. Further details can be obtained from Bioconductor website ( www.bioconductor.org) 3. MAS normalization results example 0308-1_Stat Pairs 0308-1_Stat Pairs Used 0308-1_Signal 0308-1_Detection 0308-1_Detection p-value 0308-2_Stat Pairs 0308-2_Stat Pairs Used 0308-2_Signal 0308-2_Detection 0308-2_Detection p-value 0308-3_Stat Pairs 0308-3_Stat Pairs Used 0308-3_Signal 0308-3_Detection 0308-3_Detection p-value 0308-4_Stat Pairs 0308-4_Stat Pairs Used 0308-4_Signal 0308-4_Detection 0308-4_Detection p-value 0308-5_Stat Pairs 0308-5_Stat Pairs Used 0308-5_Signal 0308-5_Detection 0308-5_Detection p-value 0308-6_Stat Pairs 0308-6_Stat Pairs Used 0308-6_Signal 0308-6_Detection 0308-6_Detection p-value 0315-1_Stat Pairs 0315-1_Stat Pairs Used 0315-1_Signal 0315-1_Detection 0315-1_Detection p-value 0315-2_Stat Pairs 0315-2_Stat Pairs Used 0315-2_Signal 0315-2_Detection 0315-2_Detection p-value 0315-3_Stat Pairs 0315-3_Stat Pairs Used 0315-3_Signal 0315-3_Detection 0315-3_Detection p-value 0315-4_Stat Pairs 0315-4_Stat Pairs Used 0315-4_Signal 0315-4_Detection 0315-4_Detection p-value 0315-5_Stat Pairs 0315-5_Stat Pairs Used 0315-5_Signal 0315-5_Detection 0315-5_Detection p-value 0315-6_Stat Pairs 0315-6_Stat Pairs Used 0315-6_Signal 0315-6_Detection 0315-6_Detection p-value Descriptions AFFX-BioB-5_at 20 20 461.7 P 0.002867 20 20 494.2 P 0.004484 20 20 373.7 P 0.013811 20 20 334.3 M 0.058444 20 20 468.2 P 0.003212 20 20 433.0 P 0.006187 20 20 489.9 P 0.001410 20 20 426.6 P 0.015183 20 20 339.3 M 0.058444 20 20 325.3 P 0.004484 20 20 402.4 P 0.000581 20 20 506.2 P 0.006187 AFFX-BioB-M_at 20 20 218.0 P 0.001248 20 20 295.2 P 0.000662 20 20 468.9 P 0.000390 20 20 275.7 P 0.002556 20 20 300.8 P 0.000662 20 20 401.5 P 0.000225 20 20 507.6 P 0.000060 20 20 438.0 P 0.000340 20 20 297.7 P 0.008440 20 20 331.2 P 0.000754 20 20 427.7 P 0.000081 20 20 514.2 P 0.000340 AFFX-BioB-3_at 20 20 307.9 P 0.002867 20 20 275.8 P 0.000662 20 20 268.7 P 0.001102 20 20 246.1 P 0.005565 20 20 253.0 P 0.000754 20 20 199.0 P 0.010317 20 20 469.9 P 0.000070 20 20 428.0 P 0.000258 20 20 309.0 P 0.023929 20 20 298.7 P 0.000340 20 20 320.9 P 0.000662 20 20 275.9 P 0.002867 ... AF509747.1_at 11 11 260.2 A 0.067627 11 11 114.9 A 0.432373 11 11 106.1 A 0.432373 11 11 92.8 A 0.432373 11 11 30.7 A 0.432373 11 11 23.3 A 0.696289 11 11 45.7 A 0.334473 11 11 143.0 A 0.334473 11 11 146.3 A 0.366211 11 11 148.5 A 0.194580 11 11 187.9 P 0.037598 11 11 154.8 A 0.303711 4. RMA normalization results examples 0308-6.CEL 0315-6.CEL 0308-1.CEL 0315-5.CEL 0308-5.CEL 0315-4.CEL 0308-3.CEL 0315-2.CEL 0308-4.CEL 0315-3.CEL 0308-2.CEL 0315-1.CEL 1200459_Reg_88-1740_at 5.29895702911779 5.41159115433385 5.41536050372796 5.32219492634803 5.64442745484244 5.68591949675641 5.84217315426786 6.01243623139809 6.03511347086101 5.89685381397693 5.59969697047079 5.5341926449814 1289374_Reg_826-1545_at 4.58914813258642 4.66553370776183 5.32167085564938 5.20899724953439 4.86413445701439 5.12654445940467 4.77461991107395 4.59119569697995 5.53614129758275 5.82251394721211 4.71727562940295 4.95870716630996 149174_Reg_66-1115_at 6.22856562347685 6.17827307064106 5.99943809259409 5.92394266009905 5.96916497843898 6.02275978962534 6.5134121104271 6.62134571919263 6.730438212663 6.94280720907981 5.70179883976429 5.91788740265311 ... Y09233_at 6.09024996692518 5.80201396037666 6.03791916201337 5.84894452007608 6.03000628173444 6.02795873166883 6.66170173523807 6.51023991510055 7.18510402682977 6.8485553647149 7.0924138902899 6.89840647341499 Y09748_at 3.92494007417609 3.97075061114701 3.18455542869903 3.14436352747130 2.80449666849437 2.94561876308592 3.18211209252081 3.11533554476063 4.1761473524174 4.39487758771256 2.75560037833073 2.76935456718945 Y10834_at 6.0169310463306 5.89024956026502 5.89134418634033 6.0116048982413 5.83873718434652 5.76573618134603 6.38480199690379 6.15875072153005 6.74818403402201 6.73127004627679 5.8323598895204 6.10226632998493 Z48624_x_at 9.03518845452556 8.22418841005502 6.99667633607508 6.93888805916397 6.6968399048599 6.75123087816228 6.6457232892968 6.8530322108439 6.37398321886739 6.55604557172551 6.40082780227334 6.39226697077772