Material on the analysis of (micro)array data

Material on the analysis of (micro)array data

We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

I'm at the moment analyzing cytokine array data. The available material on the statistical analysis of these data is more than unsatisfactory. Since a lot of effort is being made in the analysis of gene microarray and MALDI-TOF proteome data, I seek to apply these methods on my cytokine arrays.

What is a good introductory text on microarray analysis, especially quality control and error estimation?

A good place to start would be Statistical Methods for Microarray Data Analysis. I'd also suggest papers from the labs of Terry Speed, Gary Churchill, John Quackenbush, and Gordon Smyth.

Also, I found some papers that specifically reflect on your exact issue: how to apply the methods developed for DNA microarrays to analyze protein arrays.

  • Eckel-Passow et al. Experimental design and analysis of antibody microarrays: applying methods from cDNA arrays. Cancer Res. 2005 Apr 15;65(8):2985-9.
  • Royce et al. Extrapolating traditional DNA microarray statistics to tiling and protein microarray technologies. Methods Enzymol. 2006;411:282-311.

Best Microarray Data Analysis Software

High quality image processing and appropriate data analysis are important steps of a microarray experiment. This BiologyWise article outlines some of the best microarray data analysis software available to extract statistically and biologically significant information from microarray experiments.

High quality image processing and appropriate data analysis are important steps of a microarray experiment. This BiologyWise article outlines some of the best microarray data analysis software available to extract statistically and biologically significant information from microarray experiments.

Did You Know?
A single microarray generates about 10 5 – 10 6 fragments of data.

Would you like to write for us? Well, we're looking for good writers who want to spread the word. Get in touch with us and we'll talk.

Microarray experiments followed by accurate analysis of the enormous amount of data generated have developed to be a rich source of information with respect to several aspects of biology, including gene function, gene expression, pathway analysis, genomic comparisons, etc.

Given below are some of the best and most used comprehensive software that enable preprocessing, normalization, filtering, clustering, and finally, the biological interpretation and analysis of microarray data. In addition, specific software that provide tools for a particular type of analysis have also been described.

Type: Free and open source
Maintained by: Fred Hutchinson Cancer Research Center
Operating System: Windows, Linux, Mac OS X
Functionality: Comprehensive

This open development project was initiated in 2001, and is based on the R programming language. It comprises R packages that provide statistical, graphical, and other computational tools for DNA microarray image processing and data analysis, sequence analysis, as well as SNP (Single Nucleotide Polymorphism) data analysis. Specific packages are available that cater to several commercial microarray platforms like Affymetrix.

Type: Free and open source
Maintained by: Institute for Genomic Research and other contributors
Operating System: Windows, Linux, Mac OS X
Functionality: Comprehensive

The TM4 Microarray Software Suite provides the following applications that have been developed in Java and C/C++.

  1. MADAM (Microarray Data Manager) is developed to manage and store microarray data as well as the associated information, like experimental design, parameters, protocols, etc. This data is stored in a MySQL database in accordance with the MIAME (Minimal Information About a Microarray Experiment) standards.
  2. MIDAS (Microarray Data Analysis System) is developed for normalizing and filtering the data obtained. The resultant output is stored in .tav format in the MADAM associated database.
  3. Spotfinder is designed for rapid image processing and quantification of signals at each spot to quantify gene expression.
  4. MeV (MultiExperiment Viewer) enables the analysis of the normalized and filtered microarray data. It provides tools for clustering and classification, graphical visualization, statistical analysis, as well as annotation.
  5. AMP (Automated Microarray Pipeline) is a web-based application where microarray data in the form of Affymetrix CEL files can be submitted for further analysis. The workflow or pipeline can be specified for normalization and statistical analysis followed by gene classification and annotation. AMP is comparatively user-friendly, and the results are displayed in a web-based format which can be easily stored and used for further analysis.

Would you like to write for us? Well, we're looking for good writers who want to spread the word. Get in touch with us and we'll talk.

Type: Free and open source, as well as in the form of a public web server
Maintained by: Broad Institute
Operating System: Windows, Linux, Ubuntu, SuSE, CentOS, Mac OS 10.7 and later
Functionality: Comprehensive

Developed using the R programing language, this is a highly user-friendly system and comprises several analysis modules that can be easily arranged and interconnected to form a customized pipeline. Microarray data can be normalized, preprocessed, and analyzed for gene expression patterns, predicting the class of desired genes, clustering and discovering the gene class, as well as pathway analysis.

Type: Free and open source
Maintained by:
Operating System: Windows
Functionality: Specific

Developed using Visual Basic 6.0, GenMAPP or the Gene Map Annotator and Pathway Profiler is specifically designed for the analysis of genomic microarray data for understanding and identifying biological pathways, like anabolic and catabolic pathways as well as signaling pathways.

It contains gene databases for selected model organisms, including E.coli, humans, mouse, zebrafish, etc. The gene expression data obtained from custom as well as commercial microarrays can be analyzed, and the desired genes can be visualized in the form of a pathway by using a color-coded format as per the criteria indicated by the user. It provides tools to construct and modify pathways using earlier information about gene annotations.

Type: Commercial
Provided by: BioDiscovery
Operating System: Windows, Mac, and Linux
Functionality: Comprehensive

This is a Java-based commercial software for analyzing data from almost any platform and type of microarray. It even provides ready-to-use templates for standard microarray platforms, like Agilent 244K, 4x44K, 44K arrays, etc.

Analyzing microarray data depends on the type of microarray as well as the design of the study. In addition to convenience, the choice of microarray data analysis software and the statistical analysis tools should be made after careful consideration of the experimental conditions and precise objective.

Related Posts

The two gene therapy types are germ line gene therapy and somatic gene therapy. While the germ line type is aimed at permanent manipulation of genes in the germ cells,&hellip

Gene therapy is the recent development in the field of medicine, with the potential of being useful in the treatment of some serious diseases like cancer. The therapy basically tries&hellip

Severe combined immunodeficiency (SCID) is a life-threatening disease, also known as the 'Boy in the Bubble' syndrome. Here's more. Lymphocytes are white blood cells present in the immune system of&hellip


Microarray structure

A typical microarray consists of oligonucleotides which are several dozen nucleotides (nt) long attached to the surface of a glass slide. Using appropriate photolithographic masks, a single nucleotide A, C, T, or G is attached at a time, and therefore it is possible to construct a microarray with hundreds of thousands of different oligonucleotide sequences which are complementary to characteristic fragments of known DNA or RNA sequences. These characteristic fragments are arranged in sets called probes [49]. A sample containing DNA or RNA molecules is spread on the surface of a microarray and its components hybridize specifically with their complementary probes, which are located in multiple copies across the microarray (Fig.  1 ). The amount of material hybridized to a given probe is determined by a fluorescence-based method and although the relationship is not linear the fluorescence intensity reflects the amount of DNA or RNA of a given gene in the sample [50]. This approach allows quantifying the level of transcripts of thousands of genes in a relatively short time.

Microarray schematics. a Probes corresponding to the characteristic fragments of a given gene are placed in different locations across the array b Single probes are arranged in sets corresponding to the same region of the gene. DNA that hybridizes to the probe can be detected using a fluorescent reporter system. Increasing the number of probes to which cDNA hybridizes correctly, increases the contrast between this probe set (probe set A) and any other probe set (probe set B)

The most widespread microarray is the Affymetrix 3′IVT (3′ in vitro transcription), i.e. HG-U133A or HG-U133_Plus_2, which is assembled as 11 sets of perfect match (PM) probes consisting of 25 nt sequences, which in most cases were chosen out of 600 nt sequence fragments located near the 3′ end of a specific transcript. For every PM probe on the microarray, a MM (mismatch) probe exists in which all nucleotides but one are identical to those on the corresponding PM probe but the original 13th nucleotide is replaced by a non-complementary one. The rationale behind the MM probes is to gauge the level of nonspecific hybridization [51], although the usefulness of this concept has been doubted (see further on).

The most recent generation of Affymetrix microarrays, such as the HuGene 1.0ST, is constructed using probes similar to the standard PM probes but with affinity not to the noncoding part of the 3′ end but rather to the individual exons in a given transcript. In this design the MM probes are replaced by the Background Intensity Probes (BGP), which are designed to evaluate background intensity levels for probes of different sequence characteristics. BGP are a set of about 1000 probes, non-complementary to any human gene sequence, with a variable ratio of GC nucleotides in the sequence. This approach enables a better evaluation of non-specific hybridization across the microarray compared with MM probes, for which the signal often exceeds the PM signal due to probe-specific effects [52]. Additionally, lowering the number of probes which evaluate non-specific hybridization allows inserting of a much higher number of PM probes. The probe set in the new generation of whole transcript microarrays is constructed with two levels, exon and gene level. The exon probe set includes 4 probes on average, which are tailored for individual exons, and then these are clustered, usually in groups of around 25, creating sets for individual genes. Using this approach it is possible to determine levels of individual differently-spliced transcripts.

Another popular system is the Agilent microarray platform which was built using the SurePrint technology that allows using considerably longer, 60 nt-long probes. While probes are longer than in the Affymetrix system, the number of probes per gene is considerably lower, 8 on average in the most expensive set of exon microarrays (2 ×� k) or 2 in the least expensive platform (8 ×� k). As the Agilent probes are longer than those in the Affymetrix microarrays, the system tends to be more specific which is an obvious advantage, but on the other hand the lower number of probes per gene makes Agilent microarrays more sensitive to single nucleotide variations. These latter should not affect the signal if they result from amplification errors [53], but they may influence the expression estimates resulting from characteristic features of the sample analyzed. In the case of the Affymetrix microarray system these sources of error will only have a minor impact, as they influence signal only in an individual probe for a transcript or a transcript-specific probe-set. Single nucleotide polymorphisms do not block the hybridization but lower its efficiency, which can be interpreted as a significant decrease of gene expression, a feature which is used to estimate the level of nonspecific hybridization using mismatch probes [54, 55] or to assess allelic frequencies using SNP microarrays [56]. In the Affymetrix systems the signal from one badly designed probe, which may be based on inaccurate data from a sequence database, can be easily eliminated from further analysis [41] without significant decrease in the precision of gene expression estimate, while in the Agilent systems the same design glitch might cause significant difficulties in the evaluation of gene expression levels.

Microarrays provide expression data for thousands of genes, but platform differences contribute to low accuracy of microarrays and for this reason they are only used to identify potentially significant genes in the experimental conditions studied. Precise assessment of the expression level of these presumably significant genes requires additional studies using more accurate methods such as real-time PCR (polymerase chain reaction) which, in turn, are not suitable for large-scale analyses. However, some steps of the microarray protocol are shared by the validation methods, affecting data quality in a similar manner.

Biological background of microarray experiments

The microarray experiment is a multi-stage process in which the accuracy of each individual step may influence the gene expression estimates. Precise understanding of each step is very important not only for the experimenter but also for the person performing data pre-processing. In order to avoid mistakes that occur during the experiment, its accuracy and the condition of the biological material are controlled in various steps, as shown in Fig.  2 . The procedure used in a microarray experiment is very similar across different platforms, and therefore for simplicity the following description is based on the procedure used for the Affymetrix microarrays.

Individual steps of a microarray experiment. After isolation of sample mRNA (1) synthesis of cDNA (complementary DNA, step) chains begins with addition of oligo(dT) primers (2), then cDNA is amplified, producing cRNA (complementary RNA), which is labelled with biotin (3) and later fragmented (4). After such preparation sample cRNA is ready for hybridization with microarray probes (5) and ready for the final staining process (6)

Step I: RNA isolation

In the first step RNA is isolated from the cells and its concentration and extent of degradation is controlled by the use of a spectrophotometer (quantity) and a bioanalyzer (quality). In a high-quality RNA sample ribosomal RNA (rRNA) constitutes over 80 % of the entire RNA, and despite the fact that it is rarely a target of a study in fields other than bacteriology and phylogenetics, its concentration is a good indicator of the overall RNA quality, both before and after the experiment. Prior to hybridization the extent of RNA degradation can be assessed by RNA electrophoresis or a bioanalyzer using the RNA integrity number (RIN) as a benchmark [57]. Fig.  3 shows an example of an image made after electrophoresis in agarose gel. The first two lanes show RNA directly after isolation. The two most distinguishable bands correspond to the 18 and 28S rRNA and their absence would indicate that RNA were highly degraded [58].

Electrophoretic analysis of the products obtained at various stages of a microarray experiment. Lanes 1 and 2, isolated RNA 3 and 4, purified cRNA 5 and 6, fragmented cRNA (Image courtesy of Herok R. - unpublished data)

RNA quality can also be evaluated after the experiment by analyzing results from a specific group of control probe-sets (as described in Table  1 ) designed to target certain housekeeping genes (group 1) or rRNA (group 2). As with other probe-sets a single expression intensity is however non-informative since its value, expressed in arbitrary units, depends on the characteristics of the sample [59], the experimental conditions, such as for example the ozone level in the laboratory [60], and the data pre-processing methods used [61]. For this reason arbitrary criteria based solely on single expression intensities are usually ineffective and a comparison between arrays or between probe-sets is required.

Table 1

Reference genes found on a typical Affymetrix 3’IVT microarray. Amplification and hybridization control RNAs are added in various proportions and quantities as indicated in the last column. The amplification control transcripts are added using various dilutions which results in an estimated copy numbers ranging from one copy per 6,667 to 100,000 transcripts in the studied RNA sample. The hybridization control consists of biotinylated and fragmented cRNAs added in various amounts that result in a final concentrations ranging from 1.5 to 100 pM

1Housekeeping genesAFFX-HSAC07/ <"type":"entrez-nucleotide","attrs":<"text":"X00351","term_id":"28251","term_text":"X00351">> X00351ACTB - β-actin gene responsible for the structure of the cell
AFFX-HUMGAPDH/ <"type":"entrez-nucleotide","attrs":<"text":"M33197","term_id":"182976","term_text":"M33197">> M33197GAPDH – enzyme which takes part in glycolysis
AFFX-HUMISGF3A/ <"type":"entrez-nucleotide","attrs":<"text":"M97935","term_id":"2281070","term_text":"M97935">> M97935STAT1 – transcription factor
2Ribosomal RNAAFFX-HUMRGE/ <"type":"entrez-nucleotide","attrs":<"text":"M10098","term_id":"337376","term_text":"M10098">> M10098Gene coding for 18S rRNA subunit
AFFX- <"type":"entrez-nucleotide","attrs":<"text":"M27830","term_id":"337384","term_text":"M27830">> M27830Gene coding for 28S rRNA subunit
AFFX-r2-Hs18SrRNAGene coding for 18S rRNA subunit - version 2
AFFX-r2-Hs28SrRNAGene coding for 28S rRNA subunit - version 2
3Amplification control (Poly-A spike)AFFX-DapX / AFFX-r2-Bs-dapDap gene of B.Subtilis bacteria - proportions 1:6,667
AFFX-ThrX / AFFX-r2-Bs-thrThr gene of B.Subtilis bacteria - proportions 1:25,000
AFFX-PheX / AFFX-r2-Bs-phePhe gene of B.Subtilis bacteria - proportions 1:50:000
AFFX-LysX / AFFX-r2-Bs-lysLys gene of B.Subtilis bacteria - proportions 1:100,000
4Hybridization control (Bacterial spike)AFFX-BioB / AFFX-r2-Ec-bioBBioB gene of E.Coli bacteria – quantity 1.5 pM
AFFX-BioC / AFFX-r2-Ec-bioCBioC gene of E.Coli bacteria – quantity 5 pM
AFFX-BioDn / AFFX-r2-Ec-bioDBioD gene of E.Coli bacteria – quantity 25 pM
AFFX-CreX / AFFX-r2-P1-creCre gene of P1 bacteriophage – quantity 100 pM

Each of the control probe-sets exists in three variants, each targeting a different region of the selected transcript - its central section and the 3′- and 5′-ends. This allows assessing the degradation rate of individual transcripts by examining the 3′/5′ probe-set signal ratios, which can be compared to the threshold defined by the manufacturer and ratios obtained for other microarrays, in order to assess the homogeneity of degradation level across individual samples. In order to aid the assessment of post-experimental RNA degradation, more complex methods have been developed including RNA degradation plots [62] or mixed effect models based on individual probe and transcript characteristics [63].

Step II: cDNA synthesis

At the beginning of this stage external RNA controls (ERCs) are added which serve as a control of cDNA synthesis independently of the volume and condition of the input material. For this purpose bacterial RNA is used (the so-called poly-A spike) with no homology to known human genes. Following this, the cDNA synthesis process is performed by the use of oligo-dT (a primer with a short sequence of deoxy-thymine nucleotides) or random primers. Oligo-dT binds to the poly-A tails of mRNAs, initiating the synthesis of the complementary strand in a process of reverse transcription (Fig.  4 ). This process does not work for rRNA molecules since unlike mRNA molecules they do not have a poly-A tail, and for this reason it is not necessary to remove the rRNA prior to this process. However, in some cases rRNA can be polyadenylated in human cells [64], and conversely, not all mRNAs have a poly-A tail, and there are also reports of mRNAs that exist in two forms both polyadenylated and non-polyadenylated [65].

cDNA synthesis based on a commercial T7-oligo (dT) primer. The arrow indicates the direction of synthesis, red font indicates the promoter sequence used in the amplification. The green region is the sequence spacer that separates the primer from the (T)24 motif

The second strand of the cDNA is then created by using the first strand as a template. Addition of ribonuclease causes RNA cleavage at nonspecific sites, leaving only short fragments attached to the cDNA (Fig.  2 ). These fragments are then used as primers for the polymerase which synthesizes the second strand of the cDNA, removing the remaining mRNA fragments found on its way. Measurement of cDNA concentration, which allows standardizing it across various samples, is not a part of the standard experimental procedure for eukaryotic cells, due to the presence of other nucleic acid species that affect the spectrophotometric measurement, whose removal requires additional cDNA purification. This step is strongly influenced by any previous RNA degradation, which leads to the creation of truncated mRNAs (from the 5′-end) [66]. When oligo-dT primers are used during cDNA synthesis these truncated mRNAs are read from the 3′-end only to the position of truncation, and the remaining part is lost due to the lack of poly-A. In such a situation probes located further from the 3′-end usually show lower signal intensity, a phenomenon which is the basis of RNA degradation plots used to assess the mRNA quality [62]. In order to reduce this effect, on 3′IVT microarrays probes from a single set are selected based on a very small region of 600਋p located close to the 3′-end of the mRNA. To further reduce this bias sophisticated methods have been developed that take into account the location of regions targeted by probes in order to correct the signal intensities [67, 68].

The 3′-end bias does not occur when random primers are used for the cDNA synthesis. Random primers do not require a poly-A tail since they can attach to any region of the mRNA and not only to its 3′-end, promoting synthesis in a 3′ ➔ 5′ direction, and a very strong 5′-end bias can be observed as shown in ref. [69].

Although many of the available cDNA synthesis kits include a combination of oligo-dT and random primers, kits based solely on oligo-dT are commonly used especially for the 3′-IVT platform where 3′-UTR sequences are of the highest importance since they are targeted by oligonucleotide probes.

Oligo-dT-based cRNA synthesis introduces an additional bias that may affect the results of a microarray experiment. First of all, because of the mRNA degradation problem, oligo-dT primers are a good choice only if the region of interest is located in the vicinity of the 3′-UTR, since large distances between the region targeted by probes and the poly-A can decrease the precision of expression level estimates [70]. If the analysis requires the entire transcript as in the case of WT (whole-transcript) microarray platforms where individual exons are analyzed, random primers are required. Additionally, oligo-dT is assumed to bind only to the poly-A tail of the transcript, requiring a long continuous strand of A nucleotides, as shown in Fig.  2 . However, partial primer complementarity (i.e. complementarity of only 8 adenine nucleotides in the primer’s sequence) is sufficient for the reaction initiation, and due to the random nature of the attachment it can also bind to the A-strands found commonly in the UTRs [71]. Further, with increasing concentration of oligo-dT the chance of attaching multiple oligonucleotides to a single mRNA are increased. In such situation the synthesis may start from two distinct regions but the reaction located closer to the 3′-end might be blocked by the second reaction, again producing truncated cDNA products [71]. This phenomenon can therefore affect the entire probe-set signal intensity of the targeted transcript if its sequence includes simple repeats built predominantly of A nucleotides.

Step III: Amplification and labeling

In this step the newly-synthesized cDNA is replicated (amplified) in a process of in vitro transcription. The goal of this step is to obtain a large quantity of cRNA containing biotinylated C and U nucleotides that will be required in the subsequent steps [58]. For this purpose another fragment of oligo-dT is used, marked in red in Fig.  2 , which serves as a promoter for the T7 bacteriophage polymerase.

The efficiency of this reaction and its consistency between samples has a decisive impact on the final experimental outcomes [72]. There are many factors which influence the efficiency of this reaction including the structural properties of the cDNA itself which, depending on the GC content, can affect the efficiency of the polymerase [73] and form secondary structures [74]. This step is completed with a cleanup and quantification of the cRNA which allows for control of the total reaction yield and purity of the sample. The product of the amplification reaction can be observed in lanes three and four of the electrophoresis gel (Fig.  3 ). rRNA is no longer visible, and due to the variability in length of the cRNAs there are no easily distinguishable bands visible on the gel.

Post-experimental control of cRNA level variations, utilizes the signals of probes targeting a reference RNA (poly-A spike) added prior to cDNA synthesis and signals of housekeeping genes which should be on a similar level across all samples. The poly-A spike contains transcripts of five B. subtilis genes (Dap, Lys, Phe, Thr, and Trp) which are added in various proportions to the isolated RNA. Since they all include a poly-A tail they undergo the same procedure as the RNA analyzed, independently of its condition. Lys gene RNA is added at the lowest concentration (1:100,000 of the total RNA) which is close to the sensitivity level of the microarray. Its detection in at least half of the microarrays of a given experiment is a good indicator of a properly conducted procedure. The remaining reference RNAs are added in increasing concentrations Lys < Phe < Thr <�p with Dap being the highest and close to the probe signal intensity saturation level.

The amplification products no longer have the T7 promoter, although the spacer sequence between the promoter and the (T)24 primer (green in Fig.  3 ) is also amplified [75]. Since this fragment is copied with each cRNA its quantity is very large, and since it can bind to probes having a similar sequence it might affect their signal intensity [69]. It is believed that the process of amplification might be the source of inconsistent signals among samples, as it depends highly on the experiment conditions and the transcript structure [74, 76], becoming the main motivation for the development of microarray protocols that do not require RNA amplification [77].

Step IV and V: cRNA fragmentation and hybridization

cRNAs obtained in the previous step are cut into 50� nt fragments shown in lanes five and six of the electrophoresis gel (Fig.  3 ). After this, another set of external RNA controls (ERCs) that originates from P1 bacteriophage and E. coli bacteria (termed bacterial spikes) is added to the RNA pool. Similarly to the poly-A spike, bacterial RNA is added in various concentrations with the following relations satisfied: bioB <𠂛ioC <𠂛ioD <𠂜re (group 4 in Table  1 ). BioB, bioC and bioD originate from the E. coli genes used in the synthesis of biotin, while Cre is isolated from P1 bacteriophage where its gene product serves as a recombinase [78]. This bacterial spike is already converted to cRNA and fragmented allowing to control the hybridization process, independently of the efficiency of labeling and amplification used in the previous steps to obtain cRNA [58]. After this the mixture of various cRNAs is transferred on to the microarray chip, initiating the hybridization process.

Hybridization is the most time-consuming step of the entire microarray procedure. During approximately 16 h, in which microarrays are incubated in a hybridization oven set to 45 ଌ, the cRNA binds to the specific probes attached to the glass surface of the microarray chip. The dynamics of the hybridization process depends on many factors which, as in the amplification step, depend on both the reaction conditions and structural properties of the individual cRNA molecules which may significantly affect the experimental outcomes [79, 80]. Prolonged hybridization can cause sample drying and uneven distribution of the material on the surface of the chip. Additionally, evaporation of some of the water can change the salt concentration in the buffers and significantly affect the efficiency of the process [81].

The main purpose of the bacterial spikes added before the hybridization step is to control the consistency of hybridization conditions across all samples, assessing the overall microarray performance [82]. Flaws in the experimental procedure cause either variations in expression intensity range or in the relations among individual bioB, bioC, bioD and Cre transcripts, although one has to remember that flaws in the hybridization process affect other transcripts as well. For this reason, hybridization inconsistencies should be also visible in probe-sets targeting other cRNAs, including the poly-A spike controls. If variations are only present in the bacterial spikes, the problem most likely originates from inaccuracies in their preparation or their concentration in the pre-hybridized cRNA. All of the possible scenarios for housekeeping genes, poly-A, and bacterial spike controls are summarized in Table  2 .

Table 2

Problems detected by different control probe-sets and their possible reasons a

Housekeeping genesPoly-A spikeBacterial spikePossible reason
errorokokPoor quality of the mRNA analyzed
errorerrorokProblems during amplification/labeling
errorerrorerrorProblems during hybridization/washing
okokerrorInaccurate preparation of bacterial spike
okerrorokInaccurate preparation of bacterial poly-A spike

a Other possible combinations of errors rarely occur in practice

Bacterial spike controls are a good indicator of problems that may occur during the hybridization procedure, although they fail to detect uneven hybridization, since the probe-set intensity is obtained after summarizing signals of over 20 individual probes, spread over the entire surface of the microarray (3′IVT arrays) or located in a small region at the middle of the array (WT arrays). For this purpose the quality control of each sample should include the analysis of an image of the microarray surface, which is either a complete scan saved in a DAT file, or more commonly a recreated image based on the individual probe intensities stored in a CEL file [83, 84].

The main assumption made in design of a microarray is that probes targeting a single transcript are placed randomly on its surface. For this reason, variations in the signal intensity of specific regions suggest reasons other than the biological variation between the analyzed mRNAs. Such differences among regions, termed image artifacts, are mostly caused by bubbles of air or small levels of impurities, which were added into the microarray cartridge with the experimental solutions [85]. Such artifacts appear very commonly, although they usually have a very small size and are handled efficiently by summarization methods, which are insensitive to a small number of outlying values. The main problem occurs when the artifact covers a significant percentage of the array surface or its intensity is extremely high and close to the saturation level of the probes. Such artifacts are mainly caused by uneven hybridization and affect not only the expression estimates from probes located in its region, but also the remaining probe signals. This latter effect is due to data processing, which utilizes expression levels of all or of a significant fraction of the probes on the microarray [38].

Microarray surface artifacts can be visualized by either creating an image, based on single probe expression intensities in a convenient (usually logarithmic) scale, or by analyzing differential images created by subtracting the signal of each probe on a single microarray from that on another reference array created by, for example, calculating the median intensity level of each probe across all microarrays in a single experiment [37]. If a defective array is found, probes affected by an aberration may be separated and removed from the subsequent data analysis or even recreated using imputation techniques [38, 37, 85, 86]. Microarrays affected by a very large aberration should be removed from the study, as they no longer serve as a reliable source of information.

Step VI: Washing and staining

Washing follows the cRNA hybridization and is used to remove cRNA non-specifically bound to the microarray surface. Again, in this step small variations in the reaction conditions may affect the expression estimates [87]. Depending on the conditions of the washing process (temperature, salt concentration, calcium and magnesium ion levels in the buffer) non-specifically bound cRNA is removed with varying efficiency, affecting the sensitivity and background level of the entire microarray. The binding strength of cRNAs depends not only on their complementarity level but also on the temperature of the hybridization and their sequence characteristics, mainly the GC content [88] and specific base positions inside the sequence [44]. Separation between the binding strength of non-specifically bound GC-rich cRNA and GC-poor cRNA with perfect complementarity is not very sharp, affecting the final intensity level of cRNAs depending on their sequence characteristics [50], which can be only reduced using sequence-based normalization approaches during the data pre-processing step [89, 90].

The washing process is followed by staining of the hybridized cRNA using a streptavidin-phycoerythrin complex (Fig.  2 ). Streptavidin is a protein with high binding affinity to the biotinylated nucleotides used in the cRNA preparation, while phycoerythrin is a fluorescent dye used for quantitation of the hybridized cRNA. The quality of the fluorophore used significantly affects the fluorescence intensity of the microarray, decreasing its sensitivity if it is exposed for too long to daylight [91].

Step VII: Scanning

In this step the microarray cartridge is placed in the microarray scanner where the fluorescence of the phycoerythrin bound to the cRNA is excited using a laser. The level of fluorescence is measured by the scanner’s detector and is assumed to be proportional to the amount of cRNA bound to the corresponding probe. The length of this process depends on the size of the microarray and in most cases lasts around 10 min for a single array. During the scanning process all arrays are placed inside the scanner’s chamber so that the fluorescence intensity is not affected by differences in the length of exposure to daylight, which could increase the differences among microarrays in both the scale of the measurements and the sensitivity level. It is advised to scan each microarray only once, since each subsequent scan decreases the fluorescence intensity by 10� %, due to decay of the fluorophore [92]. The fluorescence intensity of cyanine-based dyes also used in microarray experiments, such as Cy5, can be further affected by the ozone concentration in the laboratory, a factor which is both time- and location-dependent, and can become a major source of among-experiment inconsistencies [60, 93].

Step VIII: Data pre-processing

The last stage involves data pre-processing which starts by analyzing the microarray image stored in the DAT file, whose goal is to obtain single fluorescence intensity for each probe based on the 16 pixels of the original microarray image. This step is performed by the Affymetrix software and returns a CEL file as an output, in which each probe, at a specific position on the microarray, has a signal intensity assigned to it. These individual probe intensities are used in the subsequent preprocessing steps, during which each array is standardized by first estimating and then subtracting the background signal in order to reduce the effect of non-specific hybridization [44]. Following step is to perform normalization procedure which reduces the differences in probe intensities that originate from differences in experimental conditions and cRNA concentration [31, 94]. The final step of pre-processing is the summarization, in which a single expression estimate is calculated for each probe-set based on the intensity of the individual probe signals [95]. Summarization step is highly dependent on the quality of the probe and probeset definitions which are in many cases low due to inaccurate transcriptome data at the time of microarray design. This can result in probesets targeting transcripts of multiple genes due to low probe specificity, probes that do not map any of the known transcripts [41, 42] or multiple probesets that map the same gene [39, 40], requiring the development of methods used for the validation of existing probes and for probeset redefinition [41, 42, 96].

Selection of the pre-processing strategy can have a very large impact on the experimental outcomes [94] and often requires a few assumptions which are not always acceptable. The main assumption made by pre-processing methods is that the total level of mRNA in the cell does not vary significantly among samples, regardless of the experimental conditions and cell lines used. This assumption is required for the standardization approaches based on mean and median scaling or more complex approaches, such as quantile normalization [31], and its natural consequence is that the amount of differentially-expressed features with increased or decreased levels will be always similar. For example, in the case of global transcript level changes in cells with inhibited transcription, one might expect to detect predominantly transcript down-regulation, whereas after applying quantile normalization it is very probable that a significant number of up-regulated transcripts will be observed, due to intensity distribution transformations.

Another important assumption is forced by the massively parallel experimentation of the microarray technique which allows for assessing expression level of thousands of genes simultaneously. We have to assume that the reaction conditions for each individual gene were similar while knowing that due to various molecular properties of the analyzed RNA/DNA fragments it is impossible to properly optimize each of the individual reactions. Most of the data processing methods make this assumption although some standardization methods also exist that utilize probe and RNA/DNA sequence information in order to reduce the signal differences resulting from sub-optimal amplification and hybridization conditions that affect gene expression estimates to a varying degree [89, 90].

Advanced Analysis of Gene Expression Microarray Data

This book focuses on the development and application of the latest advanced data mining, machine learning, and visualization techniques for the identification of interesting, significant, and novel patterns in gene expression microarray data.

Biomedical researchers will find this book invaluable for learning the cutting-edge methods for analyzing gene expression microarray data. Specifically, the coverage includes the following state-of-the-art methods:

• Gene-based analysis: the latest novel clustering algorithms to identify co-expressed genes and coherent patterns in gene expression microarray data sets

• Sample-based analysis: supervised and unsupervised methods for the reduction of the gene dimensionality to select significant genes. A series of approaches to disease classification and discovery are also described

• Pattern-based analysis: methods for ascertaining the relationship between (subsets of) genes and (subsets of) samples. Various novel pattern-based clustering algorithms to find the coherent patterns embedded in the sub-attribute spaces are discussed

• Visualization tools: various methods for gene expression data visualization. The visualization process is intended to transform the gene expression data set from high-dimensional space into a more easily understood two- or three-dimensional space.

About the Author

Dr. Ujjwal Maulik is Professor of Computer Science and Engineering at Jadavpur University (India). He is the editor or author of five books and coauthor of more than 150 articles. Dr. Maulik is a Senior Member of IEEE and also a Humboldt Fellow.

Dr. Sanghamitra Bandyopadhyay is Professor at the Indian Statistical Institute. She is the editor or author of six books and coauthor of more than 180 articles. Dr. Bandyopadhyay is a Senior Member of IEEE and also a Humboldt Fellow.

Dr. Jason T. L. Wang?is a Professor and Director of the Data and Knowledge Engineering Lab at the New Jersey Institute of Technology. He is the editor or author of six books and?Executive Editor of the World Scientific Book Series on Science, Engineering, and Biology Informatics.

Types of Microarrays

Depending upon the kind of immobilized sample used construct arrays and the information fetched, the Microarray experiments can be categorized in three ways:

1. Microarray Expression Analysis: In this experimental setup, the cDNA derived from the mRNA of known genes is immobilized. The sample has genes from both the normal as well as the diseased tissues. Spots with more intensity are obtained for diseased tissue gene if the gene is over expressed in the diseased condition. This expression pattern is then compared to the expression pattern of a gene responsible for a disease.

2. Microarray for Mutation Analysis: For this analysis, the researchers use gDNA. The genes might differ from each other by as less as a single nucleotide base.

A single base difference between two sequences is known as Single Nucleotide Polymorphism (SNP) and detecting them is known as SNP detection.

3. Comparative Genomic Hybridization: It is used for the identification in the increase or decrease of the important chromosomal fragments harboring genes involved in a disease.

Table of contents (17 chapters)

Normalization of Affymetrix miRNA Microarrays for the Analysis of Cancer Samples

Methods and Techniques for miRNA Data Analysis

Cristiano, Francesca (et al.)

Bioinformatics and Microarray Data Analysis on the Cloud

Classification and Clustering on Microarray Data for Gene Functional Prediction Using R

Kleine, Liliana López (et al.)

Querying Co-regulated Genes on Diverse Gene Expression Datasets Via Biclustering

MetaMirClust: Discovery and Exploration of Evolutionarily Conserved miRNA Clusters

Analysis of Gene Expression Patterns Using Biclustering

Using Semantic Similarities and csbl.go for Analyzing Microarray Data

Ontology-Based Analysis of Microarray Data

Integrated Analysis of Transcriptomic and Proteomic Datasets Reveals Information on Protein Expressivity and Factors Affecting Translational Efficiency

Integrating Microarray Data and GRNs

Biological Network Inference from Microarray Data, Current Solutions, and Assessments

A Protocol to Collect Specific Mouse Skeletal Muscles for Metabolomics Studies

Functional Analysis of microRNA in Multiple Myeloma

Martino, Maria Teresa, Ph.D. (et al.)

Microarray Analysis in Glioblastomas

Analysis of microRNA Microarrays in Cardiogenesis

Erratum to: Classification and Clustering on Microarray Data for Gene Functional Prediction Using R

Material on the analysis of (micro)array data - Biology

Book Title :Microarray Data Analysis: Methods and Applications (Methods in Molecular Biology)

In this new volume, renowned authors contribute fascinating, cuttingedge insights into microarray data analysis. This innovative book includes indepth presentations of genomic signal processing, artificial neural network use for microarray data analysis, signal processing and design of microarray time series experiments, application of regression methods, gene expression profiles and prognostic markers for primary breast cancer, and factors affecting the crosscorrelation of gene expression profiles. Also detailed are use of tiling arrays for large genome analysis, comparative genomic hybridization data on cDNA microarrays, integrated highresolution genomewide analysis of gene dosage and gene expression in human brain tumors, gene and MeSH ontology, and survival prediction in follicular lymphoma using tissue microarrays.

Author(s) :Michael J. Korenberg (2007)

Click on the link below to start the download Microarray Data Analysis: Methods and Applications (Methods in Molecular Biology)

Microarray Data Analysis: Methods and Applications (Methods in Molecular Biology) ebook download
download ebook pdf
download Microarray Data Analysis: Methods and Applications (Methods in Molecular Biology) ebook textbook
download ebook read
download ebook twilight
buy ebook textbook
ebook Microarray Data Analysis: Methods and Applications (Methods in Molecular Biology) library free
ebook business training
Download Microarray Data Analysis: Methods and Applications (Methods in Molecular Biology) Film
Legal Microarray Data Analysis: Methods and Applications (Methods in Molecular Biology) Movie Download
Watch Full Version Of The Microarray Data Analysis: Methods and Applications (Methods in Molecular Biology) online
Microarray Data Analysis: Methods and Applications (Methods in Molecular Biology) online
Download Microarray Data Analysis: Methods and Applications (Methods in Molecular Biology) Divx
How To Download Microarray Data Analysis: Methods and Applications (Methods in Molecular Biology) The Movie
Microarray Data Analysis: Methods and Applications (Methods in Molecular Biology) Film
Watch Microarray Data Analysis: Methods and Applications (Methods in Molecular Biology) 2009 Full Movie

ProtoArray Prospector Software v5.2.3

ProtoArray Prospector v5.2.3 generates a list of positive interactions between the probe of interest and the immobilized proteins on the array. Beginning with data from either GenePix image quantification software or in 4-column, tab-delimited format, ProtoArray Prospector v5.2.3 provides rapid analysis of single and multiple microarray results generated in Protein-Protein Interaction, Kinsase Substrate Identification, Ubiquitin Ligase Substrate Identification, Small Molecule Profiling, or Immune Response Biomarker Profiling assays.

ProtoArray Prospector is available free of charge with the purchase of ProtoArray Protein Microarray Products.


Praise for the First Edition
The book by Draghici is an excellent choice to be used as a textbook for a graduate-level bioinformatics course. This well-written book with two accompanying CD-ROMs will create much-needed enthusiasm among statisticians.
— Journal of Statistical Computation and Simulation , Vol. 74

I really like Draghici's book. As the author explains in the Preface, the book is intended to serve both the statistician who knows very little about DNA microarrays and the biologist who has no expertise in data analysis. The author lays out a study plan for the statistician that excludes 5 of the 17 chapters (4-8). These chapters present the basics of statistical distributions, estimation, hypothesis testing, ANOVA, and experimental design. What that leaves for the statistician is the three-chapter primer on microarrays and image processing, plus all of the data analysis tools specific to the microarray situation. … it includes two CDs with trial versions of several specialised software packages. Anyone who uses microarray data should certainly own a copy.
— Technometrics , Vol. 47, No. 1, February 2005

Watch the video: Εισαγωγή και Ανάλυση Δεδομένων με SPSS (May 2022).


  1. Zulmaran

    This is happiness!

  2. Shaktimuro

    As it is impossible by the way.

Write a message