We are searching data for your request:
Upon completion, a link will appear to access the found materials.
MiRNA Gene Discovery
MiRNAs are post-transcriptional regulators that bind to mRNAs to silence a gene. The computational problem here is how to find the genes which correspond to these miRNAs.
The first problem is finding hairpins. Simply folding the genome produces approximately 760,000 hairpins, but there are only 60 to 200 true miRNAs. Thus we need methods to help improve specificity. Structural features, including folding energy, loops (number, symmetry), hairpin length and symmetry, substructures and pairings, can be considered, however, this only increases specificity by a factor of 40. Thus structure alone cannot predict miRNAs. Evolutionary signatures can also be considered. MiRNA show characteristic conservation properties. Hairpins consist of a loop, two arms and flanking regions. In most RNA, the loop is the most well conserved due to the fact that it is used in binding. In miRNA, however, the arms are more conserved because they determine where the RISK complex will bind. This increases specificity by a factor of 300. Both these structural features and conservation properties can be combined to better predict potential miRNAs.
These features are combined using machine learning, specifically random forests. This produces many weak classifiers (decision trees) on subsets of positives and negatives. Each tree then votes on the final classification of a given miRNA. Using this technique allows us to reach the desired sensitivity (increased by 4,500 fold).
Validating Discovered MiRNAs
Discovered miRNAs can be validated by comparing to known miRNAs. An example given in class shows that 81% of discovered miRNAs were already known to exist, which shows that these methods perform well. The putative miRNAs have yet to be tested, however this can be difficult to do as testing is done by cloning.
Region specificity is another method for validating miRNAs. In the background, hairpins are fairly evenly distributed between introns, exons, intergenic regions, and repeats and transposons. Increasing confidence in predictions causes almost all miRNAs to fall in introns and intergenic regions, as expected. These predictions also match sequencing reads.
This also produced some genomic properties typical of miRNAs. They have a preference for transcribed strand. This allows them to piggyback in intron of real gene, and thus not require a separate transcription. They also clustering with known and predicted miRNAs. This indicates that they are in the same family and have a common orgin.
MiRNA’s 5’ End Identification
The first seven bases determine where an miRNA binds, thus it is important to know exactly where clevage occurs. If this clevage point is wrong by even two bases, the miRNA will be predicted to bind to a completely different gene. These clevage points can be discovered computationally by searching for highly conserved 7-mers which could be targets. These 7-mers also correlate to a lack of anti-targets in ubiquitously expressed genes. Using these features, structural features and conservational features, it is possible to take a machine learning approach (SVMs) to predict clevage site. Some miRNAs have no single high scoring position, and these also show imprecise processing in the cell. If the star sequence is highly scored, then it tends to be more expressed in the cell also.
Functional Motifs in Coding Regions
Each motif type has distinct signatures. DNA is strand symmetric, RNA is strand-specific and frame- invariant, and Protein is strand-specific and frame-biased. This frame-invariance can be used as a signature. Each frame can then be evaluated separately. Motifs due to di-codon usage biases are conserved in only one frame offset while motifs due to RNA-level regulation are conserved in all three frame offsets. This allows the ability to distinguish overlapping pressures.
MicroRNA Targets in Drosophila
The recent discoveries of microRNAs (miRNAs) and characterization of the first few targets of their gene products in Caenorhabditis elegans and Drosophila melanogaster have set the stage for elucidation of a novel network of regulatory control. Here, we present a novel three-step method for whole-genome prediction of miRNA target genes, validated using known examples. We apply the method to discover hundreds of potential target genes in D. melanogaster. For each miRNA, target genes are selected based on (a) pattern of sequence complementarity using a position-weighted local alignment algorithm, (b) energy calculation of RNA-RNA duplex formation, and (c) conservation of target sites in related genomes. Application to the D. melanogaster, D pseudoobscura and Anopheles gambiae genomes in this manner, identifies several hundred target genes potentially regulated by one or more known miRNAs.
These potential targets are enriched for genes that are expressed at specific developmental stages and are involved in cell fate specification, morphogenesis and the coordination of developmental processes, as well as the function of the nervous system in the mature organism. High-ranking targets are two-fold enriched in transcription factors and include genes already known to be under translational regulation. Our results reaffirm the thesis that miRNAs play an important role in establishing the complex spatial and temporal patterns of gene activity necessary for the orderly progression of development and point to additional roles in the function of the mature organism.
The emerging combinatorics of miRNA target sites in the 3' UTRs of messenger RNAs are reminiscent of transcriptional regulation in promoter regions of DNA, with both one-to-many and many-to-one relationships between regulator and regulated target. Typically, more than one miRNA regulates one message, indicative of cooperative control of translation. Conversely, one miRNAs may have several targets, reflecting target multiplicity.
As a guide to targeted experiments, we provide detailed online information  about target genes and binding sites for each miRNA and about miRNAs for each gene, ranked by likelihood of match. The target prediction tool can be applied to any similar pair of genomes with identified miRNA sequences.
MDMA Abuse in Relation to MicroRNA Variation in Human Brain Ventral Tegmental Area and Nucleus Accumbens
3,4-methylenedioxymethamphetamine (MDMA) is one of the most widespread illegal drugs, that have been used particularly by young people in the 15-34 age group. MicroRNAs (miRNAs) are endogenously synthesized, non-coding, and small RNAs that post-transcriptionally regulate their target genes' expression by inhibiting protein translation or degradation. miRNAs are increasingly implicated in drug-related gene expressions and functions. Notably, there are no reports of miRNA variation in the human brain in MDMA abuse. We here present a miRNA profiling study - the first such study, to the best of our knowledge - into the post-mortem human brains of a sample of people with MDMA abuse, along with non-drug dependent controls. The miRNA profiling of nucleus accumbens (NAc) and ventral tegmental areas (VTA) was performed by microarray analysis. Subsequently, two candidate miRNA putative biomarkers were selected according to significant regional differential expression (miR-1202 and miR-7975), using quantitative reverse-transcription PCR (qRT-PCR). We showed that the expression level of miR-7975 was significantly lower in the VTA regions of the 30 MDMA users, as compared with the 30 control samples. Another significantly deregulated miR-1202 was down-regulated in the NAc regions of 30 MDMA samples in comparison to the control samples. Alteration of these miRNAs can potentially serve as novel biomarkers for MDMA abuse, and warrant further research in independent and larger samples of patients.
Keywords: 3 4-methylenedioxymethamphetamine MicroRNA Microarray research Nucleus Accumbens Ventral Tegmental Area.
Heat-map illustration of deregulated miRNAs…
Heat-map illustration of deregulated miRNAs in VTA with MDMA samples vs VTA control…
Relative expression levels of miR-1202…
Relative expression levels of miR-1202 in 30 NAc with MDMA samples compared to…
Part 2: Tailing in the Regulation of microRNA and Beyond
00:00:15.02 Hi. I'm Narry Kim from the RNA Research Center
00:00:18.11 at IBS.
00:00:20.16 I also work at Seoul National University.
00:00:24.18 In the last part of my presentation,
00:00:26.23 I talked to you about
00:00:29.27 how microRNAs are generated and regulated.
00:00:32.16 In this talk, I will talk to you about
00:00:36.26 a type of RNA modification called tailing,
00:00:40.15 which controls microRNA biogenesis
00:00:43.24 and other types of RNAs, including messenger RNAs.
00:00:49.26 So, more than 100 types of RNA modifications
00:00:54.20 have been described so far.
00:00:56.11 RNA modification can be on the base or the ribose residues.
00:01:03.10 Additionally, the nucleotides
00:01:06.19 can be added to or removed from RNA.
00:01:09.14 My lab has been particularly interested
00:01:13.04 in the 3' part of the RNA, which we call tails.
00:01:17.29 In this presentation,
00:01:20.18 I'm going to show you some of the data from the lab
00:01:27.07 that demonstrates that, sometimes,
00:01:29.23 a tail can actually change the fate of the RNA.
00:01:33.12 So, this line of the study
00:01:38.20 began when we were studying the microRNA pathway.
00:01:42.00 As I explained to you in the first part of the talk,
00:01:46.06 miRNA biogenesis involves
00:01:49.02 Pol II, Drosha, Exportin 5, Dicer, and Argonaute.
00:01:53.28 And mature miRNAs are thought.
00:01:58.02 were thought to be a single species.
00:02:03.07 But, in fact, if you look at the deep sequencing data.
00:02:07.25 here, on the left,
00:02:11.01 the numbers indicate the relative abundance of the miRNA species.
00:02:16.01 apart from the most abundant reference sequence,
00:02:19.19 you can also find additional isoforms.
00:02:23.07 Most notably, some isoforms have
00:02:28.29 non-templated A sequences or U sequences
00:02:33.07 at the end of the miRNA.
00:02:35.29 In animals, As and Us
00:02:39.27 are the most common untemplated nucleotides
00:02:44.04 at the 3' end of the RNA.
00:02:46.14 In plants, U is the most prominent one,
00:02:50.10 as originally described by Xuemei Chen's lab.
00:02:55.06 So, where are these untemplated nucleotides coming from?
00:03:00.23 It turned out that uridylation takes place,
00:03:04.26 mostly, prior to Dicer processing on pre-miRNA,
00:03:09.15 whereas adenylation occurs
00:03:14.20 after Dicer processing, on 22 nucleotide RNA.
00:03:20.14 So, these reactions are carried out
00:03:25.07 by a group of noncanonical poly(A) polymerases,
00:03:28.23 also known as terminal uridylyl transferases.
00:03:33.02 Out of the 7 noncanonical poly(A) polymerases, or TUTases,
00:03:39.09 TUT4 and TUT7,
00:03:43.25 which have similar domain organizations,
00:03:47.21 have redundant functions in the uridylation of pre-let7.
00:03:55.05 What about adenylation?
00:03:57.04 It was shown that TUTase2, or GLD2,
00:04:01.22 or Wispy in flies,
00:04:04.19 are responsible for adenylation.
00:04:07.21 But before I begin to explain the regulation
00:04:11.05 by tailing,
00:04:13.13 I want to make clear that the tailing frequencies
00:04:16.08 are generally quite low in most cell types
00:04:18.27 for most miRNAs.
00:04:21.24 So, I want you to have an impression
00:04:25.08 that tailing is always important for all miRNAs.
00:04:30.22 However, the tailing frequency
00:04:34.27 varies depending on miRNA species and cell types.
00:04:38.13 And tailing can provide a molecular basis
00:04:43.03 for the regulation of some miRNAs
00:04:46.02 in specific developmental stages.
00:04:49.18 So, to give you a couple examples
00:04:53.24 of tailing-mediated regulation.
00:04:56.15 the first one is uridylation of pre-let-7.
00:05:00.28 There are actually at least two modes of uridylation of pre-let-7.
00:05:05.12 The first one is monouridylation
00:05:09.21 -- this is mediated by TUT7 and TUT4,
00:05:13.25 and monouridylation actually promotes Dicer processing,
00:05:18.28 so, in this context TUTases
00:05:24.09 promote let-7 biogenesis,
00:05:27.12 serving as a biogenesis factor.
00:05:29.14 But then, this is the case of differentiated somatic cells,
00:05:34.03 but in embryonic cells and cancer,
00:05:36.28 an RNA binding protein called Lin28 is expressed,
00:05:44.21 and it binds to pre-let-7
00:05:47.08 and interacts with TUTases
00:05:50.18 to induce oligouridylation.
00:05:53.09 So, now, the long U tail
00:05:56.15 is inhibitory to Dicer processing,
00:05:59.09 and further promotes decay of the RNA.
00:06:03.19 So, under this situation,
00:06:09.07 TUTases, the same enzymes,
00:06:11.11 can serve as negative regulators.
00:06:13.21 So, there is a functional duality in uridylation,
00:06:17.22 depending on the length of the uridylated tails
00:06:22.05 and the context of RNA and cell types.
00:06:28.29 By doing this, Lin28 and TUTases
00:06:35.07 can provide a molecular switch in the developmental
00:06:37.22 and pathological transitions,
00:06:40.05 such as in cancer.
00:06:41.28 The second example is
00:06:46.21 adenylation of mature miRNAs.
00:06:48.27 In this Northern blotting from Drosophila eggs and embryos,
00:06:55.03 you can find some heterogeneous miRNA populations,
00:06:59.26 which indicates adenylation of mature miRNAs.
00:07:04.00 These adenylated isoforms
00:07:07.15 disappear when the embryo moves to
00:07:11.07 its developmental stage.
00:07:14.00 Later, it turned out that the same pattern is observed in other animal species,
00:07:23.03 such as in sea urchin and mouse,
00:07:27.14 suggesting that this is a conserved mechanism.
00:07:31.05 We later found that an enzyme called Wispy,
00:07:35.11 a noncanonical poly(A) polymerase,
00:07:38.22 is specifically expressed in eggs
00:07:41.06 and induces miRNA adenylation.
00:07:44.20 This is a mutant of the Wispy gene
00:07:47.06 and, here, miRNAs accumulate as an unmodified form.
00:07:55.18 So, we found that miRNAs
00:07:58.14 are dynamically controlled during late oogenesis
00:08:02.01 and early embryogenesis.
00:08:04.23 Maternal miRNAs are deposited
00:08:07.18 and then they are degraded rapidly during development,
00:08:12.04 while the zygotic miRNAs
00:08:16.03 are induced from the zygotic genome
00:08:18.11 to replace the population.
00:08:21.11 The mechanism underlying this regulation
00:08:24.12 is regulated by Wispy,
00:08:27.20 which adenylates miRNA to induce rapid decay.
00:08:34.08 So, Wispy-mediated adenylation
00:08:37.07 may contribute to the clearance of maternal miRNAs
00:08:41.06 during this interesting, dynamic developmental transition period.
00:08:48.05 So, I have explained to you so far,
00:08:50.24 in animal systems,
00:08:53.07 there is uridylation as well as adenylation
00:08:56.11 that control the fate of a certain group of miRNAs
00:09:00.15 in specific cell types
00:09:02.25 and developmental conditions.
00:09:05.08 So, moving on to other pathways.
00:09:15.13 mRNAs are also known to have tails, canonical poly(A) tails,
00:09:19.01 but we were curious if there is
00:09:27.09 any other types of noncanonical tails on mRNAs.
00:09:29.27 We were also curious if we could investigate
00:09:34.11 the function of these tails
00:09:37.24 by measuring poly(A) tail length at the genomic scale
00:09:40.28 and at high resolution.
00:09:43.16 The other methods, such as Northern blotting and microarray,
00:09:47.29 have been developed,
00:09:50.10 but the resolution and the scale was not good enough
00:09:53.29 to look at the transcriptome effectively.
00:10:00.01 So, we developed a technique called TAIL-seq,
00:10:05.14 which is to sequence the 3' terminome.
00:10:08.03 Just to briefly explain to you the protocol,
00:10:11.27 total RNAs were enriched with mRNA
00:10:17.00 by size fractionation and ribosomal RNA depletion.
00:10:22.16 This was ligated to the 3' adapter
00:10:25.09 that contains biotin residues
00:10:28.26 so that, after partial digestion,
00:10:31.06 we can pull down the 3'-most fragment
00:10:34.21 by using streptavidin beads.
00:10:38.23 The fragment was then ligated
00:10:42.12 to a 5' adapter and further amplified by RT-PCR,
00:10:49.03 and we carried out paired-end sequencing
00:10:52.05 to get 51 nucleotides from Read 1
00:10:58.23 and 251 nucleotides from Read 2.
00:11:02.00 Read 1 is used to map the RNA,
00:11:06.18 to get the identity of the transcript,
00:11:09.29 and Read 2 is used to determine
00:11:13.25 the tail sequences.
00:11:16.13 Using this method, we could precisely determine
00:11:22.21 the very, very end sequences of the RNAs,
00:11:25.19 but one challenge that we had was,
00:11:29.01 because of the homopolymeric nature of poly(A) tails,
00:11:32.23 if you go deep into the tail,
00:11:36.07 the sequencing quality becomes really bad,
00:11:40.25 so it was very difficult to get the sequences right.
00:11:44.23 To overcome the problem,
00:11:47.00 we collected the fluorescent signals
00:11:50.01 directly from the sequencing reaction
00:11:52.27 and looked at the images from the sequencer,
00:11:56.17 noticed that there's a transition
00:12:00.07 between poly(A) sequences
00:12:03.20 and non-poly(A) sequences.
00:12:05.25 And we could convert the information
00:12:08.22 to calculate relative T signal,
00:12:11.18 and it gave us the ability
00:12:17.06 to determine the state of the sequence
00:12:21.03 to code for the poly(A) tail length.
00:12:25.04 So, we were able to measure, accurately,
00:12:29.17 the poly(A) tail length of mRNAs.
00:12:33.24 So, this was quite useful,
00:12:35.23 and let me show you some examples of TAIL-seq reads.
00:12:39.17 These are the random reads that match to the p53 mRNA.
00:12:45.07 As I said, this is a paired-end sequencing,
00:12:49.08 so from Read 1, which is shown in blue,
00:12:52.25 are used to get the identity of the transcript,
00:12:58.19 and then Read 2 is used to
00:13:02.29 learn about the sequences of the tail.
00:13:06.12 From this dataset,
00:13:09.01 we could determine precisely
00:13:13.08 the poly(A) tail length of the mRNA.
00:13:15.19 In addition, we could of course look at
00:13:20.14 the 3' end of the RNA
00:13:23.16 to find some interesting terminal modifications,
00:13:26.00 such as uridylation and even guanylation.
00:13:31.07 So, TAIL-seq is very useful
00:13:34.23 in studying the regulation of poly(A) tails.
00:13:38.09 Shown here is an example
00:13:42.15 -- it's the TAIL-seq data from total cell lysate
00:13:45.19 and PABP immunoprecipitate.
00:13:48.15 PABPC1 is a poly(A) binding protein.
00:13:52.03 On the x axis you have poly(A) tail length
00:13:56.06 and on the y axis you can see the fraction of reads.
00:14:00.20 This is a duplicated experiment
00:14:03.18 and you can see that input -- or total cell lysate --
00:14:08.26 gives a very reproducible result.
00:14:12.19 And from this distribution
00:14:18.05 you can learn that the median poly(A) length
00:14:22.26 is typically between 60-100 nucleotides
00:14:26.10 in mammalian messenger RNAs,
00:14:28.27 which is shorter than previously anticipated.
00:14:33.23 You can also learn that PABP, as you would expect,
00:14:42.26 binds preferentially to long poly(A) tails.
00:14:46.05 PABP is known to span 25 nucleotides,
00:14:50.23 so, on average, a poly(A) tail
00:14:54.11 can accommodate only 2-4 PABP molecules.
00:14:58.06 You can also use this technique
00:15:05.04 to study the 3' end modification of mRNA.
00:15:08.16 For instance, you can study uridylation of mRNA.
00:15:12.19 We were actually quite surprised to see
00:15:16.04 such widespread distribution of uridylation
00:15:20.02 on mammalian mRNAs.
00:15:21.27 This suggests that,
00:15:25.14 perhaps, uridylation is an integral part of the mRNA life cycle,
00:15:30.21 and even though mRNA uridylation
00:15:33.28 was observed in some individual mRNAs
00:15:36.16 from fungi and plants before,
00:15:41.04 this study collectively indicates that
00:15:47.10 this process may be preserved in eukaryotes.
00:15:51.17 To find out the function of uridylation,
00:15:56.05 we searched for the enzymes that are responsible for uridylation
00:16:01.01 and found that TUTases 4 and 7 mediate mRNA uridylation.
00:16:07.25 If you knock down TUTase 4 and 7 in HeLa cells
00:16:14.27 and carry out a TAIL-seq experiment,
00:16:19.14 uridylation frequency is reduced substantially,
00:16:22.29 particular the oligo-U tails.
00:16:27.12 So, what's the functional consequence of these enzymes?
00:16:34.06 To find out, we knocked down TUTase 4 and 7
00:16:38.26 and treated the cells with actinomycin D
00:16:44.01 and carried out an RNA-seq experiment
00:16:48.03 to measure the stability of mRNAs.
00:16:49.27 And we found that,
00:16:53.13 compared to control,
00:16:56.04 in TUT4 and 7 depleted cells,
00:16:59.09 the half-lives of mRNAs were globally increased,
00:17:02.17 indicating that TUTase 4 and 7 facilitate mRNA decay.
00:17:13.04 So, our study told us that
00:17:16.21 oligo-U tails serve as a decay mark for mRNAs,
00:17:21.13 as well as for pre-miRNAs.
00:17:24.08 To explain to you how messenger RNAs
00:17:27.28 are degraded in mammalian cells,
00:17:31.19 the active mRNA has a long poly(A) tail
00:17:36.04 that is bound to poly(A) binding protein,
00:17:39.12 but after the adenylation
00:17:43.20 the poly(A) tail gets shortened.
00:17:46.18 When it is below about 25 nucleotides,
00:17:49.20 PABP cannot bind to this mRNA any longer,
00:17:54.14 and this leads to the recruitment of decay factors.
00:18:03.00 It is known that this can happen in 5'-to-3' orientation
00:18:07.07 and 3'-to-5' orientation.
00:18:11.03 Our study shows that this can be facilitated
00:18:16.07 when TUTase 4 and 7
00:18:20.00 recognizes that adenylated mRNA
00:18:23.06 and uridylates the tail,
00:18:25.15 which recruits the decay factors more rapidly.
00:18:33.12 So, that was the role of uridylation
00:18:38.14 in the context of the mRNA pathway.
00:18:42.10 So, to wrap up this talk,
00:18:44.27 we have found that
00:18:49.20 tailing can provide important layers of gene regulation.
00:18:54.01 And tails can actually come in more than one flavor
00:18:59.14 -- not only canonical poly(A) tails,
00:19:02.13 but also noncanonical A tails, or U tails, or G tails.
00:19:09.18 And we have shown that noncanonical A tails
00:19:15.07 can destabilize maternal microRNAs.
00:19:18.10 And U tails can be used in various ways.
00:19:23.08 Oligo-U tail, in particular,
00:19:25.25 serves as a general decay mark
00:19:28.29 for both microRNA pathways and mRNA pathways.
00:19:33.23 With that, I would like to thank
00:19:36.29 all previous and present members of my lab,
00:19:40.22 but I would like to especially point out
00:19:45.25 Inha Heo, Chirlmin Joo, and Minju Ha
00:19:48.00 for the microRNA uridylation study
00:19:50.25 and Mihye Lee and Yeon Choi
00:19:53.24 for the microRNA adenylation study.
00:19:56.17 And mRNA tailing was studied
00:20:00.15 mainly by Hyeshik Chang, Jaechul Lim, and Minju Ha.
00:20:03.21 I also appreciate the help from our collaborators,
00:20:07.02 especially Dinshaw Partel's lab
00:20:11.04 and the funding from Institute for Basic Science.
00:20:16.01 Thank you very much for your attention.
- Part 1: Biogenesis and Regulation of microRNA
Biogenesis of miRNAs
miRNA biogenesis starts with the processing of RNA polymerase II/III transcripts post- or co-transcriptionally (14). About half of all currently identified miRNAs are intragenic and processed mostly from introns and relatively few exons of protein coding genes, while the remaining are intergenic, transcribed independently of a host gene and regulated by their own promoters (13, 24). Sometimes miRNAs are transcribed as one long transcript called clusters, which may have similar seed regions, and in which case they are considered a family (25). The biogenesis of miRNA is classified into canonical and non-canonical pathways (Figure 1).
Figure 1. MicroRNA biogenesis and mechanism of action. Canonical miRNA biogenesis begins with the generation of the pri-miRNA transcript. The microprocessor complex, comprised of Drosha and DiGeorge Syndrome Critical Region 8 (DGCR8), cleaves the pri-miRNA to produce the precursor-miRNA (pre-miRNA). The pre-miRNA is exported to the cytoplasm in an Exportin5/RanGTP-dependent manner and processed to produce the mature miRNA duplex. Finally, either the 5p or 3p strands of the mature miRNA duplex is loaded into the Argonaute (AGO) family of proteins to form a miRNA-induced silencing complex (miRISC). In the non-canonical pathways, small hairpin RNA (shRNA) are initially cleaved by the microprocessor complex and exported to the cytoplasm via Exportin5/RanGTP. They are further processed via AGO2-dependent, but Dicer-independent, cleavage. Mirtrons and 7-methylguanine capped (m 7 G)-pre-miRNA are dependent on Dicer to complete their cytoplasmic maturation, but they differ in their nucleocytoplasmic shuttling. Mirtrons are exported via Exportin5/RanGTP while m 7 G-pre-miRNA are exported via Exportin1. All pathways ultimately lead to a functional miRISC complex. In most cases, miRISC binds to target mRNAs to induce translational inhibition, most likely by interfering with the eIF4F complex. Next, GW182 family proteins bound to Argonaute recruit the poly(A)-deadenylases PAN2/3 and CCR4-NOT. PAN2/3 initiates deadenylation while the CCR4-NOT complex completes the process, leading to removal of the m 7 G cap on target mRNA by the decapping complex. Decapped mRNA may then undergo 5′𢄣′ degradation via the exoribonuclease XRN1. Modified from Hayder et al. (26).
The Canonical Pathway of miRNA Biogenesis
The canonical biogenesis pathway is the dominant pathway by which miRNAs are processed. In this pathway, pri-miRNAs are transcribed from their genes and then processed into pre-miRNAs by the microprocessor complex, consisting of an RNA binding protein DiGeorge Syndrome Critical Region 8 (DGCR8) and a ribonuclease III enzyme, Drosha (27). DGCR8 recognizes an N6-methyladenylated GGAC and other motifs within the pri-miRNA (28), while Drosha cleaves the pri-miRNA duplex at the base of the characteristic hairpin structure of pri-miRNA. This results in the formation of a 2 nt 3′ overhang on pre-miRNA (29). Once pre-miRNAs are generated, they are exported to the cytoplasm by an exportin 5 (XPO5)/RanGTP complex and then processed by the RNase III endonuclease Dicer (27, 30). This processing involves the removal of the terminal loop, resulting in a mature miRNA duplex (31). The directionality of the miRNA strand determines the name of the mature miRNA form. The 5p strand arises from the 5′ end of the pre-miRNA hairpin while the 3p strand originates from the 3′ end. Both strands derived from the mature miRNA duplex can be loaded into the Argonaute (AGO) family of proteins (AGO1-4 in humans) in an ATP-dependent manner (32). For any given miRNA, the proportion of AGO-loaded 5p or 3p strand varies greatly depending on the cell type or cellular environment, ranging from near equal proportions to predominantly one or the other (33). The selection of the 5p or 3p strand is based in part on the thermodynamic stability at the 5′ ends of the miRNA duplex or a 5′ U at nucleotide position 1 (34). Generally, the strand with lower 5′ stability or 5′ uracil is preferentially loaded into AGO, and is deemed the guide strand. The unloaded strand is called the passenger strand, which will be unwound from the guide strand through various mechanisms based on the degree of complementarity. The passenger strands of miRNA that contain no mismatches are cleaved by AGO2 and degraded by cellular machinery which can produce a strong strand bias. Otherwise, miRNA duplexes with central mismatches or non-AGO2 loaded miRNA are passively unwound and degraded (14).
Non-canonical miRNA Biogenesis Pathways
To date, multiple non-canonical miRNA biogenesis pathways have been elucidated (Figure 1). These pathways make use of different combinations of the proteins involved in the canonical pathway, mainly Drosha, Dicer, exportin 5, and AGO2. In general, the non-canonical miRNA biogenesis can be grouped into Drosha/DGCR8-independent and Dicer-independent pathways. Pre-miRNAs produced by the Drosha/DGCR8-independent pathway resemble Dicer substrates. An example of such pre-miRNAs is mirtrons, which are produced from the introns of mRNA during splicing (35, 36). Another example is the 7-methylguanosine (m 7 G)-capped pre-miRNA. These nascent RNAs are directly exported to the cytoplasm through exportin 1 without the need for Drosha cleavage. There is a strong 3p strand bias most likely due to the m 7 G cap preventing 5p strand loading into Argonaute (37). On the other hand, Dicer-independent miRNAs are processed by Drosha from endogenous short hairpin RNA (shRNA) transcripts (38). These pre-miRNAs require AGO2 to complete their maturation within the cytoplasm because they are of insufficient length to be Dicer-substrates (38). This in turn promotes loading of the entire pre-miRNA into AGO2 and AGO2-dependent slicing of the 3p strand. The 3′-5′ trimming of the 5p strand completes their maturation (39).
18.4: MicroRNA Genes and Targets - Biology
Choose one of the following search options:
|Search by miRNA name|
|Search by gene target|
miRDB is an online database for miRNA target prediction and functional annotations. All the targets in miRDB were predicted by a bioinformatics tool, MirTarget, which was developed by analyzing thousands of miRNA-target interactions from high-throughput sequencing experiments. Common features associated with miRNA binding and target downregulation have been identified and used to predict miRNA targets with machine learning methods. miRDB hosts predicted miRNA targets in five species: human, mouse, rat, dog and chicken. Users may also provide their own sequences for custom target prediction using the updated prediction algorithm. In addition, through combined computational analyses and literature mining, functionally active miRNAs in humans and mice were identified. These miRNAs, as well as associated functional annotations, are presented in the FuncMir Collection in miRDB. As a recent update, miRDB presents the expression profiles of hundreds of cell lines and the user may limit their search for miRNA targets that are expressed in a cell line of interest. To facilitate the prediction of miRNA functions, miRDB presents a new web interface for integrative analysis of target prediction and Gene Ontology data.
18.4: MicroRNA Genes and Targets - Biology
The study of a class of small non-coding RNA molecules, named microRNAs (miRNAs), has advanced our understanding of many of the fundamental processes of cancer biology and the molecular mechanisms underlying tumor initiation and progression. MiRNA research has become more and more attractive as evidence is emerging that miRNAs likely play important regulatory roles virtually in all essential bioprocesses. Looking at this field over the past decade it becomes evident that our understanding of miRNAs remains rather incomplete. As research continues to reveal the mechanisms underlying cancer therapy efficacy, it is clear that miRNAs contribute to responses to drug therapy and are themselves modified by drug therapy. One important area for miRNA research is to understand the functions of miRNAs and the relevant signaling pathways in the initiation, progression and drug-resistance of tumors to be able to design novel, effective targeted therapeutics that directly target pathologically essential miRNAs and/or their target genes. Another area of increasing importance is the use of miRNA signatures in the diagnosis and prognosis of various types of cancers. As the study of non-coding RNAs is increasingly more popular and important, it is without doubt that the next several years of miRNA research will provide more fascinating results.
Laboratory Methods in Epigenetics
Yu Liu , . Qianjin Lu , in Epigenetics and Dermatology , 2015
18.104.22.168 Gain-of-Function and Loss-of-Function Experiments
Specific miRNA function can be explored by up- and downregulating specific miRNA levels. Gain-of-function experiments are performed by transfecting a plasmid containing a constitutive promoter (e.g., cytomegalovirus (CMV)) to overexpress a pri-miRNA or a pre-miRNA sequence. Viral vectors can also be used, or the pre-miRNA itself can be transfected. Usually, the associated companies offer the pre-miRNA precursor molecule, a miRNA mimic that is chemically synthesized as a modified double-stranded oligonucleotide  . At the same time, miRNA functional analysis can also be examined by using synthetic miRNA inhibitors.
RNA is involved in the regulation of multiple cellular processes, often by forming sequence-specific base pairs with cellular RNA or DNA targets that must be identified among the large number of nucleic acids in a cell. Several RNA-based regulatory systems in eukaryotes, bacteria and archaea, including microRNAs (miRNAs), small interfering RNAs (siRNAs), CRISPR RNAs (crRNAs) and small RNAs (sRNAs) that are dependent on the RNA chaperone protein Hfq, achieve specificity using similar strategies. Central to their function is the presentation of short 'seed sequences' within a ribonucleoprotein complex to facilitate the search for and recognition of targets.
Background on miRNA biogenesis and function
miRNAs are non-coding RNAs of ∼20 to 24 nt that regulate post-transcriptional gene expression of targets (reviewed in Yu, Jia, & Chen, 2017 ). As shown in Figure 9, miRNA biogenesis begins with transcription of a MIR gene by RNA polymerase II to produce a primary miRNA (pri-miRNA), consisting of a stem-loop region flanked by unstructured arms. In sequential steps, dicer-like 1 RNase (DCL1) excises the stem-loop to form the miRNA precursor (pre-miRNA), then generates a duplex comprising the miRNA and its opposing strand, classically termed microRNA (miRNA). Once in the cytoplasm, the RNA-induced silencing complex (RISC) is established upon association of the miRNA with Argonaute 1 (AGO1). Plant miRNAs tend to have high sequence complementarity with targets and act primarily through target mRNA cleavage rather than translational inhibition, the latter being more common in animals. Although the high sequence complementarity of miRNA/target pairs observed in plants is conducive to RNA cleavage, this is not always the case, as miRNA-mediated translational inhibition has been shown in plants for highly complementary miRNA/target pairs.
Intent of the article
The outcome of this article is a candidate set of biologically relevant miRNA/target pairs, along with validations of differential expression and target cleavage. This information may be sought for the purpose of basic research, i.e., to uncover a novel layer of gene expression regulation for the pathway under study. Indeed, plant miRNAs are implicated in numerous abiotic and biotic stress responses, as well as maintenance of normal growth and development (Noman et al., 2017 ). Alternatively, one may wish to leverage these results in a more applied manner. For example, the use of miRNAs to manipulate transcript abundance is a popular strategy for crop genetic engineering, with several reports demonstrating its successful application in agronomic trait improvement (Djami-Tchatchou, Sanan-Mishra, Ntushelo, & Dubery, 2017 ).
This article is intended for researchers studying non-model plants. These species tend to lack public genomic resources, most notably a reference genome or transcriptome. Ideally, miRNA prediction is performed at the genomic level, with several prediction algorithms operating solely on the genome sequence (Rajendiran, Chatterjee, & Pan, 2018 ). We provide a strategy that considers the unique challenges associated with large-scale bioinformatic analysis of a non-model plant. Specifically, assembly and annotation of a reference transcriptome is easier, faster, and less expensive than that of a genome. The former also requires less specialized knowledge, due in large part to the availability of integrated, user-friendly tools aimed at biologists with limited bioinformatic experience. It is important to note that the use of genome or transcriptome references from related plant species is not recommended. This is because plant miRNAs tend to be highly species-specific, and unlike animals, miRNA precursors are generally not conserved across plants (Bartel, 2004 ). For plant miRNAs that are conserved, this usually occurs at the level of the mature miRNA (Bartel, 2004 ). Even so, we advise against miRNA prediction based solely on sequence comparison against mature miRNAs of related species, as the identification of a suitable miRNA precursor is an important feature used to reduce false positive predictions. Therefore, the approach of this article is to create and use resources that are specific to the plant under study.
Strengths and limitations of the procedure
Due to the tendency of plant miRNAs to be species-specific, an advantage of our procedure is that miRNAs are predicted using a transcriptomic reference generated for the species under study. Additionally, the integration of both miRNA and target differential expression in the procedure described in this article provides a filter to identify the most biologically relevant interactions for downstream analysis. This approach assumes that miRNAs and targets whose expression levels change in response to the treatment variable are more likely to be important mediators of the response than those pairs whose expression levels do not change significantly. This is a reasonable assumption given that the expression of an miRNA and its mRNA target tend to be correlated, an attribute that has been exploited successfully by us and others to discover relevant miRNA/target pairs in plants (Ji et al., 2018 Ma et al., 2018 Neller et al., 2018 Ye, Wang, & Wang, 2016 ). Therefore, an investigation of differentially expressed miRNAs without considering target expression only reveals half of the story. Our use of paired small RNA and mRNA samples in this procedure enables downstream application of correlation methods to further refine miRNA/target relationships.
Restriction of the analysis to targets that are differentially expressed has a limitation in that it may filter out miRNAs acting through translational inhibition rather than transcript cleavage. However, since this mechanism of action is not observed frequently in plants, the limitation is not critical unless the reader is specifically interested in miRNAs that operate in this manner. Furthermore, the two modes of miRNA action may be difficult to distinguish in plants, as miRNA-induced cleavage occurs on targets undergoing active translation (reviewed in Yu et al., 2017 ). Another limitation of our procedure is that validation by the modified RLM-RACE used in this article only provides information on the presence of cleaved targets, not their relative abundance, and it may be unsuccessful in the case of low-abundance miRNAs or targets, where less cleavage product is available for detection. A high-throughput equivalent to this procedure is degradome sequencing (Lin, Chen, & Lu, 2019 ). By incorporating RNA-seq, this method enables detection of all cleaved targets in a sample and their relative abundance. It is less economical and requires extensive data analysis, but the combination of small RNA-seq and degradome-seq is highly informative see Ji et al. ( 2018 ) for a recent implementation of this strategy.
Comparison of the bioinformatic workflow with current methods
The workflow presented in this article is based on our experience with available software options. We have prioritized qualities of open access, user friendliness, in-depth documentation, and smooth integration. Some components of our workflow are more advanced than others. For the tasks of de novo transcriptome assembly and annotation, we highly recommend Trinity and its companion software Trinotate and TransDecoder. Additionally, we suggest the use of scripts packaged with Trinity to facilitate differential expression and GO enrichment analyses. Each of these tasks requires an advanced level of expertise that can overwhelm a novice user, resulting in incorrect application of methods. For this reason, we view the integration and guidance provided by Trinity developers as highly advantageous. However, other options do exist for de novo transcriptome assembly. For a plant-focused summary of these tools and other resources, see Geniza and Jaiswal ( 2017 ). Additionally, refer to Honaas et al. ( 2016 ) for a comparative analysis of transcriptomes generated from different assemblers for the model plants rice and Arabidopsis. It is also possible to bypass de novo transcriptome assembly completely by performing Iso-seq, a relatively new implementation of long-read technology that has been used in plants (An, Cao, Li, Humbeck, & Wang, 2018 ). Regardless of the chosen strategy, note that Trinotate and TransDecoder can be used for annotating any transcriptome as long as the required inputs are provided.
Other aspects of the workflow are more amenable to user customization. For example, there are numerous options available for miRNA prediction and target identification (reviewed in Rajendiran et al., 2018 ). Recently, supervised machine learning was used to predict miRNAs in a reference-free manner (Vitsios et al., 2017 ). Although promising, this approach was more successful in predicting miRNAs for animals than plants. Furthermore, it is unlikely to outperform reference-based prediction for a non-model plant, as the user must provide training data consisting of known miRNAs or instead use the ‘universal plant’ model. Other options for user customization of our workflow are at the level of transcript quantification and differential expression analysis. We used the alignment-based method RSEM for transcript quantification, but alignment-free methods such as Kallisto and Salmon are also popular due to their improved speed. Although alignment-free and alignment-based methods are comparable in accuracy for standard investigations like protein-coding mRNA quantification, note that alignment-based methods perform better when quantifying small or low-abundance RNAs (Wu, Yao, Ho, Lambowitz, & Wilke, 2018 ). The reader may also wish to investigate alternatives for differential expression analysis, with popular options including DESeq2 and Voom/Limma. The Trinity accessory scripts used in our workflow support these various programs/packages for transcript quantification and differential expression analysis, thereby accommodating differences in experimental design and user preference. For a comparison of RNA-seq mapping methods (both alignment-free and alignment-based) and differential expression tools, see Costa-Silva, Domingues, and Lopes ( 2017 ).
Extended bioinformatic analysis
With the paired small RNA and mRNA samples as used in our workflow, the investigator is equipped to perform advanced correlation analysis for obtaining greater insight into miRNA/target interactions. In our study, we computed the Pearson correlation coefficient (PCC) for the expression of each miRNA and its target(s) and imposed a PCC cut-off to filter the set of candidate pairs. The PCC ranges from −1 to +1, indicating perfect negative and positive linear association, respectively. There is a tendency in literature to retain only negatively correlated miRNA/target pairs. This derives from the rationale that a cleavage-inducing miRNA acting on its target in absence of other influences leads to reduced target expression due to RNA degradation. However, we and others have observed and validated positively correlated miRNA/target pairs. This dynamic can arise from miRNA-mediated spatial restriction of the target (Kawashima et al., 2009 Kidner & Martienssen, 2004 Levine, McHale, & Levine, 2007 Nikovics et al., 2006 ). It may also indicate that the miRNA functions in a ‘buffering’ capacity, minimizing changes in target expression caused by other interacting factors (Wu, Shen, & Tang, 2009 ). For these reasons, we do not recommend restricting analysis to negatively correlated miRNA/target pairs. If readers are interested in generating an advanced miRNA/target interaction network, we direct them to reviews on the various mathematical models and integrated approaches used (Carroll, Goodall, & Liu, 2014 Muniategui, Pey, Planes, & Rubio, 2013 ). For simple visualization of predicted miRNA/target interactions, we recommend import of results from this workflow into Cytoscape (Shannon et al., 2003 ).
High-quality RNA is essential for both RNA-seq and the validations used in this article. It consists of RNA that is primarily free of degradation by cellular nucleases and lacks contamination by genomic DNA. Extraction of RNA begins by grinding the tissue sample and solubilizing its contents. Solubilization buffers containing guanidinium compounds protect against nucleases and aid in breakdown of the cell membrane (Chomczynski, 1993 ). Following solubilization, the user can continue with a reagent-based extraction, such as with RNAzol, or switch to purification with silica columns. RNAzol is advantageous as it enables convenient isolation of the small RNA fraction and contains additives to reduce genomic DNA contamination. Alternatively, silica-based purification allows much faster RNA isolation and on-column DNase treatment to remove genomic DNA however, a major drawback of this technology is its often-poor yield.
5′ RACE was originally developed to map the +1 transcription start site of mRNAs (Sambrook & Russell, 2006 ). In this application, the mRNA was reverse transcribed with an oligo d(T) primer, then terminal deoxytransferase (TdT) was used to add multiple nucleotides to the 3′ end of the cDNA (known as ‘tailing’) to produce an adapter sequence. RLM-RACE forgoes tailing and instead ligates a 5′ RNA adapter directly to an mRNA pool that has been phosphatase-treated and decapped to select for full-length mRNAs. This is ideal for the original application of the method but not for validating miRNA-induced cleavage events, since the targeted mRNA is sliced. To modify this procedure for detecting cleaved products, we omitted the selection of full-length mRNAs, resulting in any exposed 5′ phosphate in the mRNA pool becoming ligated to the RNA adapter. Subsequent amplification with PCR and cloning with Gibson assembly allows the identification of 5′ cut sites of specific mRNA targets.
RLM-RACE is superior to 5′ RACE for detecting miRNA-induced cleavage. T4 RNA ligase is efficient at adding an RNA adapter to the 5′ end of mRNA, while TdT used to tail the cDNA in 5′ RACE can add nucleotides to ssDNA, dsDNA, and, at a lower efficiency, RNA, meaning there is less specificity for the cDNA of cleaved mRNA. Additionally, RLM-RACE avoids potential artifacts caused by the reverse transcriptase stalling during cDNA synthesis. As described above, degradome-seq is a high-throughput equivalent to the modified RLM-RACE used in this article, enabling relative quantification of all cleaved targets. To quantify miRNA-directed repression regardless of whether it derives from transcript cleavage or translational inhibition, a transient dual luciferase assay has been optimized for use in Nicotiana benthamiana (Moyle et al., 2017 ).
QRT-PCR and stem-loop qRT-PCR
qRT-PCR is a common and rapid method for relative RNA quantification. The transcript of interest is quantified in both the treated and untreated sample, with the change in expression normalized to that of an internal control transcript in each sample to account for differences in amount of starting RNA. Internal controls are often referred to as ‘housekeeping’ genes for their stable expression across various treatments. In this article, we use qRT-PCR and stem-loop qRT-PCR to validate expression of target mRNAs and miRNAs, respectively. Design of primers for standard qRT-PCR is relatively straightforward: the two primers should produce a product of 150 to 200 bp. To identify and reduce the impact of genomic DNA contamination in the RNA sample, primers should span an intronic region if one is known to exist. Stem-loop qRT-PCR, developed by Chen et al. ( 2005 ), uses a stem-loop primer complementary to the last six bases on the 3′ end of the miRNA. This increases the length and melting point of the PCR product to make it compatible with standard cycling. Therefore, specificity of the PCR reaction is conferred mainly by the forward primer, which spans most of the miRNA sequence.
We use SYBR Green to measure fluorescence during qPCR, but specificity can be increased by substituting sequence-specific hydrolysis probes such as TaqMan (Applied Biosystems) or Universal ProbeLibrary (UPL, Roche Diagnostics). The probes anneal to single-stranded DNA and emit fluorescence only upon DNA polymerase-induced cleavage, which results from 5′ to 3′ exonuclease activity of the polymerase as it extends the primer. Use of these probes reduces background fluorescence due to primer dimers and increases specificity, as the probe binds between primer annealing sites. UPL probes provide increased specificity by incorporating locked nucleic acids, which are modified nucleotides with ribose rings stabilized in an ideal conformation for Watson-Crick base pairing. For a protocol utilizing UPL stem-loop qRT-PCR to quantify low-abundance plant miRNAs, see Varkonyi-Gasic, Wu, Wood, Walton, and Hellens ( 2007 ). Although beneficial, the high cost of hydrolysis probes likely excludes them from use in initial screening validations, but their incorporation may be worthwhile for characterization of a few key miRNAs.
High-quality RNA is an essential input for both RNA-seq and the validations performed in this article. General plant health is an important contributor to RNA quality, with lack of light or nutrients resulting in lesser-quality RNA. If the treatment under study is intended to elicit a stress response, the strength and duration of treatment must be optimized to avoid impacting overall RNA quality. When harvesting plants, tissue should be flash-frozen in liquid nitrogen and processed immediately. Use of pre-chilled tools and tubes prevents frozen tissue from melting, thereby limiting nuclease activity. If liquid nitrogen is unavailable, tissue can be preserved in saturated ammonium sulfate solution, such as RNAlater (Ambion). Note that the lysate is stable upon solubilization in RNAzol and can be stored long-term at −20°C. Once extracted, RNA is susceptible to degradation, by nucleases and resulting from hydrolysis under basic conditions. Both factors can be controlled by resuspending RNA pellets in RNA storage buffer, which contains DTT to inactivate ribonucleases and citrate buffer to reduce pH and chelate metal ions.
Sufficient computational power and memory are required to perform RNA-seq analysis. Raw data and output files must be stored, and the workflow must be able to complete in a reasonable timeframe. We performed all analysis for the full-scale job using a personal server with 64 Gb RAM and a 4 Tb hard drive. If suitable infrastructure is not available, public options for researchers include Galaxy and CyVerse (originally iPlant Collaborative Merchant et al., 2016 ) for both high-performance computing and data storage. We introduce the user to Galaxy in this article.
Certain aspects of the RNA-seq experimental design and bioinformatic workflow are essential to include. If differential expression analysis is performed, both biological replicates and strand-specific reads must be incorporated. The former allow for an assessment of sample variability, while the latter ensures accurate transcript quantification. Note that many peer-reviewed journals now require a minimum of three biological replicates for inclusion of RNA-seq differential expression analysis. It is also essential to quality control the raw reads prior to analysis. Contaminating adapter sequences create artifacts in both the assembled transcriptome and small RNA sequences, and low-quality bases introduce sequence errors. Both issues interfere with transcriptome assembly, differential expression analysis, and miRNA prediction.
Factors affecting success of wet-lab validations are also important to consider. Low-abundance miRNAs (<10 CPM) and targets (<100 TPM) are difficult to detect. Cleaved mRNA is quickly degraded in the cell, so a low initial abundance further inhibits detection by RLM-RACE. Similarly, low abundance results in undetectable or variable Ct values during qRT-PCR. Another critical factor for a successful qRT-PCR experiment is the selection of appropriate internal controls. A preliminary test should be performed to ensure that expression is stable for the treatment under study. Two popular programs that aid in the selection of internal controls are NormFinder (Aanes et al., 2014 ) and geNorm (Vandesompele et al., 2002 ). These programs use pairwise comparisons of expression data to rank the stability of reference genes. Both are available as Microsoft Excel plugins, which allows analysis without extensive bioinformatics knowledge. However, these programs are unable to process large transcriptome datasets with ease. We reduced the list of input reference genes by calculating the mean, standard deviation, and relative standard deviation of abundance for each transcript across all samples using Excel. Relative standard deviation is calculated by multiplying standard deviation by 100 and dividing by the mean. Transcripts were sorted first by lowest relative standard deviation and then by highest mean expression level. Such transcripts would vary little in level across different treatments and would be of sufficient abundance to detect reliably. The reduced gene list was used as input for NormFinder. The final list of candidate reference genes was verified with qRT-PCR across the different treatments tested.