Bioinformatics Prediction

Bioinformatics prediction of miRNAs that mainly relies on comparative genome-based EST analysis using known miRNAs in certain species is a well-established approach to discover conserved miRNAs in target species lacking genomic resources (Zhang et al. 2005). This method has been widely used for miRNA discovery in many plant species such as Arabidopsis (Wang et al. 2004; Adai et al. 2005), rice (Bonnet et al. 2004; Jones-Rhoades and Bartel 2004), cotton (Zhang et al. 2007), soybean (Zhang et al. 2008), tomato (Luan et al. 2010), brachypodium (Unver and Budak 2009), apple (Gleave et al. 2008), and other species.

Different known miRNA sequences as a query are used to search against NCBI’s switchgrass EST database. Matts et al. (2010) used miRNA sequences obtained from Arabidopsis (miRBase) as a query for general identification, and miRNA sequences from rice for identification of monocot-specific miRNAs (Matts et al. 2010). Xie et al. (2010) used 1699 known miRNAs from 29 plant species for switchgrass miRNA indetification (Xie et al. 2010). While Matts et al. (2010) used NCBI BLASTN as the search tool to find homologous miRNAs in switchgrass with the criteria of at least 18 nt and left 3 nt match (Matts et al. 2010), Xie et al. argued that BLASTN is not an ideal tool for miRNA discovery and might miss a lot of potential miRNA predictions (Xie et al. 2010). Instead, they adopted WATER to search against the EST database with the criteria of no >2 nt substitution (Xie et al. 2010).

The search by either BLASTN or WATER led to numerous hits among the ESTs, which were then subjected to a more strict screening by using different criteria. Matts et al. extracted the flanking region of the mature miRNA sequences and used a fold-back structure prediction software mFOLD to predict its secondary structures (Matts et al. 2010). These predicted secondary structures were then compared with those deposited in miRBase for verification (Matts et al. 2010; Xie et al. 2010). Xie et al. first removed repeated and protein-coding sequence hits, and then screened the rest of the hits by using 6 standards based on sequence complementarity between EST hits and query miRNA sequences, minimum length of pre — miRNA, secondary structure of predicted pre-miRNA, and sequence complementarity and structure of miRNA: miRNA* (Xie et al. 2010). Application of these criteria reduced some false positives and generated potential candidates for conserved microRNAs in switchgrass.