toplogo
Sign In

Predicting RNA Editing Events Across Species Using Machine Learning Algorithms


Core Concepts
Machine learning algorithms, including random forest and bidirectional long short-term memory neural networks, can predict RNA editing events by leveraging primary sequence and secondary structure information. Cross-training these models across species provides insights into the functional conservation of the RNA editing mechanism.
Abstract
The authors present a new approach to assess the functional conservation of the RNA editing targeting mechanism using machine learning algorithms. They trained a random forest (RF) model and a bidirectional long short-term memory (biLSTM) neural network with an attention layer to predict RNA editing events in humans, mice, and mackerel. The RF model was able to achieve over 75% accuracy in predicting editing events, with the global maximum double-strand size and the distance to the closest inner loops being the most important features. The biLSTM model, using both sequence and secondary structure information, achieved almost 95% accuracy in predicting human editing events. Interestingly, the model performed well even when using only the sequence channel, suggesting that secondary structure plays a key role in the RNA editing target selection mechanism. The authors then tested the models on more unbalanced datasets, mimicking real-world scenarios, and found that while the accuracy remained high, the number of false positives greatly exceeded the true positives. This highlights the challenge of using these models for de novo prediction of editing events. To investigate the conservation of the RNA editing mechanism, the authors used a cross-training approach, training the models on one species and testing on another. The results showed that the models trained on mammalian data (human and mouse) could predict each other's datasets with reasonable accuracy, but performed poorly on the mackerel dataset. This suggests that while the RNA editing mechanism is largely conserved between mammals, there are likely differences in the targeting mechanism between mammals and the teleost fish mackerel, potentially due to differences in factors like temperature affecting RNA secondary structure. Overall, this work demonstrates the power of machine learning approaches to study the complex and elusive process of RNA editing, and provides a novel in silico method to infer the conservation of the editing mechanism across species.
Stats
The most prominent sentences containing key metrics or figures are: "All performed similarly well, reaching an accuracy above 75% (Supp. Fig 1)." "Using a sliding window of 50+1+50 nucleotides, we obtained an accuracy of almost 95% using balanced datasets (Fig. 3 A)." "Interestingly, although the accuracy when predicting is just below 95%, the highly unbalanced nature of the dataset results in the amount of false positives greatly surpassing the number of true positives (Fig. 4 A)." "When predicting T. trachurus data, the algorithms trained with human and mouse data achieved only a 50% and 51%, respectively."
Quotes
"The GlobalMaxDSSize descriptor may be relevant discriminating along the decision tree, as it is a value describing the whole RNA molecule. Any RNA molecule will have either a mixture of editable and non-editable adenosines or all non-editable adenosines. Thus, the global parameters may play a role discriminating between these two groups." "If we explore the similarities in sequence and structure of the positive cases, we cannot see any distinguishable pattern (Fig. 3 C)." "Interestingly, although the accuracy when predicting is just below 95%, the highly unbalanced nature of the dataset results in the amount of false positives greatly surpassing the number of true positives (Fig. 4 A)."

Deeper Inquiries

How could the machine learning models be further improved to reliably predict de novo RNA editing events, given the challenge of the highly unbalanced nature of edited vs non-edited adenosines

To improve the machine learning models for predicting de novo RNA editing events, several strategies can be implemented. One approach is to address the issue of the highly unbalanced nature of edited vs non-edited adenosines by employing techniques such as oversampling, undersampling, or using more advanced algorithms designed for imbalanced datasets. Oversampling involves creating additional copies of the minority class (edited adenosines) to balance the dataset, while undersampling reduces the number of instances in the majority class (non-edited adenosines). Additionally, the use of more sophisticated algorithms like XGBoost or deep learning models such as convolutional neural networks (CNNs) could enhance the predictive power of the models. These algorithms are capable of capturing complex patterns in the data and may improve the accuracy of predicting RNA editing events.

What other factors, beyond primary sequence and secondary structure, might influence the ADAR-mediated RNA editing mechanism, and how could these be incorporated into the predictive models

Beyond primary sequence and secondary structure, several other factors could influence the ADAR-mediated RNA editing mechanism. One important factor is the presence of RNA-binding proteins that interact with ADAR enzymes and guide them to specific editing sites. Incorporating information about these RNA-binding proteins and their binding motifs into the predictive models could provide valuable insights into the editing process. Additionally, the local RNA environment, such as RNA modifications, RNA secondary structure stability, and RNA-protein interactions, could play a role in modulating RNA editing. Integrating data on these factors into the predictive models could enhance their accuracy and predictive power.

Given the apparent differences in the RNA editing mechanism between mammals and the teleost fish mackerel, what other evolutionary divergences in this process might exist across the tree of life, and how could this inform our understanding of the role of RNA editing in shaping organismal complexity and adaptation

The differences observed in the RNA editing mechanism between mammals and teleost fish like mackerel suggest that there may be evolutionary divergences in RNA editing across the tree of life. These differences could be attributed to variations in the expression and activity of ADAR enzymes, the presence of species-specific RNA-binding proteins, and the structural characteristics of RNA molecules in different organisms. By studying RNA editing in a diverse range of species, including invertebrates, plants, and fungi, we can gain a comprehensive understanding of how this process has evolved and diversified throughout evolution. This comparative analysis can provide insights into the role of RNA editing in shaping organismal complexity, adaptation to different environments, and the evolution of biological diversity.
0
visual_icon
generate_icon
translate_icon
scholar_search_icon
star