Submit Manuscript  

Article Details


An Improved Computational Prediction Model for Lysine Succinylation Sites Mapping on Homo sapiens by Fusing Three Sequence Encoding Schemes with the Random Forest Classifier

[ Vol. 22 , Issue. 2 ]

Author(s):

Samme Amena Tasmia , Fee Faysal Ahmed, Parvez Mosharaf, Mehedi Hasan and Nurul Haque Mollah*   Pages 122 - 136 ( 15 )

Abstract:


Background: Lysine succinylation is one of the reversible protein post-translational modifications (PTMs), which regulate the structure and function of proteins. It plays a significant role in various cellular physiologies including some diseases of human as well as many other organisms. The accurate identification of succinylation site is essential to understand the various biological functions and drug development.

Methods: In this study, we developed an improved method to predict lysine succinylation sites mapping on Homo sapiens by the fusion of three encoding schemes such as binary, the composition of kspaced amino acid pairs (CKSAAP) and amino acid composition (AAC) with the random forest (RF) classifier. The prediction performance of the proposed random forest (RF) based on the fusion model in a comparison of other candidates was investigated by using 20-fold cross-validation (CV) and two independent test datasets were collected from two different sources.

Results: The CV results showed that the proposed predictor achieves the highest scores of sensitivity (SN) as 0.800, specificity (SP) as 0.902, accuracy (ACC) as 0.919, Mathew correlation coefficient (MCC) as 0.766 and partial AUC (pAUC) as 0.163 at a false-positive rate (FPR) = 0.10 and area under the ROC curve (AUC) as 0.958. It achieved the highest performance scores of SN as 0.811, SP as 0.902, ACC as 0.891, MCC as 0.629 and pAUC as 0.139 and AUC as 0.921 for the independent test protein set-1 and SN as 0.772, SP as 0.901, ACC as 0.836, MCC as 0.677 and pAUC as 0.141 at FPR = 0.10 and AUC as 0.923 for the independent test protein set-2. It also outperformed all the other existing prediction models.

Conclusion: The prediction performances as discussed in this article recommend that the proposed method might be a useful and encouraging computational resource for lysine succinylation site prediction in the case of human population.

Keywords:

Protein sequences, lysine succinylation site, prediction, encoding schemes, feature selection, random forest, fusion model.

Affiliation:

Bioinformatics Lab., Department of Statistics, Rajshahi University, Rajshahi-6205, Department of Mathematics, Jashore University of Science and Technology, Jashore, Bioinformatics Lab., Department of Statistics, Rajshahi University, Rajshahi-6205, Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, Fukuoka, Bioinformatics Lab., Department of Statistics, Rajshahi University, Rajshahi-6205

Graphical Abstract:



Read Full-Text article