Performance evolution for sentiment classification using machine learning algorithm


  • Faisal Hassan University of Karachi
  • Naseem Afzal Qureshi University of Karachi
  • Muhammad Zohaib Khan Shaheed Mohtarma Benazir Bhutto Institute of Trauma
  • Muhammad Ali Khan Mehran UET
  • Abdul Salam Soomro Mehran UET
  • Aisha Imroz Avanza Solutions (Pvt.) Ltd
  • Hussain Bux Marri BBSUTSD



Machine Learning, K-Means, Logistic Regression, Random Forest, Decision Tree Algorithms


Machine Learning (ML) is an Artificial Intelligence (AI) approach that allows systems to adapt to their environment based on past experiences. Machine Learning (ML) and Natural Language Processing (NLP) techniques are commonly used in sentiment analysis and Information Retrieval Techniques (IRT). This study supports the use of ML approaches, such as K-Means, to produce accurate outcomes in clustering and classification approaches. The main objective of this research is to explore the methods for sentiment classification and Information Retrieval Techniques (IRT). So, a combination of different machine learning algorithms is used with a dataset from amazon unlocked mobile reviews and telecom tweets to achieve better accuracy as it is crucial to consider the previous predictions related to sentiment classification and IRT. The datasets consist of user reviews ratings and algorithms utilized consist of K-Means Clustering algorithm, Logistic Regression (LR), Random Forest (RF), and Decision Tree (DT) algorithms. The amalgamation of each algorithm with the K-Means resulted in high levels of accuracy. Specifically, the K-Means combined with Logistic Regression (LR) yielded an accuracy rate of 99.98%. Similarly, the K-Means integrated with Random Forest (RF) resulted in an accuracy of 99.906%. Lastly, when the K-Means was merged with the Decision Tree (DT) Algorithm, the accuracy obtained was 99.83%.We exhibited that we could foresee efficient, effective, and accurate outcomes.


Download data is not yet available.

Author Biographies

Faisal Hassan, University of Karachi

Department of Mathematics, Faculty of Science

Naseem Afzal Qureshi, University of Karachi

 Department of Computer Science, Faculty of Science

Muhammad Zohaib Khan, Shaheed Mohtarma Benazir Bhutto Institute of Trauma

Software and Data Engineer

Muhammad Ali Khan, Mehran UET

 Professor (Assistant), Industrial Engineering and Management

Abdul Salam Soomro, Mehran UET

Professor & Chairman, Industrial Engineering and Management

Aisha Imroz, Avanza Solutions (Pvt.) Ltd

 Software Engineer

Hussain Bux Marri, BBSUTSD

Professor (Meritorious) & Dean Faculty of Engineering Technology


Abad-Segura, E., González-Zamar, M.-D., Infante-Moro, J.C., & Ruipérez García, G. (2020). Sustainable management of digital transformation in higher education: Global research trends. Sustainability, 12(5), 2107.

Abualigah, L.M., Khader, A.T., & Hanandeh, E.S. (2018). A hybrid strategy for krill herd algorithm with harmony search algorithm to improve the data clustering? Intelligent Decision Technologies, 12(1), 3-14.

Alharbi, A.S.M., & de Doncker, E. (2019). Twitter sentiment analysis with a deep neural network: An enhanced approach using user behavioral information. Cognitive Systems Research, 54, 50-61.

Arain, M.S., Khan, M.A., & Kalwar, M.A. (2020). Optimization of Target Calculation Method for Leather Skiving and Stamping: Case of Leather Footwear Industry. International Journal of Business Education and Management Studies (IJBEMS), 7(1), 15-30.

Baig, M.A., Shaikh, S.A., Khatri, K.K., Shaikh, M.A., Khan, M.Z., & Rauf, M.A. (2023). Prediction of Students Performance Level Using Integrated Approach of ML Algorithms. International Journal of Emerging Technologies in Learning, 18(1), 216-234.

Bansal, J.C., Sharma, H., Jadon, S.S., & Clerc, M. (2014). Spider monkey optimization algorithm for numerical optimization. Memetic Computing, 6, 31-47.

Benavides, L.M.C., Tamayo Arias, J.A., Arango Serna, M.D., Branch Bedoya, J.W., & Burgos, D. (2020). Digital transformation in higher education institutions: A systematic literature review. Sensors, 20(11), 3291.

Boateng, E.Y., Otoo, J., & Abaye, D.A. (2020). Basic tenets of classification algorithms K-nearest-neighbor, support vector machine, random forest and neural network: a review. Journal of Data Analysis and Information Processing, 8(4), 341-357.

Bouazizi, M., & Ohtsuki, T. (2017). A pattern-based approach for multi-class sentiment analysis in Twitter. IEEE Access, 5, 20617-20639.

Bouazizi, M., & Ohtsuki, T. (2018). Multi-class sentiment analysis in Twitter: What if classification is not the answer. IEEE Access, 6, 64486-64502.

Brownlee, J. (2016). Supervised and Unsupervised Machine Learning Algorithms. Machine Learning Mastery, 6(3).

Brownlee, J. (2019). Machine learning mastery with Weka. Ebook. Edition, 1(4).

Buriro, M.A., Rahoo, L.A., Nagar, Muhammad Ali Khan; Kalhoro, M., Kalhoro, S., & Halepota, A.A. (2018). Social Media used for promoting the Libraries and Information Resources and services at University Libraries of Sindh Province. Proceedings of IEEE International Conference on Innovative Research and Development (ICIRD).

Channar, P.B., Ahmed, G., Thebo, J.A., Khan, M.A., & Rahoo, L.A. (2023). Factors Of Knowledge Sharing Among Faculty Members In Higher Educational Institutions: An Empirical Study Of The Public Sector. Journal of Positive School Psychology, 7(4), 1498-1506.

Chaudhry, A.K., Kalwar, M.A., Khan, M.A., & Shaikh, S.A. (2021). Improving the Efficiency of Small Management Information

System by Using VBA. International Journal of Science and Engineering Investigations, 10(111), 7-13.

Chauhan, N.S. (2020). Decision tree algorithm, explained. KDnuggets,[Online]. Available: .[Accessed 16 April 2021].

Chugh, A., Sharma, V.K., Kumar, S., Nayyar, A., Qureshi, B., Bhatia, M.K., & Jain, C. (2021). Spider monkey crow optimization algorithm with deep learning for sentiment classification and information retrieval. IEEE Access, 9, 24249-24262.

Dabbura, I. (2018). K-means clustering: Algorithm, applications, evaluation methods, and drawbacks. Towards Data Science.

Datavedas. (2018). Classification Problems. Datavedas Classification Problems.

Ducange, P., Fazzolari, M., Petrocchi, M., & Vecchio, M. (2019). An effective Decision Support System for social media listening based on cross-source sentiment analysis models. Engineering Applications of Artificial Intelligence, 78, 71-85.

Gao, L., Wang, Y., Li, D., Shao, J., & Song, J. (2017). Real-time social media retrieval with spatial, temporal and social constraints. Neurocomputing, 253, 77-88.

Golubic, S., & Marusic, D. (1999). Reviews and inspections-an approach to the improvement of telecom software development process. Proceedings ConTEL, 99, 283-290.

Hassan, A.U., Hussain, J., Hussain, M., Sadiq, M., & Lee, S. (2017). Sentiment analysis of social networking sites (SNS) data using machine learning approach for the measurement of depression. 2017 International Conference on Information and Communication Technology Convergence (ICTC), 138-140.

Injadat, M., Moubayed, A., Nassif, A.B., & Shami, A. (2021). Machine learning towards intelligent systems: applications, challenges, and opportunities. Artificial Intelligence Review, 54, 3299-3348.

Iqbal, F., Hashmi, J.M., Fung, B.C.M., Batool, R., Khattak, A.M., Aleem, S., & Hung, P.C.K. (2019). A hybrid framework for sentiment analysis using genetic algorithm based feature reduction. IEEE Access, 7, 14637-14652.

Jianqiang, Z., Xiaolin, G., & Xuejun, Z. (2018). Deep convolution neural networks for twitter sentiment analysis. IEEE Access, 6, 23253-23260.

Kaggle. (2023). Amazon Reviews: Unlocked Mobile Phones.

Kalwar, M.A., & khan. (2020). Optimization of Procurement & Purchase Order Process in Foot Wear Industry by Using VBA in Ms Excel. International Journal of Business Education and Management Studies (IJBEMS), 6(1), 213-220.

Kalwar, M.A., & Khan, M.A. (2020a). Increasing performance of footwear stitching line by installation of auto-trim stitching machines. Journal of Applied Research in Technology & Engineering (JARTE), 1(1), 31.

Kalwar, M.A., & Khan, M.A. (2020b). Optimization of Procurement & Purchase Order Process in Foot Wear Industry by Using VBA in Ms Excel. International Journal of Business Education and Management Studies (IJBEMS), 5(2), 80-100.

Kalwar, M.A., Khan, M.A., Shahzad, M.F., Wadho, M.H., & Marri, H.B. (2022). Development of linear programming model for optimization of product mix and maximization of profit: case of leather industry. Journal of Applied Research in Technology & Engineering (JARTE), 3(1), 67-78.

Kalwar, M.A., Marri, H.B., & Khan, M.A. (2021). Performance Improvement of Sale Order Detail Preparation by Using Visual Basic for Applications: A Case Study of Footwear Industry. International Journal of Business Education and Management Studies (IJBEMS), 3(1), 1-22.

Kalwar, M.A., Shahzad, M.F., Wadho, M.H., Khan, M.A., & Shaikh, S.A. (2022). Automation of order costing analysis by using Visual Basic for applications in Microsoft Excel. Journal of Applied Research in Technology & Engineering (JARTE), 3(1), 29-59.

Kalwar, M.A., Shaikh, S.A., Khan, M.A., & Malik, T.S. (2020). Optimization of Vendor Rate Analysis Report Preparation Method by Using Visual Basic for Applications in Excel (Case Study of Footwear Company of Lahore). Proceedings of the International Conference on Industrial Engineering and Operations Management (IEOM, Dhaka, Bangladesh, December 26-27.

Kalwar, M.A., Wassan, A.N., Phul, Z., & Wadho, M.H., Malik, T.S., Khan, M.A. (2023). Automation of material cost comparative analysis report using VBA Excel: a case of footwear company of Lahore. Journal of Applied Research in Technology & Engineering (JARTE), 4(1), 13-23.

Khan, M.A., Kalwar, M.A., & Chaudhry, A.K. (2021). Optimization of material delivery time analysis by using Visual Basic for applications in Excel. Journal of Applied Research in Technology & Engineering (JARTE), 2(2), 89.

Khan, M.A., Kalwar, M.A., Malik, A.J., Malik, T.S., & Chaudhry, A.K. (2021). Automation of Supplier Price Evaluation Report in MS Excel by Using Visual Basic for Applications: A Case of Footwear Industry. International Journal of Science and Engineering Investigations (IJSEI), 10(113), 49-60.

Khan, M.Z., Khan, A.A., Laghari, A.A., Shaikh, Z.A., Kaimkhani, M.A., Morkovkin, D., Gavel, O., Shkodinsky, S., Taburov, D., & Makar, S. (2022). Comparative case study: an evaluation of performance computation between support vector machine, K-nearest comparative study: Evaluation of performance computation between support vector component analysis. Journal of Tianjin University Science and Technology, April.

Khan, M.Z., Shaikh, S.A., Shaikh, M.A., Khatri, K.K., Mahira Abdul Rauf, Kalhoro, A., & Muhammad, A. (2023). The Performance Analysis of Machine Learning Algorithms for Credit Card Fraud Detection. International Journal of Online and Biomedical Engineering (IJOE), 19(03), 82-98.

Khan, M.Z., Zaman, F.U., Adnan, M., Imroz, A., & Rauf, M.A. (2022). Comparative Case Study: An Evaluation of Performance Computation Between SQL And NoSQL Database. Sindh Journal of Headways in Software Engineering (SJHSE), 01(02), 14-23.

Kumar, S., Nayyar, A., Nguyen, N.G., & Kumari, R. (2020). Hyperbolic spider monkey optimization algorithm. Recent Advances in Computer Science and Communications (Formerly: Recent Patents on Computer Science), 13(1), 35-42.

Kumar, S., Sharma, B., Sharma, V.K., & Poonia, R.C. (2021). Automated soil prediction using bag-of-features and chaotic spider monkey optimization algorithm. Evolutionary Intelligence, 14, 293-304.

Kumar, S., Sharma, B., Sharma, V.K., Sharma, H., & Bansal, J.C. (2020). Plant leaf disease identification using exponential spider monkey optimization. Sustainable Computing: Informatics and Systems, 28, 100283.

Li, L., Xu, Q., Gan, T., Tan, C., & Lim, J.-H. (2017). A probabilistic model of social working memory for information retrieval in social interactions. IEEE Transactions on Cybernetics, 48(5), 1540-1552.

Mansour, S. (2018). Social media analysis of user's responses to terrorism using sentiment analysis and text mining. Procedia Computer Science, 140, 95-103.

Mata-Rivera, F., Torres-Ruiz, M., Guzman, G., Moreno-Ibarra, M., & Quintero, R. (2015). A collaborative learning approach for geographic information retrieval based on social networks. Computers in Human Behavior, 51, 829-842.

Mataoui, M., Sebbak, F., Benhammadi, F., & Bey, K.B. (2015). Query expansion in XML information retrieval: A new approach for terms selection. 2015 6th International Conference on Modeling, Simulation, and Applied Optimization (ICMSAO), 1-4.

Matt, C., Hess, T., & Benlian, A. (2015). Digital transformation strategies. Business & Information Systems Engineering, 57, 339-343.

Mbaabu, O. (2020). Introduction to random forest in machine learning. Berreskuratua-(e) Tik https://www.Section.Io/Engineering-Education/Introduction-to-Random-Forest-in-Machine-Learning.

Memon, M., Khan, M.A., & Rahoo, L.A. (2020). Usage and Availability of Information and Communication Technology Applications Facilities at Central Library. International Research Journal in Computer Science and Technology (IRJCST), 1(1), 86-92.

Munjal, P., Kumar, L., Kumar, S., & Banati, H. (2019). Evidence of Ostwald Ripening in opinion driven dynamics of mutually competitive social networks. Physica A: Statistical Mechanics and Its Applications, 522, 182-194.

Munjal, P., Kumar, S., Kumar, L., & Banati, A. (2017). Opinion dynamics through natural phenomenon of grain growth and population migration. Hybrid Intelligence for Social Networks, 161-175.

Munjal, P., Narula, M., Kumar, S., & Banati, H. (2018). Twitter sentiments based suggestive framework to predict trends. Journal of Statistics and Management Systems, 21(4), 685-693.

Nagar, M.A.K., Kalhoro, M., & Kalhoro, S. (2018). Information Seeking Behavior of Research Scholars at MUET Library & Online Information Center, Jamshoro: A Study. Journal of Library Philosophy and Practice, August, 1-8.

Nagar, M.A.K., Rahoo, L.A., Rehman, H.A., & Arshad, S. (2018). Education management information systems in the primary schools of sindh a case study of hyderabad division. 2018 IEEE 5th International Conference on Engineering Technologies and Applied Sciences (ICETAS), 1-5.

Nitze, I., Schulthess, U., & Asche, H. (2012). Comparison of machine learning algorithms random forest, artificial neural network and support vector machine to maximum likelihood for supervised crop type classification. Proceedings of the 4th GEOBIA, Rio de Janeiro, Brazil, 79, 3540.

Pant, A. (2019). Introduction to logistic regression. Average. Towards Data Science.

Rahoo, L.A., Khan, M.A., Buriro, M.A., Baladi, Z.H., & Abbasi, M.S. (2020). Evaluation of Information Services from the Perspective of Faculties and Evaluation of Information Services from the Perspective of Faculties and Students of Mehran University Engineering and Technology, Jamshoro Pakistan. International Journal of Disaster Recovery and Business Continuity, 11(1), 1526-1538.

Rahoo, L.A., Nagar, M.A.K., & Bhutto, A. (2019). The Use of Information Retrieval Tools by the Postgraduate Students of Higher Educational Institutes of Pakistan. Asian Journal of Contemporary Education, 3(1), 59-64.

Reis, I., Baron, D., & Shahaf, S. (2018). Probabilistic random forest: A machine learning algorithm for noisy data sets. The Astronomical Journal, 157(1), 16.

Reno, U. (2023). Intelligent Systems. Department of Computer Science & Engineering, University of Nevada, Reno, USA.

Riverside, U. (2023). Intelligent Systems. Department of Electrical and Computer Engineering, University of California, Riverside, USA.

Sarker, I.H. (2021). Machine learning: Algorithms, real-world applications and research directions. SN Computer Science, 2(3), 160.

Schott, M. (2019). Random forest algorithm for machine learning. Medium. Com. (Erişim 4 Ocak 2021).

Schütze, H., Manning, C.D., & Raghavan, P. (2008). Introduction to information retrieval (Vol. 39). Cambridge University Press Cambridge.

Shah, I., El Affendi, M., & Qureshi, B. (2020). SRide: An online system for multi-hop ridesharing. Sustainability, 12(22), 9633.

Sharma, A., Sharma, A., Panigrahi, B.K., Kiran, D., & Kumar, R. (2016). Ageist spider monkey optimization algorithm.

Swarm and Evolutionary Computation, 28, 58-77.

Sheldon, R., & Wigmore, I. (2023). Intelligent System. Techtarget Network.

Singhal, A. (2001). Modern information retrieval: A brief overview. IEEE Data Eng. Bull., 24(4), 35-43.

Tess, P.A. (2013). The role of social media in higher education classes (real and virtual)-A literature review. Computers in Human Behavior, 29(5), A60-A68.

Tutorialspoint. (2023). Classification Algorithms - Random Forest. Machine Learning with Python, Tutorialspoint. Classification Algorithms - Random Forest

Vial, G. (2019). Understanding digital transformation: A review and a research agenda. The Journal of Strategic Information Systems, 28(2), 118-144.

Virmani, C., Juneja, D., & Pillai, A. (2018). Design of query processing system to retrieve information from social network using NLP. KSII Transactions on Internet and Information Systems (TIIS), 12(3), 1168-1188.

Zaman, F.U., Khuhro, M.A., Kumar, K., Mirbahar, N., Khan, Z., & Kalhoro, A. (2021). Comparative Case Study Difference Between Azure Cloud SQL and Mongo Atlas MongoDB NoSQL Database. International Journal of Emerging Trends in Engineering Research, 9(7), 999-1002.

Zhang, L., Tan, J., Han, D., & Zhu, H. (2017). From machine learning to deep learning: progress in machine intelligence for rational drug discovery. Drug Discovery Today, 22(11), 1680-1685.