Sentiment Analysis and Stance Detection on German YouTube Comments on Gender Diversity


  • Lidiia Melnyk Friedrich Schiller University Jena
  • Linda Feld Friedrich Schiller University Jena



stance detection, sentiment analysis, BERT, neural networks, annotation, YouTube comments, gender diversity


This paper explores different options of detecting the stance of German YouTube comments regarding the topic of gender diversity and compares the respective results with those of sentiment analysis, showing that these are two very different NLP tasks focusing on distinct characteristics of the discourse. While an already existing model was used to analyze the comments’ sentiment (BERT), the comments’ stance was first annotated and then used to train different models – SVM with TF-IDF, DistilBERT, LSTM and CNN – for predicting the stance of unseen comments. The best results were achieved by the CNN, reaching 78.3% accuracy (92% after dataset normalization) on the test set. Whereas the most common stance identified in the comments is a neutral one (neither completely in favor nor completely against gender diversity), the overall sentiment of the discourse turns out to be negative. This shows that the discourse revolving around the topic of gender diversity in YouTube comments is filled with strong opinions, on the one hand, but also opens up a space for anonymously inquiring and learning about the topic and its implications, on the other. Our research thereby (1) contributes to the understanding and application of different NLP tasks used to predict the sentiment and stance of unstructured textual data, and (2) provides relevant insights into society’s attitudes towards a changing system of values and beliefs.


Download data is not yet available.


ALDayel, Abeer, and Walid Magdy. 2021. "Stance detection on social media: State of the art and trends." Information Processing and Management 58: 1-22.

Augenstein, Isabelle, Tim Rocktäschel, Andreas Vlachos, and Kalina Bontcheva. 2016. "Stance Detection with Bidirectional Conditional Encoding." In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, Austin, Texas, November 01-05. Association for Computational Linguistics. 876-885.

Biber, Douglas, and Edward Finegan. 1988. "Adverbial stance types in English." Discourse Processes 11(1): 1-34.

Birjali, Marouane, Mohammed Kasri, and Abderrahim Beni-Hssane. 2021. "A comprehensive survey on sentiment analysis: Approaches, challenges and trends." Knowledge-Based Systems 226(107134).

Brownlee, Jason. 2017. Long Short-Term Memory Networks With Python: Develop Sequence Prediction Models With Deep Learning. Machine Learning Mastery. Retrieved from

Chopra, Sahil, Saachi Jain, and John Merriman Sholar. 2017. "Towards Automatic Identification of Fake News: Headline-Article Stance Detection with LSTM Attention Models." CS224N project report, Stanford University.

Cieliebak, Mark, Jan Milaln Deriu, Dominic Egger, and Fatih Uzdilli. 2017. A Twitter Corpus and Benchmark Resources for German Sentiment Analysis. Association for Computational Linguistics.

Devlin, Jacob, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding." In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, Minnesota, June 02-07. Association for Computational Linguistics. 4171-4186.

Dey, Kuntal, Ritvik Shrivastava, and Saroj Kaushik. 2018. "Topical Stance Detection for Twitter: A Two-Phase LSTM Model Using Attention." Paper presented at 40th European Conference on IR Research 2018, Grenoble, France, March 26-29. doi:10.48550/arXiv.1801.03032

Du Bois, John W. 2007. "The stance triangle." In Stancetaking in Discourse. Subjectivity, evaluation, interaction, edited by R. Englebretson, 139-182. Amsterdam/Philadelphia: John Benjamins Publishing Company.

Ezen-Can, Aysu. 2020. "A Comparison of LSTM and BERT for Small Corpus." Available online:

Go, Alec, Richa Bhayani, and Lei Huang. 2009. "Twitter Sentiment Classification using Distant Supervision." CS224N project report, Stanford University.

Göhring, Anne, Manfred Klenner, and Sophia Conrad. 2021. "DeInStance: Creating and Evaluating a German Corpus for Fine-Grained Inferred Stance Detection." In Proceedings of the 17th Conference on Natural Language Processing, Düsseldorf, Germany, September 06-09. KONVENS 2021 Organizers. 213-217.

Goldberg, Yoav. 2016. "A Primer on Neural Network Models for Natural Language Processing." Journal of Artificial Intelligence Research 57(1): 35l-420.

Goldhahn, Dirk, Thomas Eckart, and Uwe Quasthoff. 2012. Building Large Monolingual Dictionaries at the Leipzig Corpora Collection: From 100 to 200 Languages. European Language Resources Association (ELRA).

Gonçalves, Pollyanna, Matheus Araújo, Fabrício Benevenuto, and Meeyoung Cha. 2014. "Comparing and Combining Sentiment Analysis Methods." In Proceedings of the first ACM conference on Online social networks, Boston, Massachusetts, October 07-08. New York: Association for Computing Machinery. 27-38.

Guhr, Oliver, Anne-Kathrin Schumann, Frank Bahrmann, and Hans-Joachim Böhme. 2020. "Training a Broad-Coverage German Sentiment Classification Model for Dialog Systems." In Proceedings of the 12th Conference on Language Resources and Evaluation, Marseille, France, May 11-16. European Language Resources Association. 1627-1632.

He, Haibo, and Edwardo A. Garcia. 2009. "Learning from Imbalanced Data." IEEE Transactions on Knowledge and Data Engineering 21(9): 1263 l-1284.

IBM a. "What are Recurrent Neural Networks?" September 14, 2020.

IBM b. "Convolutional Neural Networks." October 20, 2020.

Kim, Yoon. 2014. "Convolutional Neural Networks for Sentence Classification." In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, Doha, Qatar, October 25-29. Association for Computational Linguistics. 1746-1751.

Kingma, Diederik P., and Jimmy Lei Ba. 2015. "Adam: A method for stochastic optimization." Paper presented at the 3rd International Conference for Learning Representations, San Diego, California, May 07-09.

Krejzl, Peter, Barbora Hourová, and Josef Steinberger. 2017. "Stance detection in online discussions." Work-in-progress paper. doi:10.48550/arXiv.1701.00504.

Landis, J. Richard, and Gary G. Koch. 1977. "The Measurement of Observer Agreement for Categorical Data." Biometrics 33(1): 159-174.

Medhat, Walaa, Ahmed Hassan, and Hoda Korashy. 2014. "Sentiment analysis algorithms and applications: A survey." Ain Shams Engineering Journal 5: 1093-1113.

Munaro, Ana Cristina, Renato Hübner Barcelos, Eliane Cristine Francisco Maffezzolli, João Pedro Santos Rodrigues, and Emerson Cabrera Paraiso. 2021. "To engage or not engage? The features of video content on YouTube affecting digital consumer engagement." Journal of Consumer Behaviour 20(5): 1336-1352.

Poddar, Lahari, Wynne Hsu, Mong Li Lee, and Shruti Subramaniyam. 2018. "Predicting Stances in Twitter Conversations for Detecting Veracity of Rumors: a Neural Approach." In Proceedings of the 2018 IEEE 30th International Conference on Tools with Artificial Intelligence, Volos, Greece, November 05-07. The Institute of Electrical and Electronics Engineers, Inc. 65-72.

Prati, Ronaldo C., Gustavo E.A.P.A. Batista, and Maria C. Monard. 2004. "Class imbalances versus class overlapping: an analysis of a learning system behavior." In MICAI 2004: Advances in Artificial Intelligence, Third Mexian International Conference on Artificial Intelligence, Mexico City, Mexico, April 26-30, 2004, 312-321. Berlin/Heidelberg: Springer.

Saif, Hassan, Yulan He, and Harith Alani. 2012. "Semantic Sentiment Analysis of Twitter." In The Semantic Web - ISWC 2012. Proceedings, Part I, Boston, Massachusetts, USA, November 11-15. Berlin/Heidelberg: Springer. 508-524.

Sänger, Mario, Ulf Leser, Steffen Kemmerer, Peter Adolphs, and Roman Klinger. 2016. SCARE - The Sentiment Corpus of App Reviews with Fine-grained Annotations in German. European Language Resources Association (ELRA).

Sanh, Victor, Lysandre Debut, Julien Chaumond, and Thomas Wolf. 2020. "DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter." Paper presented at The 5th Workshop on Energy Efficient Machine Learning and Cognitive Computing, Co-located with the 33rd Conference on Neural Information Processing Systems 2019, Vancouver, British Columbia, December 13. doi:10.48550/arXiv.1910.01108

Sarlan, Aliza, Chayanit Nadam, and Shuib Basri. 2014. Twitter Sentiment Analysis. In 2014 International Conference on Information Technology and Multimedia, Putrajaya, Malaysia, November 18-20. IEEE. 212-216.

Sidarenka, Uladzimir. 2016. PotTS: The Potsdam Twitter Sentiment Corpus. European Language Resources Association (ELRA).

Suthaharan, Shan. 2016. Machine Learning Models and Algorithms for Big Data Classification. New York: Springer.

Taher, SM Abu, Kazi Afsana Akhter, and K.M. Azharul Hasan. 2018. "N-gram Based Sentiment Mining for Bangla Text Using Support Vector Machine." In 2018 International Conference on Bangla Speech and Language Processing, Sylhet, Bangladesh, September 21-22. IEEE. 70-75.

Wojatzki, Michael, Eugen Ruppert, Sarah Holschneider, Torsten Zesch, and Chris Biemann. 2017. "GermEval 2017: Shared Task on Aspect-based Sentiment in Social Media Customer Feedback." In Proceedings of the GermEval 2017, Berlin, Germany, September 12. GSCL. 1-12. doi:10.17185/duepublico/72074

Yusof, Nor Nadiah, Azlinah Mohamed, and Shuzlina Abdul-Rahman. 2015. "Reviewing Classification Approaches in Sentiment Analysis." In Soft Computing in Data Science. First Interntational Conference 2015. Proceedings, Putrajaya, Malaysia, September 02-03. Singapore: Springer. 43-53.