Character Extraction and Character Type Identification from Summarised Story Plots


  • Vardhini Srinivasan Technological Univesity Dublin
  • Aurelia Power Technological University Dublin



character extraction, character type identification, coreference resolution, classification, clustering


Identifying the characters from free-form text and understanding the roles and relationships between them is an evolving area of research. They have a wide range of applications, from summarising narrations to understanding the social network from social media tweets, which can help in automation and improve the experience of AI systems like chatbots and much more. The aim of this research is twofold. Firstly, we aim to develop an effective method of extracting characters from a story summary, to develop a set of relevant features, then, using supervised learning algorithms, to identify the character types. Secondly, we aim to examine the efficacy of unsupervised learning algorithms in type identification, as it is challenging to find a dataset with a predetermined list of characters, roles, and relationships that are essential for supervised learning. To do so, we used summary plots of fictional stories to experiment and evaluate our approach. Our character extraction approach successfully improved on the performance reported by existing work, with an average F1-score of 0.86. Supervised learning algorithms successfully identified the character types and achieved an overall average F1-score of 0.94. However, the clustering algorithms identified more than three clusters, indicating that more research is needed to improve their efficacy.


Download data is not yet available.


Agarwal, Apoorv, and Owen Rambow. 2010. "Automatic detection and classification of social events." In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, 1024-1034.

Baccianella, Stefano, Andrea Esuli, and Fabrizio Sebastiani. 2010. "Sentiwordnet 3.0: An enhanced lexical resource for sentiment analysis and opinion mining." In Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10).

Bamman, David, Brendan O'Connor, and Noah A. Smith. "Learning latent personas of film characters. 2013." In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics 1, 352-361.

Calix, Ricardo A., Leili Javadpout, Mehdi Khazaeli, and Gerald M. Knapp. 2013."Automatic detection of nominal entities in speech for enriched content search." In The Twenty-Sixth International FLAIRS Conference.

Ceri, Stefano, Piero Fraternali, Aldo Bongio, Marco Brambilla, Sara Comai, and Maristella Matera. 2003. Morgan Kaufmann series in data management systems: Designing data-intensive Web applications. Morgan Kaufmann, 34-36.

Chaturvedi, Snigdha, Shashank Srivastava, Hal Daume III, and Chris Dyer. 2016. "Modelling evolving relationships between characters in literary novels." In Proceedings of the AAAI Conference on Artificial Intelligence, 30.

Chen, Yu-Hsin, and Jinho D. Choi. 2016. "Character identification on multiparty conversation: Identifying mentions of characters in tv shows." In Proceedings of the 17th Annual Meeting of the Special Interest Group on Discourse and Dialogue, 90-100.

Chollet, Francois. 2021. Deep Learning with Python. New York: Manning Publication.

Feinerer, Ingo, Kurt Hornik, Mike Wallace, and Maintainer Kurt Hornik. 2020. "Package 'wordnet'."

Fernandez, Matt, Michael Peterson, and Ben Ulmer. 2015. "Extracting social network from literature to predict antagonist and protagonist." Recuperado de: https://nlp. stanford. edu/courses/cs224n/2015/reports/14. pdf.

Finkel, Jenny R., Trond Grenager, and Christopher Manning. 2005. "Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling". In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL 2005), 363-370.

Grandini, Margherita, Enrico Bagli, and Giorgio Visani. 2020. "Metrics for multi-class classification: an overview." arXiv preprint arXiv:2008.05756.

Hachey, Ben, Will Radford, Joel Nothman, Matthew Honnibal, and James R. Curran. 2013. "Evaluating entity linking with wikipedia." Artificial intelligence 194: 130-150.

Jahan, Labiba, Geeticka Chauhan, and Mark A. Finlayson. 2018. "A new approach to animacy detection." In Proceedings of the 27th International Conference on Computational Linguistics.

Jahan, Labiba, and Mark Finlayson. 2019. "Character identification refined: A proposal. " In Proceedings of the First Workshop on Narrative Understanding, 12-18.

Jahan, Labiba, Rahul Mittal, W. Victor Yarlott, and Mark Finlayson. 2020. "A straightforward approach to narratologically grounded character identification." In Proceedings of the 28th International Conference on Computational Linguistics, 6089-6100.

Jung, Jason J., Eunsoon You, and Seung-Bo Park. 2013. "Emotion-based character clustering for managing story-based contents: a cinemetric analysis." Multimedia tools and applications 65: 29-45.

Kong, Fang, Guodong Zhou, Longhua Qian, and Qiaoming Zhu. 2010. "Dependency-driven anaphoricity determination for coreference resolution." In Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010), 599-607.

Labatut, Vincent, and Xavier Bost. 2019. "Extraction and analysis of fictional character networks: A survey." ACM Computing Surveys (CSUR) 52: 1-40.

Liang, Tyne, and Dian-Song Wu. 2004. "Automatic pronominal anaphora resolution in English texts." In International Journal of Computational Linguistics & Chinese Language Processing, Volume 9, Number 1, February 2004: Special Issue on Selected Papers from ROCLING XV, 21-40.

Manning, Christopher D., Mihai Surdeanu, John Bauer, Jenny Rose Finkel, Steven Bethard, and David McClosky. 2014. "The Stanford CoreNLP natural language processing toolkit." In Proceedings of 52nd annual meeting of the association for computational linguistics: system demonstrations, 55-60.

Rosenberg, Andrew, and Julia Hirschberg. 2007. "V-measure: A conditional entropy-based external cluster evaluation measure." In Proceedings of the 2007 joint conference on empirical methods in natural language processing and computational natural language learning (EMNLP-CoNLL), 410-420.

Sukthanker, Rhea, Soujanya Poria, Erik Cambria, and Ramkumar Thirunavukarasu. 2020. "Anaphora and coreference resolution: A review." Information Fusion 59: 139-162.

Talib, Ismail S. 2010. "Narrative theory: A brief introduction." sg/course/ellibst/NarrativeTheory/ .

Vala, Hardik, David Jurgens, Andrew Piper, and Derek Ruths. 2015. "Mr. bennet, his coachman, and the archbishop walk into a bar but only one of them gets recognized: On the difficulty of detecting characters in literary texts." In Proceedings of the 2015 conference on empirical methods in natural language processing, 769-774.

Valls-Vargas, Josep, Santiago Ontanón, and Jichen Zhu. 2014. "Toward automatic character identification in unannotated narrative text." In Seventh intelligent narrative technologies workshop.