Logistic calibrated items (LCI) method: does it solve subjectivity in translation evaluation and assessment?


  • Alireza Akbari University of Isfahan




translation evaluation product, Logistic Calibrated Items method, item difficulty, item discrimination, recalculation of scores


This research paper introduces a model of translation evaluation called Logistic Calibrated Items method. The aim of LCI method is to maximize a translators’ performance and to identify top competent translators through detecting all parsing items within a source text. Parsing items are extracted by the application of Brat software. The purpose of LCI was to identify parsing items having optimal item difficulty and item discrimination values. LCI method involves six stages: (1) holistic scoring; (2) the application of Brat software to extract all parsing items; (3) the calculation of item difficulty; (4) the calculation of item discrimination; (5) the identification of items with optimal item difficulty and item discrimination values; and (6) the recalculation of scores. 125 translation students and 4 professional translation evaluators took part in this research. The final results showed that LCI method was more consistent when compared to holistic method. Limitations and implications were also discussed.


Download data is not yet available.

Author Biography

Alireza Akbari, University of Isfahan

KU Leuven


Akbari, A. & Gholamzadeh Bazarbash, M. (2017). “Holistic Assessment: Effective or Lenient in Translation Evaluation?” Skopos. Revista Internacional de Traducción e Interpretación, 8/1: 51-67.

Akbari, A. & Segers, W. (2017a). “Translation Difficulty: How to Measure and What to Measure”. Lebende Sprachen, 62/1: 3-29. https://doi.org/10.1515/les-2017-0002

Akbari, A. & Segers, W. (2017b). “Translation Evaluation Methods and the End-Product: Which One Paves the Way for a More Reliable and Objective Assessment?” Skase Journal of Translation and Interpretation, 11/1: 2-24.

Akbari, A. & Segers, W. (2017c). “Evaluation of Translation through the Proposal of Error Typology: An Explanatory Attempt”. Lebende Sprachen, 62/2: 408-430. https://doi.org/10.1515/les-2017-0022

Akbari, A. & Segers, W. (2017d). “The Perks of Norm and Criterion Referenced Translation Evaluation”. LICTRA, Leipzig, Germany, 20 March.

Anckaert, P., Eyckmans, J. & Segers, W. (2008). “Pour Une Évaluation Normative De La Compétence De Traduction.” ITL - International Journal of Applied Linguistics, 155/1: 53-76. https://doi.org/10.2143/ITL.155.0.2032361

Bahameed, A. S. (2016). “Applying assessment holistic method to the translation exam in Yemen.” Babel, 62/1: 135-149. https://doi.org/10.1075/babel.62.1.08bah

Baker, F. B. (2001). The Basics of Item Response Theory. 2nd ed. New York: ERIC Clearinghouse on Assessment and Evaluation.

Baker, F. B. & Seock-Ho K. (2004). Item Response Theory: Parameter Estimation Techniques. 2nd ed. New York: Marcel Dekker. https://doi.org/10.1201/9781482276725

Barkaoui, K. (2010). “Explaining ESL essay holistic scores: A multilevel modeling approach”. Language Testing, 27/4: 515-535. https://doi.org/10.1177/0265532210368717

Barkaoui, K. (2011). “Effects of marking method and rater experience on ESL essay scores and rater performance”. Assessment in Education: Principles, Policy & Practice, 18/3: 279-293. https://doi.org/10.1080/0969594X.2010.526585

Bunt, H., Merlo, P. & Nivre, J. (2010). Trends in Parsing Technology: Dependency Parsing, Domain Adaptation, and Deep Parsing. Netherlands: Springer. https://doi.org/10.1007/978-90-481-9352-3

Conde Ruano, T. (2005). “No me parece mal. Comportamiento y resultados de estudiantes al evaluar traducciones”. Unpublished doctoral dissertation. University of Granada, Granada.

Cumming, Alister, Kantor, R. & Powers, D. E. (2002). “Decision making while rating ESL/EFL writing tasks: A descriptive framework”. The Modern Language Journal, 86/1: 67-96. https://doi.org/10.1111/1540-4781.00137

D’Agostino, R. B. & Cureton, E. E. (1975). “The 27 Percent Rule Revisited”. Educational and Psychological Measurement, 35/1: 47-50. https://doi.org/10.1177/001316447503500105

Dancette, J. (1989). “La faute de sens en traduction”. TTR : traduction, terminologie, rédaction, 2/2: 83-102. https:// doi.org/10.7202/037048ar

Eberly Center. (2016). “What is the difference between formative and summative assessment?” Accesible at https://www.cmu.edu/teaching/assessment/basics/formative-summative.html [Last access: July 2019].

Eyckmans, J. & Anckaert, P. (2017). “Item-based assessment of translation competence: Chimera of objectivity versus prospect of reliable measurement”. Linguistica Antverpiensia, New Series: Themes in Translation Studies, 16/1: 40–56.

Eyckmans, J., Anckaert, P. & Segers, W. (2013). “Assessing Translation Competence”. Actualizaciones en Comunicación Social. Centro de Lingüística Aplicada, Santiago de Cuba, 2, 513-515.

Eyckmans, J., Segers, W. & Anckaert, P. (2012). Translation Assessment Methodology and the Prospects of European Collaboration. In Collaboration in Language Testing and Assessment, edited by D. Tsagari and I. Csépes, 171-184. Bruxelles: Peter Lang.

Farmer, W. L., Thompson, R. C., Heil, S. K. R. & Heil, M. C. (2001). Latent Trait Theory Analysis of Changes in Item Response Anchors. Accesible at https://www.faa.gov/data_research/research/med_humanfacs/oamtechreports/2000s/media/0104.pdf. [Last access: July 2019].

Finch, H. & Edwards, J. M. (2015). “Rasch Model Parameter Estimation in the Presence of a Nonnormal Latent Trait Using a Nonparametric Bayesian Approach”. Educational and Psychological Measurement, 76/4: 662-684. https://doi.org/10.1177/0013164415608418

Fox, J. (2010). Bayesian Item Response Modeling: Theory and Applications. Amsterdam: Springer. https://doi.org/10.1007/978-1-4419-0742-4

Garant, M. (2009). “A case for holistic translation assessment”. AFinLA-e: Soveltavan kielitieteen tutkimuksia, 5/2: 5-17.

Gonzalez, K. (2018). “Contrast Effect: Definition & Example”. https://study.com/academy/lesson/contrast-effectdefinition-example.html.

Gouadec, D. (1989). “Comprendre, évaluer, prévenir : Pratique, enseignement et recherche face à l’erreur et à la faute en traduction”. TTR, 2/2: 35-54. https://doi.org/10.7202/037045ar

Hambleton, R. K. (1989). Principles and selected applications of item response theory. In The American Council on Education/Macmillan series on higher education, edited by R. L. Linn, 147-200. New York, NY, England: Macmillan Publishing Co.

Hambleton, R. K. & Jones, R. W. (1993). “Comparison of classical test theory and item response theory and their applications to test development”. Educational Measurement: Issues and Practice, 12/3: 38-47. https://doi.org/10.1111/j.1745-3992.1993.tb00543.x

Hamp-Lyons, L. (1991). Scoring procedures for ESL contexts. In Assessing second language writing in academic contexts, edited by L. Hamp-Lyons, 241–76. Norwood, NJ: Ablex.

Harsch, C. & Martin, G. (2013). “Comparing holistic and analytic scoring methods: issues of validity and reliability”. Assessment in Education: Principles, Policy & Practice, 20/3: 281-307. https://doi.org/10.1080/0969594X.2012.742422

Kempf, W. (1983). “Some Theoretical Concerns ahout Applying Latent Trait Models in Educational Testing”. Accesible at https://pdfs.semanticscholar.org/5909/0351c0bc109f28836a75eaa67e7eecaffa41.pdf. [Last access: July 2019].

Kockaert, H. & Segers, W. (2014). “Evaluation de la Traduction : La Méthode PIE (Preselected Items Evaluation)”. Turjuman, 23/2: 232-250.

Kockaert, H. & Segers, W. (2017). “Evaluation of legal translations: PIE method (Preselected Items Evaluation)”. Journal of Specialized Translation, 27: 148-163.

Kussmaul, P. (1995). Training the Translator. Amsterdam: John Benjamins Publishing Company. https://doi.org/10.1075/btl.10

Larose, R. (1989). Théories contemporaines de la traduction. Montréal: Presses de l’Université de Québec.

Le, D. (2013). Applying item response theory modeling in educational research. Instructional Technology, Graduate College at Iowa State University Digital Repository.

Lei, P. & Wu, Q. (2007). “CTTITEM: SAS macro and SPSS syntax for classical item analysis”. Behavior Research Methods, 39/3: 527-530. https://doi.org/10.3758/BF03193021

Mariana, V., Cox, T. & Melby, A. (2015). “The Multidimensional Quality Metrics (MQM) Framework: A New Framework for Translation Quality Assessment”. Journal of Specialized Translation, 23:137-161.

Muñoz Martín, R. (2010). On Paradigms and Cognitive Translatology. In Translation and Cognition, edited by G. Schreve and E. Angelone, 169-187. Amsterdam and Philadelphia: John Benhamins. https://doi.org/10.1075/ata.xv.10mun

Newmark, P. (1991). About Translation. Clevedon: Multilingual Matters.

Nord, C. (2005). Text Analysis in Translation. aMSTERDAM: Rodopi.

Pidgeon, D. A. & Yates, A. (1968). An introduction to educational measurement. London: Routledge.

Preacher, K. J. , Rucker, D. D., MacCallum, R. C. & Nicewander, W. A. (2005). “Use of the extreme groups approach: a critical reexamination and new recommendations”. Psycholological Methods, 10/2: 178-792. https://doi.org/10.1037/1082-989X.10.2.178

Schmitt, P. A. (2005). Qualitätsbeurteilung von Fachübersetzungen in der Übersetzerausbildung, Probleme und Methoden. Vertaaldagen Hoger Instituut voor Vertalers en Tolken, 16-17 March.

SPSS, IBM. (2017). Available at https://www.ibm.com/analytics/us/en/technology/spss/. [Last access: July 2019].

Stansfield, C. W., Scott, M. L. & Kenyon, D. M. (1992). “The Measurement of Translation Ability”. The Modern Language Journal, 76/4: 455-467. https://doi.org/10.2307/330046

Stata. (2016). “Stata: Software for Statistics and Data Science”. Available at https://www.stata.com/. [Last access: July 2019].

Stenetorp, P., Pyysalo, S., Topic, G., Ohta, T., Ananiadou, S. & Tsujii, J. (2012). BRAT: a web-based tool for NLPassisted text annotation. Proceedings of the Demonstrations at the 13th Conference of the European Chapter of the Association for Computational Linguistics, Avignon, France.

Stenetorp, P., Topic, G., Pyysalo, S., Ohta, T., Kim, J. & Tsujii, J. (2011). BioNLP Shared Task 2011: Supporting Resources. BioNLP Shared Task 2011 Workshop, Portland, Oregon, USA.

TAUS. (2018). “Measuring and Benchmarking Translation Quality”. Available at https://www.taus.net/qualitydashboard-lp. [Last access: July 2019].

Umobong, M. E. (2017). “The One-Parameter Logistic Model (1PLM) And its Application in Test Development”. Advances in Social Sciences Research Journal, 4/24: 126-137.

Van Antwerpen, J. (2016). “P-, D-, and Rit values: a new start”. Available at http://www.andriesseninternational.com/p-d-and-rit-values-a-new-start/. [Last access: July 2019].

Waddington, C. (2001). “Different Methods of Evaluating Student Translations: The Question of Validity”. Meta, 46/2: 311-325. https://doi.org/10.7202/004583ar

Weigle, S. C. (2002). Assessing writing. Cambridge: Cambridge University Press. https://doi.org/10.1017/CBO9780511732997

Widdowson, H. G. (1978). Teaching Language as Communication Oxford: Oxford University Press.

Wiersma, W. & Jurs, S. G. (1990). Educational Measurement and Testing. London: Allyn and Bacon.

Zięba, A. (2013). “The Item Information Function in One and Two-Parameter Logistic Models- A Comparison and Use in the Analysis of the Results of School Tests”. Didactics of Mathematics, 10/14: 87-96. https://doi.org/10.15611/dm.2013.10.08