Argument Quality model

We reproduce the argument prediction model from Gretz et al. (2020), which has been accessible for some years in the context of IBM's Debater project (Bar-Haim et al. 2021).

Provided are two models, trained for 2 epochs. As described in Gretz et al.'s paper, the two versions are based on two different scoring functions for the manual annotations: weighted average (WA) and MACE-P (Hovy et al. 2013).

The repository for the retraining of the models can be found here.

The model whose predictions have the higher correlation with the original model's predictions, is the WA model, so we recommend to use this rather than the MACE-P model.

If you use the models or the code in your research, please cite the following paper describing the retraining and evaluation process:

Ines Zelch, Matthias Hagen, Benno Stein, and Johannes Kiesel. Reproducing the Argument Quality Prediction of Project Debater., In Proceedings of the 12th Workshop on Argument Mining, July 2025.