Spaces:

THUIR
/

AEOLLM

Running

AEOLLM / baseline_example /README.md

陈俊杰

baseline

5f37ab9 about 1 year ago

534 Bytes

	The baseline_example folder provides a simple baseline implementation along with the evaluation logic for reference.

	Methodology: The approach involves using chatglm3_6B to perform pointwise (5-level) evaluation on question-answer pairs.

	baseline3.py stores the model's evaluation results in output/baseline1_chatglm3_6B.txt.

	eval.py calculates the evaluation metrics based on the model's evaluation results and the human annotation results.

	The human annotation results are temporarily hidden due to testing requirements.