Teach BS
Fluency Scoring
For fluency scoring we use a combination of "hard" structural scoring and "soft" probabilistic scoring from a distilled BERT model.
For probabilitistic scoring we begin by tokenizing the sentence and grouping words. Given the sentence That was unbelievable
we tokenize
and produce groupings from the offsets
We can then mask entire groups and take the average of the log probabilities of each token, negating so that lower probabilities increase loss:
where $w$ is the index of the relevant word. This penalizes words that are not preferred by the model in context, but along the way will penalize words that may be correctly used but are simply uncommon. We seek to penalize based on the "contextual awkwardness" of some given word so we will compute a rarity score
where $f_w$ is computed from the python wordfreq
package. The funtion word_frequency
will return a value $f_w \in [0, 1]$ where $0$ is extremely rare and $1$ is extremely common. Therefore $\mathcal F_w \in [0, \log 10^{-12} \approx 27.6]$ where a higher score means the word is more rare. We use this to compute an adjusted pseudo log loss $(\text{PLL})$ $\mathcal L$
which applies a downward adjustment to the loss for rare words where $\alpha$ is some weight parameter. Our final step is to produce an adjusted pseudolikelihood estimation
We then generate a fluenct score $\textbf{FS}$ from $0$ to $100$ using the logistic function
where $s$ is some steepness factor and $m$ is the midpoint (where the score should be $50$).