Spaces:

JQL-AI
/

JQL

Running

App Files Files Community

mbrack commited on May 29

Commit

202b027

verified ·

1 Parent(s): 0e67577

Update index.html

Browse files

Files changed (1) hide show

index.html +7 -7

index.html CHANGED Viewed

@@ -65,7 +65,7 @@
           <div class="column has-text-centered">
               <span class="link-block">
-                <a href="https://arxiv.org/abs/2011.12948" target="_blank"
                    class="external-link button is-normal is-rounded is-dark">
                   <span class="icon">
                       <i class="ai ai-arxiv"></i>
@@ -151,9 +151,9 @@
       <li><strong>✔️ Accuracy:</strong> Spearman’s ρ > 0.87 with human ground truth</li>
       <li><strong>📈 Downstream LLM Training:</strong>
         <ul>
-          <li>+7.2% benchmark performance improvement</li>
-          <li>+4.8% token retention vs. FineWeb2 heuristic filter</li>
-          <li>Effective threshold strategies: 0.6 and 0.7 quantile</li>
         </ul>
       </li>
       <li><strong>⚡ Annotation Speed:</strong> ~11,000 docs/min (A100 GPU, avg. 690 tokens)</li>
@@ -184,10 +184,10 @@
     <h2 class="title is-3">📜 Citation</h2>
     <p>If you use JQL, the annotations, or the pretrained annotators, please cite the paper:</p>
     <pre><code>@article{ali2024jql,
-  title={JQL: Judging Quality across Languages},
   author={Ali, Mehdi and Brack, Manuel and Lübbering, Max and Wendt, Elias and Khan, Abbas Goher and Rutmann, Richard and Jude, Alex and Kraus, Maurice and Weber, Alexander Arno and Stollenwerk, Felix and Kaczér, David and Mai, Florian and Flek, Lucie and Sifa, Rafet and Flores-Herr, Nicolas and Köhler, Joachim and Schramowski, Patrick and Fromm, Michael and Kersting, Kristian},
-  journal={Conference or preprint archive},
-  year={2024}
 }</code></pre>
   </div>
 </section>

           <div class="column has-text-centered">
               <span class="link-block">
+                <a href="https://arxiv.org/abs/2505.22232" target="_blank"
                    class="external-link button is-normal is-rounded is-dark">
                   <span class="icon">
                       <i class="ai ai-arxiv"></i>
       <li><strong>✔️ Accuracy:</strong> Spearman’s ρ > 0.87 with human ground truth</li>
       <li><strong>📈 Downstream LLM Training:</strong>
         <ul>
+          <li>Benchmark performance improvement over FineWeb2</li>
+          <li>Higher document retention vs. FineWeb2 heuristic filter</li>
+          <li>Effective dynamic threshold strategies: Trade-off document quality for quantity</li>
         </ul>
       </li>
       <li><strong>⚡ Annotation Speed:</strong> ~11,000 docs/min (A100 GPU, avg. 690 tokens)</li>
     <h2 class="title is-3">📜 Citation</h2>
     <p>If you use JQL, the annotations, or the pretrained annotators, please cite the paper:</p>
     <pre><code>@article{ali2024jql,
+  title={Judging Quality Across Languages: A Multilingual Approach to Pretraining Data Filtering with Language Modelss},
   author={Ali, Mehdi and Brack, Manuel and Lübbering, Max and Wendt, Elias and Khan, Abbas Goher and Rutmann, Richard and Jude, Alex and Kraus, Maurice and Weber, Alexander Arno and Stollenwerk, Felix and Kaczér, David and Mai, Florian and Flek, Lucie and Sifa, Rafet and Flores-Herr, Nicolas and Köhler, Joachim and Schramowski, Patrick and Fromm, Michael and Kersting, Kristian},
+  journal={arXiv preprint arXiv:2505.22232},
+  year={2025}
 }</code></pre>
   </div>
 </section>