concept of the final assessment submission?

#1
by crcdng - opened

There seems to be a problem with the concept of the final assessment submission. Current Student Leaderboard entries 2 - 6 are submitted by different users but all point to the same code. I suspect when you try somebody elses space and you enter your credentials then the result is logged under your name. Or is something else going on? 🧐

Example:

Rank 1 user susmitsil with code https://huggingface.co/spaces/susmitsil/FinalAgenticAssessment/tree/main Seems legit.

Rank 2 user mmkkkaa with code https://huggingface.co/spaces/baixianger/RobotPai/tree/main ???
same code (baixianger) with the currently ranked 3 to 6

Indeed, simply running the “baixianger” project allows you to appear on the Students leaderboard.

Well this is not protected stuff - anyone can find right files with right answers and just submit with any variables as web request

If you want to see your real rank :

  1. Go directly on the dataset (https://huggingface.co/datasets/agents-course/unit4-students-scores)
  2. Run the SQL Query below by changing your_username:
WITH FilteredTrains AS (
    SELECT *
    FROM train
    WHERE code LIKE '%' || username || '%'
),
RankedTrains AS (
    SELECT
        code,
        username,
        score,
        RANK() OVER (ORDER BY score DESC) AS rank
    FROM
        FilteredTrains
)
SELECT
    rank
FROM
    RankedTrains
WHERE
    code LIKE '%' || 'your_username' || '%';

If you want to see your real rank :

  1. Go directly on the dataset (https://huggingface.co/datasets/agents-course/unit4-students-scores)
  2. Run the SQL Query below by changing your_username:
WITH FilteredTrains AS (
    SELECT *
    FROM train
    WHERE code LIKE '%' || username || '%'
),
RankedTrains AS (
    SELECT
        code,
        username,
        score,
        RANK() OVER (ORDER BY score DESC) AS rank
    FROM
        FilteredTrains
)
SELECT
    rank
FROM
    RankedTrains
WHERE
    code LIKE '%' || 'your_username' || '%';

Its not perfect solution, i ran code locally and my code skipped username in code URL when submitting. Yes, my bad in that part.

I think best approach would be to build custom GAIA level 1 closed Q and A set where nobody knows real answers, say some 20 Q for everyone as contest in short timeframe, will cut cheating alot.

I was thinking of the exact same problem. Many entries in the leaderboard use the same code, just by picking random entries you can see this. This lowers the credibility of the certification.

Yep. I launched another's SPACE just to see how it is supposed to work and got into the LeaderBoard. And there is no even an option to delete or change it (typical haggingface). Moreover, I can't even test my agent bcs without PRO version only 10 requests/month to LLM are available.

Sign up or log in to comment