concept of the final assessment submission?
There seems to be a problem with the concept of the final assessment submission. Current Student Leaderboard entries 2 - 6 are submitted by different users but all point to the same code. I suspect when you try somebody elses space and you enter your credentials then the result is logged under your name. Or is something else going on? 🧐
Example:
Rank 1 user susmitsil with code https://huggingface.co/spaces/susmitsil/FinalAgenticAssessment/tree/main Seems legit.
Rank 2 user mmkkkaa with code https://huggingface.co/spaces/baixianger/RobotPai/tree/main ???
same code (baixianger) with the currently ranked 3 to 6
Indeed, simply running the “baixianger” project allows you to appear on the Students leaderboard.
Well this is not protected stuff - anyone can find right files with right answers and just submit with any variables as web request
If you want to see your real rank :
- Go directly on the dataset (https://huggingface.co/datasets/agents-course/unit4-students-scores)
- Run the SQL Query below by changing
your_username
:
WITH FilteredTrains AS (
SELECT *
FROM train
WHERE code LIKE '%' || username || '%'
),
RankedTrains AS (
SELECT
code,
username,
score,
RANK() OVER (ORDER BY score DESC) AS rank
FROM
FilteredTrains
)
SELECT
rank
FROM
RankedTrains
WHERE
code LIKE '%' || 'your_username' || '%';
If you want to see your real rank :
- Go directly on the dataset (https://huggingface.co/datasets/agents-course/unit4-students-scores)
- Run the SQL Query below by changing
your_username
:
WITH FilteredTrains AS ( SELECT * FROM train WHERE code LIKE '%' || username || '%' ), RankedTrains AS ( SELECT code, username, score, RANK() OVER (ORDER BY score DESC) AS rank FROM FilteredTrains ) SELECT rank FROM RankedTrains WHERE code LIKE '%' || 'your_username' || '%';
Its not perfect solution, i ran code locally and my code skipped username in code URL when submitting. Yes, my bad in that part.
I think best approach would be to build custom GAIA level 1 closed Q and A set where nobody knows real answers, say some 20 Q for everyone as contest in short timeframe, will cut cheating alot.
I was thinking of the exact same problem. Many entries in the leaderboard use the same code, just by picking random entries you can see this. This lowers the credibility of the certification.
Yep. I launched another's SPACE just to see how it is supposed to work and got into the LeaderBoard. And there is no even an option to delete or change it (typical haggingface). Moreover, I can't even test my agent bcs without PRO version only 10 requests/month to LLM are available.