Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Posts
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
NMN's picture
1

NMN

PATHANB
ยท

AI & ML interests

None yet

Recent Activity

new activity 8 days ago
huggingface/InferenceSupport:rajesh baliarsingh
replied to ZennyKenny's post 10 days ago
When I heard the Reasoning Dataset Competition deadline was extended to 9 May, I knew I had time to get in one more entry. ๐Ÿ”ฅ๐Ÿ”ฅ๐Ÿ”ฅ With the rise of Vibe Coding, and the potential risks that are introduced by humans letting LLMs build their apps for them, lots of people are (rightfully) concerned about the safety of the code that is hitting prod. In response to that, I'm happy to present my final submission to the Reasoning Dataset Competition and attempt to start benchmarking the ability of LLMs to identify unsafe and / or exploitable code by way of the CoSa (Code Safety) benchmark: https://huggingface.co/datasets/ZennyKenny/cosa-benchmark-dataset Currently a curated set of 200 examples, calibrated on OpenAI's standard issue models (GPT-4.1, o4 mini, and GPT-3.5 Turbo) as "baseline performance" (70% decile). Check it out and drop a โค๏ธ if you think it could be useful or hit the Community section with suggestions / critiques.
updated a model over 1 year ago
PATHANB/CHENEDUN
View all activity

Organizations

CHENEDUN's profile picture CHENEDUN's profile picture

spaces 1

No application file

IAS CHENEDUN

๐ŸŒ–

Jan 7, 2024

models 1

PATHANB/CHENEDUN

Updated Jan 7, 2024

datasets 0

None public yet
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs