Shikhar Singh's picture

90 418

Shikhar Singh

AxAI

·

axe--

AI & ML interests

Commonsense & Language Grounding

Recent Activity

liked a model about 13 hours ago

Qwen/Qwen-Image

upvoted an article 21 days ago

Welcome Gemma 3: Google's all new multimodal, multilingual, long context open LLM

liked a model 24 days ago

moonshotai/Kimi-K2-Instruct

View all activity

Organizations

None yet

liked a model about 13 hours ago

Qwen/Qwen-Image

Text-to-Image • Updated about 12 hours ago • 9.72k • • 842

upvoted an article 21 days ago

Article

Welcome Gemma 3: Google's all new multimodal, multilingual, long context open LLM

By

and 3 others •

Mar 12

• 448

liked a model 24 days ago

moonshotai/Kimi-K2-Instruct

Text Generation • Updated 8 days ago • 411k • • 2.01k

reacted to maxiw's post with 👍 26 days ago

Post

3682

The new Qwen-2 VL models seem to perform quite well in object detection. You can prompt them to respond with bounding boxes in a reference frame of 1k x 1k pixels and scale those boxes to the original image size.

You can try it out with my space maxiw/Qwen2-VL-Detection

6 replies

·

upvoted 3 articles about 1 month ago

Article

Welcome the NVIDIA Llama Nemotron Nano VLM to Hugging Face Hub

By

and 11 others •

Jun 27

• 27

Article

Vision Language Models (Better, Faster, Stronger)

By

and 4 others •

May 12

• 495

Article

Gemma 3n fully available in the open-source ecosystem!

By

and 7 others •

Jun 26

• 114

liked a Space about 2 months ago

core OCR

camel doc ocr / core ocr / docscope ocr / monkey ocr

liked a model about 2 months ago

MiniMaxAI/MiniMax-M1-80k

Text Generation • 456B • Updated 29 days ago • 10k • • 665

upvoted an article about 2 months ago

Article

Fine-tune Llama 3.1 Ultra-Efficiently with Unsloth

By

•

Jul 29, 2024

• 352

liked a model about 2 months ago

Qwen/Qwen2.5-VL-7B-Instruct

Image-Text-to-Text • 8B • Updated Apr 6 • 5.85M • • 1.1k

liked a Space about 2 months ago

MMLU-Pro Leaderboard

More advanced and challenging multi-task evaluation

liked 3 Spaces 3 months ago

Open VLM Leaderboard

VLMEvalKit Evaluation Results Collection

LLM Performance Leaderboard

View LLM performance rankings

Qwen2.5 VL 72B Instruct

Interact with a multimodal chatbot using text and images

upvoted a collection 3 months ago

Describe Anything

Multimodal Large Language Models for Detailed Localized Image and Video Captioning • 7 items • Updated 15 days ago • 54

liked a model 4 months ago

microsoft/resnet-50

Image Classification • 0.0B • Updated Feb 13, 2024 • 153k • • 431

liked a Space 4 months ago

MinerU

A data extraction tool to convert PDF to Markdown and JSON

liked a model 4 months ago

juliozhao/DocLayout-YOLO-DocStructBench

Updated Oct 22, 2024 • 37

liked a Space 4 months ago

DocLayout YOLO

Demo for DocLayout-YOLO