Jędrzej Grabala

jgitsolutions

AI & ML interests

Local Drive Human Overseered System of Agents, LLMs, Langchains & other useful stuff on mid-to-low-end of commercial hardware.

Recent Activity

liked a Space 3 days ago
enzostvs/deepsite
liked a Space 5 days ago
enzostvs/qwensite
liked a Space 6 days ago
mteb/leaderboard
View all activity

Organizations

LangChain Agents Hub's profile picture LangChainDatasets's profile picture ZeroGPU Explorers's profile picture Dev Mode Explorers's profile picture

jgitsolutions's activity

reacted to fdaudens's post with 🔥 9 days ago
view post
Post
3036
Forget everything you know about transcription models - NVIDIA's parakeet-tdt-0.6b-v2 changed the game for me!

Just tested it with Steve Jobs' Stanford speech and was speechless (pun intended). The video isn’t sped up.

3 things that floored me:
- Transcription took just 10 seconds for a 15-min file
- Got a CSV with perfect timestamps, punctuation & capitalization
- Stunning accuracy (correctly captured "Reed College" and other specifics)

NVIDIA also released a demo where you can click any transcribed segment to play it instantly.

The improvement is significant: number 1 on the ASR Leaderboard, 6% error rate (best in class) with complete commercial freedom (cc-by-4.0 license).

Time to update those Whisper pipelines! H/t @Steveeeeeeen for the finding!

Model: nvidia/parakeet-tdt-0.6b-v2
Demo: nvidia/parakeet-tdt-0.6b-v2
ASR Leaderboard: hf-audio/open_asr_leaderboard
  • 1 reply
·
reacted to RiverZ's post with 👀 9 days ago
view post
Post
2498
🚀 Excited to Share Our Latest Work: In-Context Edit: Enabling Instructional Image Editing with In-Context Generation in Large Scale Diffusion Transformer~

🎨 Daily Paper:
In-Context Edit: Enabling Instructional Image Editing with In-Context Generation in Large Scale Diffusion Transformer (2504.20690)


🔓 Code is now open source!
🔥 Huggingface DEMO:
RiverZ/ICEdit

🌐 Project Website: https://river-zhang.github.io/ICEdit-gh-pages/
🏠 GitHub Repository: https://github.com/River-Zhang/ICEdit/blob/main/scripts/gradio_demo.py
🤗 Huggingface:
sanaka87/ICEdit-MoE-LoRA

📄 arxiv Paper:
In-Context Edit: Enabling Instructional Image Editing with In-Context Generation in Large Scale Diffusion Transformer (2504.20690)


🔥 Why it’s cool:
- Achieves high-quality, multi-task image editing.
- Uses only 1% of the training parameters and 0.1% of the training data compared to existing methods — extremely efficient
- Beats several commercial models on background preservation, ID control, and consistency
- Open-source, low-cost, faster, and stronger — think of it as the “DeepSeek of image editing” 👀

We also implemented a Gradio demo app, available directly in our GitHub repo! And we made a flashy demo video — happy to send it your way!
reacted to seawolf2357's post with 👀 about 1 month ago
view post
Post
6508
🔥 AgenticAI: The Ultimate Multimodal AI with 16 MBTI Girlfriend Personas! 🔥

Hello AI community! Today, our team is thrilled to introduce AgenticAI, an innovative open-source AI assistant that combines deep technical capabilities with uniquely personalized interaction. 💘

🛠️ MBTI 16 Types SPACES Collections link
seawolf2357/heartsync-mbti-67f793d752ef1fa542e16560

✨ 16 MBTI Girlfriend Personas

Complete MBTI Implementation: All 16 MBTI female personas modeled after iconic characters (Dana Scully, Lara Croft, etc.)
Persona Depth: Customize age groups and thinking patterns for hyper-personalized AI interactions
Personality Consistency: Each MBTI type demonstrates consistent problem-solving approaches, conversation patterns, and emotional expressions

🚀 Cutting-Edge Multimodal Capabilities

Integrated File Analysis: Deep analysis and cross-referencing of images, videos, CSV, PDF, and TXT files
Advanced Image Understanding: Interprets complex diagrams, mathematical equations, charts, and tables
Video Processing: Extracts key frames from videos and understands contextual meaning
Document RAG: Intelligent analysis and summarization of PDF/CSV/TXT files

💡 Deep Research & Knowledge Enhancement

Real-time Web Search: SerpHouse API integration for latest information retrieval and citation
Deep Reasoning Chains: Step-by-step inference process for solving complex problems
Academic Analysis: In-depth approach to mathematical problems, scientific questions, and data analysis
Structured Knowledge Generation: Systematic code, data analysis, and report creation

🖼️ Creative Generation Engine

FLUX Image Generation: Custom image creation reflecting the selected MBTI persona traits
Data Visualization: Automatic generation of code for visualizing complex datasets
Creative Writing: Story and scenario writing matching the selected persona's style

  • 1 reply
·
reacted to Yehor's post with 🔥 about 1 month ago