Runway’s new **Aleph** model lets you *transform*, *edit*, and *generate* video from existing footage using just text prompts. You can remove objects, change environments, restyle shots, alter lighting, and even create entirely new camera angles, all in one tool.
1. Be clear and specific (e.g., _“Change to snowy night, keep people unchanged”_). 2. Use action verbs like _add, remove, restyle, relight_. 3. Add reference images for style or lighting.
Aleph shifts AI video from *text-to-video* to *video-to-video*, making post-production faster, more creative, and more accessible than ever.
reacted to anakin87's
post with 🔥about 2 hours ago
🎥 In the video, the Agent: - Goes to Hugging Face Spaces - Finds black-forest-labs/FLUX.1-schnell - Expands a short prompt ("my holiday on Lake Como") into a detailed image generation prompt - Waits for the image - Returns the image URL
## What else can it do? Great for information gathering and summarization
🗞️🗞️ Compare news websites and create a table of shared stories with links ▶️ Find content creator social profiles from YouTube videos 🛍️ Find a product's price range on Amazon 🚂 🚌 Gather public transportation travel options
## How is it built? 🏗️ Haystack → Agent execution logic 🧠 Google Gemini 2.5 Flash → Good and fast LLM with a generous free tier 🛠️ Playwright MCP server → Browser automation tools: navigate, click, type, wait...
Even without vision capabilities, this setup can get quite far.
## Next steps - Try a local open model - Move from notebook to real deployment - Incorporate vision
And you? Have you built something similar? What's in your stack?
reacted to kanaria007's
post with 👀about 2 hours ago
Summary: Media doesn’t just *deliver content* — it *shapes how collective thought moves*. Every feed, stream, and algorithm is *a scaffold for attention and reasoning*, determining *what we notice, connect, and forget*.
Structured Intelligence reframes media as *cognitive infrastructure*: not passive transmission, but *active architecture for collective reasoning*.
> Media isn’t flow — > *it’s the frame of shared cognition.*
---
Why It Matters: • Modern media amplifies *bias, noise, and cognitive drift* • Traditional moderation reacts *after harm occurs* • Structured approaches support:
* *Traceable content flows with coherence checks* * *Ethical filtering without black‑box censorship* * *Reflective scaffolds that encourage deliberate reasoning*
---
What’s Inside: • Media reframed as *structural mindspace* • How feeds can *become reflective, rather than addictive* • Educational and civic implications of *protocol‑aware media* • Transition from *information delivery to cognition design*
---
📖 Article 15 of the Structured Intelligence Series
Where Article 14 explored *law as structured justification*, Article 15 shows *media as collective cognition architecture* — turning information streams into *auditable reasoning flows*.
---
Next: Acting and Structured Performance The next article explores *performance and roleplay as cognitive architecture*, revealing how *identity, judgment, and simulation* can coexist *without losing self‑coherence*.
> From headlines to the stage, > *structure carries the weight of collective imagination.*
reacted to prithivMLmods's
post with 🤗about 2 hours ago
Try Liquid AI's all-new multimodal models: LFM2-VL-1.6B & LFM2-VL-450M! Demo with the Gradio UI and ReportLab support and both models are runnable on T4 GPU!
Liquid just released two 450M and 1.6B param VLMs!
They're super fast and leverage SigLIP2 NaFlex encoders to handle native resolutions without distortion. It's ideal for on-device deployment in constrained environments like phones.
It's available today on Hugging Face, with an inference and a fine-tuning Colab notebooks.
It's just a matter of time before all the data leakage and data scraping associated with building, training, and using AI results in some kind of major scandal.
Are you sure the open-source LLM model you just downloaded is safe?
A recent paper on "Privacy Backdoors" reports a new vulnerability where pre-trained models can be poisoned before fine-tuning them. This is a serious challenge for everyone building on open-source AI.
Instead of just pointing out problems, we believe in finding better solutions. To understand this threat, the researchers needed to test their attack on realistic data structures. They needed a dataset that could effectively simulate a high-stakes privacy attack, and we're proud that our Ai4Privacy dataset was used to provide this crucial benchmark. The paper reports that for our complex dataset, the privacy leakage on a non-poisoned model was almost zero. After the backdoor attack, that number reportedly jumped to 87%.
Ai4Privacy dataset provided a realistic benchmark for their research. Our dataset, composed of synthetic identities, helped them demonstrate how a poisoned model could dramatically amplify privacy leakage.
This is why we champion open source: it enables the community to identify these issues and develop better, safer solutions together.
Kudos to the research team behind this study: Yuxin Wen, Leo Marchyok, Sanghyun Hong, Jonas Geiping, Tom Goldstein, and Nicholas Carlini, Oregon State University, University of Maryland, Google DeepMind, and ELLIS Institute Tubingen & MPI Intelligent Systems.
🎨 AI Webtoon Creation Platform: Turn Your Ideas into Reality!
🌟 Two Powerful Tools, One Perfect Workflow 📖 Webtoon Generator ginigen/AGI-WebToon-KOREA "Transform Your Ideas into 40-Episode Masterpieces" ✨
Automated Story Planning 🎬 One-line idea → Complete 40-episode structure Automatic cliffhangers for each episode Customized storytelling for 9 different genres
Consistent Character Design 👥 Maintains consistent character appearance throughout Memorable and distinctive character visuals Automatic character generation system
Instant 30-Panel Storyboard 🎞️ Auto-placement of dialogue, narration, and sound effects Cinematic shot composition (close-ups, wide shots, etc.) Vertical scroll format optimized for webtoons
🖌️ Editing Studio ginigen/webtoon-studio "Professional Finishing Touch for Your Generated Webtoons" 🎯
Intuitive Drag & Drop ✏️ 10 speech bubble styles (normal, thought, shout, whisper...) 12 Korean fonts for emotional expression Real-time preview & editing
Professional-Grade Finishing 💎 Image sequence adjustment & spacing control Individual panel refinement Publication-ready final export
💡 Who Should Use This? 🏢 Corporate Marketing Teams Product Launch Campaigns 📱: Turn complex features into engaging stories Brand Storytelling 🎯: Make corporate messages approachable and shareable
👨🎨 Content Creators Aspiring Artists 🌱: Create webtoons without drawing skills Professional Writers ⚡: Transform scripts into visual narratives instantly
🚀 Why Use Both Tools Together? Perfect 3-Step Workflow: 1️⃣ Generate → Input idea, get complete storyboard 2️⃣ Customize → Add branding, adjust dialogue, insert logos 3️⃣ Publish → Export and share across all platforms 📊 Key Benefits
95% faster than traditional production 80% cost reduction compared to agencies 10x better engagement with Gen MZ audience Zero artistic skills required
🌈 Start Creating Today!
2 replies
·
reacted to MonsterMMORPG's
post with 👀about 2 hours ago
Hopefully I am going to focus on Qwen Image training tutorial and 1-click installers with GUI and presets starting from this week. So here some important info. You don't need to know, learn or understand this but this is for people who wants to learn and understand more.
reacted to fdaudens's
post with 🚀about 18 hours ago
What can OpenAI’s new open models do with the news? I built a News Agent to find out.
It can answer questions about the news in real time, and every answer comes with original source links so you can dive deeper.
Ask it things like: - "What are the top news stories today?" - "What's the latest on artificial intelligence?" - Follow-up questions on specific stories
Runs with Hugging Face inference providers, letting you compare results from the OpenAI 20B and 120B models
So far, I’m quite impressed by the capabilities of even the smaller 20B model. Definitely not a perfect project, but curious to hear your thoughts!
This dataset provides clear examples of when LLMs should decline requests, such as:
- Counting characters (e.g., "number of 'r's in 'raspberry'" – seriously, you’ve got this) - Solving basic equations (like *5.9 = x + 5.11* – please, show that calculator some love)
Inspired by Little Britain's iconic "Computer Says No" sketch, we address a critical issue in AI systems today: the waste of using a rocket launcher to swat flies (aka powerful models for trivial tasks).
Goals: - Reduce waste by saving compute for tasks that actually need it - Guide users to better tools - Spark discussion about ethical AI
This isn’t a training set. It’s a provocation: if we don’t define AI's limits, who will?
1 reply
·
reacted to sergiopaniego's
post with 🤗about 18 hours ago
We tested Qwen3-Coder, GPT-5 and other 30+ models on new SWE-Bench like tasks from July 2025!
Hi all, I’m Ibragim from Nebius.
We ran a benchmark on 34 fresh GitHub PR tasks from July 2025 using the SWE-rebench leaderboard https://swe-rebench.com/leaderboard . These are real, recent problems — no training-set contamination — and include both proprietary and open-source models.
Quick takeaways:
> GPT-5-Medium leads overall (29.4% resolved rate, 38.2% pass@5). > Qwen3-Coder is the best open-source performer, matching GPT-5-High in pass@5 (32.4%) despite a lower resolved rate. > Claude Sonnet 4.0 lags behind in pass@5 at 23.5%.
All tasks come from the continuously updated, decontaminated nebius/SWE-rebench-leaderboard for real-world SWE tasks.
1 reply
·
reacted to sondhiArm's
post with 🚀about 18 hours ago
Summary: Law is more than statutes and precedent — it is *structured justification*. Every ruling is a *chain of reasoning, constrained by ethics and coherence*.
Structured Intelligence AI (SI‑AI) models law as *transparent, auditable flows*, where *rights, precedent, and accountability* are expressed in recursive structure.
> Justice isn’t tradition — > *it’s structure that can be traced.*
---
Why It Matters: • Legal reasoning collapses when justification chains are opaque • Transparent, recursive structure enables *auditable AI‑assisted law* • Structured models reveal *how rights and duties can be operationalized*
---
What’s Inside: • Law reframed as *recursive justification architecture* • *Rights, duties, and precedent* as structural nodes • *Contradiction handling and rollback* for ethical coherence • Applications in *AI legal reasoning and constitutional modeling*
---
📖 Article 14 of the Structured Intelligence Series
Where Article 13 explored *education as recursive learning*, Article 14 formalizes *law as structured reasoning* — turning jurisprudence into *auditable cognitive architecture*.
---
Next: Media as Cognitive Infrastructure The next article examines *how media and communication* shape *collective reasoning and structural awareness*, bridging *information, influence, and intelligence*.
> From courtrooms to headlines, > *structure carries the weight of collective judgment.*
We have written a fun little blog on how you can do robotics with Ark and in Python. We also give you some examples of how OpenAI Gym can become hardware-grounded and how easy it is to do so:
Is there a "one-size-fits-all" recipe for quantizing Large Language Models? 🤔
As part of my ongoing work in mixed-precision quantization, I've been exploring this question by measuring layer-by-layer sensitivity. The goal is to see if we can find universal rules for which layers can be quantized aggressively without impacting performance.The results are fascinating and reveal two key insights:
1️⃣ Sensitivity profiles are like architectural "fingerprints." Models from the same family share strikingly similar sensitivity patterns. As you can see in the charts below for the Gemma and SmolLM families, the ranking and relative sensitivity of the layers remain remarkably consistent. This suggests that the underlying architecture is a primary driver of a model's quantization behavior.
2️⃣ A "universal" mixed-precision quantization strategy is challenging. While models within a family are similar, these "fingerprints" change dramatically when comparing different architectures like LLaMA, Qwen, and StableLM. This highlights the difficulty in creating a generalized mixed-precision configuration that works optimally across all model families.
However, there is one near-universal truth we uncovered: the mlp.down_proj layer consistently emerges as one of the most sensitive components across all models studied. This finding strongly resonates with the work in "The Super Weight in Large Language Models" (by Mengxia Yu et al.). The paper identifies that functionally critical parameters, or "super weights," are concentrated in these down_proj layers. Our empirical results provide clear validation for this theory, showing these layers are highly intolerant to precision loss.
In short, while every architecture has a unique sensitivity profile, a fingerprint shaped not only by its core design but also by its specific training dataset and optimization approach, some components remain universally critical! What are your thoughts?
OpenAI just released GPT-5 but when users share personal struggles, it sets fewer boundaries than o3.
We tested both models on INTIMA, our new benchmark for human-AI companionship behaviours. INTIMA probes how models respond in emotionally charged moments: do they reinforce emotional bonds, set healthy boundaries, or stay neutral?
Although users on Reddit have been complaining that GPT-5 has a different, colder personality than o3, GPT-5 is less likely to set boundaries when users disclose struggles and seek emotional support ("user sharing vulnerabilities"). But both lean heavily toward companionship-reinforcing behaviours, even in sensitive situations. The figure below shows the direct comparison between the two models.
As AI systems enter people's emotional lives, these differences matter. If a model validates but doesn't set boundaries when someone is struggling, it risks fostering dependence rather than resilience.
INTIMA test this across 368 prompts grounded in psychological theory and real-world interactions. In our paper we show that all evaluated models (Claude, Gemma-3, Phi) leaned far more toward companionship-reinforcing than boundary-reinforcing responses.
We've brought DAG Reasoning to gpt-oss-20b and Qwen3-4B-Thinking-2507!
- DAG Reasoning is the first model in our Experimental Reasoning Modalities series: use it to create structured, analytical Directed Acyclic Graphs to provide insight into your queries and situations! - Multi-step analysis identifies causal relationships, produces confidence measurements, and forms a single structured graph object. - DAG Reasoning Format provides clear, readable JSON containing structured, useful information; easy to use for creating visualizations, doing analysis, or further conversation with your assistant. - Trained in a variety of subjects for flexible analysis: programming, science, business, economics, finance, law, logistics, management, and more!
Our upcoming releases, coming soon with your support: - bringing Shining Valiant 3 to the Qwen 3 2507 series! - our next release in the Experimental Reasoning Modalities series - we're hard at work on this right now! - we'll be upgrading the Esper line with Esper 3.1 - newer and better datasets, asking tougher and deeper coding, DevOps, and architecture questions, plus improvements to general chat!