MultiTransformer (Multi🤖Transformers)

hesamation

posted an update 2 days ago

Post

2684

this book actually exists for free, “the little book of deep learning”. best to refresh your mind about DL basics:
> foundations of machine learning
> how models train
> common layers (dropout, pooling…)
> basic intro to LLMs
actually optimized for mobile.

Book: https://fleuret.org/public/lbdl.pdf

Nymbo

posted an update 4 days ago

Post

1095

Haven't seen this posted anywhere - Llama-3.3-8B-Instruct is available on the new Llama API. Is this a new model or did someone mislabel Llama-3.1-8B?

1 reply

·

daavoo

posted an update 4 days ago

Post

1426

Have you heard about the Agent2Agent Protocol (A2A)?

We have just released an option in https://github.com/mozilla-ai/any-agent to serve with A2A any of the supported agent frameworks (Agno, Google ADK, Langchain, LlamaIndex, OpenAI Agents SDK, smolagents and tinyagent)!

Check the docs https://mozilla-ai.github.io/any-agent/serving/

# google_expert.py
from any_agent import AgentConfig, AnyAgent
from any_agent.config import ServingConfig
from any_agent.tools import search_web

agent = AnyAgent.create(
    "google",
    AgentConfig(
        name="google_expert",
        model_id="gpt-4.1-nano",
        instructions="You must use the available tools to find an answer",
        description="An agent that can answer questions about the Google Agents Development Kit (ADK).",
        tools=[search_web]
    )
)

agent.serve(ServingConfig(port=5001))

daavoo

posted an update 11 days ago

Post

1357

We've just released a new version of https://github.com/mozilla-ai/any-agent , including a Python implementation of https://huggingface.co/blog/tiny-agents!

Give it a ⭐!

from any_agent import AnyAgent, AgentConfig
from any_agent.config import MCPStdioParams

agent = AnyAgent.create(
    "tinyagent",
    AgentConfig(
        model_id="gpt-4.1-nano",
        instructions="You must use the available tools to find an answer",
        tools=[
            MCPStdioParams(
                command="uvx",
                args=["duckduckgo-mcp-server"]
            )
        ]
    )
)

result = agent.run(
    "Which Agent Framework is the best??"
)
print(result.final_output)

Nymbo

posted an update 12 days ago

Post

1065

PSA for anyone using Nymbo/Nymbo_Theme or Nymbo/Nymbo_Theme_5 in a Gradio space ~

Both of these themes have been updated to fix some of the long-standing inconsistencies ever since the transition to Gradio v5. Textboxes are no longer bright green and in-line code is readable now! Both themes are now visually identical across versions.

If your space is already using one of these themes, you just need to restart your space to get the latest version. No code changes needed.

mkluczek

posted an update 13 days ago

Post

280

Expansion of Global and Dense Open Embeddings Dataset of Earth 🌍

We updated our previous embeddings release with three models MMEarth and DeCUR-S2, DeCUR-S1 of the Major TOM embeddings dataset, developed in collaboration with CloudFerro S.A. asterisk labs and Φ-lab, European Space Agency - ESA. Together with @mikonvergence , Jędrzej S. Bojanowski, we extend the open-access collection of open dataset of Copernicus embeddings built at global scale, providing dense coverage across the entire acquisition area of Sentinel-1 and Sentinel-2 sensors.

Total embedding resources after the update:
- 51 TB of AI-embeddings generated from processed Sentinel data,
- over 40 billion embedding vectors,
- processing of 147 TB of raw satellite data,
- analysis covering more than 15 million Sentinel-1 and Sentinel-2 scenes and more than 16 trillion pixels.

This project delivers open and free vectorized expansions of Major TOM datasets available on CREODIAS and Hugging Face, setting a new standard for embedding releases and enabling lightweight, scalable ingestion of Earth Observation (EO) data for countless applications.

Datasets:
Major-TOM/Core-S2L2A-MMEarth
Major-TOM/Core-S2L1C-DeCUR
Major-TOM/Core-S1RTC-DeCUR

#EarthObservation #AI #CloudFerro #asterisklabs #ESA

daavoo

posted an update 19 days ago

Post

1933

We have just released a new version of⭐https://github.com/mozilla-ai/any-agent ⭐exposing an API to be used in async contexts:

import asyncio
from any_agent import AgentConfig, AnyAgent, TracingConfig
from any_agent.tools import search_web

async def main():
    agent = await AnyAgent.create_async(
        "openai",
        AgentConfig(
            model_id="gpt-4.1-mini",
            instructions="You are the main agent. Use the other available agents to find an answer",
        ),
        managed_agents=[
            AgentConfig(
                name="search_web_agent",
                description="An agent that can search the web",
                model_id="gpt-4.1-nano",
                tools=[search_web]
            )
        ],
        tracing=TracingConfig()
    )

    await agent.run_async("Which Agent Framework is the best??")

if __name__ == "__main__":
    asyncio.run(main())

daavoo

posted an update 22 days ago

Post

2057

Another day, another release in
⭐https://github.com/mozilla-ai/any-agent ⭐

You can now use MCP (Model Context Protocol) tools via SSE (Server-Sent Events):

from any_agent import AgentConfig, AnyAgent
from any_agent.config import MCPSseParams

agent = AnyAgent.create(
    "smolagents",
    AgentConfig(
        model_id="gpt-4o-mini",
        tools=[
            MCPSseParams(
                url="http://localhost:8000/sse"
            ),
        ]
    )
)
agent.run("What do MCP and SSE mean?")

See SuperGateway for an easy way to turn a Stdio server into an SSE server.

hesamation

posted an update 23 days ago

Post

2953

The best researchers from DeepSeek, OpenAI, Microsoft, and ByteDance explored RL and Reasoning in LLMs,

Here's some of their key findings:

1/ RL can further improve distilled models. These models are essentially SFT fine-tuned with the data generated by larger models, and the SFT+RL combo does not disappoint.

This is verified in the DeepSeek-R1 paper.

2/ both GRPO and PPO algorithms suffer from length bias; they encourage longer responses. This can be tackled by introducing explicit rewards based on the length of the answer.

3/Most reasoning research is focused on code and math. But training models on logic puzzles improves them for mathematical tasks too.

This shows the RL reasoning is generalized beyond the specific domain knowledge.

Previous research also shows RL can be a great generalizer.

4/The reasoning might not be only induced by RL; it might already be hidden in the base models due to the pre-training and CoT data they were trained on.

So while RL does wake up the reasoning beast, maybe it's not the only solution (e.g. other methods such as distillation)

5/ back to the length bias; reasoning models tend to generate longer responses for wrong answers. RL might be the culprit.

RL favours longer answers when the reward is negative, to dilute the penalty per individual token and lower the loss.

This might explain the "aha" moments!

6/ OpenAI's competitive programming paper showed an interesting finding:

o3 can learn its own test-time strategies (like writing an inefficient but correct solution to verify the answer of an optimized solution)

RL helps LLMs develop their own reasoning & verification methods.
The recent article by @rasbt helped me a lot in getting a broad view of the recent research on reasoning models.

He also lists more influential papers on this topic, It's a must-read if you're interested.

check it out 👇
https://magazine.sebastianraschka.com/p/the-state-of-llm-reasoning-model-training

hesamation

posted an update 26 days ago

Post

2158

OpenAI just released a 34-page practical guide to building agents,

Here's 10 things it teaches us:

1➜ agents are different from workflows: they are complete autonomous systems that perform tasks on your behalf. many applications use LLMs for workflows, but this is not an agent.

2➜ use them for tricky stuff: complex decision making, dynamic rules, unstructured data

3➜ core recipe: each agent has three main components: Model (the brain), Tools, Instructions on how to behave

4➜ choose the right brain: set up evals to get a baseline performance, use a smart model to see what's possible, gradually downgrade the model for cost and speed

5➜ tools are key: choose well-defined and tested tools. an agent needs tools to retrieve data and context, and take actions.

6➜ instruction matters A LOT: be super clear telling the agent its goals, steps, and rules. Vague instructions = unpredictable agent. Be explicit.

7➜ start simple, then scale: often a single agent with several tools is ok. don't jump to complex multi-agent systems immediately.

8➜ if you use multi-agents: you can have a "manager" agent directing traffic to specialist agents, or have agents hand off tasks to each other.

9➜ gaurdrails are a MUST: check user input for weird stuff, make sure the agent isn't about to do something risky, filter out private info, block harmful content. Don't let it run wild.

10➜ build and plan for humans: start small, test, improve. always have a plan for when the agent gets stuck or is about to do something high-risk.

Download: https://t.co/fJaCkgf7ph

3 replies

·

daavoo

posted an update 27 days ago

Post

1303

New release in https://github.com/mozilla-ai/any-agent 🤖

You can now use "managed_agents" also in langchain and llama_index, in addition to the other frameworks:

from any_agent import AgentConfig, AgentFramework, AnyAgent
from any_agent.tracing import setup_tracing

framework = AgentFramework("langchain")  # also in AgentFramework("llama_index") and the rest of frameworks
setup_tracing(framework)

agent = AnyAgent.create(
    framework,
    AgentConfig(
        model_id="gpt-4.1-mini",
        instructions="You are the main agent. Use the other available agents to find an answer",
    ),
    managed_agents=[
        AgentConfig(
            name="search_web_agent",
            description="An agent that can search the web",
            model_id="gpt-4.1-nano",
            tools=["any_agent.tools.search_web"]
        ),
        AgentConfig(
            name="visit_webpage_agent",
            description="An agent that can visit webpages",
            model_id="gpt-4.1-nano",
            tools=["any_agent.tools.visit_webpage"]
        )
    ]
)
agent.run("Which Agent Framework is the best??")

2 replies

·

hesamation

posted an update 28 days ago

Post

2908

this paper has been blowing up
they train an open-source multimodal LLM (InternVL3) that can compete with GPT-4o and Claude 3.5 Sonnet by:
> training text and vision on a single stage
> a novel V2PE positional encoding
> SFT & mixed preference optimization
Paper: InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models (2504.10479)
> test-time scaling

daavoo

posted an update about 1 month ago

Post

2824

Wondering how the new Google Agent Development Toolkit (ADK) compares against other frameworks? 🤔You can try it in any-agent 🚀

https://github.com/mozilla-ai/any-agent

agent = AnyAgent.create(
    AgentFramework("google"),
    AgentConfig(
        model_id="gpt-4o-mini"
    )
)
agent.run("Which Agent Framework is the best??")

1 reply

·

hesamation

posted an update about 1 month ago

Post

8934

Google published a 69-page whitepaper on Prompt Engineering and its best practices, a must-read if you are using LLMs in production:
> zero-shot, one-shot, few-shot
> system prompting
> chain-of-thought (CoT)
> ReAct

LINK: https://www.kaggle.com/whitepaper-prompt-engineering
> code prompting
> best practices

daavoo

posted an update about 1 month ago

Post

1836

After working on agent evaluation🔍🤖 the last weeks, we started to accumulate code to make trying different agent frameworks easier. From that code, we have built and just released a small library called any-agent.

Give it a try and a ⭐: https://github.com/mozilla-ai/any-agent

from any_agent import AgentConfig, AgentFramework, AnyAgent

agent = AnyAgent.create(
    AgentFramework("smolagents"),  # or openai, langchain, llama_index
    AgentConfig(
        model_id="gpt-4o-mini"
    )
)
agent.run("Which Agent Framework is the best??")

AtAndDev

posted an update about 1 month ago

Post

2966

Llama 4 is out...

3 replies

·

hesamation

posted an update about 1 month ago

Post

2897

The best researchers from Yale, Stanford, Google DeepMind, and Microsoft laid out all we know about Agents in a 264-page paper [book],

Here are some of their key findings:

They build a mapping of different agent components, such as perception, memory, and world modelling, to different regions of the human brain and compare them:

- brain is much more energy-efficient
- no genuine experience in agents
- brain learns continuously, agent is static

An agent is broken down to:
- Perception: the agent's input mechanism. can be improved with multi-modality, feedback mechanisms (e.g., human corrections), etc.
- Cognition: learning, reasoning, planning, memory. LLMs are key in this part.
- Action: agent's output and tool use.

Agentic memory is represented as:
- Sensory memory or short-term holding of inputs which is not emphasized much in agents.
- Short-term memory which is the LLM context window
- Long-term memory which is the external storage such as RAG or knowledge graphs.

The memory in agents can be improved and researched in terms of:
- increasing the amount of stored information
- how to retrieve the most relevant info
- combining context-window memory with external memory
- deciding what to forget or update in memory

The agent must simulate or predict the future states of the environment for planning and decision-making.

ai world models are much simpler than the humans' with their causal reasoning (cause-and-effect) or physical intuition.

LLM world models are mostly implicit and embedded.

EMOTIONS are a deep aspect of humans, helping them with social interactions, decision-making, or learning.

Agents must understand emotions to better interact with us.

But rather than encoding the feeling of emotions, they have a surface-level modelling of emotions.

Perception is the process by which an agent receives and interprets raw data from its surroundings.

READ PAPER: Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems (2504.01990)

hesamation

posted an update about 1 month ago

Post

2720

What, How, Where, and How Well? This paper reviews test-time scaling methods and all you need to know about them:
> parallel, sequential, hybrid, internal scaling
> how to scale (SFT, RL, search, verification)
> metrics and evals of test-time scaling

🔗paper: What, How, Where, and How Well? A Survey on Test-Time Scaling in Large Language Models (2503.24235)

If you want to learn what inference-time compute scaling is @rasbt has a great blog post on that:
https://magazine.sebastianraschka.com/p/state-of-llm-reasoning-and-inference-scaling

hesamation

posted an update about 1 month ago

Post

1801

this paper lists ways to make reasoning LLMs more efficient:
> enforce token limits per reasoning step
> route tasks to different models (small/large)
> compress reasoning chains during SFT
> reward based on reasoning length
> parallel search at test-time
and more...
@Xiaoye08 @yaful @Warrieryes
A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, and Beyond (2503.21614)

daavoo

posted an update about 2 months ago

Post

2018

🤖 🗺Mapped all(?) the swimming pools ️🏊 around another town with https://github.com/mozilla-ai/osm-ai-helper.

This time, I have mapped and contributed to https://www.openstreetmap.org more than 100 swimming pools around my wife's hometown. Only took about 20min to find them all (+~3 min verification) in a free Colab GPU🚀

Try it yourself around a single point: mozilla-ai/osm-ai-helper

Multi🤖Transformers

AI & ML interests

MultiTransformer's activity

AI & ML interests

Team members 101

MultiTransformer's activity