AI & ML interests

None defined yet.

Recent Activity

lewtun submitted a paper 13 days ago

Single-minus gluon tree amplitudes are nonzero

sergiopaniego new activity 14 days ago

trl-lib/documentation-images:Upload 2 files

lewtun submitted a paper 14 days ago

Reasoning Cache: Continual Improvement Over Long Horizons via Short-Horizon RL

View all activity

sergiopaniego

posted an update about 10 hours ago

Post

234

What happens when you make an LLM drive a car where physics are real and actions can't be undone?

I ported CARLA, the autonomous driving simulator, to OpenEnv and added training support via TRL + Hugging Face Spaces.

The model interacts with the simulator through tool calls (observe, brake, change lane) and learns from a reward signal.

In 50 training steps, Qwen 0.6B learns to swerve and brake to avoid pedestrians in emergency situations.

The project supports text and vision (VLMs can see through a camera sensor), open-world driving with traffic, and multiple driving scenarios.

This builds on the carla-env project by sinatras, which originally placed LLMs inside CARLA for evaluation. We extended it with vision, new scenarios, rubric-based rewards, and made it trainable end-to-end.

Blog: https://huggingface.misakanetworks.com/blog/sergiopaniego/bringing-carla-to-openenv-trl/
CARLA env in OpenEnv: https://github.com/meta-pytorch/OpenEnv/tree/main/envs/carla_env
Training script: https://github.com/huggingface/trl/blob/main/examples/scripts/openenv/carla.py

albertvillanova

posted an update about 13 hours ago

Post

168

🚀 TRL v0.29.0 introduces trl-training: an agent-native training skill.

This makes the TRL CLI a structured, agent-readable capability, allowing AI agents to reliably execute training workflows such as:
- Supervised Fine-Tuning (SFT)
- Direct Preference Optimization (DPO)
- Group Relative Policy Optimization (GRPO)

We’re excited to see what the community builds on top of this.

If you’re working on AI agents, alignment research, or scalable RL training infrastructure: give TRL v0.29.0 a try! 🤗

The future of ML tooling is agent-native.
🔗 https://github.com/huggingface/trl/releases/tag/v0.29.0

qgallouedec

posted an update 8 days ago

Post

2538

@CohereLabs just released 🌿 Tiny Aya: a fully open-source 3B parameter model that speaks 70+ languages 🌍! But there’s a catch:

Tiny Aya is just a language model. It doesn’t support tool calling, the key capability that turns frontier models into powerful *agents*.
So the real question is:

How hard is it to turn Tiny Aya into an agent?

Turns out… it’s simple, thanks to Hugging Face TRL.
We’re sharing a hands-on example showing how to train Tiny Aya to turn it into a tool-calling agent using TRL, unlocking what could become the first *massively multilingual open agent*.

Small model. Global reach. Agent capabilities.

👉 https://github.com/huggingface/trl/blob/main/examples/notebooks/sft_tool_calling.ipynb

1 reply

sergiopaniego

posted an update 8 days ago

Post

1367

Tiny Aya 🌿 just dropped from @CohereLabs , a really powerful multilingual small model!

To celebrate, we cooked up fresh resources to train it for tool calling 🔧

> Free Google Colab guide: https://colab.research.google.com/github/huggingface/trl/blob/main/examples/notebooks/sft_tool_calling.ipynb
> Standalone training script: https://github.com/huggingface/trl/blob/main/examples/scripts/sft_tiny_aya_tool_calling.py

lewtun

submitted a paper to Daily Papers 13 days ago

Single-minus gluon tree amplitudes are nonzero

Paper • 2602.12176 • Published 14 days ago • 8

sergiopaniego

posted an update 14 days ago

Post

489

The latest piece by @MiniMax-AI is a must-read.

It tries to break the impossible triangle of agent RL: throughput × stability × flexibility.

A lot to learn here, go read it 🫵
https://huggingface.misakanetworks.com/blog/MiniMax-AI/forge-scalable-agent-rl-framework-and-algorithm

sergiopaniego

in trl-lib/documentation-images 14 days ago

Upload 2 files

#4 opened 14 days ago by

cmunley1

lewtun

submitted a paper to Daily Papers 14 days ago

Reasoning Cache: Continual Improvement Over Long Horizons via Short-Horizon RL

Paper • 2602.03773 • Published 23 days ago • 10

albertvillanova

posted an update 15 days ago

Post

1661

5 years already working in democratizing AI 🤗
Grateful to be part of such an awesome team making it happen every day.

sergiopaniego

posted an update 18 days ago

Post

455

if you're looking for a good first issue to get your open-source journey started, you could contribute to this TRL issue by documenting one impactful paper in the docs

we have a broad list to cover!! 🧐

https://github.com/huggingface/trl/issues/4407

julien-c

submitted a paper to Daily Papers 27 days ago

Shaping capabilities with token-level data filtering

Paper • 2601.21571 • Published 29 days ago • 27

sergiopaniego

posted an update 29 days ago

Post

508

Meet the Post-Training Toolkit (PTT), which easily integrates with TRL via a single callback, by Aditya Challapally ( @microsoft ):

🔍 Detects training issues early
🛠 Lets you intervene safely
📊 Keeps long training runs stable, auditable & efficient

Microsoft blog: https://devblogs.microsoft.com/engineering-at-microsoft/diagnosing-instability-in-production-scale-agent-rl/

Integration guide: https://huggingface.misakanetworks.com/docs/trl/main/en/ptt_integration

Code: https://github.com/microsoft/post-training-toolkit

sergiopaniego

posted an update 29 days ago

Post

2580

New TRL + OpenEnv example! 💥

Fine tune an LLM for playing Sudoku using an RL env via OpenEnv

Includes a script that runs on 1 or multiple GPUs with vLLM, plus a Colab-ready notebook.

Enjoy!

Notebook: https://colab.research.google.com/github/huggingface/trl/blob/main/examples/notebooks/openenv_sudoku_grpo.ipynb

Script: https://github.com/huggingface/trl/blob/main/examples/scripts/openenv/sudoku.py

1 reply

qgallouedec

updated a dataset 30 days ago

trl-lib/trackio-dataset

Viewer • Updated 1 minute ago • 3.83k • 20.5k

qgallouedec

updated a dataset about 1 month ago

trl-lib/documentation-images

Viewer • Updated 14 days ago • 11 • 57.5k

qgallouedec

updated a Space about 1 month ago

Trackio

🚀

Track and visualize data streams in real-time

qgallouedec

in trl-lib/trackio about 1 month ago

i'm really liking the GPU usage and perf tracking here

#1 opened 5 months ago by

Tonic

sergiopaniego

posted an update about 1 month ago

Post

2186

Date idea: read the entire Transformers v5.0.0 release notes

Officially stable now: https://github.com/huggingface/transformers/releases/tag/v5.0.0

1 reply

sergiopaniego

posted an update about 1 month ago

Post

1641

FunctionGemma Tuning Lab is a new no-code tool by @google that lets you fine-tune a model directly from the browser, with no coding knowledge required, using TRL behind the scenes.

blog: https://developers.googleblog.com/a-guide-to-fine-tuning-functiongemma/

try it out: google/functiongemma-tuning-lab

This example builds on a more advanced one for learning fine-tuning with SFT using TRL: https://ai.google.dev/gemma/docs/functiongemma/finetuning-with-functiongemma

1 reply

sergiopaniego

posted an update about 1 month ago

Post

853

TRL v0.27.0 is out!! 🥳

It includes GDPO, the latest variant of GRPO for multi-reward RL ✨
GDPO decouples reward normalization to avoid reward collapse and improve per-reward convergence — developed by
@sliuau @SimonX et al.

Explore the paper: GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization (2601.05242)

Explore the full set of changes here:
https://github.com/huggingface/trl/releases/tag/v0.27.0

AI & ML interests

Recent Activity

Team members 10

trl-lib's activity

Upload 2 files

Trackio

i'm really liking the GPU usage and perf tracking here