AI & ML interests
None defined yet.
Recent Activity
View all activity
boroll2347
posted an
update 10 days ago
Post
3034
Muon vs MuonClip vs Muon+Adamw
Muon has gone from an experiment to a mainstream optimizer, but does it hold up for fine‑tuning? We ran head‑to‑head tests on Qwen3‑4B (10k+ high‑quality instruction rows) to find out.
Short story: Pure Muon converged fastest at the start, but its gradient‑norm spikes made training unstable. MuonClip (Kimi K2’s clipping) stabilizes long pretraining runs, yet in our small‑scale fine‑tune it underperformed, lower token accuracy and slower convergence. The winner was the hybrid: Muon for 2D layers + AdamW for 1D layers. It delivered the best balance of stability and final performance and even beat vanilla AdamW.
Takeaway: for small-scale fine-tuning, hybrid = practical and reliable.
Next Step: scale to larger models/datasets to see if Muon’s spikes become catastrophic or if clipping wins out.
Full Blog Link: https://huggingface.misakanetworks.com/blog/KingNish/optimizer-part1
Muon has gone from an experiment to a mainstream optimizer, but does it hold up for fine‑tuning? We ran head‑to‑head tests on Qwen3‑4B (10k+ high‑quality instruction rows) to find out.
Short story: Pure Muon converged fastest at the start, but its gradient‑norm spikes made training unstable. MuonClip (Kimi K2’s clipping) stabilizes long pretraining runs, yet in our small‑scale fine‑tune it underperformed, lower token accuracy and slower convergence. The winner was the hybrid: Muon for 2D layers + AdamW for 1D layers. It delivered the best balance of stability and final performance and even beat vanilla AdamW.
Takeaway: for small-scale fine-tuning, hybrid = practical and reliable.
Next Step: scale to larger models/datasets to see if Muon’s spikes become catastrophic or if clipping wins out.
Full Blog Link: https://huggingface.misakanetworks.com/blog/KingNish/optimizer-part1
Post
2663
I tested Muon vs MuonClip vs Muon+AdamW for fine-tuning LLMs
Just published a blog on that, Read here 👉 https://huggingface.misakanetworks.com/blog/KingNish/optimizer-part1
Just published a blog on that, Read here 👉 https://huggingface.misakanetworks.com/blog/KingNish/optimizer-part1
mrfakename
posted an
update 3 months ago
Post
15996
Excited to share that I've joined the Hugging Face Fellows program! 🤗
Looking forward to contributing to & working more closely with the open-source ecosystem - huge thanks to everyone who's supported me on this journey! 🚀
Looking forward to contributing to & working more closely with the open-source ecosystem - huge thanks to everyone who's supported me on this journey! 🚀
Post
1112
The #1 trending AI/ML dataset today 🏆
Massive scale, diversity and end-to-end potential from nvidia !
nvidia/PhysicalAI-Autonomous-Vehicles
Massive scale, diversity and end-to-end potential from nvidia !
nvidia/PhysicalAI-Autonomous-Vehicles
Post
742
The new King 👑has arrived!
Moonshot AI now the top model on Hugging Face 🔥
moonshotai/Kimi-K2-Thinking
Moonshot AI now the top model on Hugging Face 🔥
moonshotai/Kimi-K2-Thinking
Post
2823
💸🤑You don’t need 100 GPUs to train something amazing!
Our Smol Training Playbook teaches you a better path to world-class LLMs, for free!
Check out the #1 trending space on 🤗 :
HuggingFaceTB/smol-training-playbook
Our Smol Training Playbook teaches you a better path to world-class LLMs, for free!
Check out the #1 trending space on 🤗 :
HuggingFaceTB/smol-training-playbook
mrfakename
posted an
update 4 months ago
Post
6267
Trained a model for emotion-controllable TTS based on MiMo audio on LAION's dataset.
Still very early and does have an issue with hallucinating but results seem pretty good so far, given that it is very early into the training run.
Will probably kick off a new run later with some settings tweaked.
Put up a demo here: https://huggingface.misakanetworks.com/spaces/mrfakename/EmoAct-MiMo
(Turn 🔊 on to hear audio samples)
Still very early and does have an issue with hallucinating but results seem pretty good so far, given that it is very early into the training run.
Will probably kick off a new run later with some settings tweaked.
Put up a demo here: https://huggingface.misakanetworks.com/spaces/mrfakename/EmoAct-MiMo
(Turn 🔊 on to hear audio samples)
Post
2323
Cool stuff these past weeks on huggingface! 🤗 🚀 !
• 📈Trackio, local-first W&B alternative
https://github.com/gradio-app/trackio/issues
• 🌍EmbeddingGemma, 300M-param, multilingual embeddings, on-device
https://huggingface.misakanetworks.com/blog/embeddinggemma
• 💻Open LLMs in VS Code (Inference Providers)
https://x.com/reach_vb/status/1966185427582497171
• 🤖Smol2Operator GUI agents
https://huggingface.misakanetworks.com/blog/smol2operator
• 🖼️Gradio visible watermarking
https://huggingface.misakanetworks.com/blog/watermarking-with-gradio
• 📈Trackio, local-first W&B alternative
https://github.com/gradio-app/trackio/issues
• 🌍EmbeddingGemma, 300M-param, multilingual embeddings, on-device
https://huggingface.misakanetworks.com/blog/embeddinggemma
• 💻Open LLMs in VS Code (Inference Providers)
https://x.com/reach_vb/status/1966185427582497171
• 🤖Smol2Operator GUI agents
https://huggingface.misakanetworks.com/blog/smol2operator
• 🖼️Gradio visible watermarking
https://huggingface.misakanetworks.com/blog/watermarking-with-gradio
jeffboudier
posted an
update 6 months ago
Post
3194
Quick 30s demo of the new Hub > Azure AI integration to deploy HF models in your own Azure account. Now with Py and CLI!
GG @alvarobartt @kramp @pagezyhf
GG @alvarobartt @kramp @pagezyhf
1024m
authored 2 papers 7 months ago
Post
2215
Wan 2.2 fast upto 10x faster than original wan 2.2
Model: FastVideo/FastWan2.2-TI2V-5B-FullAttn-Diffusers
Space: KingNish/wan2-2-fast
Model: FastVideo/FastWan2.2-TI2V-5B-FullAttn-Diffusers
Space: KingNish/wan2-2-fast
jeffboudier
posted an
update 8 months ago
Post
574
AMD summer hackathons are here!
A chance to get hands-on with MI300X GPUs and accelerate models.
🇫🇷 Paris - Station F - July 5-6
🇮🇳 Mumbai - July 12-13
🇮🇳 Bengaluru - July 19-20
Hugging Face and GPU Mode will be on site and on July 6 in Paris @ror will share lessons learned while building new kernels to accelerate Llama 3.1 405B on ROCm
Register to Paris event: https://lu.ma/fmvdjmur?tk=KeAbiP
All dates: https://lu.ma/calendar/cal-3sxhD5FdxWsMDIz
A chance to get hands-on with MI300X GPUs and accelerate models.
🇫🇷 Paris - Station F - July 5-6
🇮🇳 Mumbai - July 12-13
🇮🇳 Bengaluru - July 19-20
Hugging Face and GPU Mode will be on site and on July 6 in Paris @ror will share lessons learned while building new kernels to accelerate Llama 3.1 405B on ROCm
Register to Paris event: https://lu.ma/fmvdjmur?tk=KeAbiP
All dates: https://lu.ma/calendar/cal-3sxhD5FdxWsMDIz
jeffboudier
posted an
update 9 months ago
Post
1742
Today we launched Training Cluster as a Service, to make the new DGX Cloud Lepton supercloud easily accessible to AI researchers.
Hugging Face will collaborate with NVIDIA to provision and set up GPU training clusters to make them available for the duration of training runs.
Hugging Face organizations can sign up here: https://huggingface.misakanetworks.com/training-cluster
Hugging Face will collaborate with NVIDIA to provision and set up GPU training clusters to make them available for the duration of training runs.
Hugging Face organizations can sign up here: https://huggingface.misakanetworks.com/training-cluster
sumuks
authored a paper 9 months ago
jeffboudier
posted an
update 9 months ago
Post
2479
👏 Congrats @jinanz adding TimesFM times series forecasting to Transformers!
Learn how to use TimesFM in this blog post by the Nutanix team: https://huggingface.misakanetworks.com/blog/Nutanix/introducing-timesfm-for-time-series-forcasting
Learn how to use TimesFM in this blog post by the Nutanix team: https://huggingface.misakanetworks.com/blog/Nutanix/introducing-timesfm-for-time-series-forcasting
1024m
authored a paper 9 months ago
jeffboudier
posted an
update 9 months ago
Post
505
Wrapping up a week of shipping and announcements with Dell Enterprise Hub now featuring AI Applications, on-device models for AI PCs, a new CLI and Python SDK... all you need for building AI on premises!
Blog post has all the details: https://huggingface.misakanetworks.com/blog/dell-ai-applications
Blog post has all the details: https://huggingface.misakanetworks.com/blog/dell-ai-applications