AI & ML interests

None defined yet.

Recent Activity

boroll2347 
posted an update 10 days ago
view post
Post
169
test a post
test.
  • 1 reply
·
KingNish 
posted an update 3 months ago
view post
Post
3034
Muon vs MuonClip vs Muon+Adamw

Muon has gone from an experiment to a mainstream optimizer, but does it hold up for fine‑tuning? We ran head‑to‑head tests on Qwen3‑4B (10k+ high‑quality instruction rows) to find out.

Short story: Pure Muon converged fastest at the start, but its gradient‑norm spikes made training unstable. MuonClip (Kimi K2’s clipping) stabilizes long pretraining runs, yet in our small‑scale fine‑tune it underperformed, lower token accuracy and slower convergence. The winner was the hybrid: Muon for 2D layers + AdamW for 1D layers. It delivered the best balance of stability and final performance and even beat vanilla AdamW.

Takeaway: for small-scale fine-tuning, hybrid = practical and reliable.

Next Step: scale to larger models/datasets to see if Muon’s spikes become catastrophic or if clipping wins out.

Full Blog Link: https://huggingface.misakanetworks.com/blog/KingNish/optimizer-part1
KingNish 
posted an update 3 months ago
mrfakename 
posted an update 3 months ago
view post
Post
15996
Excited to share that I've joined the Hugging Face Fellows program! 🤗

Looking forward to contributing to & working more closely with the open-source ecosystem - huge thanks to everyone who's supported me on this journey! 🚀
adamm-hf 
posted an update 4 months ago
adamm-hf 
posted an update 4 months ago
adamm-hf 
posted an update 4 months ago
view post
Post
2823
💸🤑You don’t need 100 GPUs to train something amazing!

Our Smol Training Playbook teaches you a better path to world-class LLMs, for free!

Check out the #1 trending space on 🤗 :
HuggingFaceTB/smol-training-playbook
mrfakename 
posted an update 4 months ago
view post
Post
6267
Trained a model for emotion-controllable TTS based on MiMo audio on LAION's dataset.

Still very early and does have an issue with hallucinating but results seem pretty good so far, given that it is very early into the training run.

Will probably kick off a new run later with some settings tweaked.

Put up a demo here: https://huggingface.misakanetworks.com/spaces/mrfakename/EmoAct-MiMo

(Turn 🔊 on to hear audio samples)
·
adamm-hf 
posted an update 5 months ago
view post
Post
2323
Cool stuff these past weeks on huggingface! 🤗 🚀 !
• 📈Trackio, local-first W&B alternative
https://github.com/gradio-app/trackio/issues
• 🌍EmbeddingGemma, 300M-param, multilingual embeddings, on-device
https://huggingface.misakanetworks.com/blog/embeddinggemma
• 💻Open LLMs in VS Code (Inference Providers)
https://x.com/reach_vb/status/1966185427582497171
• 🤖Smol2Operator GUI agents
https://huggingface.misakanetworks.com/blog/smol2operator
• 🖼️Gradio visible watermarking
https://huggingface.misakanetworks.com/blog/watermarking-with-gradio
jeffboudier 
posted an update 6 months ago
view post
Post
3194
Quick 30s demo of the new Hub > Azure AI integration to deploy HF models in your own Azure account. Now with Py and CLI!

GG @alvarobartt @kramp @pagezyhf
KingNish 
posted an update 7 months ago
jeffboudier 
posted an update 8 months ago
view post
Post
574
AMD summer hackathons are here!
A chance to get hands-on with MI300X GPUs and accelerate models.
🇫🇷 Paris - Station F - July 5-6
🇮🇳 Mumbai - July 12-13
🇮🇳 Bengaluru - July 19-20

Hugging Face and GPU Mode will be on site and on July 6 in Paris @ror will share lessons learned while building new kernels to accelerate Llama 3.1 405B on ROCm

Register to Paris event: https://lu.ma/fmvdjmur?tk=KeAbiP
All dates: https://lu.ma/calendar/cal-3sxhD5FdxWsMDIz
jeffboudier 
posted an update 9 months ago
view post
Post
1742
Today we launched Training Cluster as a Service, to make the new DGX Cloud Lepton supercloud easily accessible to AI researchers.

Hugging Face will collaborate with NVIDIA to provision and set up GPU training clusters to make them available for the duration of training runs.

Hugging Face organizations can sign up here: https://huggingface.misakanetworks.com/training-cluster
KingNish 
posted an update 9 months ago
view post
Post
1216
What's currently the biggest gap in Open Source Datasets ??
·
jeffboudier 
posted an update 9 months ago
jeffboudier 
posted an update 9 months ago