Nerdy Face

Team

company

https://huggingface.misakanetworks.com/

Activity Feed Request to join this org

AI & ML interests

None defined yet.

Recent Activity

julien-c submitted a paper 27 days ago

Shaping capabilities with token-level data filtering

stefan-it submitted a paper 28 days ago

FineInstructions: Scaling Synthetic Instructions to Pre-Training Scale

kenobi authored a paper 2 months ago

On Invariance Penalties for Risk Minimization

View all activity

boroll2347

posted an update 10 days ago

Post

169

test a post
test.

1 reply

KingNish

posted an update 3 months ago

Post

3034

Muon vs MuonClip vs Muon+Adamw

Muon has gone from an experiment to a mainstream optimizer, but does it hold up for fine‑tuning? We ran head‑to‑head tests on Qwen3‑4B (10k+ high‑quality instruction rows) to find out.

Short story: Pure Muon converged fastest at the start, but its gradient‑norm spikes made training unstable. MuonClip (Kimi K2’s clipping) stabilizes long pretraining runs, yet in our small‑scale fine‑tune it underperformed, lower token accuracy and slower convergence. The winner was the hybrid: Muon for 2D layers + AdamW for 1D layers. It delivered the best balance of stability and final performance and even beat vanilla AdamW.

Takeaway: for small-scale fine-tuning, hybrid = practical and reliable.

Next Step: scale to larger models/datasets to see if Muon’s spikes become catastrophic or if clipping wins out.

Full Blog Link: https://huggingface.misakanetworks.com/blog/KingNish/optimizer-part1

KingNish

posted an update 3 months ago

Post

2663

I tested Muon vs MuonClip vs Muon+AdamW for fine-tuning LLMs
Just published a blog on that, Read here 👉 https://huggingface.misakanetworks.com/blog/KingNish/optimizer-part1

1 reply

mrfakename

posted an update 3 months ago

Post

15996

Excited to share that I've joined the Hugging Face Fellows program! 🤗

Looking forward to contributing to & working more closely with the open-source ecosystem - huge thanks to everyone who's supported me on this journey! 🚀

adamm-hf

posted an update 4 months ago

Post

1112

The #1 trending AI/ML dataset today 🏆

Massive scale, diversity and end-to-end potential from nvidia !
nvidia/PhysicalAI-Autonomous-Vehicles

adamm-hf

posted an update 4 months ago

Post

742

The new King 👑has arrived!

Moonshot AI now the top model on Hugging Face 🔥
moonshotai/Kimi-K2-Thinking

adamm-hf

posted an update 4 months ago

Post

2823

💸🤑You don’t need 100 GPUs to train something amazing!

Our Smol Training Playbook teaches you a better path to world-class LLMs, for free!

Check out the #1 trending space on 🤗 :
HuggingFaceTB/smol-training-playbook

mrfakename

posted an update 4 months ago

Post

6267

Trained a model for emotion-controllable TTS based on MiMo audio on LAION's dataset.

Still very early and does have an issue with hallucinating but results seem pretty good so far, given that it is very early into the training run.

Will probably kick off a new run later with some settings tweaked.

Put up a demo here: https://huggingface.misakanetworks.com/spaces/mrfakename/EmoAct-MiMo

(Turn 🔊 on to hear audio samples)

5 replies

adamm-hf

posted an update 5 months ago

Post

2323

Cool stuff these past weeks on huggingface! 🤗 🚀 !
• 📈Trackio, local-first W&B alternative
https://github.com/gradio-app/trackio/issues
• 🌍EmbeddingGemma, 300M-param, multilingual embeddings, on-device
https://huggingface.misakanetworks.com/blog/embeddinggemma
• 💻Open LLMs in VS Code (Inference Providers)
https://x.com/reach_vb/status/1966185427582497171
• 🤖Smol2Operator GUI agents
https://huggingface.misakanetworks.com/blog/smol2operator
• 🖼️Gradio visible watermarking
https://huggingface.misakanetworks.com/blog/watermarking-with-gradio

jeffboudier

posted an update 6 months ago

Post

3194

Quick 30s demo of the new Hub > Azure AI integration to deploy HF models in your own Azure account. Now with Py and CLI!

GG @alvarobartt @kramp @pagezyhf

1024m

authored 2 papers 7 months ago

Query Attribute Modeling: Improving search relevance with Semantic Search and Meta Data Filtering

Paper • 2508.04683 • Published Aug 6, 2025

DSBC : Data Science task Benchmarking with Context engineering

Paper • 2507.23336 • Published Jul 31, 2025 • 2

KingNish

posted an update 7 months ago

Post

2215

Wan 2.2 fast upto 10x faster than original wan 2.2

Model: FastVideo/FastWan2.2-TI2V-5B-FullAttn-Diffusers

Space: KingNish/wan2-2-fast

jeffboudier

posted an update 8 months ago

Post

574

AMD summer hackathons are here!
A chance to get hands-on with MI300X GPUs and accelerate models.
🇫🇷 Paris - Station F - July 5-6
🇮🇳 Mumbai - July 12-13
🇮🇳 Bengaluru - July 19-20

Hugging Face and GPU Mode will be on site and on July 6 in Paris @ror will share lessons learned while building new kernels to accelerate Llama 3.1 405B on ROCm

Register to Paris event: https://lu.ma/fmvdjmur?tk=KeAbiP
All dates: https://lu.ma/calendar/cal-3sxhD5FdxWsMDIz

jeffboudier

posted an update 9 months ago

Post

1742

Today we launched Training Cluster as a Service, to make the new DGX Cloud Lepton supercloud easily accessible to AI researchers.

Hugging Face will collaborate with NVIDIA to provision and set up GPU training clusters to make them available for the duration of training runs.

Hugging Face organizations can sign up here: https://huggingface.misakanetworks.com/training-cluster

KingNish

posted an update 9 months ago

Post

1216

What's currently the biggest gap in Open Source Datasets ??

5 replies

sumuks

authored a paper 9 months ago

PIPA: A Unified Evaluation Protocol for Diagnosing Interactive Planning Agents

Paper • 2505.01592 • Published May 2, 2025

jeffboudier

posted an update 9 months ago

Post

2479

👏 Congrats @jinanz adding TimesFM times series forecasting to Transformers!

Learn how to use TimesFM in this blog post by the Nutanix team: https://huggingface.misakanetworks.com/blog/Nutanix/introducing-timesfm-for-time-series-forcasting

1024m

authored a paper 9 months ago

Uncovering Cultural Representation Disparities in Vision-Language Models

Paper • 2505.14729 • Published May 20, 2025 • 1

jeffboudier

posted an update 9 months ago

Post

505

Wrapping up a week of shipping and announcements with Dell Enterprise Hub now featuring AI Applications, on-device models for AI PCs, a new CLI and Python SDK... all you need for building AI on premises!

Blog post has all the details: https://huggingface.misakanetworks.com/blog/dell-ai-applications

AI & ML interests

Recent Activity

Team members 547

nerdyface's activity