Trending topics
#
Bonk Eco continues to show strength amid $USELESS rally
#
Pump.fun to raise $1B token sale, traders speculating on airdrop
#
Boop.Fun leading the way with a new launchpad on Solana.

Ramin Hasani
check this out 👌🏻

Sanchit mongaMar 5, 11:58
In just 48 hours at @RunAnywhereAI we built MetalRT: beating @Apple at their own game and delivering the FASTEST LLM inference engine on the market for Apple Silicon right now.
- 570 tok/s decode @liquidai LFM 2.5-1.2B 4-bit
- 658 tok/s decode @Alibaba_Qwen Qwen3-0.6B, 4-bit
- 6.6 ms time-to-first-token
- 1.19× faster than Apple's own MLX (identical model files)
- 1.67× faster than llama.cpp on average
We crushed Apple MLX, llama.cpp, uzu(by TryMirai), and Ollama across four different 4-bit models, including the on-device optimized LFM2.5-1.2B on a single M4 Max.
Excited for this one!
#ycombinator #runanywhere #ondeviceai #applesilicon #mlx

6
Truly impressive release of hybrid tiny models from the Qwen team as always!
People are asking how do they compare in speed, latency, and memory to @liquidai’s LFMs for on-device deployment?
Here is a quick profiling on Apple M3 Ultra:
> LFM2.5-1.2B is 52% faster in decode than the Qwen3.5-0.8B.
> LFM2-700M is 71% faster than Qwen3.5-0.8B on decode
> LFM2-2.6B has the same speed as Qwen3.5-2B on decode
> LFM2-700M uses 46% less peak memory than Qwen3.5-0.8B
> LFM2-2.6B uses 21% less peak memory than Qwen3.5-2B
> lfms prefill with the same parameter size is generally 12% faster than Qwen3.5
We designed LFM2 series with our hardware-in-the-loop meta AI design approach that allows us to find out the most efficient architecture for a given processor without quality sacrifice.
This test is done on Apple M3 Ultra, 512 GB unified memory
Config:
> 512 prompt tokens, 128 generation tokens,
> 5 trials per configuration
> Framework: MLX (mlx-lm / mlx-vlm)


QwenMar 2, 21:18
🚀 Introducing the Qwen 3.5 Small Model Series
Qwen3.5-0.8B · Qwen3.5-2B · Qwen3.5-4B · Qwen3.5-9B
✨ More intelligence, less compute.
These small models are built on the same Qwen3.5 foundation — native multimodal, improved architecture, scaled RL:
• 0.8B / 2B → tiny, fast, great for edge device
• 4B → a surprisingly strong multimodal base for lightweight agents
• 9B → compact, but already closing the gap with much larger models
And yes — we’re also releasing the Base models as well.
We hope this better supports research, experimentation, and real-world industrial innovation.
Hugging Face:
ModelScope:

121
Top
Ranking
Favorites
