We trained our @LiquidAI_ LFM2-350M model 1400x beyond "compute optimal" > Chinchilla scaling laws: ~20 tokens per param > LFM2-350M: ~28,000 tokens per param (1400x more) Why? Because Chinchilla only concerns training compute, while we care about inference cost