Alibaba has expanded its Qwen3.5 model family with 3 new models - the 27B model is a standout, scoring 42 on the Artificial Analysis Intelligence Index and matching open weights models 8-25x its size @Alibaba_Qwen has expanded the Qwen3.5 family with three new models alongside the 397B flagship released earlier this month: the Qwen3.5 27B (Dense, scoring 42 on Intelligence Index), Qwen3.5 122B A10B (MoE, 42), and Qwen3.5 35B A3B (MoE, 37). The two MoE (Mixture-of-Experts) models only activate a fraction of the total parameters per forward pass (10B of 122B and ~3B of 35B respectively). The Intelligence Index is our synthesis metric incorporating 10 evaluations covering general reasoning, agentic tasks, coding, and scientific reasoning. All models are Apache 2.0 licensed, natively support 262K context, and return to the unified thinking/non-thinking hybrid architecture from the original Qwen3, after Alibaba moved to separate Instruct and Reasoning checkpoints with the Qwen3 2507 updates. Key benchmarking results for the reasoning variants: ➤ Qwen3.5 27B scores 42 on Intelligence Index and is the most intelligent model under 230B. The nearest model of similar size is GLM-4.7-Flash (31B total, 3B active) which scores 30. Open weights models of equivalent intelligence are 8-25x larger in terms of total parameters: MiniMax-M2.5 (230B, 42), DeepSeek V3.2 (685B, 42), and GLM-4.7 (357B, 42). In FP8 precision it takes ~27GB to store the model weights, while in 4-bit quantization you can use laptop quality hardware with 16GB+ of RAM ➤ Qwen3.5 27B scores 1205 on GDPval-AA (Agentic Real-World Work Tasks), placing it alongside larger models. For context, MiniMax-M2.5 scores 1206, GLM-4.7 (Reasoning) scores 1200, and DeepSeek V3.2 (Reasoning) scores 1194. This is particularly notable for a 27B parameter model and suggests strong agentic capability for its size. GDPval-AA tests models on real-world tasks across 44 occupations and 9 major industries ➤ AA-Omniscience remains a relative weakness across the Qwen3.5 family, driven primarily by lower accuracy rather than hallucination rate. Qwen3.5 27B scores -42 on AA-Omniscience, comparable to MiniMax-M2.5 (-40) but behind DeepSeek V3.2 (-21) and GLM-4.7 (-35). Although Qwen3.5 27B's hallucination rate (80%) is lower than peers (GLM-4.7 90%, MiniMax 89%, DeepSeek 82%), its accuracy is also lower at 21% vs 34% for DeepSeek V3.2 and 29% for GLM-4.7. This is likely a consequence of model size - we have generally observed that models with more total parameters perform better on accuracy in AA-Omniscience, as broader knowledge recall benefits from larger parameter counts ➤ Qwen3.5 27B is equivalently intelligent to Qwen3.5 122B A10B. The 122B A10B is a Mixture-of-Experts model that only activates 10B of its 122B total parameters per forward pass. The 27B model leads in GDPval-AA (1205 Elo vs 1145 Elo) and slightly on TerminalBench (+1.5 p.p.), while the 122B model leads on SciCode (+2.5 p.p.), HLE (+1.2 p.p.), and has a lower hallucination rate (Omniscience -40 vs -42) ➤ Qwen3.5 35B A3B (Reasoning, 37) is the most intelligent model with ~3B active parameters, 7 points ahead of GLM-4.7-Flash (30). Other models in this ~3B active category include Qwen3 Coder Next (80B total, 28), Qwen3 Next 80B A3B (27), and NVIDIA Nemotron 3 Nano 30B A3B (24) ➤ Qwen3.5 27B used 98M output tokens to run the Intelligence Index, costing ~$299 via Alibaba Cloud API. This is notably high token usage compared to models at similar intelligence: MiniMax-M2.5 (56M), DeepSeek V3.2 (61M), and even the larger Qwen3.5 397B (86M). Other information: ➤ Context window: 262K tokens (extendable to 1M via YaRN) ➤ License: Apache 2.0 ➤ API pricing (Alibaba Cloud): 397B: $0.60/$3.60, 122B: $0.40/$3.20, 27B: $0.30/$2.40, 35B A3B: $0.25/$2.00 per 1M input/output tokens
Qwen3.5 27B stands out for agentic capability at its model size. With an Elo of 1205 on GDPval-AA, it matches models with 8-25x more total parameters and trails the 397B flagship (1208) by only 3 points despite being ~14x smaller.
Among open weights models with 40B total parameters or less, Qwen3.5 27B and 35B A3B stand out as the clear leaders in the Intelligence Index. The next most intelligent model in this size category is GLM-4.7-Flash (30)
Compare the full Qwen3.5 family with other leading models at: Qwen3.5 27B HuggingFace Repository:
3.7K