A small team of nine researchers at Sina Weibo — best known for its microblogging platform — published a technical report on arXiv that has the AI community debating efficiency versus scale. Their model, VibeThinker-3B, reportedly achieves reasoning scores that rival or exceed those of vastly larger systems from Google DeepMind, OpenAI, Anthropic, and DeepSeek.

With just 3 billion parameters, VibeThinker-3B scored 94.3 on the American Invitational Mathematics Examination (AIME) 2026, a notoriously difficult math competition. That result sits alongside DeepSeek V3.2, a 671-billion-parameter model, and ahead of Gemini 3 Pro's 91.7. Using a test-time scaling technique called Claim-Level Reliability Assessment, the score rose to 97.1, edging past most public records.

The paper quickly drew attention: 62 upvotes on Hugging Face's daily papers feed, 130 likes on the model repository, and activity on GitHub. The claim challenges the prevailing assumption that large parameter counts are necessary for top-tier reasoning, sparking debate on benchmark validity and true intelligence in AI.

Some researchers question whether standardized math tests are a reliable measure of general reasoning, and whether these results can be replicated independently. The achievement also underscores the growing AI capability from Chinese firms, which have been closing the gap despite export restrictions on advanced chips.

Sina Weibo — primarily a social media company — has not previously been a leader in foundational AI research, making this paper a surprising signal of how talent and resources are spreading across the industry.