AI Week in Review 25.11.29

AI Week in Review 25.11.29


A group of people sitting around a table with a birthday cake

AI-generated content may be incorrect.

Figure 1. ChatGPT is three years old! Sam Altman, Mira Murati, Greg Brockman, and Ilya Sutskever were responsible for making it happen, but now Ilya Sutskever and Mira Murati head up their own AI startups, so don’t expect a real reunion. Image by Nano Banana Pro.

This week Anthropic released Claude Opus 4.5, aiming to retake the ‘best AI model in the world’ crown back from GPT-5.1 and Gemini 3 Pro, especially for coding. According to benchmarks, Claude Opus 4.5 is SOTA on coding and complex agentic tasks: it achieves 80.9% on SWE-bench Verified, surpassing GPT-5.1 (77.9%) and Gemini 3 Pro (76.2%); it is SOTA on Terminal-bench 2.0, OSWorld (computer use), and tau-bench agentic benchmarks.

A graph of different colored bars

AI-generated content may be incorrect.

Figure 2. Claude Opus 4.5 beats Gemini 3 Pro and even GPT-5.1-Codex-Max on SWE-bench. Claude is still the coding king.

Claude 4.5 Opus is the most advanced AI coding model yet, able to “one-shot a massive bug.” Gene Dai says:

Opus 4.5 has this decisive vibe – it just goes ahead and DOES things first, then tells you what it did.

Claude 4.5 Opus pricing is now $5 per million input tokens and $25 per million output tokens. While it’s one-third the cost of the prior Opus model, this is still significantly higher than Gemini 3 Pro. With the release of Claude 4.5 Opus, all top 3 AI companies have refreshed their best AI models in the past two weeks, setting up a new higher standard for frontier AI models.

If you can’t decide which one to use, McKay Wrigley has a flow to try all 3 coding models at once and pick the winner.

Prime Intellect unveiled INTELLECT-3, a 106B parameter Mixture-of-Experts model (with 12B active parameters) based on GLM-4.5-Air and trained using RL post-training for math, coding, science and reasoning. INTELLECT-3 has best-in-class benchmarks for its size: ~90% on AIME 2024/2025; 69% on LiveCodeBench v6 for coding; and 74% on GPQA-Diamond. The open-source model was trained on 512 H200 GPUs over two months using Prime Intellect’s end-to-end stack; methods were shared in the INTELLECT-3 Technical Report. Model weights are on Hugging Face.

Black Forest Labs released FLUX.2, a suite of image-generation models that support multi-reference image generation. FLUX.2 allows users to provide up to 10 reference images so that generated outputs maintain consistent character style or specific objects. The suite includes FLUX.2 [pro] for state-of-the-art generation, FLUX.2 [flex] for more steps and guidance, and FLUX.2 [dev] a 32B open-weight model derived from FLUX.2 base model.

FLUX.2 model uses the Mistral-3 24B parameter vision-language model (VLM) to capture spatial relationships and visual reasoning. It also outputs native 4-megapixel images, making it a high-resolution generation tool for art or design tasks. FLUX.2 image generation quality based on ELO scores are close to Nano Banana Pro, but it costs only a third as much to generate each image.

A collage of a person and a dog

AI-generated content may be incorrect.

Figure 3. FLUX.2 performs a multi-reference image generation, merging a woman at her computer, a dog, and a living room with a black couch into a single shot.

Microsoft Research released Fara-7B, a 7B parameter computer-use agent AI model fine-tuned from Qwen 2.5. Fara-7B predicts mouse and keyboard actions from screenshots and achieves 73.5% on WebVoyager benchmark. It is small enough to execute on a single user device, enabling local tasks like web browsing, clicking buttons, or booking flights without sending data to the cloud.

Tongyi Lab released Z-Image Turbo, a 6B parameter text-to-image generation model optimized for speed. Z-Image Turbo can generate images in sub-second time on H800 GPUs, and it fits comfortably in 16Gb consumer GPUs). The model is extremely fast, but it produces photorealistic images on par with those from larger prior models and is good at complex text rendering. The model is open-source under Apache license with weights available on HuggingFace.

A collage of people in different poses

AI-generated content may be incorrect.

Figure 4. Z-Image Turbo image generation examples, showing excellent photo-realism in an efficient open AI image generation model.

Tencent released HunyuanVideo 1.5, an open-source video generation model that produces top-tier video with only 8.3B parameters, making it suitable for local use with consumer GPUs. It uses an 8.3B parameter Diffusion Transformer (DiT) backbone with a 3D causal VAE for latent space compression, enabling generation of up to 10 second video clips at 720p with coherent motion and physics. There is a built-in path to up-sample output to 1080p for higher-quality video. Tencent published a HunyuanVideo 1.5 Technical Report and shared model weights on HuggingFace.

Character.AI launched Stories, a feature for users to create and share adventures based on their favorite Characters. Users pick characters, a genre, and a premise, and the story evolves from choices the user makes about what happens next. The preview of how AI-generated video games will work.

Tencent released HunyuanOCR, a 1B parameter model focused on optical character recognition (OCR). HunyuanOCR is a lightweight alternative to general VLMs that outperforms them on various OCR tasks, such as parsing PDFs, other types of documents, and video subtitles. It scores 860 on OCRBench, outperforming much larger models such as Qwen3-VL-72B.

LTX Studio launched Retake, a new video editing feature providing precise control over AI-generated video content by allowing users to edit specific shots without regenerating the entire clip. Users can now rephrase dialogue, adjust emotional tone, or refine character performance while maintaining the continuity and consistency of the original scene. This reduces the time and cost of iterative generation and streamlines AI video editing.

Google has rolled out agentic calling in the US, where users delegate AI agents to make phone calls to local businesses directly through Google Search. With this “Let Google call” option, the conversational AI agent can be used to verify pricing or product or service availability and report back to the user. This can streamline shopping tasks.

Perplexity has launched a new Memory feature for its AI assistants, allowing the system to retain key details, preferences, and context across multiple conversations. This update enables the AI to offer more personalized and relevant responses over time, as it gets to know the user better. As with memory features from other services, the feature is designed with privacy controls, giving users the ability to manage or delete stored memories as needed.

Perplexity also launched a Shop with Perplexity feature, using AI and personalization features to improve the shopping experience.

Alibaba launched their Quark AI Glasses, putting these smart glasses on sale this week in China, with the S1 priced at $536. The Quark AI Glasses are powered by the Qwen AI models and linked to the Qwen app for real-time assistance. These wearables compete with Meta’s Ray-Ban smart glasses and other Chinese rivals.

NVIDIA AI released Nemotron-Elastic-12B, an experimental LLM that allows for smaller nested variants of the original AI model without extra training cost. This model uses a hybrid architecture with mostly Mamba-2 and MLP layers combined with four Attention layers, designed to enable elastic inference through nested model extraction. The Elastic Architecture enables the extraction of smaller, nested variants (6B and 9B parameters) from the same parameter space without requiring separate training runs. Details are described in the paper “Nemotron Elastic: Towards Efficient Many-in-One Reasoning LLMs.”

Anthropic published a study estimating productivity gains from AI Adoption, projecting that widely adopting current-generation AI could nearly double US labor productivity growth over the next decade. The research, based on data from 100,000 anonymized conversations with its model Claude, estimates that AI can reduce the time required for many tasks (that would take on average 90 minutes) by up to 80%. The report suggests an annual productivity increase of 1.8%, with particular benefits in areas such as software development and administration.

Google and OpenAI throttle Sora and Nano Banana Pro features due to high demand. The latest AI video and image models are proving to be both exceedingly popular and very compute-hungry.

Suno announced a partnership with Warner Music Group (WMG) to collaborate on the development of licensed AI music tools. This agreement resolves previous legal disputes and paves the way for new features that allow users to create music using licensed content from WMG’s vast catalog. The partnership aims to create new revenue streams for artists while establishing a framework for responsible AI training and use in creating music.

A humanoid robot AI bubble could be brewing says the head of China’s economic planning agency. He noted that investment levels are high despite few proven commercial uses of the robots yet. Over 150 Chinese companies are competing in the humanoid robot space.

This week, OpenAI’s ChatGPT marks the third anniversary of its release. As ChatGPT reaches this milestone, this article reflects on ChatGPT’s profound impact globally, on education, professional workflows, and daily communication since its November 2022 launch.

The ChatGPT moment will be remembered as the most pivotal moment in the history of AI, kicking off the AI revolution. It was just a simple chat-based front end on an improved conversational LLM, but that made ChatGPT the “killer app” for conversational AI. Its release set off a firestorm of interest that has fed accelerated progress in AI ever since.

In the past three years, ChatGPT has evolved from a novel chatbot into a ubiquitous utility, improving as an interface as underlying AI models improved: From GPT-4, then to GPT-4o with multi-modality, then the o1 and o3 reasoning models, and now GPT-5 and GPT-5.1. ChatGPT interface has added memory, image generation, access to web sources, coding and tools, and a lot more.

In the early days of the AI Changes Everything Substack, my AI weeklies included a “Look Back” section that recalled a past event in AI history. I stopped doing that because there is too much news to cover. The ChatGPT moment, however, is one ‘look back’ worthy of remembrance, destined for the history books.



Source link