Author: Andrej Karpathy Compiled by: Tim, PANews 2025 will be a year of rapid development and great uncertainty for large language models, and we have achieved Author: Andrej Karpathy Compiled by: Tim, PANews 2025 will be a year of rapid development and great uncertainty for large language models, and we have achieved

Six major AI paradigm shifts in 2025: From RLVR training and Vibe Coding to Nano Banana

2025/12/22 17:24

Author: Andrej Karpathy

Compiled by: Tim, PANews

2025 will be a year of rapid development and great uncertainty for large language models, and we have achieved fruitful results. Below are some "paradigm shifts" that I personally find noteworthy and somewhat surprising, changes that have altered the landscape and impressed me, at least conceptually.

1. Reinforcement Learning Based on Verifiable Rewards (RLVR)

In early 2025, the LLM production stacks of all AI labs will roughly take the following form:

  • Pre-training (GPT-2/3 in 2020);
  • Supervise fine-tuning (InstructGPT 2022);
  • And reinforcement learning based on human feedback (RLHF, 2022).

For a long time, this has been a stable and mature technology stack for training production-grade large language models. By 2025, reinforcement learning based on verifiable rewards had become the core technology primarily adopted. By training large language models in various environments with automatically verifiable rewards (such as mathematical and programming problem-solving), these models can spontaneously form strategies that resemble "reasoning" in human terms. They learn to break down problem-solving into intermediate computational steps and master multiple strategies for solving problems through iterative deduction (see the example in the DeepSeek-R1 paper). In previous stacks, these strategies were difficult to implement because the optimal reasoning path and backtracking mechanism were not explicit for large language models, and solutions had to be explored through reward optimization to find suitable solutions.

Unlike supervised fine-tuning and human-feedback-based reinforcement learning (these two phases are relatively short and involve minimal computational cost), reinforcement learning based on verifiable rewards involves long-term optimization training on an objective, non-game-theoretic reward function. It has been proven that running reinforcement learning based on verifiable rewards can deliver significant performance improvements at a given cost, consuming a large amount of computational resources originally intended for pre-training. Therefore, the advancements in large language model capabilities in 2025 will primarily be reflected in major AI labs absorbing the enormous computational demands brought about by this new technology. Overall, we see that while the model size remains roughly the same, the training time for reinforcement learning has been significantly extended. Another unique aspect of this new technology is that we have gained a completely new dimension of regulation (and the corresponding Scaling theorem), namely, controlling model capability as a function of computational cost during testing by generating longer inference trajectories and increasing "thinking time." OpenAI's o1 model (released at the end of 2024) was the first demonstration of a reinforcement learning model based on verifiable rewards, while the release of o3 (early 2025) is a clear turning point, allowing a visibly significant leap forward.

2. Ghostly Intelligence vs. Animal-like Sawtooth Intelligence

2025 marked the first time I (and I believe the entire industry) began to understand the "form" of large language model intelligence from a more intuitive perspective. We are not "evolving and breeding animals," but rather "summoning ghosts." The entire technology stack of large language models (neural architecture, training data, training algorithms, and especially optimization objectives) is fundamentally different. Therefore, it's not surprising that we are obtaining entities in the realm of intelligence that are vastly different from biological intelligence; it's inappropriate to examine them from an animalistic perspective. From the perspective of supervised information, human neural networks are optimized for tribal survival in jungle environments, while large language model neural networks are optimized to mimic human text, obtain rewards in mathematical problems, and win human approval in arenas. As verifiable domains provide the conditions for reinforcement learning based on verifiable rewards, the capabilities of large language models in these domains will "suddenly increase," exhibiting an interesting, jagged performance characteristic overall. They may be both erudite geniuses and confused, cognitively challenged elementary school students, potentially leaking your data under duress.

Human intelligence: blue; AI intelligence: red. I like this version of the meme (sorry I can't find the original post on Twitter) because it points out that human intelligence also presents itself in a unique, jagged wave pattern.

Relatedly, in 2025 I developed a general indifference and distrust towards various benchmarks. The core issue is that benchmarks are essentially verifiable environments, making them highly susceptible to reinforcement learning based on verifiable rewards and weaker forms generated from synthetic data. In the typical "score maximization" process, large language model teams inevitably construct training environments near the small embedding spaces of the benchmarks and cover these areas with "capability jaggedness." "Training on the test set" has become the new normal.

So what if it sweeps all benchmark tests but still fails to achieve general artificial intelligence?

3. Cursor: A new layer for LLM applications

What impressed me most about Cursor (besides its rapid rise this year) is its compelling revelation of a new hierarchy of “LLM applications” as people start talking about “Cursor for XX domain.” As I emphasized in my Y Combinator talk this year, the core of LLM applications like Cursor lies in integrating and orchestrating LLM calls for a specific vertical domain:

  • They are responsible for "context engineering";
  • At the underlying level, multiple LLM calls are orchestrated into increasingly complex directed acyclic graphs, with a fine balance between performance and cost; and application-specific graphical interfaces are provided for people in the "human loop".
  • It also provides an "autonomous adjustment slider".

By 2025, there had been extensive discussions surrounding the development potential of this emerging application layer. Will large language model platforms dominate all applications, or will there still be vast possibilities for large language model applications? My personal prediction is that the positioning of large language model platforms will gradually converge towards cultivating "generalist university graduates," while large language model applications will be responsible for organizing and refining these "graduates," and by providing private data, sensors, actuators, and feedback loops, enabling them to truly become "professional teams" that can be deployed in specific vertical fields.

4. Claude Code: AI running locally

The emergence of Claude Code convincingly demonstrates for the first time the form of LLM agents, combining tool use with inference processes in a cyclical manner to achieve more persistent and complex problem-solving. Furthermore, what impressed me most about Claude Code is that it runs on the user's personal computer, deeply integrated with the user's private environment, data, and context. I believe OpenAI's assessment in this direction is somewhat flawed, as they have focused their code assistant and agent development on cloud deployment—specifically, containerized environments orchestrated by ChatGPT—rather than local localhost environments. While cloud-running agent clusters seem to represent the "ultimate form of general artificial intelligence," we are currently in a transitional phase characterized by uneven capability development and relatively slow progress. Under these circumstances, deploying agents directly on local computers, closely collaborating with developers and their specific work environments, is a more logical path. Claude Code accurately grasps this priority and encapsulates it in a concise, elegant, and highly attractive command-line tool, thus reshaping how AI is presented. It is no longer just a website like Google that needs to be accessed, but a tiny sprite or ghost "residing" in your computer. This is a completely new and unique paradigm for interacting with AI.

5. Vibe Coding - An environment for programming

By 2025, AI will have crossed a critical capability threshold, making it possible to build amazing programs using only English descriptions, without even needing to understand the underlying code. Interestingly, I coined the term "Vibe Coding" in a casual tweet during a shower, never imagining it would evolve to its current state. In the Vibe Coding paradigm, programming is no longer strictly confined to highly trained professionals, but becomes something everyone can participate in. From this perspective, it's yet another example of the phenomenon I described in my article, "Empowering People: How Large Language Models Are Changing Technology Diffusion Patterns." In stark contrast to all other technologies to date, ordinary people benefit more from large language models than professionals, businesses, and governments. But Vibe Coding not only empowers ordinary people to access programming but also empowers professional developers to write more software that "would never have been implemented otherwise." While developing nanochat, I used Vibe Coding to write a custom, efficient BPE tokenizer in Rust, without relying on existing libraries or delving into Rust. This year, I also used Vibe Coding to quickly prototype several projects, simply to verify the feasibility of certain ideas. I've even written entire one-off applications just to pinpoint a specific vulnerability, because code suddenly becomes free, ephemeral, malleable, and disposable. Atmospheric programming will reshape the software development ecosystem and profoundly change the boundaries of career definitions.

6. Nano Banana: LLM graphical interface

Google's Gemini Nano banana is one of the most disruptive paradigm shifts of 2025. In my view, Large Language Models (LLMs) represent the next major computing paradigm after computers of the 1970s and 80s. Therefore, we will see similar innovations based on similar fundamental reasons, resembling the evolution of personal computing, microcontrollers, and even the internet. Especially at the level of human-computer interaction, the current "dialogue" mode with LLMs is somewhat similar to inputting commands into computer terminals in the 1980s. Text is the most primitive form of data representation for computers (and LLMs), but it is not the preferred method for humans (especially when inputting). Humans actually dislike reading text; it is slow and laborious. Instead, humans prefer to receive information through visual and spatial dimensions, which is why graphical user interfaces (GUIs) were born in traditional computing. Similarly, Large Language Models should communicate with us in forms that humans prefer, through images, infographics, slides, whiteboards, animations, videos, web applications, and other media. Early forms have already achieved this through "visual text decorations" such as emojis and Markdown (e.g., headings, bolding, lists, tables, and other typographical elements). But who will ultimately build the graphical interface for a large language model? From this perspective, nano banana is an early prototype of this future blueprint. It is worth noting that nano banana's breakthrough lies not only in its image generation capabilities, but also in its comprehensive capabilities formed by the interweaving of text generation, image generation, and world knowledge in the model weights .

Market Opportunity
SIX Logo
SIX Price(SIX)
$0.01191
$0.01191$0.01191
0.00%
USD
SIX (SIX) Live Price Chart
Disclaimer: The articles reposted on this site are sourced from public platforms and are provided for informational purposes only. They do not necessarily reflect the views of MEXC. All rights remain with the original authors. If you believe any content infringes on third-party rights, please contact service@support.mexc.com for removal. MEXC makes no guarantees regarding the accuracy, completeness, or timeliness of the content and is not responsible for any actions taken based on the information provided. The content does not constitute financial, legal, or other professional advice, nor should it be considered a recommendation or endorsement by MEXC.

You May Also Like

Robert W. Baird & Co. Discloses Core AI Design Parameters and Launches Public Testing of Baird NEUROFORGE™ Equity AI

Robert W. Baird & Co. Discloses Core AI Design Parameters and Launches Public Testing of Baird NEUROFORGE™ Equity AI

New York, United States (PinionNewswire) — Robert W. Baird & Co. (“Baird”) today announced the public disclosure of selected core system design parameters of its
Share
AI Journal2025/12/23 02:16
Best Crypto to Buy as Saylor & Crypto Execs Meet in US Treasury Council

Best Crypto to Buy as Saylor & Crypto Execs Meet in US Treasury Council

The post Best Crypto to Buy as Saylor & Crypto Execs Meet in US Treasury Council appeared on BitcoinEthereumNews.com. Michael Saylor and a group of crypto executives met in Washington, D.C. yesterday to push for the Strategic Bitcoin Reserve Bill (the BITCOIN Act), which would see the U.S. acquire up to 1M $BTC over five years. With Bitcoin being positioned yet again as a cornerstone of national monetary policy, many investors are turning their eyes to projects that lean into this narrative – altcoins, meme coins, and presales that could ride on the same wave. Read on for three of the best crypto projects that seem especially well‐suited to benefit from this macro shift:  Bitcoin Hyper, Best Wallet Token, and Remittix. These projects stand out for having a strong use case and high adoption potential, especially given the push for a U.S. Bitcoin reserve.   Why the Bitcoin Reserve Bill Matters for Crypto Markets The strategic Bitcoin Reserve Bill could mark a turning point for the U.S. approach to digital assets. The proposal would see America build a long-term Bitcoin reserve by acquiring up to one million $BTC over five years. To make this happen, lawmakers are exploring creative funding methods such as revaluing old gold certificates. The plan also leans on confiscated Bitcoin already held by the government, worth an estimated $15–20B. This isn’t just a headline for policy wonks. It signals that Bitcoin is moving from the margins into the core of financial strategy. Industry figures like Michael Saylor, Senator Cynthia Lummis, and Marathon Digital’s Fred Thiel are all backing the bill. They see Bitcoin not just as an investment, but as a hedge against systemic risks. For the wider crypto market, this opens the door for projects tied to Bitcoin and the infrastructure that supports it. 1. Bitcoin Hyper ($HYPER) – Turning Bitcoin Into More Than Just Digital Gold The U.S. may soon treat Bitcoin as…
Share
BitcoinEthereumNews2025/09/18 00:27
BlackRock boosts AI and US equity exposure in $185 billion models

BlackRock boosts AI and US equity exposure in $185 billion models

The post BlackRock boosts AI and US equity exposure in $185 billion models appeared on BitcoinEthereumNews.com. BlackRock is steering $185 billion worth of model portfolios deeper into US stocks and artificial intelligence. The decision came this week as the asset manager adjusted its entire model suite, increasing its equity allocation and dumping exposure to international developed markets. The firm now sits 2% overweight on stocks, after money moved between several of its biggest exchange-traded funds. This wasn’t a slow shuffle. Billions flowed across multiple ETFs on Tuesday as BlackRock executed the realignment. The iShares S&P 100 ETF (OEF) alone brought in $3.4 billion, the largest single-day haul in its history. The iShares Core S&P 500 ETF (IVV) collected $2.3 billion, while the iShares US Equity Factor Rotation Active ETF (DYNF) added nearly $2 billion. The rebalancing triggered swift inflows and outflows that realigned investor exposure on the back of performance data and macroeconomic outlooks. BlackRock raises equities on strong US earnings The model updates come as BlackRock backs the rally in American stocks, fueled by strong earnings and optimism around rate cuts. In an investment letter obtained by Bloomberg, the firm said US companies have delivered 11% earnings growth since the third quarter of 2024. Meanwhile, earnings across other developed markets barely touched 2%. That gap helped push the decision to drop international holdings in favor of American ones. Michael Gates, lead portfolio manager for BlackRock’s Target Allocation ETF model portfolio suite, said the US market is the only one showing consistency in sales growth, profit delivery, and revisions in analyst forecasts. “The US equity market continues to stand alone in terms of earnings delivery, sales growth and sustainable trends in analyst estimates and revisions,” Michael wrote. He added that non-US developed markets lagged far behind, especially when it came to sales. This week’s changes reflect that position. The move was made ahead of the Federal…
Share
BitcoinEthereumNews2025/09/18 01:44