The Rise and Fall of ONNX (feat. PyTorch 2.0)

This article explores the rise and fall of ONNX, from its early success as a unifying stasndard for AI frameworks to its gradual shift into a niche tool in the era of PyTorch 2.0.
Taesu Kim's avatar
Feb 06, 2025
The Rise and Fall of ONNX (feat. PyTorch 2.0)

Introduction

Imagine a time when artificial intelligence was like a bustling metropolis of disparate neighborhoods—each district spoke its own language, and developers found themselves constantly grappling with communication barriers between systems. In this complex landscape, a visionary solution emerged: ONNX, the Open Neural Network eXchange format. ONNX was conceived as a universal translator, a tool designed to bridge the gaps between a myriad of AI frameworks and deployment environments. Its promise was fascinating: to unify the diverse languages of deep learning into a single, coherent dialect that all systems could understand.
In the early days of deep learning, when computer vision applications were the rock stars of AI, ONNX was heralded as a game-changer. It enabled researchers and engineers to build models in one framework and deploy them on an entirely different platform without having to rewrite code from scratch. This interoperability was nothing short of revolutionary, simplifying what had once been a convoluted process.
However, as the technology landscape shifted—with the meteoric rise of large language models (LLMs) and the consolidation of AI development around PyTorch—ONNX’s once-prominent role began to change. Today, with modern advancements like PyTorch 2.0 and direct conversion pipelines like Torch-TensorRT, ONNX has become “one of many tools” — limited in scope and utility, primarily used for edge deployments or specialized scenarios. It still occupies a niche that underscores the rapid pace of innovation in the AI field, but is no longer at the center of every AI workflow.
In this post, we will journey through ONNX’s lifecycle: from its revolutionary inception during the computer vision boom to its gradual fade as more integrated solutions emerged, and finally, to its current status as a specialized yet indispensable tool in certain scenarios. Along the way, we will uncover the lessons that ONNX’s evolution offers for the future of AI tooling.

The Golden Era of ONNX

Figure 1. ONNX flow diagram demonstrating its interoperability [source]
Figure 1. ONNX flow diagram demonstrating its interoperability [source]

A Time of Diversity

In the early stages of deep learning, the AI community was a patchwork quilt of frameworks—each with its own set of features, advantages, and peculiarities. TensorFlow, Keras, Caffe, MXNet, and PyTorch, among others, competed for adoption, and each had its unique style and approach to model development. This period of rapid innovation and experimentation was both exciting and chaotic. Developers often found themselves switching between tools, attempting to leverage the strengths of each while wrestling with their limitations.
Deploying models across this diverse ecosystem posed a significant challenge. Imagine training a convolutional neural network (CNN) in one framework and then having to re-implement or adapt it entirely to deploy on another platform or hardware accelerator. The complexity was enormous and time-consuming, often leading to redundant efforts and inefficiencies. The industry was crying out for a unifying solution—a common intermediate representation that could translate models between the various languages of AI.

The Birth of ONNX

It was in 2017 that ONNX answered this call. Developed with the support of industry heavyweights like Microsoft and Facebook, ONNX was designed to be the bridge that connected disparate AI frameworks. Its goal was to create a standardized format for representing deep learning models, allowing developers to build in one environment and deploy in another with minimal friction.
The brilliance of ONNX lies in its design philosophy. It was created not as a competitor to existing frameworks but as a complementary tool that provided interoperability. With ONNX, a researcher could, for example, train a model in PyTorch—appreciated for its dynamic computation graphs and ease of use—and then export it into a format that could be run efficiently on devices optimized with NVIDIA’s TensorRT or using the ONNX Runtime. This flexibility made it an instant hit, particularly in the realm of computer vision, where models often had to be optimized for varying hardware platforms, ranging from high-powered GPUs to resource-constrained mobile devices.

The Ecosystem Around ONNX

The early success of ONNX was not just a result of its innovative design—it also benefited greatly from a vibrant ecosystem of support. Open-source communities quickly rallied around ONNX, integrating export functionalities into popular frameworks such as TensorFlow and PyTorch. Hardware vendors saw the promise in a standardized format and began incorporating ONNX compatibility into their acceleration libraries and inference engines. This synergy between software and hardware fostered an environment where ONNX could thrive, quickly becoming the lingua franca of AI deployment across a diverse set of platforms.
Developers found themselves empowered by this newfound flexibility. ONNX allowed for a smoother transition from development to production, reducing the barriers to deploying high-performance models on multiple devices. It was a period of optimism and rapid adoption—a golden era when ONNX was at the heart of AI innovation, unifying an otherwise fragmented industry.

The Decline

Figure 2. Dominance of PyTorch in Research Area [source]
Figure 2. Dominance of PyTorch in Research Area [source]

Shifting Priorities in the AI Landscape

As the field of artificial intelligence advanced, the focus of research began to shift dramatically. The early emphasis on computer vision gave way to the rise of LLMs and transformer-based architectures. These models, which include groundbreaking architectures like BERT and GPT, demanded a new level of computational efficiency and scalability. At the same time, a cultural shift took place within the research community, emphasizing simplicity, rapid prototyping, and iterative experimentation over rigid, framework-specific workflows.
PyTorch excelled here. Initially celebrated in academic fields for its intuitive, “Pythonic” design and dynamic computation paradigm, PyTorch quickly became the framework of choice for researchers. Its ease of use allowed scientists to experiment freely, iterate on models rapidly, and debug in real time. As these researchers moved into industry, they carried their preferences with them, cementing PyTorch’s position as the de facto standard for AI research and development.
Transformers further entrenched PyTorch’s dominance. While libraries like Hugging Face’s Transformers offered support for both PyTorch and TensorFlow, the overwhelming majority of the user base preferred PyTorch. This preference created a feedback loop: PyTorch transformers consistently received support for newer models and cutting-edge features faster than their TensorFlow counterparts. Researchers and developers working on state-of-the-art transformer-based models naturally gravitated toward the PyTorch ecosystem, which enabled them to stay at the forefront of innovation.

The Role of ONNX in a Changing Ecosystem

Even with PyTorch dominating the LLM landscape, ONNX still had the potential to remain relevant as an intermediate format for deployment workflows. PyTorch, while favored for its simplicity and support for eager execution, was historically less suited for deployment tasks. Eager execution, which evaluates operations immediately, often falls short in deployment scenarios where GPUs and NPUs perform significantly better with pre-compiled computation graphs. Therefore, many practitioners were still reliant on ONNX to convert their models into formats that could be optimized for production environments—particularly for edge and mobile devices where efficiency is paramount.
However, this period of coexistence was relatively brief. As the capabilities of PyTorch grew, the very limitations that ONNX had been designed to overcome began to vanish. PyTorch was evolving, and its development community was working tirelessly to integrate advanced deployment features directly into the framework. The traditional advantages offered by ONNX—primarily, the ability to serve as an intermediary between training frameworks and deployment options—began to erode, making it increasingly redundant in modern AI pipelines.

The Rise of PyTorch 2.0

The release of PyTorch 2.0 marked a turning point in the AI tooling landscape. This wasn’t just an incremental update—it represented a complete rethinking of how the framework could support both research and production workflows. With the introduction of new features like TorchDynamo and torch.compile, PyTorch 2.0 introduced automatic conversion of Python code into highly optimized computation graphs, eliminating many of the inefficiencies traditionally associated with eager execution.
Where ONNX once played a crucial role in converting dynamic models into static, deployment-friendly representations, PyTorch 2.0 now offers a native, streamlined alternative. This integrated approach enables developers to transition directly from research to production without the extra step of exporting and converting models. The brand-new graph compilation support has emerged as one of several very good options available for model deployment—providing a robust solution that many in the AI community now favor for its simplicity and efficiency.

PyTorch 2.0 and Direct Compilation

Figure 3. Brand-new Compilation Process of PyTorch 2.0 [source]
Figure 3. Brand-new Compilation Process of PyTorch 2.0 [source]
With features like TorchDynamo and torch.compile, PyTorch 2.0 fundamentally reshaped how developers approach model optimization. These tools dynamically analyze Python bytecode and automatically generate highly optimized static computation graphs, which can then be efficiently customized and compiled for the target hardware. What once required external conversion tools like ONNX is now seamlessly built into PyTorch, simplifying the deployment process.
With the support of such new features and PyTorch’s dominance in both research and production, a strong consensus emerged within the open-source and industry communities: investing significant engineering effort into vertical, PyTorch-specific conversion passes was a worthwhile endeavor. While these optimizations require substantial development resources and are not transferable to other training frameworks, the industry’s confidence in PyTorch’s long-term trajectory drove further innovation. This momentum gave rise to projects like Torch-TensorRT, which integrates directly with NVIDIA’s TensorRT to optimize and deploy PyTorch models on NVIDIA GPUs, streamlining the transition from experimentation to production.
Recognizing this shift, several NPU hardware vendors have increasingly aligned their products with PyTorch’s native deployment ecosystem. Many of these devices now integrate its compiler as the backend of torch.compile, enabling seamless deployment of PyTorch models directly onto the hardware. This reflects a broader industry movement toward vertical integration in AI tooling, where the entire workflow—from model development to deployment—is managed within a single, unified framework. This approach simplifies development, reduces latency, improves performance, and ultimately enhances the overall experience for both researchers and engineers.
With PyTorch 2.0 and its expanding ecosystem, the AI development lifecycle has become significantly more vertical. Researchers and engineers can now prototype, optimize, and deploy models—all within a single framework—without needing to switch between multiple tools or formats. This seamless integration fosters a more agile development process, where rapid iteration accelerates the path from research to real-world applications.

ONNX Today

Specialized Applications

With the shift toward vertical integration—where training and deployment are handled within a unified ecosystem—the need for external conversion tools like ONNX has declined in many mainstream scenarios. However, ONNX has not disappeared entirely. Instead of vanishing, it has been repositioned into more specialized use cases, where its unique strengths still provide meaningful value. This is particularly evident in scenarios that require high flexibility and robust cross-platform portability, which remain crucial in today’s diverse computational landscape.
One prominent area where ONNX still demonstrates its relevance is in the deployment of AI models on mobile and edge devices. These environments, which include everything from smartphones and IoT devices to embedded systems, typically require models that are not only lightweight and efficient but also capable of operating across a broad spectrum of hardware architectures. Thanks to its framework-agnostic design, ONNX provides a powerful solution for developers looking to export models from popular platforms such as PyTorch or TensorFlow. This capability allows for the smooth transition of models to devices that often have limited computational resources, ensuring that performance and efficiency are maintained even under constrained conditions.
A concrete example of ONNX’s continued utility is found in the ONNX Runtime, an inference engine that has been designed to execute models formatted in ONNX on various devices. This engine is widely adopted in scenarios where portability and efficiency are of the utmost importance. Developers leverage the ONNX Runtime to run AI models on a diverse array of hardware—from mobile devices to specialized IoT hardware and various embedded systems. In these contexts, the unified format provided by ONNX simplifies the often complex process of deploying models on heterogeneous platforms, delivering a practical advantage that is hard to overlook.

Finding Its Place in a New Era

Despite these pockets of robust utility, ONNX has largely transitioned from being the central, unifying standard in the AI community to becoming one of many tools in the developer’s expansive toolkit. For the majority of practitioners, the need to utilize ONNX arises primarily in edge cases where its specific interoperability features are indispensable. In contrast, for mainstream workflows—especially those involving the development and deployment of LLMs or other high-performance applications—tools that are native to PyTorch, such as torch.compile or Torch-TensorRT, are typically favored. These alternatives are preferred due to their seamless integration, greater efficiency, and the streamlined processes they offer, which better align with the rapid iteration cycles common in modern AI development.
The challenges that ONNX faces in retaining its prominence are indicative of broader trends permeating the AI industry. In a development landscape increasingly dominated by frameworks like PyTorch, the inherent advantages of a framework-agnostic format naturally become less compelling. Additionally, the extra steps required to export, convert, and optimize models for ONNX can introduce an element of overhead that many developers find unnecessary when compared to the more integrated and efficient workflows provided by native solutions. As the field of AI continues to evolve, the shifting balance of these factors has led to a reconfiguration of ONNX’s role—from a universal standard to a specialized tool optimized for particular deployment scenarios.

Lessons from ONNX’s Journey

The rise and fall of ONNX provide valuable insights into the dynamic nature of the AI industry, where innovation is relentless, and adaptability is key to staying relevant. ONNX, once a critical piece of AI infrastructure, illustrates how even the most promising tools can struggle to maintain relevance as technology and developer needs evolve.

1. The Importance of Timing and Ecosystem Support

ONNX thrived during a period of fragmentation in the AI ecosystem. Its success was tied to its ability to unify this diversity, enabling developers to bridge gaps between frameworks and optimize their models for deployment. However, as PyTorch rose to dominance and the ecosystem consolidated, the need for a unifying format like ONNX naturally declined. Timing is critical in the lifecycle of any AI tool, and ONNX’s relevance was closely tied to the conditions of its era.

2. The Shift Toward Vertical Integration

The decline of ONNX also underscores a broader trend in AI tooling: the shift toward vertical integration. Developers increasingly prefer tools that provide end-to-end solutions, minimizing the need for external dependencies or intermediate formats. PyTorch’s ability to handle research, training, optimization, and deployment within a single ecosystem exemplifies this trend. In a competitive industry, simplicity and efficiency often win out over general-purpose solutions.

3. Specialization vs. Generalization

Finally, ONNX’s transition from a unifying standard to a niche tool reflects the inevitable specialization of AI tools. As the industry matures, tools and frameworks tend to narrow their focus to serve specific needs or markets. While ONNX may no longer be the go-to solution for mainstream workflows, it still serves valuable purposes in edge deployments and legacy systems. This demonstrates that while general-purpose tools can play a pivotal role during periods of fragmentation, specialization is often the key to long-term survival in a rapidly changing field.

Conclusion

Figure 4. ONNX is still good, but PyTorch 2.0 is just too strong.
Figure 4. ONNX is still good, but PyTorch 2.0 is just too strong.
The rise and fall of ONNX is a testament to the rapid pace of AI innovation. Tools that once felt indispensable can quickly fade as industry needs evolve. ONNX’s ascent during the computer vision era and its decline in the age of LLMs reflect a broader shift in priorities—from interoperability to integration, from general-purpose solutions to specialized, framework-native tools.
Yet, as ONNX fades into the background, it leaves behind an important question: Is the absence of a universal intermediate format truly sustainable? ONNX once provided a standardized way to deploy models across diverse hardware, simplifying the process of bringing new innovations into production. Now, with no widely adopted equivalent for modern AI models like LLMs, the landscape has become increasingly fragmented. Leading inference frameworks like TensorRT-LLM and vLLM rely heavily on manually written scripts and model definitions. TensorRT-LLM, for example, requires a collection of handcrafted model conversion scripts, while vLLM defines models explicitly in PyTorch. Keeping up with new architectures depends largely on continuous open-source contributions, requiring developers to manually implement support for each cutting-edge model.
But is this approach sustainable in the long run? As LLM architectures continue to evolve in complexity, will we eventually need another standardized model representation tailored specifically for LLM inference? Or will open-source efforts and vertically integrated frameworks like PyTorch continue to adapt, mitigating the need for an intermediate format altogether?
The story of ONNX is more than just history—it’s a lesson in adaptability and the ever-present need for better AI tooling. While ONNX may no longer be central to the AI ecosystem, its core idea—ensuring that the tools we build today can support the models of tomorrow—remains relevant.
So why revisit ONNX’s legacy in the era of LLMs? Because in many ways, we are trying to revive its spirit—to rethink and redefine a more intelligent, streamlined deployment path for LLM models, one that reduces the reliance on hand-coded scripts. Stay tuned as we explore this vision further in upcoming blog posts.
Share article
Join the SqueezeBits newsletter today!

SqueezeBits