Too many AI models

An inundation of AI models, too many of the next generation

 

How many AI models constitute an excessive number? It’s subjective, but the introduction of roughly 10 new models per week seems excessive. In recent days, we’ve witnessed this influx, making it increasingly challenging to discern their relative merits or distinctions, if such comparisons were ever feasible to begin with. This begs the question: what’s the purpose?

We’re currently witnessing a peculiar phase in the AI’s evolution, although it’s fair to say it’s been rather unconventional from the outset. There’s a notable surge in the development of AI models, ranging from specialized creations by smaller developers to those backed by substantial funding.

Consider the array of models introduced this week. Below is a condensed summary highlighting what distinguishes each one.

 

How many AI models is too many? It depends on how you look at it, but 10 a week is probably a bit much. That’s roughly how many we’ve seen roll out in the last few days, and it’s increasingly hard to say whether and how these models compare to one another, if it was ever possible to begin with. So what’s the point?

 

We’re at a weird time in the evolution of AI, though of course it’s been pretty weird the whole time. We’re seeing a proliferation of models large and small, from niche developers to large, well-funded ones.

 

Too many AI models

Image Credit – https://images.app.goo.gl/eMAfUFL5H8qp4ktv8

 

 

Let’s just run down the list from this week, shall we? I’ve tried to condense what sets each model apart.

  • LLaMa-3: Meta’s most recent flagship large language model, labeled as “open” despite ongoing debate surrounding the term, has garnered significant attention within the community.
  • Mistral 8×22: A large-scale “mixture of experts” model developed by a French organization, which, despite previously embracing openness, has recently become more reserved in this regard.
  • Stable Diffusion 3 Turbo: A revised version of SD3, designed to complement Stability’s new API, which adopts a somewhat open approach. While the decision to incorporate “turbo” from OpenAI’s model naming convention may seem unconventional, it is deemed acceptable in this context.
  • Adobe Acrobat AI Assistant: “Talk to your documents” offered by the dominant document entity, likely serves primarily as a wrapper for ChatGPT, despite any additional functionalities it may boast.
  • Reka Core:Developed by a small team formerly associated with a major AI firm, a multimodal model crafted from the ground up has emerged, boasting a level of competitiveness with the industry giants, at least on paper.
  • Idefics2: A newly unveiled multimodal model, distinguished by its heightened openness, constructed atop recent, smaller Mistral and Google models.
  • OLMo-1.7-7B: An expanded iteration of AI2’s LLM, renowned for its high degree of openness, poised as a precursor to a forthcoming 70B-scale model.
  • Pile-T5: A refined iteration of the dependable T5 model, fine-tuned using the extensive code database, the Pile. This version retains the familiarity and reliability of T5 while offering enhanced capabilities specifically tailored for coding tasks.
  • Cohere Compass: An “embedding model,” prioritizing the integration of multiple data types to broaden its applicability across various use cases.
  • Imagine Flash: Meta’s latest image generation model utilizes a novel distillation method to expedite diffusion while maintaining quality standards without significant compromise.
  • Limitless: Eleven models have been revealed, including one announced during the drafting of this text. However, this list does not encompass all models released or previewed this week. It merely includes those observed and deliberated upon. Should the criteria for inclusion be broadened, there would be dozens more, encompassing fine-tuned existing models, hybrid models like Idefics 2, experimental or specialized models, and beyond. Additionally, this week has seen the introduction of new tools for both constructing (torchtune) and countering (Glaze 2.0) generative AI

 

The incessant influx of new AI models presents a daunting challenge in terms of keeping pace and comprehensively reviewing each one. However, it’s essential to recognize that it’s not necessary to stay abreast of every single development. Certain models, such as ChatGPT and Gemini, have evolved into comprehensive web platforms, catering to a multitude of use cases and accessible through various channels. On the other hand, larger language models like LLaMa or OLMo, while sharing a fundamental architecture, serve different roles. They function more as background services or components rather than prominent brand names in the foreground.

 

There’s intentional ambiguity surrounding these distinctions, as developers aim to capture some of the attention typically associated with major AI platform releases, such as the hypothetical GPT-4V or Gemini Ultra. Everyone wants their release to be perceived as significant, but in reality, its importance may be limited to specific users or applications, rather than the broader audience. Therefore, while these developments may be crucial to certain stakeholders, they may not necessarily hold significance for everyone.

 

The analogy of the evolution of cars to the proliferation of AI models is quite apt. In the early days of automobiles, there were only a few options available, such as big cars, small cars, or tractors. Similarly, in the early stages of AI development, the landscape was relatively simple, with only a handful of models to consider.

 

However, as technology advanced, the number of car models exploded, and today, there are hundreds of new cars released every year. Likewise, in the field of AI, we’re witnessing a proliferation of models, with new ones emerging constantly. Just as you don’t need to be aware of every car released, you don’t need to keep up with every AI model either, as many of them may not be relevant to your needs or understanding of AI.

 

Moreover, this proliferation era in AI didn’t start with the advent of big models like ChatGPT. It has been building up for years, with constant research, development, and experimentation in the field. Even before the emergence of these large models, there were numerous papers, models, and research projects being published regularly. Conferences like SIGGRAPH and NeurIPS were hubs for machine learning engineers to exchange ideas and build upon each other’s work.

 

So, while the recent surge in attention on big AI models is notable, it’s important to recognize that AI development has been progressing steadily for years, laying the groundwork for this breakthrough moment.

 

The ongoing activity in AI development continues to shape the landscape on a daily basis. However, due to the significant commercialization of AI, these advancements are often scrutinized more closely, with people wondering if any new model could represent a leap comparable to the one seen with ChatGPT over its predecessors.

 

Yet, the reality is that none of these models are likely to bring about such a monumental shift. OpenAI’s breakthrough with ChatGPT was founded on a fundamental change to machine learning architecture, a shift that has since been widely adopted by other companies and has yet to be surpassed. Therefore, the improvements we can expect from new models are likely to be incremental, such as slightly better performance on synthetic benchmarks or marginally more convincing language or imagery.

 

Absolutely, each iteration of these models plays a crucial role in advancing the field of AI. Progression from version 2.0 to 3.0 entails a series of incremental updates, such as 2.1, 2.2, 2.2.1, and so forth, each contributing to the evolution of the technology. While not every update may garner widespread attention, some advances are indeed significant, addressing critical shortcomings or revealing unforeseen vulnerabilities.

 

We strive to highlight the most noteworthy developments in the field, recognizing that they represent only a fraction of the total number of models being developed. In fact, we are currently working on a piece to compile a list of models that we believe those interested in machine learning should be aware of, and the list is quite extensive, numbering around a dozen.

 

Rest assured, when a truly groundbreaking model emerges, its importance will be readily apparent to both experts and the general public alike. It will be as obvious to you as it is to us.

Leave a Reply

Your email address will not be published. Required fields are marked *

error: Content is protected !!