Transformers Can Solve Any Problem

With techniques like CoT, we are moving towards explainable AI systems and slowly moving away from models that were prone to blackbox. The post Transformers Can Solve Any Problem appeared first on AIM.

Programming, App Development, Web Development Sep 23, 2024 0 56 Add to Reading List

In many cases, LLMs are turning out to be the default solution to business problems, including domains that do not require language understanding. It seems like Transformers are indeed all we need.

Reiterating this, Denny Zhou, research director at Google DeepMind, recently released a new paper. While sharing his research on X, Zhou mentioned that, “We have mathematically proven that Transformers can solve any problem, provided they are allowed to generate as many intermediate reasoning tokens as needed.”

This echoes AI researcher Andrej Karpathy’s recent remarks on next-token prediction frameworks, suggesting that they could become a universal tool for solving a wide range of problems, far beyond just alone.

LLMs are not really just “language experts” anymore. According to Karpathy, the “language” part has become historical because these models were first trained to predict the next word in a sentence, but in reality, they can work on any kind of data that’s broken down into little pieces, called tokens.

This is what Zhou also presented in his paper, but instead for LLMs. The research primarily focused on CoT (chain of thought) and by using CoT, it provides a “road map” for LLMs to follow when solving complex problems.

A YouTube video, while explaining the significance of this paper, mentioned that using CoT, you are not giving your AI tools to put the puzzle pieces together but instead you are making it understand why the pieces fit the way they should.

Chain of Thinking and Reasoning

Without CoT, Transformers can only solve problems in parallel computation models (AC0/TC0 complexity classes). However, with CoT, Transformers can solve more complex problems, enabling serial computation.

It all comes down to how a model will be able to reason better. We saw the first iteration of System 2 LLMs starting with o-series models from OpenAI which uses a perfect combination of CoT and reasoning tokens.

Zhiyuan Li, assistant professor at the Toyota Technological Institute at Chicago and lead contributor to the research paper, mentioned that this proves that CoT enables more iterative compute to solve inherently serial problems. “On the other hand, a const-depth transformer that outputs answers right away can only solve problems that allow fast parallel algorithms,” he added.

Li further shared an image suggesting that models using CoT can solve more complex problems that require many steps in sequence. On the other hand, models without this ability can only handle simpler problems that can be solved quickly in parallel.

With techniques like CoT, we are moving towards explainable AI systems and slowly moving away from models that were prone to blackbox. A Reddit user mentioned that with the help of CoT, the inner workings of LLM are traceable too. “The black box of latent space would make it harder for us humans to understand how the model is performing the reasoning. There is a huge benefit to explainable AI,” he added further.

Number of Tokens Matter the Most

CoT goes beyond solving maths problems, and users have started comparing it to Turing Machines, a theoretical computational model that defines an abstract machine capable of simulating any computer algorithm.

For example, Google researchers have solved two critical problems – the Circuit Value Problem (CVP) and the Permutation Composition Problem. These two are classic computer problems and in both cases, enabling CoT allows Transformers, especially those with low depth, to solve these inherently serial problems much more effectively than without CoT.

This aligns with the paper’s theoretical predictions about CoT’s ability to empower Transformers to handle serial computations.

In a discussion on HackerNews, users mentioned that CoT could push LLMs closer to the theoretical limits of computation as represented by Turing machines. “Since LLMs operate as at least a subset of Turing machines, the chain of thought approach could be equivalent to or even more expressive than that subset. In fact, CoT could perfectly be a Turing machine,” he added further.

The CoT method, while being extremely useful, is known to use multiple tokens, which is not only costly but takes longer to respond. Justin, the founder of the AI LegalTech startup, raised questions about practical application perspective, time, and cost.

There are few things which can be considered when it comes to cost. The first is Mosaic’s law, which states that the cost of training goes down 75% per year. The other consideration is Koomey’s law, which states that the energy efficiency of computation doubles roughly every 1.5 years.

It’s a critical metric to understand how computational power becomes more energy-efficient over time, which is vital for sustainable technology development and the same goes for a number of tokens.

The post Transformers Can Solve Any Problem appeared first on AIM.

Read Original