Thursday, June 20, 2024

When Colorless Green DNNs Sleep Furiously in an Unexplainable Fantasy – Communications of the ACM

Computer scienceWhen Colorless Green DNNs Sleep Furiously in an Unexplainable Fantasy – Communications of the ACM

It seems that the misguided hype surrounding deep learning and in turn artificial intelligence (AI) does not seem to be subsiding. In fact, with the advent of Large Language Models (LLMs) – and products therefrom, such as GPT-4, Sora, Copilot, Llama 3, etc., many have been raising up the ante as we started to hear of declarations such as “AGI (artificial general intelligence) is near,” or that “kids should stop learning to code because AI programming is here.” In this blurred atmosphere foundational research that shows the “in-theory” limitations of LLMs (or, for that matter, limitations of the overall deep neural networks architecture) are completely ignored. Paradigmatic problems in these models such as unexplainability, hallucinations, faulty reasoning, utter failures in complex problem solving and in non-trivial planning tasks, failure in analogical reasoning, etc., are all apologetically brushed aside. This state of affairs is unhealthy not only because we are deceiving a generation of young scientists and engineers, but also because this path hinders making progress outside of this black box and this in turn will only lead to disappointments once the house of cards collapses, a state of affairs that might trigger another very cold and severe AI Winter.

I will discuss here (i) some of the misguided claims being made; and (ii) some of the results regarding “in-theory” limitations of deep neural networks (DNNs)—results that are stubbornly being brushed aside or outright ignored. My hope, like in several other posts and blogs I have written about this subject, is to bring some sanity to the discussion and to start thinking seriously about other alternatives that could help in producing AI that is explainable, reliable, and scalable.

Colorless Green Fantasy #1:
Forget the Theory of Computation, AI will Soon Replace Programmers.

One of the recurring and utterly unscientific claims that have been surfacing in the media for some time now is the notion that AI can soon replace computer programmers. In fact, people in very senior positions have been saying something to the effect that young people should not study computer programming any more since ‘AI is the future programmer.’ These claims are utterly false, and not because of the current state of the art, but such claims can never be true, never! Automatically discovering formal (i.e., unambiguous) solutions to novel problems can never be automated. Anyone who is slightly acquainted with the theory of computation knows (or should know) this.

Let us suppose some AI came up with an algorithm A to solve a specific problem P. To ensure that A solves P, the AI has to execute A to test its functionality. This, presumably, is done by a test program T1. In executing T1 on A, the AI is essentially trying to solve the Halting Problem [1] since the test might never terminate and the AI will never know if the testing program T1 does not terminate or the original program A it is testing does not terminate. The AI can, if it does not care about the theory of computation and mathematical logic, try to construct another program T2, to test if T1 works well. Well, what if T2 does not terminate? The non-termination could be in the original program A, in the testing program T1, or in the new test program T2 that is testing T1.

Figure 1. The infinite regress of a program testing a program testing a program testing ad infinitum.

Should we continue in this infinite regress and design T3 to test T2? Clearly, automatic programming leads to the Halting Problem that is an instance of paradoxes involving ‘self-reference’—problems that always lead to an infinite regress (see [2]).

To conclude, we can never automate the construction (nor the testing) of computer programs. So says logic and so says the theory of computation. OK, Mr. CEO?

What AI can do is improve on our long dream of computer-aided software engineering (CASE) and help programmers automatically generate tedious code that we usually write over and over. This can be done—and modern tools are doing a lot of this by ‘cleverly’ stitching together snippets of code memorized and found to occur in similar contexts. This automation is very helpful and will increase productivity, but any talk of fully automating human intuition (Turing’s Oracle!) in coming up with novel solutions to complex problems is fantasy and the computer science police department (CSPD) should issue a citation for any such talk.

Colorless Green Fantasy #2
We are Making Progress in Explainable AI (XAI).

First of all, I classify such claims as ‘fantasy’ only in regard to AI that is exclusively based on the subsymbolic architecture of neural networks (deep or otherwise). There can be no explainability in such models and I am not sure why this is hard to understand. Explainability of how/why F(x1, x2, xn) produced some output y is doing inference in reverse—it is about backtracking and explaining every step of the computation in terms of the original inputs x1, x2, …, xn. In symbolic systems, this is accomplished by maintaining a ‘semantic map’ of the computation, usually using some symbolic structure like an abstract syntax tree or the computation graph. In other words, explainability requires ‘structured semantics’ [3] that do not exist in neural networks—and we have known for some time these models cannot represent or model symbolic structures [4]. I tried to explain this in a previous post (summarized in Figure 2A), but I recently received comments saying my visual explanation (Figure 2B) was more effective in driving the point home.

If neither argument convinces you that purely distributed subsymbolic architectures are unexplainable, then I cannot perhaps convince you that the Earth is not flat. Now, far from being flat-earthers, there are, of course, those who acknowledge the fact that distributed subsymbolic architectures are unexplainable, such as the AI luminary Yann LeCun, but their solution to the problem is to ignore the problem, or dismiss it, stating things like explainability is “not necessary”, or even by stating that “sometimes, requiring explainability is counterproductive” (requiring explainability as to why a system decided to refuse an applicant, or to why a system decided to fire a nuclear missile, is unproductive?) In any case, the argument to ignore the problem is still more intelligent than denying the problem, because all the handwaving about making progress on explainability in purely distributed subsymbolic systems (e.g., DNNs) is certainly pure fantasy.

Figure 2. Two illustrations of why explainability cannot be attained in distributed subsymbolic systems, such as DNNs.

Fantasy #3
LLMs are the first step towards AGI

One of the most interesting papers I have read in the past year or so is entitled “Can Deep CNNs Avoid Infinite Regress/Circularity in Content Constitution?”[4] While the title mentions CNNs (convolutional neural networks), the arguments in the paper apply to all DNNs. The paper clearly shows (and in many ways reiterates previous results from decades earlier, most notably by Fodor and his colleagues) that concepts in distributed subsymbolic systems are not well-defined because they are determined by “similarity,” and similarity without identity leads to an infinite regress. This result is very much related to the research that also confirms that distributed subsymbolic models are inherently non-symbolic.[3] In both cases the research reaffirms the fact that DNNs cannot perform reasoning that requires storage of, reference to, and quantification over symbolic objects. But if this is the case, then many tasks that are a prerequisite for any AGI are beyond even the most advanced large language models (LLMs). One of these tasks is planning, which is the basic task for any autonomous agent/robot. In [5], it is reported that even the most advanced LLMs performed so poorly that even making random plans would have scored higher. Anyone with access to a large language model can confirm this in a few minutes. We conducted several of these tests, with simple planning tasks like the ones shown in Figure 3.

Figure 3. Two simple planning tasks that all LLMs failed miserably in solving.

Although a child could solve the problems in Figure 3 very easily (although admittedly, they might not formulate “the” optimal plan), the most advanced LLMs tested not only failed to reach the goal state, but the plans suggested did not confirm to any reasonable strategy—the plan was almost a random (rambling) set of steps. This should not be surprising for anyone familiar with the “theoretical” importance of object identity and the need to be able to store and make reference to symbolic variables when solving such problems. As stated above, distributed subsymbolic systems are inherently non-symbolic, and they lack any notion of object identity, and so they cannot be relied upon in solving such tasks that require planning. Given these facts, it is beyond astonishing to hear anyone talk of AGI being near when all this talk is fueled by a mere next token prediction!

AGI, as it has been defined, is intelligence that is at the level of or surpassing human-level intelligence. This requires not only mastery of planning and everyday problem solving, tasks that even the most advanced LLM cannot even attempt to solve, but that level of intelligence requires solving long-standing and still-unsolved problems in AI such as the frame problem, not to mention the paradox of inverted complexity. The bottom line here is this: while the data-driven bottom-up strategy in the reverse-engineering of language at scale (which is what LLMs are) has resulted in impressive performance in text generation, extrapolating this into suggesting AGI is near is also a fantasy, and this, too, requires immediate attention from the computer science police department (CSPD).

Fantasy #4
LLMs understand language.

Having argued that LLMs will not be the next AI programmers, and that they will not (if they are built exclusively in the distributed subsymbolic architecture) be explainable, and they will not be able to perform simple tasks that require planning and problem-solving skills, I will now also dispel some misconceptions that relate to LLMs and their so-called success in natural language understanding. Contrary to popular (‘folk’) belief, LLMs do not truly understand language, for the same reason they fail in general problem-solving and planning, the same reason they are unexplainable.

While most testing of LLMs involve tasks that involve text generation (e.g., summarization, question answering, content retrieval), the real test for the language competency of LLMs should be done in reverse. By their design LLMs are excellent as text generators, but to test their “understanding,” you should provide text (as opposed to asking for text) and query their understanding of the given text. One easy way to do this is to give an LLM some text (a few sentences or a small paragraph) and ask the LLM to show all the entities and relations that are implicit in that text. A simple example is shown below:


Show me the entities and relations implicit in the text below using a formal and unambigious representation; for example, in RDF or Prolog notation.

I was in Barcelona when the city was celebrating their win over Real Madrid as it also getting ready to vote for independence.






in_city(self, barcelona1).

won_over(barcelona1, real_madrid1).

celebrating(barcelona1, celebration_win1).

preparing_for(barcelona1, preparation_vote1).

While the LLM was able to grasp the low-hanging fruits, recognizing ‘Barcelona’ as a city, and that the win and vote are events, deeper understanding of the text, as small as it is, was too far off. For one thing, Barcelona the city—i.e., the geographic location, is apparently what won over Real Madrid (not the FC Barcelona football club), and Barcelona the geographic location is also what is celebrating and preparing to vote. What LLMs cannot cope with (precisely because they do not deal with object identity and symbolic references) is the fact that in language, we often use a single entity to refer to multiple semantic types at once (a phenomenon known by ‘co-predication’ [6]). The same will happen if you ask an LLM to show you the entities and relations implicit in “I threw away the newspaper I was reading after I knew they fired my favorite columnist.” Again, the most advanced LLMs will determine the same physical object newspaper (that someone threw away) was the one that fired the columnist, as opposed to another entity implicitly implied by the same reference. Without dwelling too much on these types of tests that show the (lack of) language understanding, I wanted to also stress the point that LLMs can also be tested for basic language understanding and the results are far from impressive.

Concluding Remarks

While much progress has been made in AI, we are nowhere near speaking of AGI. Even the main forte of LLMs, namely language, is not a done deal, not by a long shot. In almost all areas of AI, all indications are that we need to incorporate new techniques and new technologies, and almost all the reported theoretical limitations point in one direction: there can be no reliable, explainable, predictable, and inferential AI without incorporating symbolic knowledge and symbolic reasoning.

Regardless of where you stand on the symbolic vs. subsymbolic (or the hybrid, or the modular and emergent paradigms), any talk of AGI at this point is fantasy. Existing AI cannot yet perform tasks that for a 4-year-old are second-nature, including many tasks that require language understanding.

If we are not going to police ourselves, and if respectable venues like journals and conferences and media outlets are not policing themselves, should we as a community seriously think about creating some sort of a Computer Science Police Department (CSPD)?


[1] The Halting Problem, Stanford Encyclopedia of Philosophy

[2] Andrea Cantini, 2004, Paradoxes, Self-reference and, Truth in the 20th Century, Handbook of the History of Logic. Volume 6, Dov M. Gabbay and John Woods (Editors), Elsevier.

[3] Kieran Browne and Ben Swift, 2020. Semantics and explanation: why counterfactual explanations produce adversarial examples in deep neural networks, Available online at

[4] Tim van Gelder, 1990. Why Distributed Representation is Inherently non-Symbolic, In Dorffner (ed.), Konnektionismus in Artificial Intelligence und Kognitionsforschung, Springer-Verlag Berlin Heidelberg.

[5] Karthik Valmeekam, Matthew Marquez, Sarath Sreedharan, Subbarao Kambhampati. 2023. On the Planning Abilities of Large Language Models – A Critical Investigation, In Advances in Neural Information Processing Systems 36 (NeurIPS 2023)

[6] Nicholas Asher and James Pustejovsky. 2005. Word Meaning and Commonsense Metaphysics. In: Course Materials for Type Selection and the Semantics of Local Context, ESSLLI 2005.

Walid Saba

Walid Saba is Senior Research Scientist at the Institute for Experiential AI at Northeastern University. He has published over 45 articles on AI and NLP, including an award-winning paper at KI-2008.

Check out our other content

Check out other tags:

Most Popular Articles