Tuesday, July 02, 2024

messy AI milestones

For me it is VERY useful to have a list of AI milestones with dates. This defines the ball park which is much, much bigger than ChatGPT. It provides a framework which helps inform future focus. The comments I've added are there as a self guide to future research. So, they often do hint at my favourites.

Keep in mind that there are at least four different types of AI: Symbolic, Neural Networks aka Connectionist, Traditional Robots and Behavioural Robotics, as well as hybrids. For some events in the timeline it is easy to map to the AI type but for others it is not so easy.

1943: Warren McCulloch, a neurophysiologist, and Walter Pitts, a logician, teamed up to develop a mathematical model of an artificial neuron. In their paper "A Logical Calculus of the Ideas Immanent in Nervous Activity" they declared that:
Because of the “all-or-none” character of nervous activity, neural events and the relations among them can be treated by means of propositional logic. It is found that the behavior of every net can be described in these terms.
1950: Alan Turing publishes “Computer Machinery and Intelligence” (‘The Imitation Game’ later known as the Turing Test)
1952: Arthur Samuel implemented a program that could play checkers against a human opponent

1954: Marvin Minsky submitted his Ph.D. thesis in Princeton in 1954, titled Theory of Neural-Analog Reinforcement Systems and its Application to the Brain-Model Problem; two years later Minsky had abandoned this approach and was a leader in the symbolic approach at Dartmouth.

1956: Dartmouth Workshop organised by John McCarthy coined the term Artificial Intelligence. He said would explore the hypothesis that "every aspect of learning or any other feature of intelligence can in principle be so precisely described that a machine can be made to simulate it."
The main descriptor for the favoured approach was Symbolist: based on logical reasoning with symbols. Later this approach was often referred to as GOFAI or Good Old Fashioned AI.

Knowledge can be represented by a set of rules, and computer programs can use logic to manipulate that knowledge. Leading symbolists Allen Newell and Herbert Simon argued that if a symbolic system had enough structured facts and premises, the aggregation would eventually produce broad intelligence.

Marvin Minsky, Allen Newell and Herb Simon, together with John McCarthy, set the research agenda for machine intelligence for the next 30 years. All were inspired by earlier work by Alan Turing, Claude Shannon and Norbert Weiner on tree search for playing chess. From this workshop, tree search — for game playing, for proving theorems, for reasoning, for perceptual processes such as vision and speech and for learning — became the dominant mode of thought.

1957: Connectionists: Frank Rosenblatt invents the perceptron, a system which paves the way for modern neural networks
The connectionists, inspired by biology, worked on "artificial neural networks" that would take in information and make sense of it themselves. The pioneering example was the perceptron, an experimental machine built by the Cornell psychologist Frank Rosenblatt with funding from the U.S. Navy. It had 400 light sensors that together acted as a retina, feeding information to about 1,000 "neurons" that did the processing and produced a single output. In 1958, a New York Times article quoted Rosenblatt as saying that "the machine would be the first device to think as the human brain."

Perceptrons were critiqued as very limited in what they could achieve by the symbolic advocates Minsky & Papert in their book Perceptrons. The Symbolists won this funding battle.

1959: John McCarthy noted the value of commonsense knowledge in his pioneering paper "Programs with Common Sense" [McCarthy1959]

1959:Arthur Samuel published a paper titled “Some Studies in Machine Learning Using the Game of Checkers”⁠2, the first time the phrase “Machine Learning” was used–earlier there had been models of learning machines, but this was a more general concept

1960: Frank Rosenblatt published results from his hardware Mark I Perceptron, a simple model of a single neuron, and tried to formalize what it was learning.

1960: Donald Michie himself built a machine that could learn to play the game of tic-tac-toe (Noughts and Crosses in British English) from 304 matchboxes, small rectangular boxes which were the containers for matches, and which had an outer cover and a sliding inner box to hold the matches. He put a label on one end of each of these sliding boxes, and carefully filled them with precise numbers of colored beads. With the help of a human operator, mindlessly following some simple rules, he had a machine that could not only play tic-tac-toe but could learn to get better at it.

He called his machine MENACE, for Matchbox Educable Noughts And Crosses Engine, and published⁠5 a report on it in 1961

1960s: Symbolic AI in the 1960s was able to successfully simulate the process of high-level reasoning, including logical deduction, algebra, geometry, spatial reasoning and means-ends analysis, all of them in precise English sentences, just like the ones humans used when they reasoned. Many observers, including philosophers, psychologists and the AI researchers themselves became convinced that they had captured the essential features of intelligence. This was not just hubris or speculation -- this was entailed by rationalism. If it was not true, then it brings into question a large part of the entire Western philosophical tradition.

Continental philosophy, which included Nietzsche, Husserl, Heidegger and others, rejected rationalism and argued that our high-level reasoning was limited, prone to error, and that most of our abilities come from our intuitions, our culture, and from our instinctive feel for the situation. Philosophers who were familiar with this tradition were the first to criticize GOFAI (Good Old Fashioned AI) and the assertion that it was sufficient for intelligence, such as Hubert Dreyfus and Haugeland.

1963: First PhD about computer vision by Larry Roberts MIT

1963: (1985) The philosopher John Haugeland in his 1985 book "Artificial Intelligence: The Very Idea" asked these two questions:
  • Can GOFAI produce human level artificial intelligence in a machine?
  • Is GOFAI the primary method that brains use to display intelligence?
AI founder Herbert A. Simon speculated in 1963 that the answers to both these questions was "yes". His evidence was the performance of programs he had co-written, such as Logic Theorist and the General Problem Solver, and his psychological research on human problem solving.

1966: Joseph Weizenbaum creates the Eliza Chatbot, an early example of natural language processing.
1967: MIT professor Marvin Minsky wrote: "Within a generation...the problem of creating 'artificial intelligence' will be substantially solved."

1968: Origin of Traditional Robotics: an approach to Artificial Intelligence by Donald Pieper, "The Kinematics of Manipulators Under Computer Control", at the Stanford Artificial Intelligence Laboratory (SAIL) in 1968.

1969-71: The classical AI "blocksworld" system SHRLDU, designed by Terry Winograd (mentor to Google founders Larry Page and Sergey Brin) revolved around an internal, updatable cognitive model of the world, that represented the software's understanding of the locations and properties of a set of stacked physical objects (Winograd,1971). SHRDLU carried on a simple dialog (via teletype) with a user, about a small world of objects (the BLOCKS world) shown on an early display screen (DEC-340 attached to a PDP-6 computer)

1979: Hans Moravec builds the Stanford Cart, one of the first autonomous vehicles (outdoor capable)

1980s: Back propagation and multi layer networks used in neural nets (only 2 or 3 layers)

1980s: Rule based Expert Systems, a more heuristic form of logical reasoning with symbols encoded the knowledge of a particular discipline, such as law or medicine

1984: Douglas Lenat (1950-2023) began work on a project he named Cyc that aimed to encode common sense in a machine. Lenat and his team added terms (facts and concepts) to Cyc's ontology and explained the relationships between them via rules. By 2017, the team had 1.5 million terms and 24.5 million rules. Yet Cyc is still nowhere near achieving general intelligence. Doug Lenat made the representation of common-sense knowledge in machine-interpretable form his life's work
Alan Kay's speech at Doug Lenat's memorial

1985: Robotics loop closing (Rodney Brooks, Raja Chatila) – if a robot sees a landmark a second time it can tighten up on uncertainties

1985: Origin of behavioural based robotics. Rodney Brooks wrote "A Robust Layered Control System for a Mobile Robot", in 1985, which appeared in a journal in 1986, when it was called the Subsumption Architecture. This later became the behavior-based approach to robotics and eventually through technical innovations by others morphed into behavior trees.

This has lead to more than 20 million robots in people’s homes, numerically more robots by far than any other robots ever built, and behavior trees are now underneath the hood of two thirds of the world’s video games, and many physical robots from UAVs to robots in factories.

1986: Marvin Minsky publishes "The Society of Mind". A mind grows out of an accumulation of mindless parts.
1986: David Rumelhart, Geoffrey Hinton, and Ronald Williams published a paper Learning Representations by Back-Propagating Errors, which re-established the neural networks field using a small number of layers of neuron models, each much like the Perceptron model. There was a great flurry of activity for the next decade until most researchers once again abandoned neural networks.

1986: Perhaps the most pivotal work in neural networks in the last 50 years was the multi-volume Parallel Distributed Processing (PDP) by David Rumelhart, James McClellan, and the PDP Research Group, released in 1986 by MIT Press. Chapter 1 lays out a similar hope to that shown by Rosenblatt:
People are smarter than today's computers because the brain employs a basic computational architecture that is more suited to deal with a central aspect of the natural information processing tasks that people are so good at. ...We will introduce a computational framework for modeling cognitive processes that seems… closer than other frameworks to the style of computation as it might be done by the brain.

Rumelhart and McClelland dismissed symbol-manipulation as a marginal phenomenon, “not of the essence human computation”.
1986: The term Deep Learning was introduced to the machine learning community by Rina Dechter in 1986

1987: Chris Langton instigated the notion of artificial life (Alife) at a workshop11 in Los Alamos, New Mexico, in 1987. The enterprise was to make living systems without the direct aid of biological structures. The work was inspired largely by John Von Neumann, and his early work on self-reproducing machines in cellular automata.

1988: One of Hinton's postdocs, Yann LeCun, went on to AT&T Bell Laboratories in 1988, where he and a postdoc named Yoshua Bengio used neural nets for optical character recognition; U.S. banks soon adopted the technique for processing checks. Hinton, LeCun, and Bengio eventually won the 2019 Turing Award and are sometimes called the godfathers of deep learning.

Late 1980s: The market for expert systems crashed because they required specialized hardware and couldn't compete with the cheaper desktop computers that were becoming common

1989: “Knowledge discovery in databases” started as an off-shoot of machine learning, with the first Knowledge Discovery and Data Mining workshop taking place at an AI conference in 1989 and helping to coin the term “data mining” in the process

1989: “Fast, Cheap, and Out of Control: A Robot Invasion of the Solar System”, by Rodney Brooks and Anita Flynn where we had proposed the idea of small rovers to explore planets, and explicitly Mars, rather than large ones that were under development at that time

1991: Rodney Brooks published “Intelligence without Reason”. This is both a critique of existing AI being determined by the current state of computers and a suggestion for a better way forward based on emulating insects (behavioural robotics)
1991: Simultaneous Localisation and Mapping (SLAM) Hugh Durrant-Whyte and John Leonard: symbolic systems replaced with geometry with statistical models of uncertainty ( used in self-driving cars , navigation and data collection from quadcopter drones, inputs from GPS )

1997: IBMs Deep Blue defeats world chess champion Gary Kasparov
1997: Soft landing of the Pathfinder mission to Mars. A little later in the afternoon, to hearty cheers, the Sojourner robot rover deployed onto the surface of Mars, the first mobile ambassador from Earth
Early 2000s: new symbolic-reasoning systems based on algorithms capable of solving a class of problems called 3SAT and with another advance called simultaneous localization and mapping. SLAM (Simultaneous Localisation and Mapping) is a technique for building maps incrementally as a robot moves around in the world

2001: Rodney Brooks company iRobot, on the morning of September 11, sent robots to ground zero in New York City. Those robots scoured nearby evacuated buildings for any injured survivors that might still be trapped inside.

2001-11: Packbot robots from irobot were deployed in the thousands in Afghanistan and Iraq searching for nuclear materials in radioactive environments, and dealing with road side bombs by the tens of thousands. By 2011 we had almost ten years of operational experience with thousands of robots in harsh war time conditions with human in the loop giving supervisory commands

2002: iRobot (Rodney Brooks company) introduced the Roomba
2005: The DARPA (Defense Advanced Research Projects Agency) Grand Challenge was won by Stanford Driverless car by driving 211 km on an unrehearsed road

2006: Geoffrey Hinton and Ruslan Salakhutdinov, published "Reducing the Dimensionality of Data with Neural Networks", where an idea called clamping allowed the layers to be trained incrementally. This made neural networks undead once again, and in the last handful of years this deep learning approach has exploded into practicality of machine learning

2009: Foundational work on neurosymbolic models is (D’AvilaGarcez,Lamb,& Gabbay,2009) which examined the mappings between symbolic systems and neural networks

2010s: Neural nets learning from massive data sets

2011: A week after the tsunami, on March 18th 2011, when Brooks was still on the board of iRobot, we got word that perhaps our robots could be helpful at Fukushima. We rushed six robots to Japan, donating them, and not worrying about ever getting reimbursed–we knew the robots were on a one way trip. Once they were sent into the reactor buildings they would be too contaminated to ever come back to us. We sent people from iRobot to train TEPCO staff on how to use the robots, and they were soon deployed even before the reactors had all been shut down.

The four smaller robots that iRobot sent, the Packbot 510, weighing 18kg (40 pounds) each with a long arm, were able to open access doors, enter, and send back images. Sometimes they needed to work in pairs so that the one furtherest away from the human operators could send back signals via an intermediate robot acting as a wifi relay. The robots were able to send images of analog dials so that the operators could read pressures in certain systems, they were able to send images of pipes to show which ones were still intact, and they were able to send back radiation levels. Satoshi Tadokoro, who sent in some of his robots later in the year to climb over steep rubble piles and up steep stairs that Packbot could not negotiate, said⁠3 “[I]f they did not have Packbot, the cool shutdown of the plant would have [been] delayed considerably”. The two bigger brothers, both were the 710 model, weighing 157kg (346 pounds) with a lifting capacity of 100kg (220 pounds) where used to operate an industrial vacuum cleaner, move debris, and cut through fences so that other specialized robots could access particular work sites.
But the robots we sent to Fukushima were not just remote control machines. They had an Artificial Intelligence (AI) based operating system, known as Aware 2.0, that allowed the robots to build maps, plan optimal paths, right themselves should they tumble down a slope, and to retrace their path when they lost contact with their human operators. This does not sound much like sexy advanced AI, and indeed it is not so advanced compared to what clever videos from corporate research labs appear to show, or painstakingly crafted edge-of-just-possible demonstrations from academic research labs are able to do when things all work as planned. But simple and un-sexy is the nature of the sort of AI we can currently put on robots in real, messy, operational environments.

2011: IBM’s Watson wins Jeopardy

2011-15: Partially in response to the Fukushima disaster the US Defense Advanced Research Projects Agency (DARPA) set up a challenge competition for robots to operate in disaster areas

The competition ran from late 2011 to June 5th and 6th of 2015 when the final competition was held. The robots were semi-autonomous with communications from human operators over a deliberately unreliable and degraded communications link. This short video focuses on the second place team but also shows some of the other teams, and gives a good overview of the state of the art in 2015. For a selection of greatest failures at the competition see this link.

2012: Nvidia noticed the trend and created CUDA, a platform that enabled researchers to use GPUs for general-purpose processing. Among these researchers was a Ph.D. student in Hinton's lab named Alex Krizhevsky, who used CUDA to write the code for a neural network that blew everyone away in ImageNet competition, which challenged AI researchers to build computer-vision systems that could sort more than 1 million images into 1,000 categories of objects

AlexNet's error rate was 15 percent, compared with the 26 percent error rate of the second-best entry. The neural net owed its runaway victory to GPU power and a "deep" structure of multiple layers containing 650,000 neurons in all.
In the next year's ImageNet competition, almost everyone used neural networks.

2013-18: Speech transliteration systems improve and proliferate – we can talk to our devices

2014: Google program had automatically generated this caption: “A group of young people playing a game of Frisbee”. (reported in a NYT article)
2015: LeCun, Bengio, Hinton (LeCun 2015)
Deep learning allows computational models that are composed of multiple processing layers to learn representations of data with multiple levels of abstraction. These methods have dramatically improved the state-of-the-art in speech recognition, visual object recognition, object detection and many other domains such as drug discovery and genomics. Deep learning discovers intricate structure in large data sets by using the backpropagation algorithm to indicate how a machine should change its internal parameters that are used to compute the representation in each layer from the representation in the previous layer. Deep convolutional nets have brought about breakthroughs in processing images, video, speech and audio, whereas recurrent nets have shone light on sequential data such as text and speech.

2015: Diffusion models were introduced in 2015 as a method to learn a model that can sample from a highly complex probability distribution. They used techniques from non-equilibrium thermodynamics, especially diffusion. Diffusion models have been commonly used to generate images from text. Still, recent innovations have expanded their use in deep-learning and generative AI for applications like developing drugs, using natural language processing to create more complex images and predicting human choices based on eye tracking.
2016: Google's AlphaGo AI defeated world champion Lee Sedol, with the final score being 4:1.
2017: In one of Deep Mind’s most influential papers “Mastering the game of Go without human knowledge”,the very goal was to dispense with human knowledge altogether, so as to “learn, tabularasa, superhuman proficiency in challenging domains”(Silveretal.,2017).
(this claim has been disputed by Gary Marcus)

2017-19: New architectures, such as the Transformer(Vaswanietal.,2017) developed, which underlies GPT-2(Radfordetal.,2019)

2018: Behavioural AI:
Blind cheetah robot climbs stairs with obstacles: visit the link then scroll down for the video

2019: Hinton, LeCun, and Bengio won the 2019 Turing Award and are sometimes called the godfathers of deep learning.
2019: The Bitter Lesson by Rich Sutton, one of founders of reinforcement learning.
The biggest lesson that can be read from 70 years of AI research is that general methods thatleverage computation are ultimately the most effective, and by a large margin…researchers seek to leverage their human knowledge of the domain, but the only thing that matters in the long run is the leveraging of computation.…the human-knowledge approach tends to complicate methods in ways that make them less suited to taking advantage of general methods leveraging computation.
(This analysis is disputed by Gary Marcus in his hybrid essay)

2019: Rubik’s cube solved with a robot hand: video

2020: Open AI introduces GPT3 natural language model which later spouts bigoted remarks

2021: DALL-E images from text captions

2022: Text to images
Stable Diffusion is a deep learning, text-to-image model released in 2022 based on diffusion techniques. It is considered to be a part of the ongoing artificial intelligence boom. It is primarily used to generate detailed images conditioned on text descriptions.

Stable Diffusion is a latent diffusion model, a kind of deep generative artificial neural network. Its code and model weights have been released publicly, and it can run on most consumer hardware equipped with a modest GPU with at least 4 GB VRAM. This marked a departure from previous proprietary text-to-image models such as DALL-E and Midjourney which were accessible only via cloud services.

2022, November: ChatGPT is a chatbot and virtual assistant developed by OpenAI and launched on November 30, 2022. Based on large language models (LLMs), it enables users to refine and steer a conversation towards a desired length, format, style, level of detail, and language. Successive user prompts and replies are considered at each conversation stage as context.

ChatGPT is credited with starting the AI boom, which has led to ongoing rapid investment in and public attention to the field of artificial intelligence (AI). By January 2023, it had become what was then the fastest-growing consumer software application in history, gaining over 100 million users and contributing to the growth of OpenAI's current valuation of $86 billion.