The paper mentions a theory of nets without circles:
The activity of the neuron is an “all-or-none” process.
A certain fixed number of synapses must be excited within the period of latent addition in order to excite a neuron at any time, and this number is independent of previous activity and position on the neuron.
The only significant delay within the nervous sytem is synaptic delay.
The activity of any inhibitory synapse absolutely prevents excitation of the neuron at that time.
The structure of the net does not change with time.
Escpecially the last point in combination with nets without circles rules out(?) the the possibility of «memory».
The turing test states that a machine can be considered intelligent if it a human communicating with that machine cannot tell if the human is communicating with a machine or another human.
The Winograd schema challenge (Levesque, 2012) was designed to be an improvement on the Turing Test.
1956: Dartmouth workshop
… every aspect of learning or any other feature of intelligence can be so precisely described that a machine can be made to simulate it …
After McCarthy persuaded the attendees to accept Artificial Intelligence as the name of a field of study, the Dartmouth workshop is widely considered to be the founding event for the discipline of Artificial Intelligence.
How is information about the physical world sensed, or detected, by the biological system?
In what form is information stored, or remembered?
How does information contained in storage, or in memory, influence recognition and behavior?
This article primarily deals with the second and third question «which are still subject to a vast amount of speculation».
On the percepton, Rosenblatt writes:
[ … ] a hypothetical nervous system, or machine, called a perceptron. The perceptron is designed to illustrate some of the fundamental properties of intelligent systems in general, without becoming too deeply enmeshed in the special, and frequently unknown, con- ditions which hold for particular bio- logical organisms. The analogy between the perceptron and biological systems should be readily apparent to the reader.
In this article, Rosenblatt connects Donald Hebb's theory (The organization of behavior, 1949) with his own theory:
Hebb, however, has never actually achieved a model by which behavior (or any psychological data) can be predicted from the physiological system. His physiology is more a suggestion as to the sort of organic substrate which might underlie behavior, and an attempt to show the plausibility of a bridge between biophysics and psychology. The present theory represents the first actual completion of such a bridge.
Rosenblatt concludes his article with
In the meantime, the theory reported here clearly demonstrates the feasibility and fruitfulness of a quantitative statistical approach to the organization of cognitive systems. By the study of systems such as the perceptron, it is hoped that those fundamental laws of organization which are common to all information handling systems, machines and men included, may even tually be understood.
A perceptron is a brain model
Rosenblatt and/or the idea of the perecptron received negative feedback, which Rosenblatt attributed to the following three reasons:
An (admitted) lack of mathematical rigor
Sensational headlines, such as Frankenstein Monster Designed by Navy Robo That Thinks
Failure to comprehend the difference in motivation between the perceptron program and other engineering programs concerned with automatic pattern recognition, artificial intelligence and adavanced computers.
In Principles of Neurodynamics: Perceptrons and the Theory of Brain Mechanisms (1962), Rosenblatt writes
The term 'perceptron', originally intended as a generic name for a variety of theoretical nerve nets, has an unfortunate tendency to suggest a specific piece of hardware … […] A perceptron is first: and foremost a brain model, not an invention for pattern recognition
Further down:
Perceptrons are not intended to serve as detailed copies of any actual nervous system. They are simplified networks, designed to permit the study of lawful relationships between the organization of a nerve net, the organization of its environment, and the "psychological" performances of which the network is capable.
This program was initiated in 1959, tor the purpose of studying "intelligent systems", by means of theoretical models, computer simulation, and biological experiments. While the program has been primarily supported since that time by the Office of Naval Research, some assistance has also been provided by other agencies, particularly the National Science Foundation, and (for a period of one year) the National Institutes of Health.
1966: ELIZA, the first chatbots
Joseph Weizenbaum developped ELIZA, possibly the first chatbot (then called «chatterbot»).
Conversations with ELIZA at times were so realistic that users occasionally believed to communicate with a human being and not a program (See Turing Test).
User: I am unhappy.
ELIZA: DO YOU THINK COMING HERE WILL HELP YOU NOT TO BE UNHAPPY
User: I need some help, that much seems certain.
ELIZA: WHAT WOULD IT MEAN TO YOU IF YOU GOT SOME HELP
User: Perhaps I could learn to get along with my mother.
ELIZA: TELL ME MORE ABOUT YOUR FAMILY
User: My mother takes care of me.
ELIZA: WHO ELSE IN YOU FAMILY TAKES CARE OF YOU
User: My father.
ELIZA: YOUR FATHER
User: You are like my father in some ways.
Rumelhart, Hinton and Williams publish the article Learning internal representations by error propagation which introduced backpropagation as a method for training neural networks.
Prior to the publication of this article, neural networks were considered a somewhat obscure area of research.
Because backpropagation is both practical and effective, interest in neural network grew rapidly.
TODO: Paul Warbos' dissertation (1974), which first described the process of training neural networks through backpropagation of errors.
An early form of backpropagation was also proposed in 1985 by Yann LeCun (Une procédure d'apprentissage pour réseau a seuil asymmetrique/a Learning Scheme for Asymmetric Threshold Networks).
1997: Deep Blue beats Garry Kasparov
Deep Blue was the first computer to win against a reigning world chess champion: Garry Kasparov.
2012 - AlexNet
2012 became a pivotal year for computer vision with the groundbreaking release of AlexNet, similarly to how 2018 witnessed a comparable breakthrough in NLP with the introduction of GPT and BERT.
Computer vision led to the adoption of deep learning in AI.
2017: Attention Is All You Need
The paper Attention Is All You Need introduced the transformer model which provides a more structured memory for handling long-term dependencies in text, compared to alternatives like RRNs.
Introduction of the transformer architecture
The paper Attention Is All You Need (Vaswani et al.) introduced the transformer architecture which 2018 became the foundation for GPT models:
We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely.
And further down:
In this work we propose the Transformer, a model architecture eschewing recurrence and instead relying entirely on an attention mechanism to draw global dependencies between input and output. The Transformer allows for significantly more parallelization and can reach a new state of the art in translation quality …
And also:
To the best of our knowledge, however, the Transformer is the first transduction model relying entirely on self-attention to compute representations of its input and output without using sequence-aligned RNNs or convolution.
Relation of transformers to encoder-decoder models
The transformer model is a specific instance of the encoder-decoder models that had become popular in the years 2014-2015.
Original transformer architecture
Both, the encoder and the decoder of the original transformer architecture had 6 layers (see the illustrated transformer).
The structure (but not the weights) of each layer in the decoder is identical and likewise, the structure of each layer in the encoder is identical.
The output size of the encoder is 512.
Layers of the encoder
The layers of the encoder has two sublayers:
a multi-head attention layer
a simple feedforward network
Both sublayers have
a residual connection
a layer normalization
Decoder
The decoder has a third sublayer: another multi-head attention layer (? which is additionally masked to prevent attention to subsequent positions ?)
Impact
The transformer model became the foundation for state-of-the-art NLP models such as
BERT (Bidirectional Encoder Representation from Transformers)
Pathways is intended to improve three shortcomings of AIs:
Create a model that can do many things (as opposed to a single things for which a model is specifically trained)
Focus on multiple senses (multimodal senses)
Use sparse (rather than dense) activation (which makes it faster and requires less energy)
The Pathway architecture was used, for example, to train PaLM.
2022 - April: PaLM (Pathways Language Model): Potential tipping point
PaLM is able to explain why jokes are funny which marks a potential tipping point in the history of AI. Apparently, this model «understands» language and passed a litmus test if a language model «knows» what's going on.
PaLM was announced in April 2022 and became public (for a limited number of developers) in March 2023.
PaLM was trained using the Pathways architecture (which is what the P in PaLM stands for).
PaLM uses SwiGLU activation for the MLP intermediate activations.j
2023 - March: LLaMA Leak
In the beginning of March 2023, (the parameters of?) Meta's LLaMA model was/were leaked.
The leak had no instruction or conversation tuning, and no RLHF. Nevertheless, it sparked a surge of in innovations in the open source community.
Among others, it led to the development of
Alpaca (Stanford University)
ColossalChat
etc.
Links
Jürgen Schmidhuber: The road to modern AI - Annotated History of Modern AI and Deep Learning