AI research uses a wide variety of techniques to accomplish the goals above.
Search and optimization
AI
can solve many problems by intelligently searching through many
possible solutions.There are two very different kinds of search used in
AI: state space search and local search.
State space search
State
space search searches through a tree of possible states to try to find a
goal state.For example, planning algorithms search through trees of
goals and subgoals, attempting to find a path to a target goal, a
process called means-ends analysis.
Simple exhaustive searches
are rarely sufficient for most real-world problems: the search space
(the number of places to search) quickly grows to astronomical numbers.
The result is a search that is too slow or never completes.
"Heuristics" or "rules of thumb" can help prioritize choices that are more likely to reach a goal.
Adversarial
search is used for game-playing programs, such as chess or Go. It
searches through a tree of possible moves and countermoves, looking for a
winning position.
Local search
Illustration of gradient descent
for 3 different starting points; two parameters (represented by the plan
coordinates) are adjusted in order to minimize the loss function (the
height)
Local search uses mathematical optimization to find a
solution to a problem. It begins with some form of guess and refines it
incrementally.
Gradient descent is a type of local search that
optimizes a set of numerical parameters by incrementally adjusting them
to minimize a loss function. Variants of gradient descent are commonly
used to train neural networks,7through the back propagation algorithm.
Another
type of local search is evolutionary computation, which aims to
iteratively improve a set of candidate solutions by "mutating" and
"recombining" them, selecting only the fittest to survive each
generation.
Distributed search processes can coordinate via swarm
intelligence algorithms. Two popular swarm algorithms used in search
are particle swarm optimization (inspired by bird flocking) and ant
colony optimization (inspired by ant trails).
Logic
Formal
logic is used for reasoning and knowledge representation. Formal logic
comes in two main forms: propositional logic (which operates on
statements that are true or false and uses logical connectives such as
"and", "or", "not" and "implies") and predicate logic (which also
operates on objects, predicates and relations and uses quantifiers such
as "Every X is a Y" and "There are some Xs that are Ys").
Deductive
reasoning in logic is the process of proving a new statement
(conclusion) from other statements that are given and assumed to be true
(the premises).Proofs can be structured as proof trees, in which nodes
are labelled by sentences, and children nodes are connected to parent
nodes by inference rules.
Given a problem and a set of premises,
problem-solving reduces to searching for a proof tree whose root node is
labelled by a solution of the problem and whose leaf nodes are labelled
by premises or axioms. In the case of Horn clauses, problem-solving
search can be performed by reasoning forwards from the premises or
backwards from the problem.In the more general case of the clausal form
of first-order logic, resolution is a single, axiom-free rule of
inference, in which a problem is solved by proving a contradiction from
premises that include the negation of the problem to be solved.
Inference
in both Horn clause logic and first-order logic is undecidable, and
therefore intractable. However, backward reasoning with Horn clauses,
which underpins computation in the logic programming language Prolog, is
Turing complete. Moreover, its efficiency is competitive with
computation in other symbolic programming languages.
Fuzzy logic assigns a "degree of truth" between 0 and 1. It can therefore handle propositions that are vague and partially true.
Non-monotonic
logics, including logic programming with negation as failure, are
designed to handle default reasoning.Other specialized versions of logic
have been developed to describe many complex domains.
Probabilistic methods for uncertain reasoning
A simple Bayesian network, with the associated conditional probability tables
Many
problems in AI (including in reasoning, planning, learning, perception,
and robotics) require the agent to operate with incomplete or uncertain
information. AI researchers have devised a number of tools to solve
these problems using methods from probability theory and economics.[86]
Precise mathematical tools have been developed that analyze how an agent
can make choices and plan, using decision theory, decision analysis,and
information value theory.These tools include models such as Markov
decision processes, dynamic decision networks,game theory and mechanism
design.
Bayesian networks are a tool that can be used for
reasoning (using the Bayesian inference algorithm),learning (using the
expectation–maximization algorithm),planning (using decision
networks)and perception (using dynamic Bayesian networks).
Probabilistic
algorithms can also be used for filtering, prediction, smoothing, and
finding explanations for streams of data, thus helping perception
systems analyze processes that occur over time (e.g., hidden Markov
models or Kalman filters).
Expectation–maximization clustering of Old
Faithful eruption data starts from a random guess but then successfully
converges on an accurate clustering of the two physically distinct
modes of eruption.
Classifiers and statistical learning methods
The
simplest AI applications can be divided into two types: classifiers
(e.g., "if shiny then diamond"), on one hand, and controllers (e.g., "if
diamond then pick up"), on the other hand. Classifiers[98] are
functions that use
pattern matching to determine the closest
match. They can be fine-tuned based on chosen examples using supervised
learning. Each pattern (also called an "observation") is labeled with a
certain predefined class. All the
observations combined with
their class labels are known as a data set. When a new observation is
received, that observation is classified based on previous experience.
There
are many kinds of classifiers in use.The decision tree is the simplest
and most widely used symbolic machine learning algorithm. K-nearest
neighbor algorithm was the most widely used analogical AI until the
mid-1990s, and
Kernel methods such as the support vector machine
(SVM) displaced k-nearest neighbor in the 1990s.The naive Bayes
classifier is reportedly the "most widely used learner" at Google, due
in part to its scalability.Neural networks are also used as
classifiers.\
Artificial neural networks
A neural network is an interconnected group of nodes, akin to the vast network of neurons in the human brain.
An
artificial neural network is based on a collection of nodes also known
as artificial neurons, which loosely model the neurons in a biological
brain. It is trained to recognise patterns; once trained, it can
recognise those patterns in fresh data. There is an input, at least one
hidden layer of nodes and an output. Each node applies a function and
once the weight crosses its specified threshold, the data is transmitted
to the next layer. A network is typically called a deep neural network
if it has at least 2 hidden layers.
Learning algorithms for
neural networks use local search to choose the weights that will get the
right output for each input during training. The most common training
technique is the backpropagation algorithm.Neural networks
learn
to model complex relationships between inputs and outputs and find
patterns in data. In theory, a neural network can learn any function.
In
feed forward neural networks the signal passes in only one direction.
Recurrent neural networks feed the output signal back into the input,
which allows short-term memories of previous input events. Long short
term memory is the most successful network architecture for recurrent
networks.Perceptrons use only a single layer of neurons; deep learning
uses multiple layers. Convolutional neural networks strengthen the
connection between neurons that are "close" to each other this is
especially important in image processing, where a local set of neurons
must identify an "edge" before the network can identify an object.
Deep learning
Deep
learning uses several layers of neurons between the network's inputs
and outputs. The multiple layers can progressively extract higher-level
features from the raw input. For example, in image processing, lower
layers may identify edges, while higher layers may identify the concepts
relevant to a human such as digits, letters, or faces.
Deep
learning has profoundly improved the performance of programs in many
important subfields of artificial intelligence, including computer
vision, speech recognition, natural language processing, image
classification, and others. The reason that deep learning performs so
well in so many applications is not known as of 2023. The sudden success
of deep learning in 2012–2015 did not occur because of some new
discovery or theoretical breakthrough (deep
neural networks and
backpropagation had been described by many people, as far back as the
1950s) but because of two factors: the incredible increase in computer
power (including the hundred-fold increase in speed by switching to
GPUs)
and the availability of vast amounts of training data, especially the
giant curated datasets used for benchmark testing, such as ImageNet.
GPT
Generative
pre-trained transformers (GPT) are large language models (LLMs) that
generate text based on the semantic relationships between words in
sentences. Text-based GPT models are pretrained on a large corpus of
text that can be from the Internet. The pretraining consists of
predicting the next token (a token being usually a word, subword, or
punctuation). Throughout this pretraining, GPT models accumulate
knowledge about the world and can then generate human-like text by
repeatedly predicting the next token. Typically, a subsequent training
phase makes the model more truthful, useful, and harmless, usually with a
technique called reinforcement learning from human feedback (RLHF).
Current GPT models are prone to generating falsehoods called
"hallucinations", although this can be reduced with RLHF and quality
data. They are used in chatbots, which allow people to ask a question or
request a task in simple text.
Current models and services
include Gemini (formerly Bard), ChatGPT, Grok, Claude, Copilot, and
LLaMA. Multimodal GPT models can process different types of data
(modalities) such as images, videos, sound, and text.
Hardware and software
Programming languages for artificial intelligence and Hardware for artificial intelligence
In
the late 2010s, graphics processing units (GPUs) that were increasingly
designed with AI-specific enhancements and used with specialized
TensorFlow software had replaced previously used central processing unit
(CPUs) as the
dominant means for large-scale (commercial and
academic) machine learning models' training.Specialized programming
languages such as Prolog were used in early AI research, but
general-purpose programming languages like Python have become
predominant.
The transistor density in integrated circuits has
been observed to roughly double every 18 months—a trend known as Moore's
law, named after the Intel co-founder Gordon Moore, who first
identified it. Improvements in GPUs have been even faster, a trend
sometimes called Huang's law, named after Nvidia co-founder and CEO
Jensen Huang.
No comments:
Post a Comment