Artificial Intelligence III
Learning from Observation
A general model of learning agents.

Components of the performance element
- A direct mapping from conditions on the current state to actions.
- A means to infer relevant properties of the world from the percept sequence.
- Information about the way the world evolves.
- Information about the results of possible actions the agent can take.
- Utility information indicating the desirability of world states.
- Action-value information indicating the desirability of particular actions in particular states.
- Goals that describe classes of states whose achievement maximizes the agent's utility.
Representation of the components
Any situation in which both the inputs and outputs of a component can be perceived is called supervised learning.
In learning the condition-action component, the agent receives some evaluation of its action (such as a hefty bill for rear-ending the car in front) but is not told the correct action (to brake more gently and much earlier). This is called reinforcement learning;
Learning when there is no hint at all about the correct outputs is called unsupervised learning.
Inductive Learning
In supervised learning, the learning element is given the correct (or approximately correct) value of the function for particular inputs, and changes its representation of the function to try to match the information provided by the feedback. More formally, we say an example is a pair (x,f(x)), where x is the input and/(jt) is the output of the function applied to x. The task of pure inductive inference (or induction) is this: given a collection of examples of/, return a function h that approximates/. The function h is called a hypothesis.
Learning decison trees
A decision tree takes as input an object or situation described by a set of properties, and outputs a yes/no "decision.".
The decision tree learning algorithm.
function DECISION-TREE-LEARNING(examples, attributes,
default) returns a decision tree
inputs: examples, set of examples
attributes, set of attributes
default, default value for the goal predicate
if examples is empty then return default
else if all examples have the same classification
then return the classification
else if attributes is empty then return
MAJORITY- VALVE(examples)
else
best <— CHOOSE- ATTRIBUTE(attributes, examples)
tree = a new decision tree with root test best
for each value v, of best do
example.i — {elements of examples with best = v,}
subtree — DECISION-TREE-LEARNING(examples,
attributes — best, MAJORITY- VALUE(examples)
add a branch to tree with label v, and subtree subtree
end
return tree
The performance of the learning algorithm
- Collect a large set of examples.
- Divide it into two disjoint sets: the training set and the test set.
- Use the learning algorithm with the training set as examples to generate a hypothesis H.
- Measure the percentage of examples in the test set that are correctly classified by H.
- Repeat steps 1 to 4 for different sizes of training sets and different randomly selected training sets of each size.
Learning from Decision tree is used in : Designing oil platform equipment, Learning to fly, etc.
Using information theory
The information gained from the attribute test is saved as the difference between the original information requirement and the new requirement
Noise and overfitting in data
Whenever there is a large set of possible hypotheses, one has to be careful not to use the resulting freedom to find meaningless "regularity" in the data. This problem is called overfitting.
technique that eliminates the dangers of overfitting : pruning, Cross-validation.
Broadening the applicability of decision trees:
issues must that be addressed:
- Missing data
- Multivalued attributes
- Continuous-valued attributes
Learning general logical descriptions
Hypothesis proposes expressions, which we call as a candidate definition of the goal predicate.
An example can be a false negative for the hypothesis, if the hypothesis says it should be negative but in fact it is positive.
An example can be a false positive for the hypothesis, if the hypothesis says it should be positive
but in fact it is negative.
Current-best-hypothesis search
The current-best-hypothesis learning algorithm. It searches for a consistent hypothesis and backtracks when no consistent specialization/generalization can be found.
function CURRENT-EEST-LEARNING(examples) returns a hypothesis
H = any hypothesis consistent with the first example in examples
for each remaining example in examples do
if e is false positive for H then
H = choose a specialization of H consistent with examples
else if e is false negative for H then
H = choose a generalization of H consistent with examples
if no consistent specialization/generalization can be found then fail
end
return H





Least-commitment search
The version space learning algorithm. It finds a subset V ,that is consistent with the examples.
function VERSION-SPACE-LEARNING(example) returns a version space
local variables: V, the version space: the set of all hypotheses
V — the set of all hypotheses
for each example e in examples do
if V is not empty then V = VERSION-SPACE-UPDATE(V, e)
end
return V
function VERSiON-SPACE-UPDATE(V , e) returns an updated version space
V— {h G V : h is consistent with e}
Computational learning theory
Learning means behaving better as a result of experience. Hypothesis that is consistent with a sufficiently large set of training examples is unlikely to be seriously wrong—that is, it must be Probably Approximately Correct. PAC-learning is the subfield of computational learning theory that is devoted to this idea.
Learning decision lists
An algorithm for learning decision lists.
function DECISION-LIST-LEARNING(examples) returns a
decision list, No or failure
if examples is empty then return the value No
t — a test that matches a nonempty subset examples,
of examples
such that the members of examples, are all positive
or all negative
if there is no such t then return failure
if the examples in examples, are positive
then o <— Yes else o — No
return a decision list with initial test / and outcome o
and remaining elements given
by DECISION-LIST-LEARNING(examples - examples)
Learning in Neural and belief Networks
Neural networks how to train complex networks of simple computing elements, thereby perhaps shedding some light on the workings of the brain. The simple arithmetic computing elements correspond to neurons , the network as a whole corresponds to a collection of interconnected neurons. For this reason, the networks are called neural networks.
Comparing brains with digital computers
| Computer | Brain | |
| Computational units | 1 CPU, 10^5 gates | 10^11 neurons |
| Storage units | 10^9 bits RAM, 10^10 bits disk | 10^11 neurons,10^14 synapses |
| Cycle time | 10 ns | 1 ms |
| Band width | 10^9 bits per sec | 10^14 bits per sec |
| Neuron updates/sec | 10^5 | 10^14 |
What is a Neural network?
A neural network is composed of a number of nodes, or units, connected by links. Each link has a numeric weight associated with it. Weights are the primary means of long-term storage in neural networks, and learning usually takes place by updating the weights. Some of the units are connected to the external environment, and can be designated as input or output units. The weights are modified so as to try to bring the network's input/output behavior more into line with that of the environment providing the inputs. Each unit has a set of input links from other units, a set of output links to other units, a current activation level, and a means of counting the activation level at the next step in time, given its inputs and weights. The idea is that each unit does a local computation based on inputs from its neighbors, but without the need for any global control over the set of units as a whole. In practice, most neural network implementations are i software and use synchronous control to update all the units in a fixed sequence.
A unit

Network structures
Feed-forward and recurrent networks.
In a feed-forward network, links are unidirectional, and there are no cycles.
In a recurrent network, the links can form arbitrary topologies.
Hopfield networks are probably the best-understood class of recurrent networks. They use bidirectional connections with symmetric weights , all of the units are both input and output units; the activation function g is the sign function; and the activation levels can only be ± 1. A Hopfield network functions as an associative memory
Boltzmann machines also use symmetric weights, but include units that are neither input nor output units . They also use a stochastic activation function, such that the probability of the output being 1 is some function of the total weighted input.
Optimal network structure
If we choose a network that is too small, then the model will be incapable of representing the desired function. If we choose a network that is too big, it will be able to memorize all the examples by forming a large lookup table, but will not generalize well to inputs that have not been seen before. In other words, like all statistical models, neural networks are subject to overfitting when there are too many parameters (i.e., weights) in the model.
Approach to solve above problem that has been used is to use a genetic algorithm to search the space of network structures.
Perceptrons
Layered feed-forward networks were first studied in the late 1950s under the name perceptrons.
Some complex Boolean functions can be represented with perceptrons . For example, the majority function, which outputs a 1 only if more than half of its n inputs are 1, can be represented by a perceptron with each W, = 1 and threshold = theta/2. This would require a decision tree with O(2^n) nodes.A perceptron can represent a function only if it is linearly separable. Thus, a perceptron can
represent AND and OR, but not XOR
Learning linearly separable functions
We have a perceptron algorithm that will learn any linearly separable function, given enough training examples.
Linear separability in three dimensions—representing the "minority" function. The generic neural network learning method: adjust the weights until predicted output values O and true values T agree.
function NEURAL-NETWORK-LEARNING(examples) returns network
network <— a network with randomly assigned weights
repeat
for each e in examples do ,
O <— NEURAL-NETWORK-OUTPUT(example)
T <— the observed output values from e
update the weights in network based on e, O, and T
end
Multilayer feedforward networks
The back-propagation algorithm for updating weights in a multilayer network.

Application of neural networks
Pronunciation
Pronunciation of written English text by a computer is a fascinating problem in linguistics, as well as a task with high commercial payoff. It is typically carried by first mapping the text stream to phonemes—basic sound elements—and then passing the phonemes to an electronic speech generator.
Handwritten character recognition
In one of the largest applications of neural networks to date, Le Cun et al. (1989) have implemented a network designed to read zip codes on hand-addressed envelopes
Driving
ALVINN (Autonomous Land Vehicle In a Neural Network) (Pomerleau, 1993) is a neural network that has performed quite well in a domain where some other approaches have failed. It learns to steer a vehicle along a single lane on a highway by observing the performance of a human driver.
Bayesian learning
Bayesian learning views the problem of constructing hypotheses from data as a subproblem of the more fundamental problem of making predictions. The idea is to use hypotheses as intermediaries between data and predictions. First, the probability of each hypothesis is estimated, given the data. Predictions are then made from the hypotheses, using the posterior probabilities of the hypotheses to weight the predictions.
- Belief network learning problems :
- Known structure, fully observable
- Unknown structure, fully observable
- Known structure, hidden variables
- Unknown structure, hidden variables
A comparison of belief networks and neural networks
The principal difference is that belief networks are localized representations, whereas neural networks are distributed representations
Another representational difference is that belief network variables have two dimensions of "activation"—the range of values for the proposition, and the probability assigned to each of those values.
As inference mechanisms—once they have been trained—feedforward neural networkscan execute in linear time, whereas general belief network inference is NP-hard.
With respect to learning, a comparison is difficult because adaptive probabilistic networks(APNs) are a very recent development.
Renforcement learning
The task of reinforcement learning is to use rewards to learn a successful agent function.The learning task can vary as :
- The environment can be accessible or inaccessible. In an accessible environment, states can be identified with percepts, whereas in an inaccessible environment, the agent must maintain some internal state to try to keep track of the environment.
- The agent can begin with knowledge of the environment and the effects of its actions; or it will have to learn this model as well as utility information.
- Rewards can be received only in terminal states, or in any state.
- Rewards can be components of the actual utility (points for a ping-pong agent or dollars for a betting agent) that the agent is trying to maximize, or they can be hints as to the actualutility ("nice move" or "bad dog").
- The agent can be a passive learner or an active learner. A passive learner simply watches the world going by, and tries to learn the utility of being in various states; an active learner must also act using the learned information, and can use its problem generator to suggest explorations of unknown portions of the environment.
Q-Learning
The agent learns an action-value function giving the expected utility of taking a given action in a given state. This is called Q-learning.
Passive learning in an unknown enviornment
The prioritized-sweeping heuristic prefers to make adjustments to states whose likely successors have just undergone a large adjustment in their own utility estimates. Using heuristics like this, approximate ADP algorithms usually can learn roughly as fast as full ADP, in terms of the number of training sequences, but can be several orders of magnitude more efficient in terms of computation.
Active learning in an unknown nviornment
An active agent must consider what actions to take, what their outcomes may be, and how they will affect the rewards received.
Design for an active ADP agent.
The agent learns an environment model M by observing the results of its actions, and uses the model to calculate the utility function U using a dynamic programming algorithm (here POLICY-ITERATION could be substituted for VALUE-ITERATION).
function ACTIVE-ADP-AGENT(e) returns an action
static: U, a table of utility estimates
M, a table of transition probabilities from
state to state for each action
R, a table of rewards for states
percepts, a percept sequence (initially empty)
last-action, the action just executed
add e to percepts
if [STATE[e]] — REWARD[e]
M — UPDATE- ACTIVE-MODEL(M, percepts, last-action)
U — VALUE-lTERATION(U, M, R)
if TERMINAL[e] then
percepts — the empty sequence
last-action <— PERFORMANCE-EEEMENT(e)
return last-action
Exploration
an action has two kinds of outcome:
- It gains rewards on the current sequence.
- It affects the percepts received, and hence the ability of the agent to learn—and receive rewards in future sequences.
Generalization in reinforcement learning
Functions learned by the agents (U, M, R, Q) are represented in tabular form —that is, an explicit representation of one output value for each input tuple. Such an approach works reasonably well for small state spaces.
Implicit representation of the function a form that allows one to calculate the output for any input, but that is usually much more compact than the tabular form.
The compression achieved by an implicit representation allows the learning agent to generalize from states it has visited to states it has not visited
Genetic Algorithm
It starts with a set of one or more individuals and applies selection and reproduction operators to "evolve" an individual that is successful, as measured by a fitness function. The genetic algorithm finds a fit individual using simulated evolution.
function GENETIC-ALGORITHM( population, FITNESS-FN) returns an individual
inputs: population, a set of individuals
FITNESS-FN, a function that measures the fitness of an individual
repeat
parents <— SELECTION (population, FITNESS-FN)
population <— REPRODUCTION( parents)
until some individual is fit enough
return the best individual in population, according to FITNESS-FN
Knowledge from learning
If we use Descriptions to denote the conjunction of all the example descriptions, and Classifications to denote the conjunction of all the example classifications, then the Hypothesis must satisfy the following property:
Hypothesis A Descriptions |= Classifications
We call this kind of relationship an entailment constraint.
A cumulative learning process uses, and adds to, its stock of background knowledge over time.

example :
the use of background knowledge allows much faster learning than one might expect from a
pure induction program.
Consider the case of the traveller to Brazil meeting her first Brazilian. On hearing him speak Portuguese, she immediately concludes that Brazilians speak Portuguese, yet on discovering that his name is Fernando, she does not conclude that all Brazilians are called Fernando. Similar examples appear in science. For example, when a freshman physics student measures the density and conductance of a sample of copper at a particular temperature, she is quite confident in generalizing those values to all pieces of copper. Yet when she measures its mass, she does not even consider the hypothesis that all pieces of copper have that mass. On the other hand, it would be quite reasonable to make such a generalization over all pennies.
Inductive logic programming or ILP systems,
Prior knowledge plays two key roles in reducing the complexity of learning:
- Because any hypothesis generated must be consistent with the prior knowledge as well as with the new observations, the effective hypothesis space size is reduced to include only those theories that are consistent with what is already known.
- For any given set of observations, the size of the hypothesis required to construct an explanation for the observations can be much reduced, because the prior knowledge will be available to help out the new rules in explaining the observations. The smaller the hypothesis, the easier it is to find.
Explanation based learning
The technique of memoization has long been used in computer science to speed up programs by saving the results of computation.
.Improving efficiency
A common approach to ensuring that derived rules are efficient is to insist on the operationality of each subgoal in the rule. EBL makes the knowledge base more efficient for the kind of problems that it is reasonable to expect.
An algorithm for finding a minimal consistent determination.
function MINIMAL-CONSISTENT-DET(E, A) returns a determination
inputs: E, a set of examples
A, a set of attributes, of size n
for i = 0, . . . , n do
for each subset A, of A of size /' do
if CONSISTENT-DET(Ai,E) then return A,
end
end
function CONSISTENT-DET(A, E) returns a truth-value
inputs: A, a set of attributes
E, a set of examples
local variables: H, a hash table
for each example e in E do
if some example in H has the same values as e
for the attributes A
but a different classification then return False
store the class of e in H, indexed by the values
for attributes A of the example e
end
return True
Inductive logic programming
Inductive logic programming (ILP) is one of the newest subfields in AI. It combines inductive methods with the power of first-order representations, concentrating in particular on the representation of theories as logic programs.
Inverse resolution is based on the observation that if the example Classifications follow from Background A Hypothesis A Descriptions, then one must be able to prove this fact by resolution
Top-down learning methods
Sketch of the FOIL algorithm for learning sets of first-order Horn clauses from examples.
NEW-EITERAL and CHOOSE-EITERAL are explained in the text.
function FoiL(examples,target) returns a set of Horn clauses
inputs: examples, set of examples
target, a literal for the goal predicate
local variables: clauses, set of clauses, initially empty
while examples contains positive examples do
clause — NEW-CLAUSE(examples)
remove examples covered by clause from examples
clauses = {clause or clauses}
return clauses
function NEW-CLAUSE(examples, target) returns a Horn clause
local variables: clause, a clause with target as
head and an empty body
l ,a literal to be added to the clause
extended-examples, a set of examples with values for new variables
extended-examples — examples
while extended-examples contains negative examples do
l = CHOOSE-LNERAL(NEW-LNERALS(clause),extended-examples)
append l to the body of clause
extended-examples = set of examples created
by applying EXTEND-EXAMPLE
to each example in extended-examples
return clause
function EXTEND-EXAMPLE(example,literal) returns
if example satisfies literal
then return the set of examples created by extending example with
each possible constant value for each new variable in literal
else return the empty set
Agents that can communicate
One of the actions that is available to an agent is to produce language. This is called a speech act.
Language is a complex, structured system of signs used for communication.
Fundamentals of language
The component steps of communication
A typical communication episode, in which speaker S wants to convey proposition P to hearer H using words W, is composed of seven processes. Three take place in the speaker:
Intention: S wants H to believe P (where S typically believes P)
Generation: S chooses the words W (because they express the meaning P)
Synthesis: S utters the words W (usually addressing them to H)
Four take place in the hearer:
Perception:
Analysis:
H perceives W (ideally W' = W, but misperception is possible)
H infers that W has possible meanings P1,..., Pn (words and
phrases can have several meanings)
Disambiguation: H infers that S intended to convey P, (where ideally Pj = P,
but misinterpretation is possible)
Incorporation:
H decides to believe P, (or rejects it if it is out of line with what H already believes)
Two models of communication system
The encoded message model says that the speaker has a definite proposition P in mind, and encodes the proposition into the words (or signs) W. The hearer then tries to decode the message W to retrieve the original proposition P (cf. Morse code).
Situated language model, which says that the meaning of a message depends on both the words and the situation in which the words are uttered. In this model, just as in situation calculus, the encoding and decoding functions take an extra argument representing the current situation.
Types of communicating agents
Communication using Tell and Ask
Two agents with a shared internal language communicating directly with each other's knowledge bases through TELL and ASK.

Diffuculties in the model:
- There has to be a naming policy so that A and B do not simultaneously introduce the same symbol to mean different things. We have adopted the policy that each agent includes its own name as part of the subscript to each symbol it introduces.
- There has to be some way of relating symbols introduced by different agents, so that an agent can tell whether PA \ and, say, PB2 denote the same pit or not.
- The final difficulty is in reconciling the differences between different agents' knowledge bases. If communication is free and instantaneous, then all agents can adopt the policy of broadcasting each new fact to everyone else as soon as they learn it. That way everyone will have all the same knowledge. But in most applications, the bandwidth between agents is limited, and they are often completely out of touch with each other for periods of time. When they come back into contact, they have the problem of deciding what new information is worth communicating, and of discovering what interesting facts the other agent knows.
Communicating using formal language
Two agents communicating with language.

Syntactic Analysis (parsing)
Nondeterministic bottom-up parsing algorithm for context-free grammars. It picks one parse to return. Each node in a parse tree has two fields: CATEGORY and CHILDREN.
function BOTTOM-UP-PARSE( words, grammar) returns a parse tree
forest — words
loop do
if LENGTH(forest) = 1 and CATEGORY(forest[ 1 ]) = START(grammar) then
return forest[1 ]
else
i = choose from {1.. .LENGTH(forest)}
rule = choose from RULES(grammar)
n = LENGTH(RULE-RHS(rule))
subsequence <— SUBSEQUENCE(forest, l, i+n)
if MATCH(subsequence, RULE-RHSOw/e)) then
forest[i...i+n-l] = MakeNode (Rule_LHS(rule),subsequence)
else fail
end
Definite clause grammer (DCG)
There are two problems with BNF. First, we are interested in using language for communication, so we need some way of associating a meaning with each string, and BNF only talks about strings, not meanings. Second, we will want to describe grammars that are contextsensitive, whereas BNF is strictly context-free.
A grammar written with logical sentences is called a logic grammar. Since unrestricted logical inference is computationally expensive, most logic grammars insist on a restricted format. The most common is definite clause grammar or DCG, in which every sentence must be a definite clause.
Semantic Interpretation
Compositional semantics: This means that the semantics of any phrase is a function of the semantics of its subphrases; it does not depend on any other phrase before, after, or encompassing the given phrase. So if we know the meaning of X and Y (and +), then we know the meaning of the whole phrase.
Methodology to write grammer:
- Decide on the logical or quasi-logical form you want to generate. Write down some example sentences and their corresponding logical forms.
- Make one-word-at-a-time modifications to your example sentences, and study the corresponding
logical forms. - Eventually you should be able to write down the basic logical type of each lexical category
(noun, verb, and so on), along with some word/logical form pairs. This is motivated in part by
example sentences and in part by your intuitions. - Now consider phrase-at-a-time modifications to your example sentences (e.g., substituting
"every stinking wumpus" for "I"). You should be able to determine examples and types for
constituent phrases. - Once you know the type of each category, it is not too hard to attach semantic interpretation
augmentations to the grammar rules. Some of the rules have only one right-hand side constituent
and only need to copy up the semantics of that constituent:
NP(sem) —> Pronoun(sem) - Other times, the right-hand side of a rule will contain a semantic interpretation that is a
predicate (or function), and one or more that are objects. To get the semantics of the whole
phrase, just apply the relation (or function) to the object(s):
S(rel(obj)) -+ NP(obj) VP(rel) - Sometimes the semantics is built up by concatenating the semantics of the constituents, possibly with some connectors wrapped around them:
NP([sem1,sem2) —> Digit(sem1) Digit(sem2) - Finally, sometimes you need to take apart one of the constituents before putting the semantics
of the whole phrase back together.
Ambiguity and disambiguation in speech
The simplest type of ambiguity is lexical ambiguity, where a word has more than one meaning.
The syntactic ambiguity leads to a semantic ambiguity
One pervasive form of semantic ambiguity is referential ambiguity.
One type of pragmatic ambiguity occurs when the speaker and hearer disagree on what the current situation is
Sometimes a phrase or sentence has local ambiguity, where a substring can be parsed several ways, but only one of those ways fits into the larger context of the whole string.
Disambiguation
Disambiguation is a question of diagnosis. In general, disambiguation requires the combination of four models:
- The world model: the probability that a fact occurs in the world.
- The mental model: the probability that the speaker forms the intention of communicating this fact to the hearer, given that it occurs.9 (This combines models of what the speaker believes, what the speaker believes the hearer believes, and so on.)
- The language model: the probability that a certain string of words will be chosen, given that the speaker has the intention of communicating a certain fact.
- The acoustic model: the probability that a particular sequence of sounds will be generated, given that the speaker has chosen a given string of words.
Practical Natural Language processing
Effecient parsing
At the broadest level, there are three main things we can do improve efficiency:
- Don't do twice what you can do once.
- Don't do once what you can avoid altogether.
- Don't represent distinctions that you don't need.
Scaling up the lexicon
Tokenization is the process of dividing the input into distinct tokens—words and punctuation marks. .Morphological analysis is the process of describing a word in terms of the prefixes, suffixes, and root forms that comprise it.
There are three ways that words can be composed:
Inflectional morphology reflects the changes to a word that are needed in a particular grammatical context. For example, most nouns take the suffix "s" when they are plural.
Derivational morphology derives a new word from another word that is usually of a different category. For example, the noun "shortness" is derived from the adjective "short" together with the suffix "ness."
Compounding takes two words and puts them together. For example, "bookkeeper" is a compound of "book" and "keeper." (The noun "keeper" is in turn derived from the verb "keep" by derivational morphology.)
Ambiguity
Finding the right interpretation involves reasoning with uncertainty using the evidence provided by lexical, syntactic, semantic, and pragmatic sources.
Syntactic evidence
Modifiers such as adverbs and prepositional phrases cause a lot of ambiguity, because they can attach to several different heads. For example, in
Raj asked Sheldon to tell Penny to bring him a coffee.
Lexical evidence
Many words are ambiguous, but not all senses of a word are equally likely. When asked what the word "pen" means, most people will say first that it is a writing instrument. However, it has two other fairly common senses meaning an enclosure for animals and a penitentiary. The senses meaning a female swan and the internal shell of a squid are more obscure, known mostly to specialists in biology (or specialists in ambiguity). As another example, consider the following:
Lee positioned the dress on the rack.
Kim wanted the dress on the rack.
Semantic evidence
The a priori probability of a word sense is usually less important than the conditional probability in a given context. Consider the following:
chair,table,key-board,computer
Metonymy
A metonymy is a figure of speech in which one object is used to stand for another
Metaphor
A metaphor is a figure of speech in which a phrase with one literal meaning is used to suggest a different meaning by way of an analogy
Discourse
In the technical sense, a discourse or text is any string of language, usually one that is more than one sentence long.
Example: Novels, weather reports, textbooks, conversations, and almost all nontrivial
uses of language are discourses. six types of knowledge come into play in arriving at an understanding:
- General knowledge about the world.
- General knowledge about the structure of coherent discourse.
- General knowledge about syntax and semantics.
- Specific knowledge about the situation being discussed.
- Specific knowledge about the beliefs of the characters.
- Specific knowledge about the beliefs of the speaker.
The structure of coherent discourse
Discourses are composed of segments, where a segment can be a clause, a complete sentence, or a group of several consecutive sentences, each segment of the discourse is related to the previous one by a coherence relation that determines the role that each segment plays in the unfolding discourse.
For example, the coherence relation between the first two sentences of (A) is one of enablement
speaker does four things in putting together a discourse:
• The speaker wants to convey a message.
• The speaker has a motivation or goal in doing this.
• The speaker wants to make it easy for the hearer to understand.
• The speaker must link the new information to what the hearer already knows.
Perception
Perception provides agents with information about the world they inhabit. Perception is initiated by sensors.
Uses for vision:
- Manipulation: grasping, insertion, and so on, need local shape information and feedback ("getting closer, too far to the right,...) for motor control.
- Navigation: finding clear paths, avoiding obstacles, and calculating one's current velocity and orientation.
- Object recognition: a useful skill for distinguishing between tasty mice and dangerous carnivores; edible and inedible objects; close relations and strangers; ordinary vehicles, Volvos. and police cars.
Image Formation
The image so formed is stored digitally for analysis.
Image Processing
Edge detection

One standard form of smoothing is to convolve the image with a Gaussian function
We have a simple algorithm for 1-D edge detection:
1. Convolve the image i with G'a to obtain R.
2. Find the absolute value of R.
3. Mark those peaks in R that are above some prespecified threshold Tn. The threshold is
chosen to eliminate spurious peaks due to noise.
The algorithm for detecting vertical edges then is as follows:
1. Convolve the image l(x,y) withfv(x,y) - G'c,(x)G(r(y) to obtain Rv(x,y).
2. Find the absolute value of RV(X, y).
3. Mark those peaks in | RV(x,y) that are above some prespecified threshold Tn.
To extract 3-D information for performing certain tasks such as manipulation, navigation, and recognition. There are threte aspects of this:
1. Segmentation of the scene into distinct objects.
2. Determining the position and orientation of each object relative to the observer.
3. Determining the shape of each object.
Segmentation of the image is a key step towards organizing the array of image pixels into regions that would correspond to semantically meaningful entities in the scene.
Motion
If the camera moves relative to the three-dimensional scene, the resulting apparent motion in the image is called optical flow.
Speech recognition
Speech recognition is the task of mapping from a digitally encoded acoustic signal to a string of words. Speech understanding is the task of mapping from the acoustic signal all the way to an interpretation of the meaning of the utterance.
A speech understanding system must answer three questions:
1. What speech sounds did the speaker utter?
2. What words did the speaker intend to express with those speech sounds?
3. What meaning did the speaker intend to express with those words?
Process:
Signal processing
sampling rate is the frequency with which we look at the signal
quantization factor determines the precision to which the energy at
each sampling point is recorded
Defining the overall speech recognition model
![]()
There are two kinds of models used :
- The language model: P( words)
- The acoustic model: P(signallwords)
The search algorithm
The Viterbi algorithm takes an HMM model and an output sequence, [C\, €2, • • • , Cn],
and returns the most probable path through the HMM that outputs the sequence. It also returns
the probability for the path
Training the model
The HMM approach is used in speech recognition for two reasons. First, it is a reasonably good performance element—we saw that the Viterbi algorithm is linear in the length of the input. More importantly, HMMs can be learned directly from a training set of [signal,words] pairs. This is important because it is far too difficult to determine all the parameters by hand. There are other approaches that make better performance elements than HMMs, but they require the training data. To be labelled on a phone-by-phone basis rather than a sentence-by-sentence basis, and that too is a difficult task. The standard algorithm for training an HMM is called the Baum-Welch or forward-backward algorithm. Rabiner (1990) gives a tutorial on this and other HMM algorithms.
Artificial intellegence for robotics
The Robot Institute of America defines a robot as a programmable, multifunction manipulator designed to move material, parts, tools, or specific devices through variable programmed motions for the performance of a variety of tasks
Five properties of environments
- The real world is inaccessible. Sensors are imperfect, and in any case can only perceive stimuli that are near the agent.
- The real world is nondeterministic, at least from the robot's point of view. Wheels slip, batteries run down, and parts break, so you never know if an action is going to work. That means a robot needs to deal with uncertainty .
- The real world is nonepisodic—the effects of an action change over time. So a robot has to handle sequential decision problems .
- The real world is dynamic. Therefore, a robot has to know when it is worth deliberating and when it is better to act immediately.
- The real world is continuous, in that states and actions are drawn from a continuum of physical configurations and motions. Because this makes it impossible to enumerate the set of possible actions, many of the search and planning algorithms described in earlier chapters will need to be modified.
Components of a Robot:
Effectors: Tools for action An effector is any device that affects the environment, under the control of the robot. To have an impact on the physical world, an effector must be equipped with an actuator that converts software commands into physical motion.
Locomotion: manupulation -A tool for motion.
Sensors: Tools for perception
Proprioception
Like humans, robots have a proprioceptive4 sense that tells them where their joints are. Encoders fitted to the joints provide very accurate data about joint angle or extension. If the output of the encoder is fed back to the control mechanism during motion, the robot can have much greater positioning accuracy than humans.
Force sensing
Even though robots can sense and control the positions of their joints much more accurately than humans, there are still many tasks that cannot be carried out using only position sensing.
Tactile sensing
Picking up a paper coffee cup or manipulating a tiny screw requires more than proprioception. The force applied to the cup must be just enough to stop it from slipping, but not enough to crush it.
Sonar
Sonar is SOund NAvigation and Ranging. Sonar provides useful information about objects very close to the robot and is often used for fast emergency collision avoidance
Camera data
Human and animal vision systems remain the envy of all machine-vision researchers
Architecture of a robot
The architecture of a robot defines how the job of generating actions from percepts is organized.
Classical architecture
The first version of Shakey, appearing in 1969, demonstrated the importance of experimental research in bringing to light unsuspected difficulties.
The second version of Shakey incorporated several improvements. First, most of the detailed work of finding paths and moving objects was moved from the general problem-solving level down into special-purpose programs called intermediate-level actions (ILAs). These actions in fact consisted of complex routines of low-level actions (LLAs) for controlling the physical robot, and included some error detection and recovery capabilities.
STRIPS also introduced the idea of compiling the results of planning into generalized macrooperators,
Situated automata
The principal drawback of the classical view is that explicit reasoning about the effects of lowlevel actions is too expensive to generate real-time behavior
.A situated automaton is essentially a finite-state machine whose inputs are provided by sensors connected to the environment, and whose outputs are connected to effectors. Situated automata provide a very efficient implementation of reflex agents with state.
The basic structure of Rosenschein's situated automaton design. The shaded region on the left represents a Boolean circuit for updating the current state registers. The shaded region to the right represents a circuit for selecting actions based on the current state.

Navigation and Motion Planning
Five major classes of algorithms
- Cell decomposition methods break continuous space into a finite number of cells, yielding a discrete search problem.
- Skeletonization methods compute a one-dimensional "skeleton" of the configuration space, yielding an equivalent graph search problem.
- Bounded-error planning methods assume bounds on sensor and actuator uncertainty, and in some cases can compute plans that are guaranteed to succeed even in the face of severe actuator error.
- Landmark-based navigation methods assume that there are some regions in which the robot's location can be pinpointed using landmarks, whereas outside those regions it may have only orientation information.
- Online algorithms assume that the environment is completely unknown initially, although most assume some form of accurate position sensor.
Philosophical foundations of reasoning and perception
Almost all parties to the AI debate share a certain amount of common ground regarding the relation of brain and mind. The common ground goes under various names—physicalism, materialism, and biological naturalism
The correspondence theory of belief goes somewhat further toward a realistic account. According to this theory, an internal structure in an agent is a reasonable candidate for a representation of a proposition if the following conditions hold:
- The structure is formed when sensory evidence for the truth of the proposition is obtained.
- The structure ceases to exist when sensory evidence for the proposition's falsehood is obtained.
- The structure plays the appropriate causal role in the selection of actions.
The future of Artificial intelligence
The need for decision making situation Two promising techniques have emerged in recent years.
The first involves the use of anytime algorithms (Dean and Boddy, 1988). An anytime algorithm is an algorithm whose output quality improves gradually over time, so that it has a reasonable decision ready whenever it is interrupted. Such algorithms are controlled by a metalevel decision procedure that decides whether further computation is worthwhile. Iterative deepening search in game playing provides a simple example of an anytime algorithm. More complex systems, composed of many such algorithms working together, can also be constructed (Zilberstein, 1993).
The second technique is decision-theoretic metareasoning (Russell and Wefald, 1991). This method applies the theory of information value (Chapter 16) to select computations. The value of a computation depends on both its cost (in terms of delaying action) and its benefits (in terms of improved decision quality). Metareasoning techniques can be used to design better search algorithms, and automatically guarantee that the algorithms have the anytime property. Metareasoning is expensive, of course, and compilation methods can be applied to generate more efficient implementations. The application of anytime and metareasoning methods to general decision-making architectures has not yet been investigated in any depth.
The desirable property of a rational agent:
Perfect rationality: the classical notion of rationality in decision theory. A perfectly rational agent acts at every instant in such a way as to maximize its expected utility, given the information it has acquired from the environment
Calculative rationality: the notion of rationality that we have used implicitly in designing logical and decision-theoretic agents. A calculatively rational agent eventually returns what would have been the rational choice at the beginning of its deliberation. This is an interestingproperty for a system to exhibit because it constitutes an "in-principle" capacity to do the right thing.
Bounded optimality (BO): a bounded optimal agent behaves as well as possible given its computational resources. That is, the expected utility of the agent program for a bounded optimal agent is at least as high as the expected utility of any other agent program running on the same machine.
