Thursday, April 30, 2009

Using "evaluation function" learning as a segue into Machine Learning isn't as artificial as it might sound..

So when I used "learning of evaluation functions" as a segue to shift from Game trees to learning, it may not have looked
much more than a contrivance of convenience.

Turns out however that this has almost biblical importance. One of the first successful computer programs to use machine learning
(and the program that pretty much gave the name "Machine Learning" to the field) is Samuel's Checkers player.  Samuel was an IBM researcher
who developed one of the first checkers playing programs, that learned to improve its performance by learning evaluation functions.

Here are few links:  (definitely see this--you will be impressed at how many buzzwords you can understand now ;-) (slightly more technical)


Last chance to don the [Thinking Cap] on MDPS & learning..


1. What happens to MDPs if the underlying dynamics are "deterministic"? Can we still talk about value and policy iteration? Do we still have non-stationary policies for finite horizon deterministic MDPs?
2. We talked about how infinite horizon case is easier than finite horizon case, and said, in passing, that here is one case where "infinite" is easier to handle. Consider the following two cases:
2.1. CSPs where the variables are "real valued", and constraints between variables are expressed as linear inequalities. Clearly, the number of potential assignments for the CSP variables is infinite. What do you think will be the complexity of finding a satisfying assignment of the variables? (recall that discrete variable CSP is NP-complete) If the complexity is "low" by AI standards, what can you do to increase it? (hint: consider modifying constraints).

2.2. Planning problem where (some of) the variables are real valued. Actions have effects that can increment and decrement the variables by arbitrary amounts. What do you think will be the complexity of planning in this case? (recall that discrete variable planning is PSPACE-complete, i.e., it is among the hardest problems that can be solved with polynomial space).

3. The MDPs we considered until now are called "Fully observable"--in that during execution, the agent can tell which state it is in (thus the policy needs to only map states to actions).  What happens if the domain is only "partially observable".
Note that this means that the agent may not know which unique state it is in, but knows the  "probability distribution" over the possible states it could be in. When the agent does an action, it effect of the action is to modify the distribution into a new distribution over states (with some states being more likely and others less.

Notice that the "distribution" is fully observable although the underlying states aren't.
So, one idea will be to consider the distributions as the states. What happens to the complexity of MDPs? How is this connected to MDPs over "continuous" state spaces?

[Notice that the above still works for the special case fully observable domain. The initial distribution will be a delta function--with probablity being 1.0 for a single state, and 0.0 for all others. Doing an action converts into another delta function--with the 1.0 for a different state].


Qn 0. [George Costanza qn] Consider two learners that are trying to solve the same classification problem with two classes (+  and -). L1 seems to be averaging about 50% accuracy on the test cases while L2 seems to be averaging 25% accuracy. Which learner is good?  Why is this called the George Costanza question? ;-)
Qn 1. Consider a scenario where the training set examples have been labeled by a slightly drunk teacher--and thus they sometimes have wrong labels (e.g. +ve are wrongly labelled negative etc.).  Of course, for the learning to be doable, the percentage of these mislabelled instances should be quite small.  We have two learners, L1 and L2. L1 seems to be 100% correct on the *training* examples. L2 seems to be 90% correct on the training examples. Which learner is likely to do well on test cases?
Qn 2.  Compression involves using the pattern in the data to reduce the storage requirements of the data.  One way of doing this would be to find the rule underlying the data, and keep the rule and throw the data out. Viewed this way, compression and learning seem one and the same. After all, learning too seems to take the training examples, find a hypothesis ("pattern"/"rule") consistent with the examples, and use that hypothesis instead of the training examples.  What, if any, differences do you see between Compression and Learning?
Qn 3. We said that most human learning happens in the context of prior knowledge. Can we view prior knowledge as a form of bias?
In particular, can you say that our prior knowledge helps us focus on certain hypotheses as against other ones in explaining the data?

that is all for now.

Possibility of an *optional* extra class on "Learning" [Monday 10:30--11:45, BY 210] Free Food Event. [RSVP]


 I felt a little queasy that I may not have the full opportunity to sufficiently brainwash you with my view of learning.

So, I am offering to run a completely optional extra class on Monday 5/4 morning in BY 210 (10:30--11:45AM) *if* there is
sufficient interest.

As a clarification, nothing I discuss in this class would be included in the final exam, so if you skip it, you won't be in anyway
disadvantaged with respect to your course grade.

Also, the Tuesday regular class will not depend on what is discussed on Monday; it will take off from where we left today.

What I want to do is give a bigger picture about learning, that I didn't get to do because of time constraints.

*if* you are interested and will be able to attend, let me know by email.*************

If I get sufficient interest, I will confirm by Sunday if the meeting is going to be held.

Since I cannot use sticks, I will dangle the carrot of free food if you show up ;-)


Clarification on the planning question in Homework 4


We have not done partial order planning and mutual exlusion propagation in the class. SO you don't need to do
parts 1.d and 1.h in the planning question.


Monday, April 27, 2009

If Chess is gone, why should Jeopardy stay in human sphere?

The New York Times
This copy is for your personal, noncommercial use only. You can order presentation-ready copies for distribution to your colleagues, clients or customers here or use the "Reprints" tool that appears next to any article. Visit for samples and additional information. Order a reprint of this article now.
Printer Friendly Format Sponsored By

April 27, 2009

Computer Program to Take On 'Jeopardy!'

YORKTOWN HEIGHTS, N.Y. — This highly successful television quiz show is the latest challenge for artificial intelligence.

What is "Jeopardy"?

That is correct.

I.B.M. plans to announce Monday that it is in the final stages of completing a computer program to compete against human "Jeopardy!" contestants. If the program beats the humans, the field of artificial intelligence will have made a leap forward.

I.B.M. scientists previously devised a chess-playing program to run on a supercomputer called Deep Blue. That program beat the world champion Garry Kasparov in a controversial 1997 match (Mr. Kasparov called the match unfair and secured a draw in a later one against another version of the program).

But chess is a game of limits, with pieces that have clearly defined powers. "Jeopardy!" requires a program with the suppleness to weigh an almost infinite range of relationships and to make subtle comparisons and interpretations. The software must interact with humans on their own terms, and fast.

Indeed, the creators of the system — which the company refers to as Watson, after the I.B.M. founder, Thomas J. Watson Sr. — said they were not yet confident their system would be able to compete successfully on the show, on which human champions typically provide correct responses 85 percent of the time.

"The big goal is to get computers to be able to converse in human terms," said the team leader, David A. Ferrucci, an I.B.M. artificial intelligence researcher. "And we're not there yet."

The team is aiming not at a true thinking machine but at a new class of software that can "understand" human questions and respond to them correctly. Such a program would have enormous economic implications.

Despite more than four decades of experimentation in artificial intelligence, scientists have made only modest progress until now toward building machines that can understand language and interact with humans.

The proposed contest is an effort by I.B.M. to prove that its researchers can make significant technical progress by picking "grand challenges" like its early chess foray. The new bid is based on three years of work by a team that has grown to 20 experts in fields like natural language processing, machine learning and information retrieval.

Under the rules of the match that the company has negotiated with the "Jeopardy!" producers, the computer will not have to emulate all human qualities. It will receive questions as electronic text. The human contestants will both see the text of each question and hear it spoken by the show's host, Alex Trebek.

The computer will respond with a synthesized voice to answer questions and to choose follow-up categories. I.B.M. researchers said they planned to move a Blue Gene supercomputer to Los Angeles for the contest. To approximate the dimensions of the challenge faced by the human contestants, the computer will not be connected to the Internet, but will make its answers based on text that it has "read," or processed and indexed, before the show.

There is some skepticism among researchers in the field about the effort. "To me it seems more like a demonstration than a grand challenge," said Peter Norvig, a computer scientist who is director of research at Google. "This will explore lots of different capabilities, but it won't change the way the field works."

The I.B.M. researchers and "Jeopardy!" producers said they were considering what form their cybercontestant would take and what gender it would assume. One possibility would be to use an animated avatar that would appear on a computer display.

"We've only begun to talk about it," said Harry Friedman, the executive producer of "Jeopardy!" "We all agree that it shouldn't look like Robby the Robot."

Mr. Friedman added that they were also thinking about whom the human contestants should be and were considering inviting Ken Jennings, the "Jeopardy!" contestant who won 74 consecutive times and collected $2.52 million in 2004.

I.B.M. will not reveal precisely how large the system's internal database would be. The actual amount of information could be a significant fraction of the Web now indexed by Google, but artificial intelligence researchers said that having access to more information would not be the most significant key to improving the system's performance.

Eric Nyberg, a computer scientist at Carnegie Mellon University, is collaborating with I.B.M. on research to devise computing systems capable of answering questions that are not limited to specific topics. The real difficulty, Dr. Nyberg said, is not searching a database but getting the computer to understand what it should be searching for.

The system must be able to deal with analogies, puns, double entendres and relationships like size and location, all at lightning speed.

In a demonstration match here at the I.B.M. laboratory against two researchers recently, Watson appeared to be both aggressive and competent, but also made the occasional puzzling blunder.

For example, given the statement, "Bordered by Syria and Israel, this small country is only 135 miles long and 35 miles wide," Watson beat its human competitors by quickly answering, "What is Lebanon?"

Moments later, however, the program stumbled when it decided it had high confidence that a "sheet" was a fruit.

The way to deal with such problems, Dr. Ferrucci said, is to improve the program's ability to understand the way "Jeopardy!" clues are offered. The complexity of the challenge is underscored by the subtlety involved in capturing the exact meaning of a spoken sentence. For example, the sentence "I never said she stole my money" can have seven different meanings depending on which word is stressed.

"We love those sentences," Dr. Nyberg said. "Those are the ones we talk about when we're sitting around having beers after work."


Sunday, April 26, 2009

Added a video for the second alpha-beta example...

I added a short video explaining the second alpha-beta example (right below the audio for the lecture). You might find it useful
if you found the discussion in the (make-up)class too fast to follow.

A second alpha-beta example. Video with narration (150mb) (Powerpoint slide (with animnation)


Friday, April 24, 2009

Re: Lecture notes and audio for today's class are available...

If you want a single avi file that has both the moving slides and audio together, and you have a high-speed internet,  you can try the following link
(Warning: this is a 2gig avi file..)


On Fri, Apr 24, 2009 at 3:40 PM, Subbarao Kambhampati <> wrote:
The slides are at
 (start at slide 44--which says 4/24  and end at slide 63--which says "4/24 class ended here")

here is the link to the audio

Here is a summary of what is done

Audio of [Apr 24, 2009] (Make-up for April 28th class). Policy Iteration for MDPS, Finding policies in dynamic domains. Real-time dynamic programming. RTA* as a special case. Going from dynamic to multi-agent domains--where RTDP becomes min-max (or max-max if you are Ned in Simpsons). Discussion of adversarial search. Types of games (and the exciting nature of deterministic version of Snakes and Ladders ;-). Minmax with depth-first search. Alpha-beta pruning. The effectiveness of alpha-beta pruning.

You now have everything needed to do all parts of homework except the learning ones.

When I come back for the regular class on Thursday, I will tie up a few loose ends on game tree search (~15min)
and start learning  (for which we will cover 18.1-18.5 and then 20.1 and 20.2)


Ps: As I said, the Tuesday class will now be an optional review session lead by Will Cushing.

Tuesday class/review

In addition to letting me know if you have any specific requests, I'd
appreciate a generic "I'll be showing up" from everyone who is, in
fact, planning on attending.


P.S. For that matter, if your *not* planning on attending it would
still be a good idea to let me know. Or if your plans change one way
or another.

Lecture notes and audio for today's class are available...

The slides are at
 (start at slide 44--which says 4/24  and end at slide 63--which says "4/24 class ended here")

here is the link to the audio

Here is a summary of what is done

Audio of [Apr 24, 2009] (Make-up for April 28th class). Policy Iteration for MDPS, Finding policies in dynamic domains. Real-time dynamic programming. RTA* as a special case. Going from dynamic to multi-agent domains--where RTDP becomes min-max (or max-max if you are Ned in Simpsons). Discussion of adversarial search. Types of games (and the exciting nature of deterministic version of Snakes and Ladders ;-). Minmax with depth-first search. Alpha-beta pruning. The effectiveness of alpha-beta pruning.

You now have everything needed to do all parts of homework except the learning ones.

When I come back for the regular class on Thursday, I will tie up a few loose ends on game tree search (~15min)
and start learning  (for which we will cover 18.1-18.5 and then 20.1 and 20.2)


Ps: As I said, the Tuesday class will now be an optional review session lead by Will Cushing.

Interactive Review Qn added to the homework--post your answer as a comment to this thread on the blog

 I added an extra question to the last homework

 [Mandatory] [Answer to this question *must*  also be posted on the
            class blog as a comment to my post]. List upto five non-trivial ideas you were
            able to appreciate during the course of this
            semester. (These cannot be "I thought Bayes Nets were Groovy"
            variety--and have to include a sentence of

Your answer to this should be posted to the blog as a comment on *this* thread. (Due by the same time--last class)

The collection of your answers will serve as a form of interactive review of the semester.


Optional review class with Will Cushing on Tuesday during class time


 As you know, I won't be here on Tuesday and am making up for that class with a meeting today.
Those of you who cannot make it can use the slides and audio (and may be video if my recording works).

Regarding Tuesday, since all of you will be available during the class, it seemed to me that we should use it
for a review session (since the semester is ending anyway, and getting a review session with everyone being
available is going to be hard).

Will Cushing--who did the review before mid-term--is willing to come and sort of pick-up where he left off
(at least go over the stuff from homework 3).   I encourage you to make use of it.

Send an email to Will (cc'd here) if you have any questions about specific things you want him to go over.


Final Homework posted; Solutions to last home work posted


 I posted the final homework--it has 5 questions. You already are ready to do the first two. You will be able to do the
first four by the end of today's class. The last question, on learning, will be "due" only if the material gets covered
on next Thursday.

I posted the solutions to homework 4; will post solutions to this homework on the last class (which is May 5th--Tuesday after next).


Wednesday, April 22, 2009

Project 4

Do we have to include a trace on each of the querys of the second domain?
It seems like they get pretty large

Tuesday, April 21, 2009

(really really) mandatory reading for the next class..

You should make sure to read 17.1--17.3 for the next class.

After that, we will get into 6.1-6.4


Sunday, April 19, 2009

Proj4 task 2 & task 3 questions

Rao or Yunsong,
A) Can you say if the prolog function in task 2 is supposed to include the depth check from task 1, or is that giving too much away?

B) Also, in task 3, it says to modify theorem prover to produce an answer other than T or nil, but then the example shows the prolog function being called. To me it makes more sense to modify the prolog function, rather than theorem prover, but can you clear this up for me?

Var Substitution

When dealing with the variable substitution function in part 2 what is the format for the parameters. In the project write up it
(varsubst '(parent (? x) (mother-of (? y)))
'( ((? x) Fred) ((? y) Mary)))

But it seems that '(parent (? x) (mother-of (? y))) should be '((parent (? x) (mother-of(?y)))

What is the correct format?

Saturday, April 18, 2009

Thinking Cap Questions: Planning

See if you can crack some of these...

[0.][Don't ask why something is bad. Ask why it is not worse?] We said that regression searches in the space of partial states or "sets" of real world states. We also said that there are
3^n such partial states (where n is the number of state variables). However, given that there are actually 2^n complete states,
there should be 2^{2^n} distinct sets of states. How come regression is searching in only 3^n of them? (Notice 3^n is hugely better
than 2^2^n).

[1] One thing that I left unsaid w.r.t. planning graph based heuristic computation is how many layers the planning graph should be expanded. Can you see a natural place to stop expanding?
      Also, if you are too lazy to go all that far, can you think of how the heuristic will change if you expanded it a little less farther than is needed?

[2]Suppose the actions have differing costs (i.e., they are not all equally costly). Can you think of how planning graph based heuristics will behave in that case? (how does your answer to 1 above change?)

[3] In all the actions we looked at, we assumed that the effects are unconditional. That is, the action will give all its effects once it executed. Consider modeling the action of driving a bus from tempe to LA
The requirements for the driving are that the bus is originally in Tempe (and that there is a driver). The "unconditional" effects of the action are that the bus will be in LA, and the driver also will be in LA.
Now how about anyone else who is in the bus? Suppose Tom is in the bus when it is in Tempe--Tom will be in LA after the action. And yet, Tom being in the bus is not a requirement of the bus taking off. In
these cases, the effect of tom being in LA is a "conditional effect" ---if Tom is in the bus before the action, then Tom will be in LA after the action.  How do you see progression and regression changing
in the presence of conditional effects?


ps: if you are curious about planning graph heuristics, check out

Thursday, April 16, 2009

Note for Project 4

I was having trouble getting lisp's trace function to show any useful information so I talked to the professor. If you're compiling and then executing your lisp files like I was, the compiler can optimize out tail-recursion. So when you trace a recursive function, it looks like it only gets called once.

The solution is to load your lisp file using lisp's load function, like this:
(load "Z:/Documents/CSE471/Project4/Project4.lisp")

It won't get compiled. Now you can trace your recursive function:
(trace my-function)

And then when you call it, you'll see the traced output. Hope that helps someone.

Query regarding Task1 Project 4

Hi All,

I am missing something in my understanding of Task 1. It is mentioned in the project description that we need to implement a depth criteria that will cut off the recursion when it crosses a depth-limit. But it is not clear to me what the value of depth-limit should be. Also further up ahead the description hints at "guessing the limit". So does that mean we need to experiment with different depth-limit values?

Please help me out.


Wednesday, April 15, 2009

The expressiveness roller-coaster picture revisited...


 I went ahead and added a slide to yesterday's lecture for the expressiveness roller coaster that we discussed at the beginning of the class yesterday.
I am enclosing a jpeg here..

Comments and/or questions welcome (on the blog)


Thursday, April 9, 2009

Undergraduate research opportunities with my group..

This is meant especially for the undergraduate students in the class who
think they are doing well in the class and enjoying it.

I will likely have funding--starting summer--for supporting  undergraduate students
in AI research with my group (which does work in automated planning--
a topic we will hear a bit about next week). If this sort of thing interests you, get in touch with me.

Recent publications from my group can be found at
(the titles of the papers might give at least some inkling on the sorts of things we do).


Offer hours for project 2 and project 3 grading

Dear All, 
    I will be holding office hours through out the day(from 10am to 5pm) tomorrow for project 3 and project 4 grading. My cubicle is 464AC in brickyard building. If your project2 is deducted 5 points for only considering one diagonal, you can just go to the TA yunsong's office to have that corrected.I apologize for this mis-deduction as I wasn't aware that professor Rao told you guy one diagonal was ok.  For any other concerns, you are welcome to see me at my office. Best wishes. 

Xin Sun
Phd student at Arizona State University
Tempe, AZ, 85281, USA
Tel: 4809659038

In case the (ir)rationality of sqrt(2)^sqrt(2) is bugging you... + Constructive vs. Existential math.

..In case you are dying to know whether sqrt(2)^sqrt(2) is rational or irrational, you can be rest assured
that it is irrational (actually transcendental  (*)). So a constructive proof for
our theorem is with p=sqrt(2)^sqrt(2) and q=sqrt(2)


(which also points out a more general and easy to understand constructive proof. Consider
  e^{log_e q} for any transcendental number e and rational number q--which will be q. All you need to show is log_e(q) is irrational and you can show this easily (If log_e(q) = m/n with integers m and n without common factors, then
q = e^{m/n}. This would mean that e is the root of an algebraic equation x^m - q^n = 0. But the definition of transcendental number is that it cannot be the root of any algebraic equation!).


(*) By the way, transcendental => irrational but not vice versa. In particular, transcendentals are those irrational numbers that cannot be roots of any algebraic equation. Two famous examples of course are e and pi.  Notice that proving that a number e *is* transcendental involves showing that e^r for any rational number r cannot be rational (since if it is, then e will be the root of an algebraic equation). Thus, proving transcendentality is not all that easy.

(ps 2:

Check out

for a nice discussion on the Constructive vs. Classical mathematics--and how during Hilbert's time there was a pretty big controversy in mathematics--with mathematicians such as Brouer insisted that all math that depended on existential proofs be thrown out.Papa Hilbert had to come to rescue--pretty heady stuff.

You might also look at

which also talks about the slick "irrational power irrational can be rational proof..."

*Important*--a fully worked out example for variable elimination..

Several of you have been asking questions on the details of the variable elimination.

The URL here contains a fully worked out version of the example that the text book has. See if this reduces your confusion:

I am also linking it from the lecture notes.


Wednesday, April 8, 2009

CS101 Robot Program

On my way to the Computre Lab on the second floor. A bunch of students were making mini LEGO robots for maze solving. All equipped with a fixed Ultrasonic sensor(can compute distance from obstacles like walls) and a pair of wheels and a pivot to rotate. The maze was card board and on uneven surface. meaning the robots couldn't always keep going without often slipping or sliding, leading to changes in orientaion, often messing up everything. This was happening a lot.
The memory is sufficient but not enough to build a map of the maze. The good news is that the uneven surface can only cause the robot to change orientation by a max of 30-40 degrees....Which is potentially recoverable.... the maze is not too big and time is unlimited.

I know that it was simple DFS using the sensors, distance from the furthest wall would do as the depth measure...When you get to a wall pivot Right and left and spot the next furthest wall...

But there was a vey implicit "A starish" behaviour used in the best performing design, even when the search was DFS...Can you spot it!!??

Tuesday, April 7, 2009

project 4 released


 Project 4 is now released. It will require you to implement a prolog-style theorem prover starting from a given code base.
It basically involves doing the apartment-pet example discussed in today's class.

Please look at that example as well as the project assignment by next class so I can answer any questions you may have.


Another problem in FOL
In FOL there seems to be a separation between the constants and atomic terms of the logic and the actual statements made over variables. So natural question comes to mind is that while constructing a statements like FORALL(x) Happy(x)=> Playing(x) and then start putting in the constants and terms in our Knowledge base or model how do we validate the model that we build? When we did it in propositional logic we essentially made sure by using certain refutation and resolution methods that the knowledge base was consistent. Since we saw in the lecture how there is exponential blow up if we do inference the same would go for model checking too.... Another big problem...?

Monday, April 6, 2009

Required reading for tomorrow's class (class will cover project 4...!)

Please read 9.1, 9.2, 9.4(*) and 9.5

9.2 (unification) and 9.4--backward chaining--are connected to project 4 that will be released this week. So
it helps for you to be sure of this material.


Friday, April 3, 2009

project 2 grades and current cumulatives


 The project 2 just got graded. I will return them next Tuesday. However, since some of you wanted to
know your project 2 grade before this weekend, I am sending the information about who got how much
(as well as the current cumulatives--out of 51 points until now).

 Please note that I haven't yet had a chance to look at the graded projects and there may be some changes
in the points by the time you get them back on Tuesday.



Thursday, April 2, 2009

Homework 3 released--included bayes networks and First-order Logic topics

The fopc topics will get covered by next thursday--the homework is due the Tuesday after that.


Wednesday, April 1, 2009

Part 4 which network to use

In part 4 it says " Modify the bayes network to show this improved understanding of the
domain. Show the topology as well as the .bn representation"

Do we use the network we created in part 1 or part 3? Also, the first entry in the edit menu gives .bif files, not .bn files, is that okay?

Godel and First Order Logic?

With help from wikipedia, here's my summary of Godel's Theorems:

Godel's incompleteness theorems talk about the limitations of formal systems. First, two definitions: A set of axioms is consistent if you can't find a statement such that you can prove the statement and its negation from those axioms. A set of axioms is complete if for any statement in the axioms' language, you can prove it or its negative using those axioms.

So there are actually two theorems. The first one is something like "any system capable of expressing basic arithmetic cannot be both consistent and complete." So, if a system can prove basic arithmetic, then there is an arithmetic statement that is true, but not provable in that system.

The second theorem is something like "for any system that can express arithmetic and provability, that system can describe its own consistency if (and only if) it's inconsistent.

This was a big deal when Godel proved this, because mathematicians at the time were trying to rebuild math from the ground up, 100% provable and complete, and he basically showed that it was an impossible task.

So that's my non-mathematician's interpretation.

As far as our class and first order logic, the question becomes, is first order logic sufficiently capable of expressing basic arithmetic such that it must be either consistent or complete, but not both? This matters because we need to know if an agent can reach faulty conclusions because its logic is inconsistent, or if true statements can't be reached because its logic is incomplete, right?

Based on what Rao said in class, Godel's theorem doesn't apply to first order logic. Is it because you can create an infinite number of axioms in first order logic, and thus it's impossible to try and prove a given statement starting from an infinity of axioms? Or is there another reason? I'm not good enough at math to see the answer on my own.

(Also, there's a great book called "Godel, Escher Bach" that I think everyone in this class would really like).