Sunday, February 15, 2009

Thinking Cap 2: Informed Search and CSP

Here is the second edition of the thinking cap questions:

1. In the standard CSP problms, all variables must be given assignments--and all solutions are of equal cost. We also
considered an "optimizing" variation of CSP, where each constraint has a cost, and a solution violating that constraint
incurs that cost (with costs being additive); the goal is to find the solution with the least cost.

In both cases above, all variables must be assigned values.

In some problems, we would like to model not assigning any value to a variable--just to keep the solution cost low.
For example, in finding a time for a party for all your friends, you might decide to go ahead without one of your friends,
if he/she is too hard to accommodate. Can you think of how to model such problems within an optimizing CSP?

2. The "informedness" of a heuristic is not as great an idea as it is touted to be. You will likely find, during your project 1,
that the search with manhattan distance heuristic sometimes does *more* expansions than the search with misplaced tiles
heuristic! In this thinking-cap question you will try to understand why this happens.

In particular, given a heuristics h2 which is
more informed than h2 (in that forall nodes n, h2(n) >= h1(n)), we can only show that the "interior f-contour" nodes expanded by search
with h1 are also expanded by h2.

Specifically, consider a node n' such that f2(n') = g(n') + h2(n') <  f*   (where f*, the optimal cost of the goal node, is the same for both
heuristics). So, n' will clearly be expanded by the search with h2. Since h1 <= h2, clearly  f1(n') is also less than f*, and so search with h1 also expands
node n'. This shows that the number of nodes on the interior f-contours for h1 can not be fewer than the nodes on the interior f-contours for h1. (An interior f-contour
is the set of equi-f-value nodes with  f-values less than f*)

The tricky part is that this subsumption doesn't necessrily hold for nodes that are on the "f* contour"--i.e. nodes that have
f1 or f2 equal to f*. Based on the tie-breaking strategy, a subset of these nodes will likely be expanded before the search terminates.
If the subset expanded by h2 is more than that expanded by h1, search with h2 can do more expansions even though it is more informed.

See if you can think of possible reasons why this could happen. [hint: there is a slide in the informed search lecture--with annotation "advanced"--that
gives the insights needed, if you are having trouble..]



  1. 1.] The first implication such a problem has is that the goal nodes will not necessarily be at distance 'd', where 'd' is the number of variables. This means that there is a possibility that the algorithm may be stuck at a state which has maximum partial assignment but there is nowhere else to go. This can then be considered a goal node, after making sure that there is no other node in the search space with more assignments (this would be the terminating condition for the algo).

    In such situations, there should be a standard penalty (say 1) for each unassigned variable and the goal should be to minimize this penalty, i.e. to find a goal node with most variables assigned.

    To make things interesting each variable can also be given individual penalty values so that it becomes more important to assign some variables than others (something akin to the Bill Gates problem discussed in class). In this case the goal might have even lesser number of assignments.

  2. 1:
    This problem can be addressed by adding a dummy value, say null, to the domains of all variables to represent that this variable is not assigned any value. As the above comments stated, we can assign a cost to each assignment of a variable to null to reflect the importance of this variable. In this case, the assignment of Bill Gates to null incurs a higher cost to others.

    Another point is that if we are solving an optimizing variation of CSP, the heuristics we discussed, such as the most constrained variable and least constrained value, may NOT be effective any more, since our goal is not to find a goal, but an optimal goal.

    In a uniform search tree, the number of leaf nodes is larger than the number of internal nodes. Although this does not hold strictly for non-uniform tree, it is intuitively the case. Hence, the number of nodes that are on the f-contour is large compared to those inside the contour. Since more informed heuristic is only guaranteed to expand smaller number of nodes inside the contour, it can expand more nodes than less informed heuristic if it expands a large number of nodes on the f-contour.

    Note that there are a couple of typos in the question statement regarding h1 and h2, but you can figure them out from the context.


  3. 1. After reading the comments above about this problem, I would like to say that I like the ideas presented (adding the penalties/unique costs to the situation). I think that the most constrained variable/least constrained value idea can still be applied to the situation, but in a different way. Perhaps these two concepts can still be used, but with modifications based on the penalties/costs. The optimization would then use these modifications as a basis for minimizing the total penalties/costs. Not sure if this makes sense (or if its just repeating what's already here), but they way I am picturing it is what I have tried to describe here.

  4. 1) I'm sure we've all tried to schedule a meeting or a party where people have varying availability -- this was a good example. Penalties for unassigned variables are certainly needed, but they would hopefully be proportionate to their importance. For example, the penalty for not scheduling the party around Steve Nash's schedule should be more severe than not scheduling around Steve Kerr's. How to do this, however, I'm not sure.

    2) Not touching this one.

  5. 1) I agree completely with Shuiwang on this one. By having a cost assigned to each assignment we can control which variables are assigned and which are not. For example if Bill gates is not yet assigned then we would assigned a variable to it. However, if we came to a point where our search needed only two more variables to be assigned there may be a benefit to only assign one. Say that Bill gates and a friend needed to be assigned. We really want Bill gates so the cost is only 1, while the cost of the other person is 10. Then it would make sense to assign bill gates and leave the friend out. This way we optimize our solution cost. It would of cost more to backtrack and find a new solution which satisfied both variables.


Note: Only a member of this blog may post a comment.