12. Heuristics for SSP

This tutorial will consider the factored representation of SSPs, i.e., probabilistic FDR tasks. It is the same representation as FDR, besides the fact that we can have probabilistic effects. In other words, each action can have several effects and a probability distribution over these effects. As mentioned during the lectures, to get an admissible heuristic for such an SSP, one can first turn it into a usual FDR task by all-outcome determinization and then compute any admissible heuristic on this FDR task. Such heuristics completely disregard the probabilistic information we have. On the other hand, we will focus on the pattern databases for SSPs, which consider probabilistic information.

Consider a probabilistic FDR $\Pi=\langle V,A,s_0,g\rangle$ modeling a course exam that we want to pass with minimalistic effort where

  • $V=\{tries, level, grade, done\}$
    • $\mathsf{dom}(tries)=\{0,1,2,3\}$, expressing the number of tries we have to pass the exam,
    • $\mathsf{dom}(level)=\{0,1,2,3\}$, expressing the knowledge level; the higher the number is, the better our knowledge is,
    • $\mathsf{dom}(grade)=\{A,B,C,D,E,F,\bot\}$, $\bot$ means that we have not a grade yet,
    • $\mathsf{dom}(done)=\{T,F\}$, $T$ means that we pass and $F$ not passed,
  • $s_0=\{tries=3,level=0,grade=\bot,done=F\}$,
  • $g=\{done=T\}$,
  • $A=\{study_i\mid i\in\{0,1,2\}\}\cup \{exam_{ij}\mid i\in\{0,1,2,3\}, j\in\{1,2,3\}\}\cup\{repeat\}$.
action pre eff cost
$study_i$ $level=i$ $level=i+1$ $10$
$repeat$ $tries=0$ $tries=2,grade=\bot$ $100$
$exam_{0j}$ $level=0, tries=j$ $0.1:\ tries=j-1, grade=E, done=T\\ 0.9:\ tries=j-1, grade=F, done=F$ $4$
$exam_{1j}$ $level=1, tries=j$ $0.1:\ tries=j-1, grade=C, done=T\\ 0.2:\ tries=j-1, grade=D, done=T\\ 0.3:\ tries=j-1, grade=E, done=T\\ 0.4:\ tries=j-1,grade=F, done=F$ $3$
$exam_{2j}$ $level=2, tries=j$ $0.1:\ tries=j-1, grade=B, done=T\\ 0.2:\ tries=j-1, grade=C, done=T\\ 0.3:\ tries=j-1, grade=D, done=T\\ 0.2:\ tries=j-1,grade=E, done=T\\ 0.2:\ tries=j-1, grade=F, done=F$ $2$
$exam_{3j}$ $level=3, tries=j$ $0.2:\ tries=j-1, grade=A, done=T\\ 0.3:\ tries=j-1, grade=B, done=T\\ 0.2:\ tries=j-1, grade=C, done=T\\ 0.2:\ tries=j-1,grade=D, done=T\\ 0.1:\ tries=j-1, grade=E, done=T$ $1$

The action $study_i$ is deterministic and increases our knowledge level $i$ by one. We must take the action $repeat$ if we have no remaining tries to pass the exam. In that case, we must repeat the whole course, which is quite expensive. Finally, the actions $exam_{ij}$ are the only probabilistic actions. They decrease the number of tries by one and set a grade depending on the knowledge level. If the grade exceeds $F$, it also sets $done=T$.

Exercise 1: Consider the pattern $P=\{level, done\}$ and draw the transition graph for the syntactic projection $\Pi_P$.

Solution

Exercise 2: Compute the pattern database for the pattern $P=\{level,done\}$ from the previous exercise. Evaluate the heuristic $h_P(s_0)$ for the initial state.

Solution

Exercise 3: Consider the pattern $P=\{tries,level,done\}$. Is it true that the optimal policy in $\Pi_P$ also recommends studying once and then taking the exam as in the previous exercise?

Solution

Exercise 4: Consider the pattern $P=\{tries,done\}$ and compute the heuristic value for the initial state $s_0$.

Solution

courses/pui/tutorials/tutorial12.txt · Last modified: 2024/05/06 08:53 by deckejin