Udacity – Unit 4.19 Left Turn Policy

This was a great assignment. By far the most challenging for far…in a good way.

My solution for this problem was very similar to the one for 4.17. In problem, I back propagated the path from the end state. Since we also needed to keep track of the final path taken, I also borrowed some techniques from Unit 4.9. There were a couple of complexities that required additional work.

One of the added difficulties was the addition of a variable step cost. That necessitated both keeping track of the additional dimension of orientation and restricting possible movements.

In addition, I needed an enhancement to building and looping through open nodes. Because the costs of actions varied, I was no longer able to just store the cost of the current node; I also needed to add information about the previous action. Therefore, for every location I need to add a new open node corresponding to each of the three actions. Then, when I pick items off of the open node list, I prioritize by minimizing (cost of node) + (cost of action) instead of just by cost of node. When the initial node is reached, we are assured of having the lowest possible cost because every other open node has a higher cost.

Here’s my commented code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
def optimum_policy2D():
    xN, yN = goal
    x0, y0, orient0 = init
    m = len(grid)
    n = len(grid[0])
 
    #Populate the closed with information from the grid.
    closed = [[[ cell for k in range(4)] for cell in row] for row in grid]
    paths = [[[ [] for k in range(4)] for j in range(n)] for i in range(m)]
 
    # Create the open list by building an array goal states,
    # plus valid actions to have gotten there.
    open = []    
    for i in range(len(action)):        
        for j in range(len(forward)):
            fx, fy = forward[j]            
            #verify that the cell is open
            if grid[xN-fx][yN-fy] == 0:
                closed[xN][yN][j] = 1
                open.append([cost[i], i, xN, yN, j])
 
    # While there are more cells to explore, we go backwards from 
    # the goal state.
    while len(open) > 0:
        #Take the cheapest point off of the list.
        open.sort()
        loc = open.pop(0)
 
        g, a_index, x, y, orient = loc
 
        #check if we have reached the goal
        if x == x0 and y == y0 and orient == orient0:
            break
 
        #Copy the path informtion for later use
        path = paths[x][y][orient]
        path = path[:]
 
        #grab the parameters for the specified action...
        fx, fy = forward[orient]
        a = action[a_index]
 
        # ... and take the action if it is in bounds
        x -= fx
        y -= fy    
 
        if x < 0 or y < 0 or x >= m or y >= n:
            continue
 
        orient -= a
        orient %= 4
 
        # Check if we had explored this cell before.
        if closed[x][y][orient] == 0:
 
            # Save the new path to this location and mark it as closed            
            path.append([action_name[a_index], x, y])
            paths[x][y][orient] = path            
            closed[x][y][orient] = 1
 
            # Add in all of the possible actions to the open array
            for i in range(len(action)):
                open.append([g + cost[i], i, x, y, orient])
 
 
    #the best path is the path to the finish
    best_path = paths[x0][y0][orient0]
 
    #fill in the policy with the data from the 
    policy2D = [[' ' for j in range(n)] for i in range(m)]
    policy2D[xN][yN] = '*'
    for step in best_path:
        name, x, y = step
        policy2D[x][y] = name
 
    return policy2D
This entry was posted in Online Learning and tagged , , , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

HTML tags are not allowed.