This was a great assignment. By far the most challenging for far…in a good way.
My solution for this problem was very similar to the one for 4.17. In problem, I back propagated the path from the end state. Since we also needed to keep track of the final path taken, I also borrowed some techniques from Unit 4.9. There were a couple of complexities that required additional work.
One of the added difficulties was the addition of a variable step cost. That necessitated both keeping track of the additional dimension of orientation and restricting possible movements.
In addition, I needed an enhancement to building and looping through open nodes. Because the costs of actions varied, I was no longer able to just store the cost of the current node; I also needed to add information about the previous action. Therefore, for every location I need to add a new open node corresponding to each of the three actions. Then, when I pick items off of the open node list, I prioritize by minimizing (cost of node) + (cost of action) instead of just by cost of node. When the initial node is reached, we are assured of having the lowest possible cost because every other open node has a higher cost.
Here’s my commented code:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76
def optimum_policy2D(): xN, yN = goal x0, y0, orient0 = init m = len(grid) n = len(grid) #Populate the closed with information from the grid. closed = [[[ cell for k in range(4)] for cell in row] for row in grid] paths = [[[  for k in range(4)] for j in range(n)] for i in range(m)] # Create the open list by building an array goal states, # plus valid actions to have gotten there. open =  for i in range(len(action)): for j in range(len(forward)): fx, fy = forward[j] #verify that the cell is open if grid[xN-fx][yN-fy] == 0: closed[xN][yN][j] = 1 open.append([cost[i], i, xN, yN, j]) # While there are more cells to explore, we go backwards from # the goal state. while len(open) > 0: #Take the cheapest point off of the list. open.sort() loc = open.pop(0) g, a_index, x, y, orient = loc #check if we have reached the goal if x == x0 and y == y0 and orient == orient0: break #Copy the path informtion for later use path = paths[x][y][orient] path = path[:] #grab the parameters for the specified action... fx, fy = forward[orient] a = action[a_index] # ... and take the action if it is in bounds x -= fx y -= fy if x < 0 or y < 0 or x >= m or y >= n: continue orient -= a orient %= 4 # Check if we had explored this cell before. if closed[x][y][orient] == 0: # Save the new path to this location and mark it as closed path.append([action_name[a_index], x, y]) paths[x][y][orient] = path closed[x][y][orient] = 1 # Add in all of the possible actions to the open array for i in range(len(action)): open.append([g + cost[i], i, x, y, orient]) #the best path is the path to the finish best_path = paths[x0][y0][orient0] #fill in the policy with the data from the policy2D = [[' ' for j in range(n)] for i in range(m)] policy2D[xN][yN] = '*' for step in best_path: name, x, y = step policy2D[x][y] = name return policy2D