Discretizing the Continues Options in Reinforcement Studying | by Eligijus Bujokas | Mar, 2023


Find out how to convert infinite variables to a discrete house utilizing tile coding and Python

Picture by Ehud Neuhaus on Unsplash

This text is a continuation of the Reinforcement Studying sequence. To recap, go to these articles:

The final article about Q-Studying explored the idea of assigning a quantity to a state motion pair:

Q worth operate

The states used had been states that may be listed and written right into a desk. For instance, we listed all of the accessible positions in a maze that our agent will be in. Even in an enormous maze (think about one million occasions million grid) we are able to nonetheless assign a singular index to every state and straightforwardly use the states when filling out the Q-table.

Usually in follow, the states that our agent is in can’t be uniquely listed and match right into a desk. For instance, think about that the state is the angle of the wheel which will be turned precisely one time and it will possibly take ANY worth within the vary [-360, 360] levels. The wheel will be turned at precisely 12.155…, 152.1568… and so forth levels. We can not index all of the distinctive levels and create a desk — the quantity of prospects is infinite.

However, we’d nonetheless like to make use of all of the algorithms that RL has to supply. Thus, step one is to create a discrete function house from the options which have infinite prospects.

One of many widespread methods to discretize a steady function house is the so-called tile coding algorithm.

The definition of tile coding is as follows¹:

Tile coding is a technique for representing a steady state house by dividing the state house into plenty of overlapping areas, known as tiles, after which representing the state by the set of tiles that it falls into.

We will symbolize a easy 1 function discretization with the next code and graph:

# Creating an instance 1D function that goes from 0 to 1
x = np.linspace(0, 1, 100)

# Defining the variety of tilings
n_tilings = 4

# Defining the offset
offset = 0.05

# Defining the variety of tiles in a tiling
n_tiles = 10

# Creating a listing of tilings
tilings = []
cur_tiling = 0
for i in vary(n_tilings):
# Making a tiling by including the offset to the function
tiling = x + cur_tiling * offset

# Appending the tiling to the checklist
tilings.append(tiling)

# Incrementing the tiling
cur_tiling += 1

# Ploting the x function and the tilings
# The x function is plotted a horizontal line
# The tilings are plotted as horizontal strains, every moved up by 0.1
vertical_offset = 0.1

plt.determine(figsize=(10, 5))
plt.plot(x, np.zeros_like(x), coloration='black')
for i, tiling in enumerate(tilings):
plt.plot(tiling, np.zeros_like(x) + vertical_offset + vertical_offset * i, coloration='pink')

# Including vertical ticks on the tiling strains
for j in vary(n_tiles):
plt.plot(
[j / n_tiles + offset * i, j / n_tiles + offset * i],
[vertical_offset + vertical_offset * i - 0.01, vertical_offset + vertical_offset * i + 0.01],
coloration='black'
)

plt.xlabel('Function values')
plt.ylabel('Tilings')

# Drawing a vertical line at x = 0.46
plt.plot([0.46, 0.46], [0, vertical_offset * n_tilings + 0.1], coloration='blue', linestyle='dashed')
plt.present()

Tiling coding in motion for x=0.46; Graph by creator

To know tile coding, we have to completely perceive what’s going on within the above graph.

The underside-most horizontal line is the function x which might get hold of any worth within the vary [0, 1].

Every pink line is a tiling which is utilized in discretizing the function x.

Every tiling is split into tiles, that are evenly spaced out.

The blue dashed line is a random worth taken from the x vary. The query is, how will we use the 4 tilings and eight tiles to create a discrete state out of the x function worth?

The algorithm is as follows:

Given a worth s from a continues x function:

For every tiling:

  • Initialize a vector of measurement equal to the variety of tiles. Fill it with 0.
  • Calculate in what tile the s worth falls. Save that index i.
  • Fill within the vector coordinate i with worth 1.

Lastly, stack all of the vectors into one vector.

Allow us to work out the instance offered within the graph. For the primary tiling, immediately above the function house x, the blue worth falls into the fifth tiling house. Thus, the function vector of the primary tiling is:

[0, 0, 0, 0, 1, 0, 0, 0]

For the second tiling, we repeat the identical course of and find yourself with the vector:

[0, 0, 0, 0, 1, 0, 0, 0]

The third and fourth tilings vectors:

[0, 0, 0, 1, 0, 0, 0, 0]

[0, 0, 0, 1, 0, 0, 0, 0]

The ultimate discrete vector, representing the blue dashed “state” is

[0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0]

Let’s do another instance with an x worth of 0.44 to grasp the method totally.

x=0.44; Graph by creator

Every tiling vector (ranging from the underside):

[0, 0, 0, 0, 1, 0, 0, 0]

[0, 0, 0, 1, 0, 0, 0, 0]

[0, 0, 0, 1, 0, 0, 0, 0]

[0, 0, 1, 0, 0, 0, 0, 0]

Ultimate state vector:

[0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0]

The ultimate state representing the vector can be of size N tilings * N tiles.

The method of assigning a vector to a state that’s offered by 2 options follows a really comparable algorithm. The tilings now usually are not horizontal strains however rectangles.

Allow us to think about that our state is comprised of steady x and y variables, every starting from 0 to 1.

We are going to divide the entire function house with 2 tilings, every comprised of 4 tiles:

2D continues house; Graph by creator

The gray zone within the above graph represents the unique function house. Every pink tiling is split into 4 tiles. We wish to create a vector representing the state for the blue level (0.44, 0.44).

The algorithm is identical as within the 1D case, however now we assign the index for the purpose falling right into a tile going from left to proper, going from the highest left:

Indexing the tiling; Graph by creator

Thus, for the primary and second tiling, the blue level will fall into the third tile and the ensuing state vectors can be:

[0, 0, 1, 0]

[0, 0, 1, 0]

And the ultimate vector can be:

[0, 0, 1, 0, 0, 0, 1, 0]

Taking one other level:

One other 2D level; Graph by creator

The vectors can be:

[1, 0, 0, 0]

[0, 0, 1, 0]

With the ultimate vector being:

[0, 0, 1, 0, 0, 0, 1, 0]

Leave a Reply

Your email address will not be published. Required fields are marked *