
Q Learners are typically 2-dimensional matrices of reward. Given a state you have a vector of actions that result in a known reward. The course of
action for that state is chosen using a nonlinear evaluation like Max or Min, depending on what your reward function does. The traditional limits of the Q learner
are constrainted state management. Being a 2-D array of rewards, the state index is a 1-D array of indices that map to a state. This can quickly run out of space in memory
and computability.
With the HyperQ you no longer have that single dimension of index lookup. The Q engine does the mapping for you so that you can give it any number of elements for your
state. In the real world of applied machine learning the agents will have telemetry vectors that describe the state. These vectors can be used as the state for the
HyperQ learner without having to reduce their fidelity in a hash/compression algorithm for doing the state mapping.
This algorithm for Q learning does not require a neural network or expensive GPU hardware. Your CPU is the only tool you need, and a bunch of memory. This means less power during evaluation
and less power during training. A CPU based solution/trainer is significantly less power hungry than any GPU based solution.
We've applied this learner to a variety of problems, including the original Atari LEM game and the GYM.Net Lunar Lander game. Raw telemetry was used to solve
the problems (not the 2D raster) and have been repeatedly successful. Read our white paper on the solution and its algorithm to learn more about the tool. Other
games we've applied it to are Hunt The Wumpus, Artillery, Rocket2, and Mine Sweeper.