It’s called Reinforcement Learning. The is through a trial-and-error process, where it receives a numerical reward every time it makes a “nice” move. The reward can be negative though, which the AI will receive every time it makes a “bad” move. The goal of the AI is to maximize the reward. You can dig deeper by reading this link:

In , the AI will completely randomly at first. When the first ends. we have collected the of the first version of the AI. Then the AI will play the again, versus the previous data. If it fails to beat the previous data, it is punished. It it manages to win against the previous data, it is rewarded. This process is repeated as many as needed, with the goal of maximizing the reward.

Source link
thanks you RSS link


Please enter your comment!
Please enter your name here