It’s called Reinforcement Learning. The is trained through a trial-and-error process, where it receives a numerical reward every time it makes a “nice” move. The reward can be negative though, which the will receive every time it makes a “bad” move. The goal of the is to maximize the reward. You can dig deeper by reading this link:

In , the AI will completely randomly at first. When the first ends. we have collected the of the first version of the AI. Then the AI will play the game again, versus the previous . If it fails to beat the previous , it is punished. It it manages to win against the previous , it is rewarded. This process is repeated as many as needed, with the goal of maximizing the reward.

Source link
thanks you RSS link


Please enter your comment!
Please enter your name here