It’s called Reinforcement Learning. The AI is trained through a trial-and-error process, where it receives a numerical reward every time it makes a “nice” move. The reward can be negative though, which the AI will receive every time it makes a “bad” move. The goal of the AI is to maximize the reward. You can dig deeper by reading this link: http://www.scholarpedia.org/article/Reinforcement_learning.
In practice, the AI will play completely randomly at first. When the first game ends. we have collected the data of the first version of the AI. Then the AI will play the game again, versus the previous data. If it fails to beat the previous data, it is punished. It it manages to win against the previous data, it is rewarded. This process is repeated as many times as needed, with the goal of maximizing the reward.