The following is a digest of 2018 AI chip technology white paper.
AI systems often involve training and inference for which the required computing resources are quite different. For artificial neural networks, training aims to minimize the error as a function of the parameters (i.e., weights) in a given neural network structure, and can be achieved either offline or online and with supervision or without supervision. Inference, on the other hand, is usually done online with straightforward evaluation of the network.
While having a large number of parallel functional units, high memory bandwidth, and low latency operations, are generally desirable for AI chips, training and inference, however, due to their own respective objectives, have notable differences in terms of needs for computing resources.
Use points to gain access. You can either purchase points or contribute content and use contribution points to gain access.

Good
Good paper for a AI starter
Good tutorial. Inferencing can be that computational expensive?