Neural network Mapping on dragonfly
In this section, since the neuron mapping method has a direct effect on network timing, first, we present a method for mapping neurons on dragonfly topology’s nodes and then schedule a proper timing for communication between neurons. Since it is very likely that in a hardware accelerator, the number of neurons is much greater than the number of cores, it is necessary that by using an appropriate technique, neurons are classified and each category is written on a processor core.
In many previous works, neurons of a layer were placed on a node or cluster of nodes. In this case, some of the nodes that only contain the input layer’s neurons, are only sender and some nodes that contain an intermediate layer’s neuron, will be sender in a phase (to the next layer) and receiver in the other phase (from the previous layer). Therefore, the network traffic load will not balance among all nodes. To fix this problem in our proposed architecture, we put neurons from different layers in one category. With this technique, each category will contain neurons from all neural network’s layers. So, if each category is mapped on a processor core, that processing core at any moment should send the data generated from its neurons and at the same time receive the data from other categories (other processing cores). As a result, at the same time, it will be both sender and receiver. In this mapping model, all nodes are identical and are doing the same work.
As shown in Figure 5, Categorization technique is applied to a multi-layered network which each layer contains 16 neurons And categories are regularly mapped on processing elements of an on-chip network with mesh topology. In this mapping model, the number of neurons in layers are not required to be the same and if the number of neurons in the input and output layers is less than the intermediate layer (which is usually the case) a different number of neurons in a layer can be maintained in each category. Also, if it is necessary, it is possible to put several categories in a network node on a chip. By doing this, all processing cores will have the same load and the generated traffic will distribute in a balanced way throughout the network. Communication between neurons in this model is a special traffic model Where each node sends its data to all other nodes and receives data from all other nodes.