When studying the performance of any high performance system, the processor core forms the basic "unit" of analysis. There are several reasons why:
1. It is more or less an atomic hardware unit. In other words, with the exception of other specialized designs like hyperthreading, GPGPUs etc ..., when an application's task is running on one processor core, another cannot be run on the same processor core.
2. It is usually a convenient place in the memory hierarchy where performance logs or data may be stored as the application executes.
3. It is a convenient to label and identify from the perspective of the operating system's capabilities.
However, as with more complex runtime middleware systems (like Charm++ and even some MPI implementations) or applications, it is often useful to understand the performance behavior of logical software units like objects in the case of Charm++ or threads in the case of MPI (using multiple threads per processor, of course. With most MPI code, a thread tends to be synonymous with a processor).
Thus, the challenge is to provide a way to capture, display and reason about performance information that is, at the same time, processor-oriented (given the above reasons) and suitable for understanding complex interactions/interplay between entities at a higher level of abstraction.
Take, for example, a timeline. If the performance tool displays a timeline of events based on what is observed by a processor core, one can indirectly infer how the various software entities share the resource. Similarly, if one displays a timeline of events based on a Charm++ object's perspective, then one needs to piece together a processor-based context in order to understand that the time spent by an object doing nothing may simply be because another object was using processor resources. Whether the latter is a good or a bad thing will depend on deeper analysis, some of which I intend to cover in another post.
No comments:
Post a Comment