Performance at the Lower Levels
I've written about performance before so it's no surprise that when I heard about UC Berkeley's Jasjeet S. Sekhon's analysis I thought I better take a closer look. His basic conclusion is that for "statistical computing", Mac OS X is too slow to be considered. Sure enough, the ever anonymous ridiculous fish (presumed Apple employee and blogger) took some time to get to the bottom of the "Mac OS X is slow" meme. His summary response is classic:
To sum up the particulars of this test:I build some of the tools we use to measure performance of Mac Office. When MacBU was formed, my first job was building Excel performance automation, so this is a subject near and dear to my heart. It's easy to take for granted how difficult it is to get a responsive piece of software, and yet, a responsive application experience is key to any great Mac application. If there's anything I've learned about performance it's this: perception is reality. What you choose to measure better be exactly what your users use when they perceive the performance of your application. When you do dig down and try to solve the performance problem, you will often find lots of performance tradeoffs, but the key is to fix the ones that effect what the user perceives. Trying to use low level performance measurements to indicate what the user will experience simply does not tell the whole story, in-fact, it can often hide the most important story.Linux, Windows, and Mac OS X service small allocations from the application heap and large ones from the kernel’s VM system in recognition of the speed/fragmentation tradeoff. Mac OS X’s default malloc switches from the first to the second at an earlier point (smaller allocation size) than do the allocators used on Windows and Linux. Sekhon’s test boils down to a microbenchmark of malloc()ing and then immediately free()ing 35 KB chunks. 35 KB is after Mac OS X switches, but before Linux and Windows switch. Thus, Mac OS X will ask the kernel for the memory, while Linux and Windows will not; it is reasonable that OS X could be slower in this circumstance. If you use the same allocator on Mac OS X that R uses on Windows, the performance differences all but disappear. Most applications are careful to avoid unnecessary large allocations, and will enjoy decreased memory usage and better locality with an allocator that relies more heavily on the VM system (such as on Mac OS X). In that sense, this is a poor benchmark. Sekhon’s code could be improved on every platform by allocating only what it needs. Writing this entry felt like arguing on IRC; please don’t make me do it again. In that spirit, the following are ideas that I want potential authors of “shootoffs” to keep in mind: Apple provides some truly excellent tools for analyzing the performance of your application. Since they’re free, there’s no excuse for not using them. You should be able to point very clearly at which operations are slower, and give a convincing explanation of why. Apple has made decisions that adversely impact OS X’s performance, but there are reasons for those decisions. Sometimes the tradeoff is to improve performance elsewhere, sometimes it’s to enable a feature, sometimes it’s for reliability, sometimes it’s a tragic nod to compatibility. And yes, sometimes it’s bugs, and sometimes Apple just hasn’t gotten around to optimizing that area yet. Any exhibition of benchmark results should give a discussion of the tradeoffs made to achieve (or cause) that performance. If you do provide benchmark results, try to do so without using the phrase "reality distortion field."
No comments:
Post a Comment