
I've been running some tests lately on various stacking methods, and lauriek's recent posting prompted me to post out a few results.
What's shown here is the same stack processed through CombineZM and Helicon Focus, with options set to isolate the effects of the basic stacking algorithms. For HF, that means turning off auto-registration and brightness adjustment; for CZM, it means editing the standard macros to leave in the essence of the stacking algorithms, while deleting registration, brightness adjustment, and post-sharpening.
This subject is my spider pedipalp. It's a moderately difficult subject because it has an assortment of overlapping bristles and hairs combined with smooth structures in an interesting geometry.
This stack is also complicated by a systematic and gradual change in background brightness between the back and front of the stack. I'm not sure exactly what caused that shift, since I didn't spot it in time to debug it, but best guess is that it's due to a specular reflection off the shiny blue card, interacting with slight movement of the lens during focusing. At any rate, the shift is there, it's not unusual, and I'm interested to see how the stacking software deals with it.
What's shown here is consistent with my previous experiences, but it's far from the whole story -- that's why the post is titled "A single comparison". These algorithms give significantly different results on different kinds of subjects and different lighting conditions. I don't know how to tell a story that is simultaneously simple, accurate, and comprehensive. You'll have to run your own tests with your own stacks.
At any rate, I see two key points in these images.
The first key point is that two of methods (CombineZM "Do Stack" and Helicon Focus "Method B") give obviously inferior results where strong bristles overlap. I know how "Do Stack" works, and I'm not surprised by the result shown here. CombineZM's "Do Stack" attempts to fit a continuous depth surface to the observed in-focus points. Where bristles overlap, the depth surface really is not continuous, and trying to make it be continuous can easily result in missing some lower contrast information. I do not know how HF's Method B works, but from watching its intermediate displays and looking at the final results, I'd guess that it must be similar. I would have no hesitation in recommending to avoid these methods for this kind of subject. (However, other people report other results even in apparently similar cases.)
The second key point has to do with halo. It's obvious even at this scale (and painfully obvious in an actual-pixels view) that halo is a problem with this stack. I am intrigued to notice that CZM's output is a lot cleaner than HF's in this case. But I'm not sure how general this result is (different stacks might give different results), nor have I tested to see the effects of HF's radius and smoothing parameters.
If anybody's interested, I can post out some more info or closer views on these tests.
But probably the most important point is just that there are differences in these tools and methods, and you really have to run the tests to find out about them. It is very easy -- altogether too easy -- to look at the output from one tool/method and not even notice a defect that would be painfully apparent once you compared it to another. If you're used to being bothered by halo, then an image without halo may be so refreshing that you don't even notice some of the bristles are missing. (Been there, done that, got embarrassed by it.

By the way, there's another interesting tidbit of information in these images. See the dustspot near left margin, about 20% of the way down? It's there on the top two images, but not the bottom two. Why? I had to go back and scan through the original stack to answer that question. It's simple. The dust spot isn't there for the first half of the stack, then it suddenly appears, and stays unchanged for the second half. Apparently the top two algorithms pick up on the spot as being local detail that should get preserved, while the bottom two algorithms happen to overlook it and run their depth surfaces through images that don't have the spot. When I first saw these results, I figured I must have messed up the tests and somehow run different stacks through the software. Nope -- they're all the same!
Hope this is helpful.

--Rik