But I'm not aware that anybody has ever carefully evaluated how sensitive the measurements are. Or as ray_parkhurst recently put it, "what minimum % scale change causes Zerene to adjust scaling between frames?"
The answer to that specific question is that there's no hard threshold, but there must be some amount of noise in the measurement that limits its ability to accurately determine small changes in scale.
The question is, how much noise is that?
The answer, jumping ahead to the bottom line, is that with a well chosen target and a 15 megapixel image, Zerene Stacker can accurately measure scale changes smaller than 10 parts per million.
In fact the measurement is so sensitive that if you're working in that range, you also need to worry about issues like changes in image scale due to thermal expansion of the sensor as it warms up while shooting.
Here is the test setup that I used to explore most of these issues. It consists of a Canon EF 100mm f/2.8L Macro IS USM lens, set on f/8 and 1:1, focused on a target that consists of a fine pattern of laser copier toner dots, with a Canon T1i camera (APS-C sensor, 4752 x 3168 pixels = 15.1 MP).
Hardware:

Test pattern, whole frame:

Test pattern, center, at 800% :

Note that these optics are not even close to telecentric. At f/8 and 1:1, the nominal DOF of this lens is 0.560 mm, so the scale change per DOF will be about 1:1.0019 -- almost 9 pixels in the width of a frame.
However, by moving the camera+lens only a tiny distance per frame, we can capture a large number of frames that look almost identical and vary by only a tiny amount in scale.
The number that I chose was 0.002 mm per frame, leading to a scale change per frame that is only about 6.6 parts per million (=0.002/300). In terms of pixels, this is about 1/30 of a pixel in total image width -- about 1/60 of a pixel on either side.
It's not immediately obvious (to put it mildly) that the alignment process should be able to determine such small changes in scale with much accuracy at all.
But here are the results:

I should clarify that these scale numbers were obtained by making a copy of the center frame, putting it first in the Zerene Stacker project, then loading all the frames and doing an Align All Frames with the option set to "Align each frame against first frame separately". (Shift X and Shift Y were enabled, Rotation was not.)
So, the numbers result from comparing a single reference image against images that are progressively larger and smaller, up to a maximum of about +-1.4 pixels in total image width.
I chose that range because I was concerned that the interpolation process might do something like be accurate for very small differences but get weird around 1/2 pixel, or vice versa.
Fortunately, none of those concerns actually appeared in the data. Instead, all of the actual measurements line up nicely along the expected straight line, with an RMS error less than 2.7 parts per million.
OK, so we now have good evidence that with this sort of test pattern and quite similar images (because they're all close to perfect focus), that we can measure scale changes in the range of 10 ppm or less.
What happens when we measure a telecentric lens, using images that are not so well focused?
To evaluate, I added an Olympus 135 mm bellows lens in front of the Canon macro, and carefully adjusted the ring on the Canon lens so that the combo was accurately telecentric.
Here's the setup:

Then using this setup, I stepped focus by a much larger amount, so as to capture a stack that started significantly out of focus, ended significantly out of focus, and passed through focus in the middle. Again I copied a central frame, put that first, and aligned all the other frames against that one to avoid accumulating error.
Here are the results:

I confess, it's not at all clear to me why this curve has the U-shape that it does. But I've seen the pattern before, that when doing this sort of measurement with telecentric optics, the frames that are focused in front and behind perfect focus will evaluate as slightly larger or smaller than perfect focus, while being the same size as each other. I can only guess this is another case of the alignment method being misled by how objects change appearance as they go in and out of focus.
In any case, this curve illustrates why I recommend to compare equal distances in front and behind perfect focus.
It turns out that these optics are in fact not telecentric across the entire sensor. Due to vignetting, the edges and corners of the field are distinctly not telecentric. In fact the extreme corners are off by something like 12 pixels between the front and rear frames. But the center 2000x2000 pixels of the image are almost perfectly telecentric, and cropping to that region is what produces the results shown here -- less than 8 parts per million change of scale within the DOF slab.
Now, I mentioned in passing that when working in this range, you need to worry about other issues.
For example at one point I noticed that I got increasing scale when I stepped upward, and I also got increasing scale when I stepped downward. That was very strange, and I puzzled for a long time over what might be causing it. I considered that the lens elements might be slowly changing position, or the lens barrel might be slowly changing length, only by tiny amounts of course. So I started running sequences in which there was nominally no movement along the optical axis -- essentially a focus step of zero.
That produced a couple of very interesting results:


What's going on? Well, I think it's a matter that Live View makes the sensor get hot, which makes the sensor get larger, which makes the image get smaller (in pixel size), which means that Zerene Stacker reports it has to be enlarged to match. I read that the thermal coefficient of silicon is about 2.6 parts per million per degree C, so the observed total change of 60 parts per million could be nicely explained by a sensor gradually heating up by 25 degrees C or so. My best guess is that the sensor was heating up when I shot the Live View stack (at least until the "sensor hot" indicator came on), then gradually cooling down while I shot the non-LiveView stack. Or something like that.
More investigation would be required to completely nail down what's going on, and I probably won't bother. But anyway, you've been warned -- measurements in the parts-per-million range can be witchy.
By the way, 10 parts per million corresponds to focusing on a target 10 meters away, then moving the camera by 0.1 mm. It's a small amount!
I hope this is helpful.
--Rik