With just one sample, what is the average gap to plug into the calculation?
--Rik
Moderators: rjlittlefield, ChrisR, Chris S., Pau
Bernoulli processes are time-symmetric. The expected gap size between the begiinning of observation and the lowest serial number is the same as the expected gap size between highest sampled serial number and actual true highest serial number. If there is only one sample, this is still true.With just one sample, what is the average gap to plug into the calculation?
I agree, but there are some devils in the details.Lou Jost wrote:Bernoulli processes are time-symmetric.
In my application, the times ARE the values; these are the ages of the strata in which fossils have been recovered. The gaps are the time differences between succesive recovered-fossil ages. The serial numbers in the PN105 example correspond to the stratum ages in my biological example.in my world the process that you're describing would not be called "Bernoulli". The process described by https://en.wikipedia.org/wiki/Bernoulli_process denotes a sequence of observations that are sampled with replacement from a finite distribution. In that process the sequence of values of the observations are of interest, but not the times at which they occur.
Yes, good oberservation, you are right that when the method is applied to serial numbers and there is only a single sample, you do need to know the first serial number, and you can do without it if you have two or more numbers. I had not dealt with that issue in my comment. Sorry. In my biological version of the problem, the values are known to start at t=0, the present time; in this problem, things run backwards in time compared to the serial number problem, because the ages are in units of "millions of years ago". So in the biological application, there is no difference in method between the single-sample case and the multiple-sample case.So, it seems to me that you're either assuming a min, or not, depending on how many samples you have.
Ah, I see how you're thinking about this. Yes, I agree.
This is a quibble, but I disagree on this snippet. It's a curse from doing too many proofs in college. The serial number problem involves sampling without replacement, while Bernoulli processes are equivalent to sampling with replacement. Every serial number has the same probability of detection on the first draw, but not after that, and the probability of getting any particular available number gets larger as the process goes along. So, as far as I can see, the serial number problem does not contain a Bernoulli process, although they are quite similar if there are lots of serial numbers compared to the number of draws. Your problem, on the other hand, does involve a Bernoulli process as you make the connection.The serial number problem contains a Bernoulli process because every serial number from the earliest to the last one has the same probabiility of detection.
For me the serial number problem is exactly analogous to the fossil-finding problem. I think of a series of slots (strata, or blank cells on a page), each with an equal probability of returning a "hit" , with p = the probability of finding a fossil in a given stratum, or the probability of finding the lens whose serial number matches the line number of a given cell. I think these are the same processes.The serial number problem involves sampling without replacement, while Bernoulli processes are equivalent to sampling with replacement.
The discussion has been very interesting, and the analysis is potentially useful for a wide range of problems beyond the current one, so I'm grateful this came out in the discussion. Ultimately the result is not unexpected, even with the non-random dataset.Lou Jost wrote: ↑Wed Jun 30, 2021 6:25 pmAnyway I never thought I'd be writing about this problem on this forum....but when Ray asked his interesting question, I couldn't resist mentioning that there was a way to answer it IF we had a liist of serial numbers from randomly sampled lenses. Ray correctly observed that in this case, we don't have that. But maybe in other cases we will have a random sample, and this method could be used. I have to confess that I like trying to solve these kinds of questions.
Is it possible that the ‘1’ in the final three serial numbers refers to a variant of some kind, and the lenses and are actually within the XXX range?ray_parkhurst wrote: ↑Mon Jun 28, 2021 3:22 pm
153
168
177
221
241
247
263
284
285
319
328
355
358
365
372
379
402
411
414
419
420
425
429
438
444
463
467
668
1281 (eBay)
1520
1521 (David)
The thread morphed from "highest" number" to "all known numbers and resulting distribution", so having intermediate numbers is good. Yours also fills a "mini-gap" between 467 and 668.