Highest Serial Number 105PN?

This area is for the discussion of what's new, what's on your mind, and general photographic topics. A place to meet, make comments on this site, and get the latest community news.

Moderators: rjlittlefield, ChrisR, Chris S., Pau

rjlittlefield
Site Admin
Posts: 22519
Joined: Tue Aug 01, 2006 8:34 am
Location: Richland, Washington State, USA
Contact:

Re: Highest Serial Number 105PN?

Post by rjlittlefield »

Lou Jost wrote:
Wed Jun 30, 2021 6:50 am
Maybe that would be true. But the estimator still works (i.e, is unbiased) with just one sample, under the stated assumptions.
With just one sample, what is the average gap to plug into the calculation?

--Rik

Lou Jost
Posts: 5401
Joined: Fri Sep 04, 2015 7:03 am
Location: Ecuador
Contact:

Re: Highest Serial Number 105PN?

Post by Lou Jost »

With just one sample, what is the average gap to plug into the calculation?
Bernoulli processes are time-symmetric. The expected gap size between the begiinning of observation and the lowest serial number is the same as the expected gap size between highest sampled serial number and actual true highest serial number. If there is only one sample, this is still true.

rjlittlefield
Site Admin
Posts: 22519
Joined: Tue Aug 01, 2006 8:34 am
Location: Richland, Washington State, USA
Contact:

Re: Highest Serial Number 105PN?

Post by rjlittlefield »

Lou Jost wrote:Bernoulli processes are time-symmetric.
I agree, but there are some devils in the details.

First, in my world the process that you're describing would not be called "Bernoulli". The process described by https://en.wikipedia.org/wiki/Bernoulli_process denotes a sequence of observations that are sampled with replacement from a finite distribution. In that process the sequence of values of the observations are of interest, but not the times at which they occur.

Instead, your process looks to me like random sampling from a distribution that is continuous and uniform on [a,b], and you want to estimate the value of b given the samples. This problem is not quite the same as the serial number problem, but it's very close when the maximum serial number is large.

In any case, as explained at https://math.stackexchange.com/questions/60497/unbiased-estimator-for-a-uniform-variable-support , the use of max observation + average gap is not only an unbiased estimator of the distribution maximum, but it's also the best estimator in the sense that any other unbiased estimator will have a higher variance.

That last point is worth a few more words. With most common statistics, such as estimating the mean of a distribution as the average of the observations, the standard deviation of the estimate drops only in proportion to the square root of the number of samples. If you want 10 times the accuracy, you have to collect 100 times as much data. This would be the case, for example, if you estimated max in [0,max] as twice the average of the observations, which is another unbiased estimator. But using the max observation + average gap approach, the standard deviation of the estimate drops in full proportion to the number of samples. So, using max+gap, if you want 10 times the accuracy, you only have to collect 10 times as much data. It's hard to imagine any estimator that would be more powerful than that!

By the way, the reason I asked about "gap" in the one-sample case is that your calculation method strikes me as being a little inconsistent. In the first case, with two samples, you worked with 501281 and 501521 to get 501761. In this case, I see that 501761 = 501521 + (501521-501281), that is, max + gap, with no assumption of what the minimum serial number would be. But in the one-sample case, it seems like you're assuming some minimum value, calculating a gap from that, and adding that to the maximum (only) data value that you have. Applying that method to the two-sample case, and assuming that the smallest serial number would be 500001, the corresponding calculation would be 501521 + (501521-500001)/2 = 502281. So, it seems to me that you're either assuming a min, or not, depending on how many samples you have.

I hope this helps, somehow or other.

--Rik

Lou Jost
Posts: 5401
Joined: Fri Sep 04, 2015 7:03 am
Location: Ecuador
Contact:

Re: Highest Serial Number 105PN?

Post by Lou Jost »

Thanks for your interesting comment!
in my world the process that you're describing would not be called "Bernoulli". The process described by https://en.wikipedia.org/wiki/Bernoulli_process denotes a sequence of observations that are sampled with replacement from a finite distribution. In that process the sequence of values of the observations are of interest, but not the times at which they occur.
In my application, the times ARE the values; these are the ages of the strata in which fossils have been recovered. The gaps are the time differences between succesive recovered-fossil ages. The serial numbers in the PN105 example correspond to the stratum ages in my biological example.

This is a Bernoulli process, with a fixed probabiilty p of finding a fossil in any given age stratum (we can assign a stratum thickness of say 1 million years; we can also take the limit as the thickness goes to zero, getting a Poisson distribution, which should give the same result). The serial number problem contains a Bernoulli process because every serial number from the earliest to the last one has the same probabiility of detection.
So, it seems to me that you're either assuming a min, or not, depending on how many samples you have.
Yes, good oberservation, you are right that when the method is applied to serial numbers and there is only a single sample, you do need to know the first serial number, and you can do without it if you have two or more numbers. I had not dealt with that issue in my comment. Sorry. In my biological version of the problem, the values are known to start at t=0, the present time; in this problem, things run backwards in time compared to the serial number problem, because the ages are in units of "millions of years ago". So in the biological application, there is no difference in method between the single-sample case and the multiple-sample case.

rjlittlefield
Site Admin
Posts: 22519
Joined: Tue Aug 01, 2006 8:34 am
Location: Richland, Washington State, USA
Contact:

Re: Highest Serial Number 105PN?

Post by rjlittlefield »

Lou Jost wrote:
Wed Jun 30, 2021 1:41 pm
This is a Bernoulli process, with a fixed probabiilty p of finding a fossil in any given age stratum
Ah, I see how you're thinking about this. Yes, I agree.
The serial number problem contains a Bernoulli process because every serial number from the earliest to the last one has the same probabiility of detection.
This is a quibble, but I disagree on this snippet. It's a curse from doing too many proofs in college. The serial number problem involves sampling without replacement, while Bernoulli processes are equivalent to sampling with replacement. Every serial number has the same probability of detection on the first draw, but not after that, and the probability of getting any particular available number gets larger as the process goes along. So, as far as I can see, the serial number problem does not contain a Bernoulli process, although they are quite similar if there are lots of serial numbers compared to the number of draws. Your problem, on the other hand, does involve a Bernoulli process as you make the connection.

I'm not sure why I'm spending so much time thinking about this. Best guess is that what's really on my mind is the process of communication.

When I look at articles on Bernoulli processes, I find an assumption that there are only a few classes, they are chosen up front, and I see lots of analysis and calculation methods relating to sequences, but little or nothing that relates to identifying classes for which the probability of selection is zero versus non-zero (above max versus below max, for your problem).

On the other hand, when I do a simple Google search on "estimating limits uniform distribution", at this moment the #2 response points to https://en.wikipedia.org/wiki/Continuou ... stribution , where section 4.1.1 "Estimation of Maximum" contains the formula that we've been discussing, and a link to the German tank problem.

So, it seems to me -- and this is obviously a subjective reaction -- that for this problem the phrase "Bernoulli process" triggers a bunch of connections that are not helpful, while failing to trigger connections that are. But perhaps the term has other connotations and connections for you and your target community, that make it the best choice there. Yet another example that communication is tough.

--Rik

Lou Jost
Posts: 5401
Joined: Fri Sep 04, 2015 7:03 am
Location: Ecuador
Contact:

Re: Highest Serial Number 105PN?

Post by Lou Jost »

Thanks for continuing to think about this.
The serial number problem involves sampling without replacement, while Bernoulli processes are equivalent to sampling with replacement.
For me the serial number problem is exactly analogous to the fossil-finding problem. I think of a series of slots (strata, or blank cells on a page), each with an equal probability of returning a "hit" , with p = the probability of finding a fossil in a given stratum, or the probability of finding the lens whose serial number matches the line number of a given cell. I think these are the same processes.

For me, the recognition that these are Bernoulli processes immediately leads to intuitive results because of the symmetries of Bernoulli processes, while the estimation approaches in some of the online references obscure the intuition. But I realize that most readers don't know what a Bernoulli process is. I defined it, but I guess that was not enough.

Anyway I never thought I'd be writing about this problem on this forum....but when Ray asked his interesting question, I couldn't resist mentioning that there was a way to answer it IF we had a liist of serial numbers from randomly sampled lenses. Ray correctly observed that in this case, we don't have that. But maybe in other cases we will have a random sample, and this method could be used. I have to confess that I like trying to solve these kinds of questions.

ray_parkhurst
Posts: 3125
Joined: Sat Nov 20, 2010 10:40 am
Location: Santa Clara, CA, USA
Contact:

Re: Highest Serial Number 105PN?

Post by ray_parkhurst »

Lou Jost wrote:
Wed Jun 30, 2021 6:25 pm
Anyway I never thought I'd be writing about this problem on this forum....but when Ray asked his interesting question, I couldn't resist mentioning that there was a way to answer it IF we had a liist of serial numbers from randomly sampled lenses. Ray correctly observed that in this case, we don't have that. But maybe in other cases we will have a random sample, and this method could be used. I have to confess that I like trying to solve these kinds of questions.
The discussion has been very interesting, and the analysis is potentially useful for a wide range of problems beyond the current one, so I'm grateful this came out in the discussion. Ultimately the result is not unexpected, even with the non-random dataset.

I'd still keep the question open...if anyone has or has knowledge of a 105PN with SN above 1521, please let us know, or if they have a SN from within the "gap", that would be useful for tracking down the most accurate estimate of the highest SN produced, and thus how many are extant.

Chris S.
Site Admin
Posts: 3865
Joined: Sun Apr 05, 2009 9:55 pm
Location: Ohio, USA

Re: Highest Serial Number 105PN?

Post by Chris S. »

Neither higher nor within the gap, but in the interest of adding data points, mine is 500404.

--Chris S.

ray_parkhurst
Posts: 3125
Joined: Sat Nov 20, 2010 10:40 am
Location: Santa Clara, CA, USA
Contact:

Re: Highest Serial Number 105PN?

Post by ray_parkhurst »

I added another as well, 500575.

Doppler9000
Posts: 78
Joined: Thu Oct 13, 2011 3:56 pm

Re: Highest Serial Number 105PN?

Post by Doppler9000 »

ray_parkhurst wrote:
Mon Jun 28, 2021 3:22 pm

153
168
177
221
241
247
263
284
285
319
328
355
358
365
372
379
402
411
414
419
420
425
429
438
444
463
467
668
1281 (eBay)
1520
1521 (David)
Is it possible that the ‘1’ in the final three serial numbers refers to a variant of some kind, and the lenses and are actually within the XXX range?

Lou Jost
Posts: 5401
Joined: Fri Sep 04, 2015 7:03 am
Location: Ecuador
Contact:

Re: Highest Serial Number 105PN?

Post by Lou Jost »

From a purely statistical standpoint, that (or something similar) seems like a very reasonable hypothesis.

dmillard
Posts: 631
Joined: Tue Oct 24, 2006 7:37 pm
Location: Austin, Texas

Re: Highest Serial Number 105PN?

Post by dmillard »

One more intermediate data point: my copy has the serial number 500886.

David

ray_parkhurst
Posts: 3125
Joined: Sat Nov 20, 2010 10:40 am
Location: Santa Clara, CA, USA
Contact:

Re: Highest Serial Number 105PN?

Post by ray_parkhurst »

dmillard wrote:
Sun Oct 10, 2021 11:07 am
One more intermediate data point: my copy has the serial number 500886.

David
Well, that pretty much blows away the "gap" hypothesis.

enricosavazzi
Posts: 1365
Joined: Sat Nov 21, 2009 2:41 pm
Location: Västerås, Sweden
Contact:

Re: Highest Serial Number 105PN?

Post by enricosavazzi »

If it can be of any use, mine is 500527.
--ES

ray_parkhurst
Posts: 3125
Joined: Sat Nov 20, 2010 10:40 am
Location: Santa Clara, CA, USA
Contact:

Re: Highest Serial Number 105PN?

Post by ray_parkhurst »

enricosavazzi wrote:
Tue Feb 15, 2022 11:59 am
If it can be of any use, mine is 500527.
The thread morphed from "highest" number" to "all known numbers and resulting distribution", so having intermediate numbers is good. Yours also fills a "mini-gap" between 467 and 668.

The hypothesis of @Doppler9000 is also still waiting to be tested, ie that the "1" refers to a variant rather than an expanded range. So if anyone has SN's 281, 520, or 521, or any of 3-digit SN's listed here with a leading "1", please speak up as that would disprove the hypothesis.

Post Reply Previous topicNext topic