Interpreting YDNA Results — Estimating Time to Most Recent Common Ancestor
James M. Gossett,
All materials contained on this site are protected by United States copyright law and may not be reproduced, distributed, transmitted, displayed, published or broadcast without the prior written permission of the authors. However, you may download material for your personal, noncommercial use only.
Questions have arisen from several "Gossett Project" participants, regarding interpretation of YDNA marker results – particularly when results imply a close match between two individuals. From your "Y-DNA Matches" page, FTDNA invites you to select the lineage-tree icon, , of various individuals who are considered to be close matches to you. This takes you to a page with a Table (the FTDNATiPTM Report) that estimates the probabilities that the two of you share a "most recent common ancestor" (MRCA) within specified numbers of generations. You can refine the estimates by including knowledge from your paper-trail – e.g., you might know for fact that your MRCA is not within the preceding four generations. This knowledge can – and should – be included.
Here's an example of results from two individuals – James M. and Jeffrey L. – who have a four-marker difference in a 67-marker test:
How do we interpret these results?
First, keep in mind that there is a lot of uncertainty in such estimates of "time to most recent common ancestor" (TMRCA). In fact, it's rather ridiculous that estimates in the Table are presented to the nearest 0.01% ! You should round off the digits considerably – "57.34%" is best considered to be "about 57%."
The Table estimates that there is a 50% chance that James' and Jeffrey's MRCA is within the past 8 to 9 generations. [Note that in these matters, the current generation is "generation zero." Thus, one's father would be "one generation ago;" one's grandfather would be "two generations ago;" etc.] Of course, the result also means that there is an equal likelihood that the MRCA was more than 8 to 9 generations ago. Scientists and statisticians generally prefer a surer bet that "50% confidence." Often, they use 90% or 95% confidence as the benchmark for drawing statistical inference.
From the Table, one would have to ascend the lineage chart to about 17 generations ago, in order to have 95% confidence of finding the man who is the MRCA of these two subject individuals. Note that this does not mean that the MRCA was 17 generations ago, but rather that he was within the most recent 17 generations. He could have been 8 generations ago; or 12; or whatever – but there is a 95% chance he is to be found within the previous 17 generations. More specific, we cannot be.
Using the 95% probability will provide an estimate of how far distant the MRCA might have been (i.e., "the TMRCA is no more than 17 generations ago."). At the near end, the 5% probability can be used to provide an estimate of how recent the MRCA might have been (i.e., "the TMRCA is at least 3 generations ago"). There is only a 5% chance that the actual TMRCA is more distant than the 95% figure; and there is only a 5% chance that the actual TMRCA is more recent than the 5% figure. We can be 90% sure that the TMRCA is between the two estimates -- in this example, between 3 and 17 generations ago. Obviously, that's a huge window, and illustrates that TMRCA calculations are generally not very useful for pinpointing MRCAs.
Why the great uncertainty? Because: (1) geneticists don't know for sure, the average rates of change in specific markers; and (2) there is considerable spread in the actual rate around that true, unknown average. [For more technical details, click here.] Also, we are dealing with relatively small numbers of generations (i.e., a relatively small number of occasions for possible change) of a phenomenon with a low rate of change. The best analogy I can give comes from rolling dice. Consider a single, standard, six-sided die. Suppose every generation is like another roll of the die, and that every time a "two" comes up, it means a change in a YDNA-marker. Well, on average, the die would come up "two," on one-sixth of all rolls. If you rolled the die a million times, you'd surely see "two" come up in one-sixth of the rolls. But what if you only rolled the die six times? Would you really see "two" come up exactly once? Not likely. In fact, there is only about a 40% chance that "two" will come up exactly once in six tosses of the die. Indeed, there's a 33% chance that "two" won't come up at all, and a 27% chance that "two" will come up more than once in six rolls.
Bottom line: It's very difficult to pin-point how many generations ago is to be found the MRCA of two individuals. Over the (typically) 8 to 10 previous generations in which we expect to find our MRCA, it is not improbable that some descendant lines will have experienced a single marker-change, while others will have experienced two, three, or even four marker-changes. The statistics of marker-change accommodate such diverse results over a comparatively small number of generations -- just as the outcome of rolling a die only a few times can give "two" with quite a greater or lesser frequency than the expected one-sixth of all tosses.
Another level of uncertainty enters the picture when we try to turn "numbers of generations" into "numbers of years ago." Mutation of Y-markers occurs at conception. Therefore, it is more fundamentally correct to consider the numbers of generations between individuals than to consider the spread of years between them. However, it's understandable that we often would like to convert "generations" into "years."
But how many years is a generation? Many charts assume about 25 years per generation, but this can in some cases be a considerable underestimate. When I consider my year of birth (generation zero) and the year of my great-great-grandfather's birth (generation 4), I find 132 years between us – or an average of about 33 years for these most recent four generations of my line. Looking at data from all of our participants, I have concluded that we are better off using an assumption of 30 years per generation.
If you know the number of years that have elapsed in the recent generations of your line, use that information – and only apply some assumption (25 or 30 years per generation) for generations further back, where you lack the knowledge of a paper trail. For example, in my case, if I were seeking the approximate birth-date of my ancestor of 6 generations ago, I'd subtract the known 132-year difference from my d.o.b. (to the birth-date of my generation-4 ancestor), and then subtract another 60 years from that date to account for the preceding two generations of unknown ancestors. This would be better than having assumed 30 years per generation for all 6 generations. Use what you know for sure, before applying "rules of thumb." Obvious, perhaps; but all-too-often neglected.