L540 - A Small Y-DNA Haplogroup

12 Jul 2016

Peter Gwozdz

pete2g2@comcast.net

News

 

            18 May 2016:  One more Big Y result, Y7026*, in the L540 tree.  We now have 6 samples in the paragroup Y7026*.

 

            21 Mar 2016:  700 year old skeletal remains might be L540.

 

Abstract

            Rewrite 27 Dec 2015.

            This web document is a summary of my information about a small haplogroup of Y-DNA based on an SNP mutation named L540.  The subject is genetic genealogy.

            There is a Neighborhood table below with a list of samples (men) predicted to belong to the L540 haplogroup based on STRs, and also samples predicted to be in the STR Neighborhood just outside L540.  The samples near the cutoff (borderline STR fit) are the ones that should be tested to see if they belong to the L540 haplogroup.

            The L540 Tree shows the samples that have been tested for the branches of L540.  Prediction to the branches cannot be done with confidence by STRs, so L540 samples need to do further SNP testing to determine their branch.

            The L540 haplogroup seems to be roughly 2,100 years old, with an origin perhaps in what is now Germany.

            This web document is written for people reasonably familiar with the jargon of genetic genealogy.  If you are new to genetic genealogy you might prefer to first read an Introduction that I wrote for another of my web documents.

            My References and Sources are listed at the bottom.

 

L540 in Y-DNA Tree

            New Topic 27 Dec 2015.  Edit 3 Jan 2016.

            Rough outline of the human Y-DNA tree with ISOGG and SNP code names, showing location of L540:

       E (M96)

               E1b1b1 (M35.1)

                      E1b1b1a1 (M78)

                             E1b1b1a1b1a (V13)

                                    E1b1b1a1b1a6 (L540)

            V13 is the largest haplogroup division of haplogroup E, but L540 is relatively small.

            Link to an  up-to-date tree with more V13 & L540 detail:

http://www.yfull.com/tree/E-V13/

            Steve Fix is now heading a project to discover new branches of V13, using Big Y data.  New branches are showing up almost monthly.  His V13 tree is at:

https://docs.google.com/spreadsheets/d/1D9WaPOZn_0l5GKtqXR0PbbxEir2o1t0e2HO-SJ2GboI/edit?pli=1#gid=2080530255 and discussion can be found at:

http://community.haplozone.net/index.php?topic=3657.msg36427#new.

            The FTDNA tree does not use L540, placing the samples in haplogroup S3003, a branch of V13.  L540 is the main branch of S3003;  L540 has all but one of the known S3003 samples.

 

L540

            Rewrite 27 Dec 2015.

            L540 is the code name for an SNP that was discovered in my WTY.  L540 was announced 29 March 2011.  On 27 Apr 2011 I demonstrated that L540 defines a new haplogroup branch of V13.

            I use the code name L540 for the SNP, for the associated  haplogroup, and for the samples (men) in that haplogroup.

            This haplogroup was predicted as cluster C based on STR correlations in 2008.  When I originated this web page in early 2010, I coined the name V13C, renaming it L540 on 30 Apr 2011.  Cluster C, also called C type, is the STR equivalent of L540.

            The Neighborhood table below has my predictions for L540 (C type). 

            Update of statistics on 27 Dec 2015:  There are 25 samples that have tested L540+.  In addition, there are 18 C type samples that have not taken the SNP test;  I predict about 15 of them would test L540+.  Finally, there are several marginal samples (STRs close to the C type cutoff);  I suppose another 5 or so of those would test L540+.  That’s roughly 45 known members of L540.

 

L540 Tree

            Update 12 Jul 2016 Y12393 and A779 added.

            Edit 18 Jun 2016;  Country added for each sample.

            Update 5 Jun 2016, two YF codes added.

            Update 18 May 2016, one more result, Marschner.

            L540 Tree in conventional outline format.  Click on a link in this tree for more discussion about that SNP or sample (male ancestor family name).

 

V13

                                                            V13 has about 80 phyloequivalent SNPs

            Other V13 branches

                                                            Z5018 and Z5016 are by far the largest branches of V13

            S3003

                                                            PGP89

                        L540                                        IDs:  FTDNA;  Yfull;  Yseq

                                    A6295-, Y7026-

                                                            Hoff:  Norway.  374375;  --;  3203

                                                            Kovalev:  Russia.  268215 Big Y; YF04818

                                    A6295

                                                A9035-

                                                            Nowak:  Poland.  225596 Big Y; YF03833

                                                A9035

                                                            Gwozdz:  Poland.  N16800 Big Y; YF02909;  1433

                                                            Kargul:  Poland.  199446; --; 4230

                                    Y7026

                                                Y7026*:  Z29042-, A783-

                                                            Marschner:  Czech.  480087 Big Y;  YF05931

                                                            Stavbom:  Sweden.  B3807 Big Y;  YF05186;  2100

                                                            Glasser:  Germany.  171456 Big Y;  YF04393

                                                            Appell:  Germany.  350864;  --;  3187

                                                            Blind:  Germany.  B2670;  --; 2891

                                                            Kline:  Germany.  158091;  --;  2360

                                                Z29042

                                                            Z39377-

                                                                        Gebert:  Germany.  166692 Big Y;  YF01811

                                                            Z39377

                                                                        Roider:  Germany.  275510 Big Y;  YF04216

                                                                        Hartsfield:  Prussia.  140927 Big Y;  YF04834;  2397

                                                A783

                                                            Y12393-

                                                                        Stelz:  Germany.  175213 Big Y;  YF05757;  2028

                                                            Y12393

                                                                        A779-

                                                                                    Svercl:  Czech.  155155 Big Y;  YF02913

                                                                        A779

                                                                                    Hochreutter:  Germany.  N45041 Big Y;  YF02161

 

V13

            Rewrite 31 Oct 2015.

            For a detailed V13 tree, see:

https://docs.google.com/spreadsheets/d/1D9WaPOZn_0l5GKtqXR0PbbxEir2o1t0e2HO-SJ2GboI/edit?pli=1#gid=2080530255

http://www.yfull.com/tree/E-V13/

http://isogg.org/tree/ISOGG_HapgrpE.html

            V13, in the E haplogroup, is a major branch of the human Y-DNA tree.  The L540 branch is a relatively small branch of V13.

            There are about 80 known SNP equivalents to V13.  V13 was the first to be discovered and the one used in most discussions about this haplogroup.  All but a very few V13 samples belong to L142 and CTS5856, so technically L540 is the main branch of the S3003 haplogroup, which is one of many branches of the CTS5856 haplogroup, which is the main branch of the L142 haplogroup, which is the main branch of V13.  For simplicity the L540 Tree above minimizes these details.  I usually just say in this web page that L540 is a branch of V13.  I say that V13 is the father of L540, when technically S3003 is the father and V13 is the great-great grandfather, and even that may change if additional side branches are discovered with very few samples.  I’m ignoring the known branches that have few samples, for simplicity.

            L542 is one of those 80 equivalents.  V13 is sometimes called L542.  L542 was found in my WTY.

 

PGP89

            New Topic 10 Feb 2015.

            PGP89 is a sample from the Personal Genome Project (search Google for details).  PGP89 is S3003+ but L540-, so this sample represents an older node in the branch leading to L540.  So far there are no such S3003+ L540- results in the E-M35 Project.

            This is from Steve Fix, who includes PGP data in his tree.

 

Z29042

            New Topic 14 Jan 2015.  Edit 27 Dec 2015.

            This SNP was discovered by Steve Fix on 10 Jan 2015, from the Big Y data of Roider, compared to Gebert.  These two samples have this SNP, but Hochreiter does not, so Z29042 defined a new Haplogroup, the first branch to be found for L540.  Steve assigned the Z series code number.  Actually, there are 6 new SNP locations common to Roider and Gebert, but only Z29042 was assigned a code;  some of those others may be needed in the future.

            I’m a bit surprised.  I expected Roider to fall into a branch with Hochreiter, because they are closest in STRs.  Also, I have been predicting an older node for Gebert, based on his DYS389 value, and his STR values that differ from other L540 samples, more than L540 samples differ from each other.  STR predictions are statistical, because STRs mutate relatively rapidly.  So this is a surprise, but such surprises are expected from time to time when making predictions based on STRs.

 

A783

            Update 10 Feb 2015.

            This SNPs was noticed by Steve Fix and me in Hochreiter’s Big Y data, our first L540 Big Y.  Actually, there were 10 new SNPs;  I tested myself for them but came out negative.  Yseq assigned A series code numbers to them.  None of the 10 showed up in the Big Y data for Roider or Gebert.  In Feb 2015 I noticed this one in the Big Y data for Svercl, so it defined a new haplogroup branch for L540, with Hochreiter and Svercl, not me, not Roider, not Gebert.

 

Y12393

            New topic 12 Jul 2016.

            This SNP was newly posted by Yfull in July 2016, as a new branch of A783.

 

A779

            New topic 12 Jul 2016.

            Steve Fix suggested this SNP to me on 12 Jul 2016, to distinguish Svercl from two Hochreiter Big Ys.  The two Hochreiter’s are 7th cousins, with a common ancestor about 300 years ago, so the A779 mutation is at least 300 years old.  That’s still too young to be listed in public Y-DNA trees like Yfull.

 

Y7026

            New Topic 8 Feb 2015.  Update 17 Mar 2016.

            This SNP represents the major division of L540, with 11 of the 16 samples in the L540 tree so far.  The Yfull tree estimates Y7026 to be about 2,000 years old, although this is a very rough estimate due to the caveats associated with DNA age estimates.  See the next topic discussion about the “bushy” nature of Y7026.

 

Y7026*

            New Topic 17 Mar 2016.

            We now have 5 samples in the paragroup Y7026*.  Click here for a jump to Y7016* in the tree.

            All 5 have been confirmed with SNP results Y7026+, and Z29042-, A783-, so they do not belong to those two known haplogroup branches of Y7026.  Two of the 5 have Big Y results, and they do not have a common novel SNP, which means they will end up in two different new branches of Y7026, as soon as a future sample in their branch gets a Big Y result with a common novel SNP to define that future haplogroup branch.

            In other words, we know Y7026 has a least 4 branches.  The node associated with Y7026 is the major “bushy” node of the L540 tree.

            The other three samples have not purchased Big Y;  their results are from SNP testing only.  These three may belong to those two future branches.  Or, perhaps one or more of those three may end up in yet another branch of Y7026.

            A bushy node is evidence that the immediate descendants of the corresponding MRCA participated in a significant population expansion.

            On the other hand, bushy nodes may be just random, not evidence of population expansion, due to the luck of SNP discovery statistics, particularly for the case of Y7026, with only 11 total samples so far.  Big Y does not cover the entire Y chromosome, and Big Y does occaionally randomly miss SNPs, so future testing may show Y7026 to be not so bushy after all, if future novel SNPs combine the Y7026 branches into fewer larger branches.

 

A6295

            New Topic 22 Jul 2015.  Update 17 Mar 2016.

            This SNP and haplogroup was defined 22 Jul 2015, being present in Nowak’s Big Y, and also present in my earlier Gwozdz Big Y.  Kargul tested positive for A6295, making 3 samples so far.  My Gwozdz cousin would no doubt also test positive, but I leave him out of the L540 tree since together we represent one ancestral line.  Actually, I recruited both Kargul and Nowak, based on close STR matches to me, so statistically, the A6295 branch should be considered to be much smaller than the Y7026 branch with 11 independent samples.  Note that the three A6295 samples are the only Poland origin samples in the L540 tree;  I did not recruit on the basis of Poland origin, so we can speculate that A6295 might represent a small Polish branch of L540, although three samples is far too few for any confidence in this regard.

 

A9035

            New Topic 21 Jan 2016.

            This SNP has just been defined 21 Jan 2016.  It is negative in one A6295 sample (Nowak) and positive in the other two (Gwozdz and Kargul), so it represents a haplogroup - a small twig in the Y-DNA tree.  Kargul does not have Big Y data;  Kargul’s FTDNA sample is A6295+;  Kargul’s Yseq sample is A9035+.  I (Gwozdz) ordered SNP tests at Yseq for 4 of my “private” SNPs, A9032, A9033, A9035, and A9036;  Kargul is negative for those other 3, implying that our MRCA node for A9035 is roughly 3/4 as old as our node with Nowak for A6295, although this is a very rough estimate with only 4 SNPs tested.  At the Yfull SNP browser, using the locations for those 4 SNPs from my (Gwozdz) Big Y data, I verified my positive standing for all 4 of these SNPs;  Nowak and all other V13 samples in the V13 Project at Yfull are negative for all 4.

 

Z39377

            New Topic 22 Nov 2015.

            This SNP has just been defined 22 Nov 2015.  It is present in Hartsfield’s recent Big Y, and is also present in Roider’s Big Y from earlier this year.  So Z39377 defines a new haplogroup, with only those two samples so far.

 

S3003

            Update Feb 2015:

            This SNP is in the L540 branch, but older.  PGP89 is a sample from the Personal Genome Project (search Google for details).  PGP89 is S3003+ but L540-, so this sample represents an older node in the branch leading to L540.  So far there are no such S3003+ L540- results in the E-M35 Project.  Technically, S3003 defines a haplogroup with branches PGP89 and also L540, but for simplicity I just say in this web page that L540 is a branch of V13.

 

Determining Your L540 Twig;  Dividing L540;  Discovering New SNPs

 

            Rewrite 31 Oct 2015.  Edit 3 Nov 2015.

            I recommend Big Y, next paragraph, if cost is not an issue for you, and if you are enthusiastic about discovering new haplogroups.  Otherwise, consider the less expensive tests per the following paragraphs, to determine your current haplogroup.

            Big Y:  Discovering new SNP haplogroups is part of my genetic genealogy hobby.  I have been recently recruiting L540 members to purchase Big Y in order to discover new SNPs, which provide new haplogroups - terminal  twigs on the Y tree, to further subdivide L540.  It’s not cheap.  $575 for Big Y.  Anyone interested in joining this L540 project can order Big Y;  please contact me so I can keep track of the status.  With Big Y, there is no need for individual SNP testing.  In fact, with Big Y, many men immediately discover their own new twig.  If you don’t discover a new twig with Big Y immediately, that is because two samples with the same SNP are required to officially define a new haplogroup.  It is almost certain that a future Big Y test will match one of the new SNPs in your Big Y data, thereby defining a new twig for just the two of you.

            If cost of Big Y is an issue, you can wait.  Save your money.  The price will come down with time.  That means others get the thrill of discovering the series of twigs in your branch of the Y tree, but you get to purchase the corresponding SNPs at the low price for individual SNP tests.

            I encourage testing at FTDNA, and joining the E-M35 Project, because I like the convenience of finding all the data in one place.  There are other companies.  Yseq offers individual SNPs at lower price with faster results.  Currently, individual SNPs cost $39 at FTDNA vs $17.50 at Yseq.  Click on SNP ordering for detailed instructions.

            If you have already come out L540+ with an SNP test, you can work your way through the L540 tree one SNP test at a time.  For example, starting with the Y7026 test, if you come out Y7026-, then test A6295;  if you come out Y7026+, continue with Z29042 and if negative then A783.  Then you wait for another new twig to show up in the tree.

            If you are predicted L540 based on STRs - red in the Neighborhood Table below, you might test for L540 first, then continue with the branches when confirmed L540+.  If your prediction is very high confidence, boldface in the table, you can skip the L540 test, but come back and test L540 and/or S3003 if you test negative for the two branches.

            If you are in that table with a low confidence blue prediction number, and you have not been tested for L540, consider one of the SNP package deals.  SNP packages are offered because your DNA testing companies (like FTDNA) provide you with high confidence haplogroup predictions for your main Y-DNA branch.  Even if you are not in my table, if your sample is predicted by FTDNA to be in one of the main branches upstream from L540, consider an SNP package deal:

            FTDNA has an “E-V68 SNP Pack, with 114 SNPs downstream of V68, at $119 to determine which is yours.  V68 is the “father” of V13, which is the father of L540.  At your FTDNA home page, click on “Haplotree & SNPs”, which jumps to your predicted location in the tree.  A banner ad for this pack should be in the tree above your position.  This pack has SNPs for all the known branches of L540, although L540 is not recognized yet in FTDNA’s tree, and although Y7026 and A6295 are still not available individually at FTDNA.

            If you are sure you are V13, Yseq has a “V13 Panel” that has a better selection of V13 branches including all the known branches of L540.  It costs $88, plus $5 for a cheek swab kit if you have not already tested at Yseq.  Link with a description:  http://www.yseq.net/product_info.php?products_id=2486.  That description has a nice V13 tree.  Please let me know if you are L540 and order this panel, so I can keep track of results.

            For more specific discussion, click on  L540, A783, Z29042, Y7026, A6295, SNP ordering, and Big Y.

            How about STRs?  In the past, I encouraged upgrading to 111 Markers, the largest set available at FTDNA.  Now that there are plenty of SNPs available with low cost tests, SNPs are better than STRs for finding your closest Y matches.  However, there are plenty of samples without the latest SNP tests, so if you are anxious to find out which of these best match your Y, 111 STR markers are much better than the smaller standard sets.

 

700 Year Old Skeleton - Might be L540

            New Topic 21 Mar 2016.

            Recent Article:  Vanek D. et. al. (2015) Complex Analysis of 700-Year-Old Skeletal Remains found in an Unusual Grave - Case Report. Anthropol 2: 138. doi: 10.4172/2332-0915.10000138.

            This Vanek article describes male skeletal remains found in Bohemia.  DNA testing included 14 STR markers from the Y chromosome, placing this man in the V13 haplogroup with high confidence.  (V13 is almost always predicted with confidence using only the basic 12 STR markers.)

            Vanek was brought to my attention by Svercl, who in turn found it mentioned at the Haplozone forum by “Passa”, who points out that the Y-DNA data matches L540 samples.  Passa was directed by “Ignis90” at the Anthrogenica forum.

            I demonstrate in this topic that the Vanek skeletal Y-DNA might belong to the L540 branch of V13, although I figure this prediction to be somewhat less than 50% probable.

            Vanek et. al. used 13 of the 14 markers, because DYS635 data was not available to them.

            Passa used the Haplozone database, which does not use 635.  Pasa’s list of closest matches really uses only 12 markers, because Haplozone ignores DYS385a when 385b is not present.

            Vanek used Ysearch, apparently before 635 data was widely available.  The Vanek skeletal sample data is available at Ysearch as DE8WN.  Vanek, Table 1, has “X” for all but one of the values at 635 for matching samples.  (That one sample has an X at another missing marker DYS481.)  My  sample KFKGM is in Table 1 with X.  Today (18 Mar 2016), many samples, including mine, have 635 values.  The other samples in Table 1 still do not have data at all 14 markers.  Today, a Ysearch using the 13 values without 635 provides many more matches, with Kargol as the closest match, although Kargol is not in Vanek Table 1, probably because that data was taken before I recruited Kargol.

            The Vanek sample DE8WN does not match the L540 samples as well with all 14 markers, because DE8WN has 635=23, whereas the L540 Modal has 635=21.  Today, a Ysearch using all 14 markers, provides one match at step (also called GD, or genetic distance) 5, and 6 matches at step 6 including mine.  One at step 6, Bohman, has tested L540-;  the others do not have L540 test data available to me.

            Twelve L540 samples are available to me with the 111 STR set, which includes 635;  9 have 635=21, one has 20, one has 22;  only Fredeen has 23.  Fredeen, the 12th row in the Neighborhood Table, is an STR outlier, marginally matching L540 using the definition C75 with 75 of the 111 markers;  Fredeen does not match L540 using the 67 set or fewer STR markers.

            This is not unusual - for samples to not match each other very closely using fewer STR markers but to match quite closely using more markers.  (The converse is also not unusual - close matches with fewer markers that are distant using more markers.)

            Passa’s list of closest matches to the Vanek sample are all at step 3, which is not a good match for L540 using 12 markers.  A glance at the far right column of the Neighborhood Table shows that step 3 is not a good indicator for L540 using the standard 12 set;  one sample at step 2 has tested L540-;  one sample at step 4 has tested L540+.  I know of 2 samples with 67 or 111 markers that are not L540 but differ from L540 by only step 1 at 12 markers;  I know of 35 such samples with step 2 at 12.

            The samples from Vanek Table 1 using 13 markers are all at step 3 or greater from DE8WN.

            There are 542 samples available from E-M35 with 111 markers, including DYS635 (my data download 1 Nov 2015).  I analyzed these using all 14 Vanek markers.  There are no matches at genetic distance step < 5.  There are 5 samples at step 5 from the Vanek skeleton STRs, including Kargul and Hochreutter, who are tested L540+.  The other 3 matches at step 5 are not predicted L540, because the full 111 set with C75 gives step 19, 20, and 23 (step 8 is the cutoff;  see the Neighborhood Table column for C75).  One of those other 3 not predicted L540 is confirmed L540- because it has a Big Y placement in S2927, another branch of V13.  Another is V13+ without an L540 test.  The last one of those 3 has no SNP data and is conservatively predicted M35 (upstream from V13) by FTDNA, although that last one is probably V13.

            There are 12 samples at step 6;  4 are tested L540+ and listed in my Neighborhood Table;  8 are clearly not L540, with C75 step 21 to 49.  Of those 8, 3 have Big Y results placing them in other branches of V13 (one is V13* belonging to a branch not yet identified).  One is tested V13+ L540-.  Two are tested V13+ without L540 test.  One is tested only M35+.  The last of those 8 is conservatively predicted M35 by FTDNA, again probably V13.

            This analysis at steps 5 and 6 using all 14 markers is my basis for the rough estimate for less than 50% probability that the skeleton is L540+.  2 of 5 at step 5;  4 of 12 at step 6.

            At step 7, 4 of 16 samples are L540+.  At step 7, 1 of 20 samples are L540+.  The last L540+, at step 10, is the STR outlier Fredeen.

            Summary:  14 markers are not enough for confident prediction of L540.

            End of analysis using the 111 marker database.

            There are 1728 samples available from E-M35 with 67 markers.  The L540 definition at 67 markers is C54.  I analyzed these using the 13 Vanek markers excluding 635.  There are no matches at genetic distance step < 3.  There are 2 samples at step 3 from the Vanek skeleton STRs, Kargul and Sabieka, who are tested L540+.  At step 4, 5 of 14 samples are L540+, about 1/3.  At step 5, 11 of 36 samples are L540+.  At step 6, 3 of 68 samples are L540+.  L540 outliers are 3 at step 7 and 1 at step 8.  At step less than 5 there are no marginal predictions;  at step 5 or more a few of the results are marginal - close to the L540 cutoff at step 8.  The match of Vanek to L540 is better using the 67 vs 111 marker sets, and statistics is better with a lot more in the database, but of course confidence is lower with 13 instead of 14 markers available for comparison.

            Although there is no close match, it is possible the skeletal remains came from a man who was an STR outlier in L540.  It is also possible he was an outlier from another known branch of V13.  A third possibility:  this man’s Y-DNA data might be the only data available to me from a small branch of V13 not yet discovered.

            There is a variation on that 3rd possibility:  the skeletal remains might be in a yet to be discovered very old node on the segment leading to the L540 samples.  L540 has 23 phyloeoquivalent SNPs.  I disuses this at haplozone.  That means the tree segment that goes from the V13 MRCA to the L540 MRCA is quite long, with 23 SNPs distributed along the length (in time) of that segment.  The Yfull tree estimates that segment as 4300 to 2000 years ago.  We don’t know where L540 lies in that segment;  if L540 is among the younger of those SNPs, then there may well be L540- branch twigs to be discovered along that long segment.  (S3003 seems to be one such twig, with only one PGP89 sample, as mentioned in my haplozone discussion.)

            It would be nice to get an L540 test from these skeletal remains.  If positive the known branches of L540 could also be tested.  Today, individual SNP tests are the same cost and effort as individual STR tests, with primers published by Yseq.

 

Cluster C

            Rewrite 31 Oct 2015.

            Friedman proposed cluster C in 2008, based on STR correlations, when the data was less than what is available today.  Cluster C now seems equivalent to L540.  The cluster C data is still available at the haplozone site but may not be up to date.

 

C Type

            Rewrite 31 Oct 2015.

            I defined C type in Jan 2010 as my version of Cluster C.

            I use C type to predict L540 samples based on STRs, for samples that do not have the L540 SNP test.

            I use the word type for an STR cluster with statistical validity as established by my Mountain Method.  “Type” is my own term.  I chose the word “type” because it is not generally used in genetic genealogy and I wish to distinguish my types from haplogroups and from other clusters.  By “type” I mean the cluster data, the hypothetical clade, the modal haplotype, and the set of all possible haplotypes, at any number of markers.  Accordingly, by “C type” I mean any or all of these 4 things.  I sometimes use just “C” as short for “C type”.  I also have a previous C type identified in R1a;  unrelated;  please don’t get confused.  I published my methods in the Fall 2009 issue of JoGG.

            My analysis files define C type.  Sorry, it can be a bit confusing because I have multiple STR definitions for C type, for various marker sets.  The number of markers in my definitions change slightly when new samples show up with unusual STR values.  I hope the meanings are clear from the context of my discussions in this web document.  See the discussion below the Neighborhood Table for links to my definitions, with links to my Excel analysis files.

            Click on seems equivalent for an explanation that STR types (such as C type) cannot be exactly equal to equivalent SNP haplogroups (such as L540), due to STR outliers.

 

V13C

            Rewrite 31 Oct 2015,

            I coined the name V13C in 2010 to represent C type, cluster C, the hypothetical haplogroup, and the samples (men) in the hypothetical haplogroup.

            I also used V13C to mean samples that match C type from the database of samples at E-M35 or at Haplozone, or at other databases.

            This web document used to be named V13C.html.

            Now that C type seems equivalent to L540 I editing away most of my mentions of the name “V13C”, but I’ll continue to use “C type” for the hypothetical clade based on STRs.

 

L Type

            Edit 27 Dec 2015.

            I proposed L type on this web page in mid 2011, based on only 2 samples, which means not very high statistical confidence.  L type (also called L540 type) was a type that included C type plus those 2 samples, which did not fit C type at that time.

            I no longer consider the distinction between C type and L type useful.  One of those two samples (Gebert) tested positive for the Z29042 branch, which means it is just a statistical STR outlier. The other (Fredeen) has not been SNP tested for L540 branches, so I don’t know if that one is also an outlier, or if it truly belongs to a much older node in the L540 tree.

            I now use the 2013 L type definition for C type;  see C45 for more discussion.

            I edited this web page to remove most mentions of L type.

 

111 Markers

            Rewrite 5 Dec 2015.

            FTDNA provides STR markers in various sets.  The largest, a set of 111, was introduced in 2011.  Upgrades can be purchased for samples with fewer markers.  Obviously, matches and predictions are more accurate using more markers.  Until 2014, I had been recommending the 111 set to L540 members, hoping to discover STR correlations good enough to divide the L540 haplogroup into clusters with high confidence.  Today, SNPs are more important than STRs.  This is because the cost of discovering new SNPs has come down a lot.  SNPs define haplogroup divisions;  STRs only provide statistical predictions for haplogroups.

            Still, the set of 111 markers is the most accurate way to find out which samples in the large on-line STR databases are your best matches, and statistically most likely to form a recent branch (I would call it a twig) in the tree of your male line ancestry, and a prediction of their order (older vs younger nodes).  As an example of the value of 111 STRs, I discovered DYS445=11 as an unusual mutation in my own Y, shared by my 3rd cousin, and also shared by Kargul, adding evidence that we form a twig in the L540 tree, perhaps restricted to Poland, perhaps only a few centuries old.  DYS445 is not available at less than 111 markers in FTDNA standard sets.  The rest of L540 samples have the value DYS445=10.  The value 11 does show up rarely elsewhere in V13, as an independent mutation, so although DYS445 is very slowly mutating it is not as slow as a typical SNP, so not as statistically reliable as an SNP.

            New clusters can still be discovered with STRs, as predictions for new haplogroups, which still need confirmation by discovery of a corresponding SNP.  However, STR analysis is yielding diminishing returns for the effort.  SNP discovery is now accelerating instead.

            Summary:  111 STR markers are valuable if you are very interested in genetic genealogy, and if cost is not a big issue for you.  If cost is an issue, and if you are merely curious about your Y-DNA, as a first test I recommend the 37 marker STR set (topic after next).

            For my 111 marker analysis of L540, see my discussion of C75(111) below the Neighborhood table.

 

67 Markers

            Rewrite 5 Dec 2015.

            FTDNA provides a 67 marker standard set of STR markers.  I have been using this 67 set for analysis for more than 8 years.  Although the 111 set is more accurate, this 67 set is valuable for analysis because there are a lot more samples on-line at 67, and all samples with 111 are included.

            For my 67 marker analysis of L540, see my discussion of C54(67) below the Neighborhood table.

 

37 Markers

            Rewrite 5 Dec 2015.

            FTDNA no longer offers the 25 and 12 STR marker standard sets.  The 37 marker set is sufficient as a first test is you are curious to see in which Y-DNA main branch haplogroup you belong.  With 37 markers, FTDNA will automatically place you in one of the main large haplogroup branches of the Y-DNA tree.  For the smaller branches of the tree, there are SNP tests.  For L540 candidates, I have a separate discussion topic about this:  Dividing L540.

            Most of the more rapidly mutating STRs are in the 37 marker set, so the 37 marker set is good to search for your best matches to other men with a male line common ancestor in the last millennium or so.  FTDNA provides you with matches to other men with similar STR haplotypes.  All samples with 67 or 111 are included because they have these 37 plus more.

            For my 37 marker analysis of L540, see my discussion of C30(37) below the Neighborhood table.

 

25 Markers

            Rewrite 5 Dec 2015.

            FTDNA provides the older STR sets, using 12 and 25, as special orders by project administrators, but for the price difference the 37 set makes more sense.

            For my 25 marker analysis of L540, see my discussion of C12(25) below the Neighborhood table.

 

12 Markers

            Rewrite 5 Dec 2015.

            The standard 12 STR markers are among the slower mutating STRs, so this set can be used for prediction of the oldest  haplogroups, including V13.  There are still lots of data on-line with only 12 markers.  This 12 set is not reliable for L540.  With the modal haplotype, C12, one confirmed L540+ sample (Sabieka) is at step 5 (last column in the Neighborhood Table) and three samples are at step 4, so in the future a sample may show up even at step 6.  I have found no confirmed L540- samples yet at steps 0 or 1, but there are some that I predict L540-, so I used blue, for low confidence, at the bottom of the Neighborhood table for samples at steps 0 and 1.  At step 2 my confidence for each sample is only about 10%, based on more than 50 samples, and confidence decreases above step 2.  See also http://www.gwozdz.org/C12.xls for more details.

            Actually, the known L540+ samples are just as valuable as C12, because any samples that match an L540+ sample using all 12 markers are candidates for L540 SNP testing.  As an extreme example, Sabieka (kit 226416) has no matches in the E-M35 database at the first 12 markers step 0, no matches at step 1, and only 6 samples at step 2.  His haplotype is rare.  So any future samples with a 12 marker haplotype that differs from Sabieka by less than step 2 is an L540 candidate, albeit with low confidence.

            Exceptions:  The V13 modal haplotype differs from L540 by step 4 using the 12 markers.  Many branches of V13 have the same modal haplotype at 12 markers, for example L241 and L143.  It is no surprise that the V13 modal haplotype has 125 samples in the E-M35 database at 12 markers.  Obviously, the confidence for predicting any one of these to be L540+ is extremely low, and in fact there are none yet.   The V13 vs L540 modals differ at 3 markers (one is step 2).  So there are 3 haplotypes that differ by step 1 from V13, toward L540, meaning they are step 3 from L540, also with many samples in the database, and also extremely low confidence.  There are 3 other haplotypes that differ by step 1 from L540, toward V13;  these have only 8 samples at 12 markers (Nov 2015), with one confirmed L540+ and two others predicted L540 based on more markers, but confidence for these 3 haplotypes is of course lower than the other step 1 haplotypes (using 12 markers) that differ in haplospace directions away from V13.

 

Best STR Markers

            Update 17 Dec 2014:

            STR markers that mutate relatively slowly are statistical indicators for clades in which they are recently mutated, but they are not perfect because of subsequent independent mutations.  When a clade has a few such good STR markers those provide a signature set of STR markers.  A signature is statistically expected to be a more probable indicator of a clade than just one marker.  Indeed cluster C is characterized by the Friedman Signature.  My definitions of C type and L540 use other helpful markers, not just the signature.

            My analysis files automatically rank markers, as useful for a definition, using a method that I published.  You can view my ranking in those xls files linked below the Neighborhood Table.  See row 11 of the Calculator sheet, and row 17 of the TypeRank sheet.  The exact ranking of markers varies slightly from month to month due to the random nature of mutation values in new samples, and due to the somewhat arbitrary cutoff that I use to restrict the database to the neighborhood (using too many samples provides a ranking of the father clade instead of the clade of interest).  For example a sample that ranks 6th one month might come out 4th or 5th or 7th or 8th the next month.

            An SNP that defines a haplogroup is very unlikely to have happened exactly at the time of the most recent common ancestor (TMRCA) of a haplogroup.  Most likely the SNP is somewhat older, because usually there are many generations between nodes.  By definition an SNP cannot be younger than the TMRCA.  Similarly, we can consider a hypothetical clade defined by a particular STR mutation, which is likely somewhat older than the TMRCA of that clade.  However, for clusters defined by signatures, and for types defined by definitions, one rare STR mutation that contributes to the signature might have happened shortly before or after the TMRCA of that cluster or type.

            Very slow mutators should make the best markers.  However the slowest are rarely mutated, so those with intermediate mutation rate show up more often as signature markers.  My Type.xls master file has the Chandler STR mutations rates, in the ASD sheet, row 5.  The ASD sheet is not usually included in my analysis files.

            Usually it is silly to speculate about clusters defined by a single STR value.  In this case, however, we have a hypothetical haplogroup, C type, which seems quite young, with relatively little STR variation, so some speculation is in order:

 

DYS389II = 32  (389II minus 389I = 19);  Best Marker for Cluster C

            Rewrite 27 Dec 2015.

            DYS389II=32 is best of the original Friedman markers for cluster C.  It remains a good marker for C type and L540.

            [Technical detail:  DYS389 is a compound marker, where 389I is the first STR chain and (389II minus 389I) is the second STR chain.  For cluster C the first chain is 389-1 = 389I = 13.  The second chain is 389-2 = 19.  389II = 13 + 19 = 32.  The marker of interest here is really 389-2 = 19 (389II minus 389I = 19).  However, 389I mutates more slowly and has the value 13 for all but one C sample so far and for almost all samples in the L540 neighborhood.  At Ysearch or Haplozone, both 389 markers need to be used together;  if one is omitted both are ignored.  My analysis files allow the 389-2 chain to be used alone in analysis, using 389-I only to calculate the difference.  However, I use both 389 values (or neither in some cases) in my published definitions to be compatible with other web sites.  In this discussion topic, by “32” I really mean 19 for the delta value.]

            All STR marker sets by all DNA companies include the 389 pair (I have not noticed any exceptions).

            Only two L540+ samples, Fredeen and Gebert, have the ancestral value 30.  Butman, the closest STR match with L540-, also has 30.  Only a few samples in the branches of L540 have the value 31, which is not common in the neighborhood.  On this basis, it seems likely that the mutations to from 30 to 31 to 32 happened before the TMRCA for L540, and later mutations back from 32 to 31 and 30 happened in some but not most L540 male lines.  (We cannot rule out a rare double size mutation incident, from 30 to 32, or a double mutation back to 30.)

            The 32 value is rare throughout V13 but shows up in E-M35 branches outside V13.

            DYS389II (actually the delta value 389-2) ranks 43rd in Chandler mutation rates.  Near the middle.  So exceptions are expected, due to recent mutations.

 

DYS594 = 12;  Best Marker for L540 at 67 Markers

            Rewrite 27 Dec 2015.

            In my analysis, DYS594=12 is the best marker for L540 (and C type) using the 67 marker set.  594 is not in the 37 marker set.

            All L540+ samples with 67 or more markers, including 2 that are not C type, all have the 594=12 value.  Butman, the closest STR match not predicted L540, indeed tested L540-, and has the ancestral 11.

            All C type samples (predicted L540), even those not tested yet for L540, have the 12 value.

            A few samples in the STR neighborhood have 594=12 but are L540-.  These are not a random sample;  I recruited two of them for the L540 test to find out if all 594=12 in the neighborhood are L540;  no, not all.

            The 594=12 value is more common in the L540 neighborhood than in the rest of the V13 data.  So I was wondering if 594=12 is an old mutation in the S3003 branch.  So I tested one of those two L540- samples with 594=12;  it came out S3003-, so it is clearly an independent mutation.  Also, considering the L241 haplogroup, some of those samples are in the neighborhood, but they have 594=11 except one sample that has the value 12, so that is also independent.

            DYS594 ranks 12th from the slowest in the 67 Chandler mutation rates.  Quite slow, so independent recent mutations should be rare.

 

DYS636 = 12;  DYS504 = 14;  DYS561 = 17

Excellent Signature Markers for L540;  Available Only in the 111 Set

            Rewrite 27 Dec 2015.

            These three are not in the FTDNA 67 STR maker set, but are available in the extended 111 STR marker set.  They are each about as good as DYS594=12, previous topic.  There are other markers almost as good in the 111 set.  That’s why C75(111), my 111 marker definition for C type, works very well.

            Using C75(111) to analyze these 4 best markers (including 594, previous topic):

            12 samples C type at 111 markers are all confirmed L540+.  (My cousin and I - both Gwozdz - do not show L540+ at the E-M35 SNP web page because my WTY discovery of L540 does not show at that page.)

            10 samples are statistically independent because I recruited my cousin and Kargul.

            That 111 marker analysis file includes the 101 nearest STR neighbors at 111.  113 samples total.

            DYS594 = 12:  All 10 L540 samples have 594=12.  Only one other sample has the 12 value, not a near neighbor.  One has 10;  all the rest have 11.  See the previous topic for discussion of 67 marker DYS594 neighbors.

            DYS636 = 12:  All 12 have 636=12.  Only one other sample has the 12 value, not a near neighbor.  All the rest have 11.

, and all 42 neighbors have 636=11.

            DYS504 = 14:  9 of the 12 have 504=14, but two of those are not really exceptions, because they have 504=15, representing an additional mutation.  Glasser is the only exception with 13.  14 other samples have the 14 value, not near neighbors.  All the rest have 13.

            DYS561 = 17:  11 of the 12 have 561=17;  one has 16.  4 other samples have the 14 value, not near neighbors.  The rest are 16 except for several 15.

            Kargul is that sole exception, with 561=16.  As discussed in the Kargul topic below, Kargul is obviously a male line relative of mine from the past few centuries, so this exception seems to be an independent mutation back to the ancestral value.

            Summary:  10 of the 12 L540+ samples at 111 markers match on all 4 of these signature markers.

            Butman is the closest STR neighbor at 111 (C type at 67 and 37).  Butman is confirmed L540-.  Butman has the ancestral values for all 4 of these.

 

Signature C4

            Rewrite 29 Dec 2015.

            An excellent signature using the 67 standard marker set, for C type is (389I, 389II, 594, 444) = (13, 32, 12, 13).  But it’s not perfect.  12 of the 19 L540+ samples with 67 markers have this signature at step 0, 5 samples are at step 1, Gebert and Fredeen are outliers at steps 2 and 3.  Butman is the only L540- sample at step 0.  This is using all on-line data that I rounded up in Nov 2015.

            There are better markers than 389I.  I included that one because it enables C4 in on-line searches, which disregard 389II if used alone.

 

Friedman Signature

            Rewrite 29 Dec 2015.

            The signature is (390, 389-2, 447) = (25, 32, 25).

            Friedman had been calling this the “characteristic marker values” for cluster C at the Haplozone site before I started working on this, back in 2008, when there were only 9 samples available in cluster C, including mine.

 

            This original Friedman signature by works surprisingly well by itself for samples with only 25 of the standard markers, but not with high confidence.  For more details, see the discussion about C3(25) below the Neighborhood Table.

            In early 2011 Friedman added 594=12 to the “characteristic marker values”, for 67 marker samples.  See also the discussion below the Neighborhood Table.

            DYS389 is a compound marker, discussed above.

            Friedman used a more complicated analysis than just this simple signature in her C type assignments.  I do not know her method exactly, but most definitions (not all) that I tried, selecting well ranked markers, extracted the same samples that she did.

 

L540 Neighborhood

            Update 13 Dec 2015.  Edit 27 Dec 2015.  One more sample added 25 Apr 2016.

            L540 is small enough that I can insert a complete table here, including neighbors just beyond in STR values.  These are the samples known to me that might be L540 members, and near neighbors, based on STR prediction.

            Those numbers are STR step, which is mutation count from that column’s Modal Haplotype, as explained in the notes below the table.

            + vs --- means confirmed positive L540+ vs negative L540-, violet vs green. Confirmed by an SNP test, or a relative or very close STR match to a confirmed L540+ sample.

            L241 means positive for another haplogroup, implying negative for L540.

            There are many more negative L540 results from outside this neighborhood (higher step).

            Red step numbers are C type, which seems equivalent to L540, so these are predicted L540 with more than 70% confidence.

            Red boldface step numbers are predicted L540 with more than 90% confidence.

            Blue step numbers are borderline, might be L540, less than 70% confidence.

            Pink step numbers fit C type, but a better modal haplotype is available;  these pink numbers provide calibration of these lesser modals, for use with other samples.

            For my recommended DNA tests for samples in this table see the topic Determining Your L540 Twig.

            Data sources:  e = E-M35 project, h = Haplozone,  y = Ysearch

            Edit 17 Dec 2015:  A few ancestor names corrected;  some had been showing the name of the administrator for the sample.

 

 

 

 

 

 

 

 

 

Modal>

C75

(111)

C111

(111)

C54

(67)

C67

(67)

C4

(67)

C30

(37)

C37

(37)

C12

(25)

C25

(25)

C3

(25)

C12

(12)

 

 

 

 

 

L540

Terminal

 

Cutoff >

8

17

8

13

2

5

9

2

3

2

1

Kit

Ysearch

L540

Ancestor

Origin

Tree

Test

Data

Markers

 

 

 

 

 

 

 

 

 

 

 

N45041

UQR4B

+

Hochreutter

Germany

A779

Big Y

ehy

111

1

8

1

5

0

0

6

0

3

0

1

51282

A9FVE

+

Weiand 

Germany

 

FTDNA

eh

111

2

14

5

11

0

5

8

1

4

0

1

N16800

KFKGM

+

Gwozdz

Poland

A9035

Big Y

ehy

111

2

10

4

8

0

3

6

1

4

1

2

171456

79QF7

+

Glasser

Germany

Y7026*

Big Y

ehy

111

2

8

0

2

0

0

2

0

1

0

0

175213

5XP46

+

Stelz

Germany

A783*

Yseq

ey

111

2

11

2

6

0

3

4

1

2

0

0

155155

 

+

Svercl

Czech

Y12393*

Big Y

eh

111

2

14

1

7

0

1

4

0

2

0

2

140927

9JM9U

+

Hartsfield

Prussia

Z39377

Big Y

ehy

111

2

10

4

5

1

2

2

1

1

1

1

N81304

 

+

Gwozdz

Poland

A9035

Relative

eh

111

3

12

5

10

0

4

8

1

5

1

3

225596

6S4J6

+

Nowak

Poland

A6295*

Big Y

ehy

111

5

11

3

8

1

2

4

0

0

0

0

480087

 

+

Marschner

Czech

Y7026*

BY

e

111

5

17

2

11

 

 

 

 

 

 

 

199446

TK98K

+

Kargul

Poland

A9035

Yseq

ehy

111

6

11

4

7

1

3

5

1

4

1

2

166692

8FTXT

+

Gebert

Germany

Z29042*

Big Y

ehy

111

7

16

6

9

2

3

5

2

4

2

3

162917

 

+

Fredeen

Sweden

 

FTDNA

eh

111

7

22

7

17

3

6

12

3

6

2

4

N91348

 

---

Butman

England

 

FTDNA

e

111

15

21

6

12

4

2

7

2

2

2

2

417237

 

Z17264

Simutkin

Russia

 

Big Y

e

111

17

30

12

20

3

9

17

4

11

3

5

61348

 

 

Ramsey

England

 

 

e

111

17

37

13

24

3

11

20

4

10

2

4

5960

V93B3

Z17264

Bartlett

England

 

Big Y

ehy

111

18

28

9

17

3

5

12

3

6

2

4

98212

 

L241

Baber

England

 

FTDNA

e

111

18

34

13

23

4

11

16

5

9

4

5

295031

 

 

Takhir

Russia

 

 

e

111

18

36

13

26

4

9

15

5

8

5

5

N39989

5N5MF

---

Hohnloser

Germany

 

FTDNA

ehy

111

18

29

13

19

4

7

10

2

3

2

3

 

 

 

5 samples

 

 

 

e

111

19

 

 

 

 

 

 

 

 

 

 

 

 

 

11 more

 

 

 

e

111

20

 

 

 

 

 

 

 

 

 

 

 

 

 

Z17264 Modal

 

 

 

e

111

13

24

9

15

 

 

 

 

 

 

 

 

 

 

V13 Modal

 

 

 

e

111

17

22

12

17

6

8

12

4

7

4

4

 

 

 

L241 Modal

 

 

 

e

111

23

34

16

25

5

13

17

4

8

4

5

 

 

 

L143 Modal

 

 

 

e

111

25

35

14

21

5

11

15

4

7

4

4

320415

 

+

Micek

Slovakia

 

FTDNA

e

67

 

 

1

3

0

2

3

0

2

0

1

200924

 

+

Ratuszni

Hungary

 

FTDNA

e

67

 

 

1

3

0

1

1

1

1

0

0

229581

 

 

Zinin

Unknown

 

 

eh

67

 

 

1

5

1

2

4

1

2

1

2

262750

 

+

Svercel

Slovakia

Y12393*

Relative

eh

67

 

 

2

7

0

1

3

0

1

0

1

243901

FSQXZ

 

Stubblefield

Unknown

 

 

ehy

67

 

 

2

11

0

5

10

1

6

0

2

E10751

 

 

Schulz

Germany

 

 

1

67

 

 

3

7

2

5

7

2

5

2

4

 

PFKX4

 

Georgi

Germany

 

 

y

67

 

 

3

13

 

 

11

1

5

 

 

6104

4HJ3D 

 

Boyd

Unknown

 

 

ehy

67

 

 

4

9

0

3

8

0

1

0

0

207878

 

 

Frind

Germany

 

 

eh

67

 

 

4

9

0

3

6

1

4

0

2

B3807

 

+

Stavbom

Sweden

Y7026*

Yseq

eh

67

 

 

5

12

0

4

9

1

5

0

4

B2670

X2JH9

+

Blind 

Germany

Y7026*

Yseq

ehy

67

 

 

5

10

1

4

8

2

5

2

2

174240

 

 

 

Unknown

 

 

1

67

 

 

6

3

1

 

2

1

1

1

1

 

WHFQB

 

Froetscher

Germany

 

 

y

67

 

 

6

14

1

4

12

1

4

1

2

70482

6HMRD

+

Simonsson 

Sweden

 

FTDNA

ehy

67

 

 

7

11

1

4

7

1

2

1

1

226416

 

+

Sabieka

Belarus

 

FTDNA

eh

67

 

 

7

12

1

6

11

3

7

2

5

290459

 

 

Peck

Unknown

 

 

ehy

67

 

 

7

16

2

6

11

3

6

1

1

 

59BSP

 

Unknown

Unknown

 

 

y

67

 

 

7

16

 

 

11

3

6

 

 

54711

 

 

Eilhauer

Germany

 

FTDNA

G

67

 

 

7

9

 

 

 

 

 

 

 

44601

 

 

Harcus

Scotland

 

FTDNA

eh

67

 

 

9

19

 

7

14

5

9

2

4

70079

 

 

Skapyak

Austria

 

 

ey

67

 

 

10

16

 

6

11

4

8

3

4

75569

 

 

McDonald

Unknown

 

 

ey

67

 

 

10

22

 

9

17

6

9

3

3

152742

 

 

Acevedo

Unknown

 

 

e

67

 

 

10

23

 

7

18

4

9

3

5

E7459

 

 

Vilanueva

Philippines

 

 

ey

67

 

 

10

18

 

9

15

5

9

4

6

 

 

 

5 more

 

 

FTDNA

eh

67

 

 

10

 

 

 

 

 

 

 

 

 

 

 

3 more

 

 

 

y

67

 

 

10

 

 

 

 

 

 

 

 

275510

3K5CF

+

Roider

Germany

Z39377

Big Y

ey

37

 

 

 

 

 

0

8

0

4

0

1

N109412

BYHHR

 

Howe

Unknown

 

 

ehy

37

 

 

 

 

 

1

4

0

2

0

0

350864

 

+

Appell

Germany

Y7026*

FTDNA

e

37

 

 

 

 

 

1

3

1

2

1

2

317302

9P4Z5

 

Sager

Germany

 

 

ey

37

 

 

 

 

 

1

4

1

1

1

1

158091

QHU8Y

+

Kline 

Germany

Y7026*

FTDNA

ehy

37

 

 

 

 

 

2

4

1

2

1

2

268215

 

+

Kovalev

Russia

L540*

Big Y

e

37

 

 

 

 

 

2

8

0

3

0

2

284871

 

 

Knotz

Austria

 

 

e

37

 

 

 

 

 

2

4

1

2

1

0

434037

 

 

Giegold

Germany

 

 

e

37

 

 

 

 

 

3

5

1

1

1

1

426965

EJB8R

 

Symns

Germany

 

 

ey

37

 

 

 

 

 

4

7

2

3

1

2

141863

W5JHS

 

Pohl

Germany

 

 

ehy

37

 

 

 

 

 

5

7

1

3

1

3

374375

 

+

Hoff 

Norway

L540*

Yseq

e

37

 

 

 

 

 

5

9

2

5

2

4

42790

 

 

Brenneman

Switzerland

 

 

e

37

 

 

 

 

 

5

12

2

6

2

2

294225

XVN9H

 

Belinskiy

Russia

 

 

e

37

 

 

 

 

 

5

9

0

2

0

1

122332

 

 

Preece

England

 

 

e

37

 

 

 

 

 

6

12

2

7

2

4

65296

 

 

Garig

Germany

 

 

e

37

 

 

 

 

 

7

11

6

11

4

9

338942

 

 

Altmeier

Germany

 

 

e

37

 

 

 

 

 

7

11

3

6

3

3

 

 

 

~ 6 more

 

 

 

e

37

 

 

 

 

 

8

 

 

 

 

 

 

Q8JRJ

 

Spooner

USA

 

 

y

37

 

 

 

 

 

0

3

0

1

0

0

 

2N3UM

 

Oppitz

Germany

 

 

y

37

 

 

 

 

 

4

6

1

4

1

2

 

EDS4E

 

Haenicke

Germany

 

 

y

37

 

 

 

 

 

3

4

1

2

1

1

 

V6X4V

 

Fitze

Germany

 

 

y

37

 

 

 

 

 

2

6

0

1

0

0

 

3K4Y2

 

Lintner

Germany

 

 

y

37

 

 

 

 

 

2

7

0

4

0

1

 

4Q933

 

Kephart

USA

 

 

y

37

 

 

 

 

 

3

6

2

3

2

2

 

YN5M6

 

Bend

Unknown

 

 

y

37

 

 

 

 

 

3

5

1

3

 

 

 

9RCZR

 

Stenborg

Sweden

 

 

y

37

 

 

 

 

 

3

8

1

 

 

 

 

WME5S

 

Cervenka

Hungary

 

 

y

37

 

 

 

 

 

4

10

1

6

0

2

 

K48RR

 

Mowers

Canada

 

 

y

37

 

 

 

 

 

5

9

2

 

 

 

 

J266G

 

Wysocki

Poland

 

 

y

37

 

 

 

 

 

5

11

2

 

 

 

 

 

 

More

 

 

 

y

37

 

 

 

 

 

6

 

 

 

 

 

S10193

 

 

Engel

Germany

 

 

h

34

 

 

 

 

 

 

 

0

1

0

1

S10194

 

 

Kochtitizky

Hungary

 

 

h

34

 

 

 

 

 

 

 

0

3

0

1

A2983

 

 

Undisclosed

Austria

 

 

h

34

 

 

 

 

 

 

 

1

4

1

1

S10231

 

 

Karozewski

Austria

 

 

h

34

 

 

 

 

 

 

 

1

 

 

 

 

 

 

More

 

 

 

h

34

 

 

 

 

 

 

 

2

 

 

 

 

PNP4W

 

East

USA

 

 

y

25

 

 

 

 

 

 

 

1

4

 

 

 

 

 

Several more

 

 

 

ehy

25

 

 

 

 

 

 

 

2

 

 

 

 

 

 

Several more

 

 

 

ehy

25

 

 

 

 

 

 

 

3

 

 

 

285764

 

+

Stavbom

Sweden

Y7026*

Relative

eh

12

 

 

 

 

 

 

 

 

 

 

4

N26163

R38X2

 

Fritsch

Czech

 

 

ehy

12

 

 

 

 

 

 

 

 

 

 

0

N39377

 

 

Obendorf

Germany

 

 

eh

12

 

 

 

 

 

 

 

 

 

 

0

N57225

XKCE3

 

Livingston

Germany

 

 

ehy

12

 

 

 

 

 

 

 

 

 

 

0

 

Ysearch

 

6 more

 

 

 

y

12

 

 

 

 

 

 

 

 

 

 

0

 

 

 

Many more

 

 

 

ehy

12

 

 

 

 

 

 

 

 

 

 

1

 

Explanation of the modal haplotype columns in the table:

 

            C111 is the modal haplotype for L540 (and for C type) using the full 111 standard STR marker set.  With the cutoff at step 17, it fails to capture one L540+ sample, Fredeen.  C67 is the modal haplotype using the 67 standard STR set;  similarly for C37, C25, and C12.

 

            111 STR Marker data updated 5 Nov 2015.  Edit 4 Dec 2015.

            C75(111) is my modal haplotype definition for prediction of C type, using 75 of the 111 standard STR markers.  The cutoff is 8;  notice that there are no samples in the gap at steps 9 through 14.  All L540+ samples are captured by this definition, and no L540- samples are captured.  Because of that large step 7 gap, it seems improbable (although slightly possible) future L540 outliers might be missed by this definition, or future L540- samples might be captured.  My analysis file http://www.gwozdz.org/C111Type.xls is available if you are interested in the details.  For example, that file shows that any number of markers from 23 to 75 (columns DX to EA of the “Calculator” sheet) could be used for the definition and the gap with no samples would still be step 7;  I use 75 STR markers, the largest choice.

            Near Neighbors.  The table includes a few samples with 111 markers beyond the C75 cutoff, for comparison.  These help to calibrate the other modals with fewer markers.

            For more discussion, see the topic 111 Markers.

            My Type.xls master file has instruction sheets explaining how my xls analysis files work.

 

            67 STR Marker data updated 17 Nov 2015.

            C54(67) is my modal haplotype definition for prediction of C type, using 54 of the 67 set of standard STR markers.  The cutoff is 8.  Notice the minimum at the gap, no sample at step 8, and only two at step 9 (one of the 9’s has 111 markers and is not L540).  C54 captures one sample that is not L540, Butman at step 6, but that sample has 111 markers, does not fit the C75(111) definition, and has tested L540-, so Butman is an STR outlier from another haplogroup.  C54 captures all the known L540+ samples at less than step 8, but this is a bit misleading.  In the past, my C type definitions at 67 markers have occasionally failed to capture new L540+ outliers;  My definition method generally captures outliers, so my new definitions are slightly different when making use of new outlier data.  So it is likely outliers will show up in the future from C54 step 8 or 9 or 10, at which time I’ll tweak my definition again.  It is even slightly possible that a sample at C54 step 11 might someday come out L540+, but the probability for each individual sample at step 11 is surely less than 10%.  In other words, prediction of L540 using C type is uncertain near the cutoff value of step 8.  Accordingly, in the table above for predictions at 67 markers, I used blue color for steps 7 though 10, where step 7 is relatively more confident and step 10 is relatively less confident.  I used red below step 7 indicating higher confidence of prediction, and boldface below step 4 for very high confidence.

            67 STR marker summary:  C54 can be used to predict L540 quite well, with uncertainty near the cutoff.  111 markers work much better, because there are additional 3 excellent signature markers and a number of other helpful markers in the 111 set that are not available at 67.

            My analysis file http://www.gwozdz.org/C67Type.xls is available if you are interested in the details.  In that analysis file I show how several other C type definitions work almost as well as C54;  various definitions using from 4 to 67 markers differ only by a few samples near the cutoff.  That analysis file has a sheet “Haplotypes and Masks” with C54(67) and also with my previous definitions.  It also has sheets with C type data from Ysearch and from Haplozone.  C54 is also available at Ysearch with the ID QAZ7P.

            For more discussion, see the topic 67 Markers.

            C4(67) is the signature used by Haplozone cluster C since before L540 was discovered:  (390, Δ389, 447, 594) = (25, 19, 25, 12).

 

            37 STR Marker data updated 28 Nov 2015.

            C30(37) is my modal haplotype best fit for predicting L540 samples using 30 of the 37 set of standard STR markers.  The cutoff is step 5.  There is no gap, so the cluster does not form a type;  prediction is not very specific using only 37 markers.  From the calibration (C30 step values for samples with 111 and 67 markers), it seems C30 predicts C type with relatively high confidence for steps less than 3, so I colored those red in the table.  I used blue to for steps 3 through 8 to indicate progressively lower confidence for higher step values.  The probability

is low for each sample at steps higher than 5, but there surely will be a few outliers showing up with C30 step greater than 5 in the future;  indeed Fredeen and Sabieka are L540+ and they have step 6,

            My analysis file http://www.gwozdz.org/C37.xls is available if you are interested in the details.

            I have another file, http://www.gwozdz.org/C37Matrix.xls, with a matrix of step values, showing step between samples at 37 markers.  The samples with nearest neighbors tested L540+ are more likely to also be L540.  This file also shows that most samples at steps 5 through 8 are probably not L540.

            For more discussion, see the topic 37 Markers.

 

            25 STR Marker data updated 1 Dec 2015.

            C12(25) is my modal haplotype best fit for predicting L540 samples using 12 of the 25 set of standard STR markers.  The cutoff is step 2.  There is no gap, so the cluster does not form a type;  prediction is not very specific using only 25 markers.  From the calibration (C12 step values for samples with 111, 67, and 37 markers), it seems C12 predicts C type with some confidence for steps 0 through 3, so I colored those blue in the table.  I did not use red for step zero, because there a