L540 - A Small Y-DNA Haplogroup

9 Oct 2018

Peter Gwozdz

pete2g2@comcast.net

Notice

            3 Aug 2018:  FTDNA changed the rules for DNA data, requiring that DNA data must be removed from all files whenever a person changes their privacy settings to restrict web posting.  This web page has links to my 6 STR analysis “xls” files.  It would be too much trouble for me to change all 6 files every time a person changes settings, so I removed these files from the web.  The links to these files are still here, but the links do not work.  I’ll be rewriting this page to remove all mention of my STR analysis;  it may take me a few months to finish a full rewrite.  This does not matter very much because these days SNPs are much more important than STRs, and SNPs do not require the statistical analysis.

            The few individuals named in this web page requested to be mentioned;  they may request to be removed at any time.

 

News

            9 Oct 2018:  Eliasson added;  new branch Y153993 Eliasson & Rohss

            3 Aug 2018:  Rohss added to the L540 tree

            15 Jun 2018:  Gush added to the L540 tree

            16 Apr 2018:  Hartsfield 80059 added to the L540 tree

            11 Apr 2018:  Kusyi added to the L540 tree

 

Abstract

            Edit 16 Aug 2018.

            This web document is a summary of my information about a small haplogroup of Y-DNA based on an SNP mutation named L540.  The subject is genetic genealogy.

            The L540 Tree shows the samples that have been tested for the branches of L540.  Prediction to the branches cannot be done with confidence by STRs, so L540 samples need to do further SNP testing to determine their branch.

            The L540 haplogroup seems to be roughly 1,950 years old, with an origin perhaps in what is now Germany.

            This web document is written for people reasonably familiar with the jargon of genetic genealogy.  If you are new to genetic genealogy you might prefer to first read an Introduction that I wrote for another of my web documents.

            My References and Sources are listed at the bottom.

 

L540 in Y-DNA Tree

            New Topic 27 Dec 2015.  Edit 3 Jan 2016.

            Rough outline of the human Y-DNA tree with ISOGG and SNP code names, showing location of L540:

       E (M96)

              E1b1b1 (M35.1)

                      E1b1b1a1 (M78)

                             E1b1b1a1b1a (V13)

                               E1b1b1a1b1a6 (L540)

            V13 is the largest haplogroup division of haplogroup E, but L540 is relatively small.

            Link to an  up-to-date tree with more V13 & L540 detail:

http://www.yfull.com/tree/E-V13/

            Steve Fix is now heading a project to discover new branches of V13, using Big Y data.  New branches are showing up almost monthly.  His V13 tree is at:

https://docs.google.com/spreadsheets/d/1D9WaPOZn_0l5GKtqXR0PbbxEir2o1t0e2HO-SJ2GboI/edit?pli=1#gid=2080530255 and discussion can be found at:

http://community.haplozone.net/index.php?topic=3657.msg36427#new.

            The FTDNA tree does not use L540, placing the samples in haplogroup S3003, a branch of V13.  L540 is the main branch of S3003;  L540 has all but one of the known S3003 samples.

 

L540

            Rewrite 27 Dec 2015.  Edit 16 Aug 2018.

            L540 is the code name for an SNP that was discovered in my WTY.  L540 was announced 29 March 2011.  On 27 Apr 2011 I demonstrated that L540 defines a new haplogroup branch of V13.

            I use the code name L540 for the SNP, for the associated  haplogroup, and for the samples (men) in that haplogroup.

            This haplogroup was predicted as cluster C based on STR correlations in 2008.  When I originated this web page in early 2010, I coined the name V13C, renaming it L540 on 30 Apr 2011.  Cluster C, also called C type, is the STR equivalent of L540.

            Update of statistics on 27 Dec 2015:  There are 25 samples that have tested L540+.  In addition, there are 18 C type samples that have not taken the SNP test;  I predict about 15 of them would test L540+.  Finally, there are several marginal samples (STRs close to the C type cutoff);  I suppose another 5 or so of those would test L540+.  That’s roughly 45 known members of L540.

 

L540 Tree

            9 Oct 2018 branch Y153993 added

            Edit 16 Aug 2018.

            2 Aug 2018 Rohss added;  A9034 branch added

            15 Jun 2018 Gush added

            29 Apr 2018 edit of the format codes

            16 Apr 2018  Hartsfield 80059 added;  a few edits to the tree for clarification

            11 Apr 2018 Kusyi added

            10 Jan 2018 Wieand added

            2 Dec 2017 Simonsson added

            19 Nov 2017 Belov 321021 added

            6 Oct 2017 Helgesson 682512 added

            22 Sep 2017 Eliasson 458953 added

 

            L540 Tree is below, in conventional outline format.  Click on a link in this tree for more discussion about that SNP or sample (male ancestor family name).

 

Format for each individual line data:

Ancestor:  Country.  FTDNA code number, BY = Big Y;  Yfull code;  Yseq code

 

            References:  FTDNA, Big Y, Yfull, Yseq

            The BY SNP number is the SNP used for that sample in the tree at FTDNA.  In many (not all) cases, that is a “private” SNP, unique to that individual in the Big Y database, meaning that mutation happened relatively recently in that male family line.

            -- means no Big Y data;  placed in the tree by testing individual SNPs, or very close STR match to another sample

 

V13

                        V13 has about 80 phyloequivalent SNPs

     There are other V13 branches

                        Z5018 and Z5016 are by far the largest branches of V13

     S3003              about 6 phyloequivalent SNPs

                        PGP89

          L540           about 23 phyloequivalent SNPs

               L540* = A6295-, Y7026-

                        Hoff:  Norway.  374375, BY-S3003;  YF09864;  3203

                        Kovalev:  Russia.  268215, BY5858; YF04818

                        Ponto:  Poland.  557832, BY-S3003;  YF08450

                        Helgesson:  Sweden.  682512, --

                        Belov:  Russia.  321021, --

                        Simonsson:  Sweden.  70482, --

                        Kusyi:  Ukraine.  370518, --

               A6295         1 phyloequivalent SNP

                    A6295* = A9034-

                             Nowak:  Poland.  225596, BY5854; YF03833

                    A9034    1 phyloequivalent SNP

                        Y153993

                             Eliasson:  Sweden.  458953, BY-A9034;  YF15457

                             Rohss:  Germany.  555020;  BY-A9034;  YF14486

                        A9035         about 2 phyloequivalent SNPs

                             A9036-

                                  Kargul:  Poland.  199446, --; 4230

                             A9036     about 10 phyloequivalent SNPs

                                  Gwozdz:  Poland.  N16800, BY5850; YF02909;  1433

                                  Gush (Gwozdz):  Poland.  B182584, BY5850; YF13748

               Y7026             about 4 phyloequivalent SNPs

                    Y7026*:  Z29042-, A783-

                        Marschner:  Czech.  480087, BY5890;  YF05931

                        Stavbom:  Sweden.  B3807, BY5862;  YF05186;  2100

                        Glasser:  Germany.  171456, BY5930;  YF04393

                        Appell:  Germany.  350864, --;  3187

                        Blind:  Germany.  B2670, --; 2891

                         Kline:  Germany.  158091, --;  2360

                    Z29042            about 6 phyloequivalent SNPs

                        Z39377-

                             Gebert:  Germany.  166692, BY5193;  YF01811

                        Z39377

                             BY5200

                                  Roider:  Germany.  275510, BY5200;  YF04216

                             BY5185

                                  Hartsfield:  Prussia.  140927, BY5185;  YF04834;  2397

                                  Hartsfield2:  Prussia.  80059, BY5185;  YF13261

                    A783

                        A783+ A1157?

                             Ratuszni:  Hungary.  200924, --

                        A1157

                             Weiand:  Germany.  51282, BY-A783

                             Y12393-

                                  Stelz:  Germany.  175213, BY5900;  YF05757;  2028

                             Y12393

                                  BY5853

                                       Svercl:  Czech.  155155, BY5860;  YF02913

                                  A779      about 10 phyloequivalent SNPs including BY5841

                                       Y33576, A782/BY5856

                                            Hochreutter:  Germany.  N45041, BY5856;  YF02161

                                            Hochreutter:  Germany.  --;  YF09477

                                       BY5909

                                            Hochreutter:  Germany.  131761 BY5853;  YF06285

                                            Hochreutter:  Germany.  --;  YF09478

 

V13

            Rewrite 31 Oct 2015.

            For detailed V13 trees, see:

https://docs.google.com/spreadsheets/d/1D9WaPOZn_0l5GKtqXR0PbbxEir2o1t0e2HO-SJ2GboI/edit?pli=1#gid=2080530255

 

http://www.yfull.com/tree/E-V13/

 

http://isogg.org/tree/ISOGG_HapgrpE.html

 

            V13, in the E haplogroup, is a major branch of the human Y-DNA tree.  The L540 branch is a relatively small branch of V13.

            There are about 80 known SNP equivalents to V13.  V13 was the first to be discovered and the one used in most discussions about this haplogroup.  All but a very few V13 samples belong to L142 and CTS5856, so technically L540 is the main branch of the S3003 haplogroup, which is one of many branches of the CTS5856 haplogroup, which is the main branch of the L142 haplogroup, which is the main branch of V13.  For simplicity the L540 Tree above minimizes these details.  I usually just say in this web page that L540 is a branch of V13.  I say that V13 is the father of L540, when technically S3003 is the father and V13 is the great-great grandfather, and even that may change if additional side branches are discovered with very few samples.  I’m ignoring the known branches that have few samples, for simplicity.

            L542 is one of those 80 equivalents.  V13 is sometimes called L542.  L542 was found in my WTY.

 

PGP89

            New Topic 10 Feb 2015.

            PGP89 is a sample from the Personal Genome Project (search Google for details).  PGP89 is S3003+ but L540-, so this sample represents an older node in the branch leading to L540.  So far there are no such S3003+ L540- results in the E-M35 Project.

            This is from Steve Fix, who includes PGP data in his tree.

 

Z29042

            New Topic 14 Jan 2015.  Edit 27 Dec 2015.

            This SNP was discovered by Steve Fix on 10 Jan 2015, from the Big Y data of Roider, compared to Gebert.  These two samples have this SNP, but Hochreiter does not, so Z29042 defined a new Haplogroup, the first branch to be found for L540.  Steve assigned the Z series code number.  Actually, there are 6 new SNP locations common to Roider and Gebert, but only Z29042 was assigned a code;  some of those others may be needed in the future.

            I’m a bit surprised.  I expected Roider to fall into a branch with Hochreiter, because they are closest in STRs.  Also, I have been predicting an older node for Gebert, based on his DYS389 value, and his STR values that differ from other L540 samples, more than L540 samples differ from each other.  STR predictions are statistical, because STRs mutate relatively rapidly.  So this is a surprise, but such surprises are expected from time to time when making predictions based on STRs.

 

A783

            Update 10 Feb 2015.

            This SNPs was noticed by Steve Fix and me in Hochreiter’s Big Y data, our first L540 Big Y.  Actually, there were 10 new SNPs;  I tested myself for them but came out negative.  Yseq assigned A series code numbers to them.  None of the 10 showed up in the Big Y data for Roider or Gebert.  In Feb 2015 I noticed this one in the Big Y data for Svercl, so it defined a new haplogroup branch for L540, with Hochreiter and Svercl, not me, not Roider, not Gebert.

 

Y12393

            New topic 12 Jul 2016.

            This SNP was newly posted by Yfull in July 2016, as a new branch of A783.

 

A779

            New topic 12 Jul 2016. Edit 31 Jul 2016.  Edit 17 Aug 2016.  Edit 29 Aug 2016.

            Steve Fix suggested the SNP A779 to me on 12 Jul 2016, to distinguish Svercl from two Hochreiter Big Ys.  The two Hochreiter’s are 7th cousins, with a common ancestor about 300 years ago, so the A779 mutation is at least 300 years old.  The 2nd Hochreiter Big Y provided a split into two branches, defined by A782 and BY5909, although those two will not be listed in Yfull’s tree until a 2nd sample shows up in the same branches.

 

Y7026

            New Topic 8 Feb 2015.  Update 17 Mar 2016.

            This SNP represents the major division of L540, with 11 of the 16 samples in the L540 tree so far.  The Yfull tree estimates Y7026 to be about 2,000 years old, although this is a very rough estimate due to the caveats associated with DNA age estimates.  See the next topic discussion about the “bushy” nature of Y7026.

 

Y7026*

            New Topic 17 Mar 2016.

            We now have 5 samples in the paragroup Y7026*.  Click here for a jump to Y7016* in the tree.

            All 5 have been confirmed with SNP results Y7026+, and Z29042-, A783-, so they do not belong to those two known haplogroup branches of Y7026.  Two of the 5 have Big Y results, and they do not have a common novel SNP, which means they will end up in two different new branches of Y7026, as soon as a future sample in their branch gets a Big Y result with a common novel SNP to define that future haplogroup branch.

            In other words, we know Y7026 has a least 4 branches.  The node associated with Y7026 is the major “bushy” node of the L540 tree.

            The other three samples have not purchased Big Y;  their results are from SNP testing only.  These three may belong to those two future branches.  Or, perhaps one or more of those three may end up in yet another branch of Y7026.

            A bushy node is evidence that the immediate descendants of the corresponding MRCA participated in a significant population expansion.

            On the other hand, bushy nodes may be just random, not evidence of population expansion, due to the luck of SNP discovery statistics, particularly for the case of Y7026, with only 11 total samples so far.  Big Y does not cover the entire Y chromosome, and Big Y does occasionally randomly miss SNPs, so future testing may show Y7026 to be not so bushy after all, if future novel SNPs combine the Y7026 branches into fewer larger branches.

 

A6295

            New Topic 22 Jul 2015.  Update 17 Mar 2016.

            This SNP and haplogroup was defined 22 Jul 2015, being present in Nowak’s Big Y, and also present in my earlier Gwozdz Big Y.  Kargul tested positive for A6295, making 3 samples so far.  My Gwozdz cousin would no doubt also test positive, but I leave him out of the L540 tree since together we represent one ancestral line.  Actually, I recruited both Kargul and Nowak, based on close STR matches to me, so statistically, the A6295 branch should be considered to be much smaller than the Y7026 branch with 11 independent samples.  Note that the three A6295 samples are the only Poland origin samples in the L540 tree;  I did not recruit on the basis of Poland origin, so we can speculate that A6295 might represent a small Polish branch of L540, although three samples is far too few for any confidence in this regard.

 

A9035

            New Topic 21 Jan 2016.

            This SNP has just been defined 21 Jan 2016.  It is negative in one A6295 sample (Nowak) and positive in the other two (Gwozdz and Kargul), so it represents a haplogroup - a small twig in the Y-DNA tree.  Kargul does not have Big Y data;  Kargul’s FTDNA sample is A6295+;  Kargul’s Yseq sample is A9035+.  I (Gwozdz) ordered SNP tests at Yseq for 4 of my “private” SNPs, A9032, A9033, A9035, and A9036;  Kargul is negative for those other 3, implying that our MRCA node for A9035 is roughly 3/4 as old as our node with Nowak for A6295, although this is a very rough estimate with only 4 SNPs tested.  At the Yfull SNP browser, using the locations for those 4 SNPs from my (Gwozdz) Big Y data, I verified my positive standing for all 4 of these SNPs;  Nowak and all other V13 samples in the V13 Project at Yfull are negative for all 4.

 

Z39377

            New Topic 22 Nov 2015.

            This SNP has just been defined 22 Nov 2015.  It is present in Hartsfield’s recent Big Y, and is also present in Roider’s Big Y from earlier this year.  So Z39377 defines a new haplogroup, with only those two samples so far.

 

S3003

            Update Feb 2015.

            This SNP is in the L540 branch, but older.  PGP89 is a sample from the Personal Genome Project (search Google for details).  PGP89 is S3003+ but L540-, so this sample represents an older node in the branch leading to L540.  So far there are no such S3003+ L540- results in the E-M35 Project.  Technically, S3003 defines a haplogroup with branches PGP89 and also L540, but for simplicity I just say in this web page that L540 is a branch of V13.

 

Determining Your L540 Twig;  Dividing L540;  Discovering New SNPs

 

            Rewrite 31 Oct 2015.  Edit 3 Nov 2015.  Edit 22 Sep 2017.  Edit 16 Aug 2018.

            I recommend Big Y, next paragraph, if cost is not an issue for you, and if you are enthusiastic about discovering new haplogroups.  Otherwise, consider the less expensive tests per the following paragraphs, to determine your current haplogroup.

            Big Y:  Discovering new SNP haplogroups is part of my genetic genealogy hobby.  I have been recently recruiting L540 members to purchase Big Y in order to discover new SNPs, which provide new haplogroups - terminal  twigs on the Y tree, to further subdivide L540.  It’s not cheap.  $575 for Big Y.  Anyone interested in joining this L540 project can order Big Y;  please contact me so I can keep track of the status.  With Big Y, there is no need for individual SNP testing.  In fact, with Big Y, most men immediately discover a “private” new twig of the Y tree unique to their sample (unique so far in the Big Y database).  Many men discover a new twig combining their sample with one or more other private twigs.  It is almost certain that a future Big Y test will match one of the new SNPs in your Big Y data, thereby defining a new small branch, perhaps combining quite a few samples.

            If cost of Big Y is an issue, you can wait.  Save your money.  The price will come down with time.  That means others get the thrill of discovering the series of twigs in your branch of the Y tree, but you get to purchase the corresponding SNPs at the low price for individual SNP tests.  Occasionally FTDNA has special sale pricing for Big Y.

            I encourage testing at FTDNA, and joining the E-M35 Project, because I like the convenience of finding all the data in one place.  There are other companies.  Yseq offers individual SNPs at lower price with faster results.  Currently, individual SNPs cost $39 at FTDNA vs $18 at Yseq.  Click on SNP ordering for detailed instructions.

            If you have already come out L540+ with an SNP test, you can work your way through the L540 tree one SNP test at a time.  For example, starting with the Y7026 test, if you come out Y7026-, then test A6295;  if you come out Y7026+, continue with Z29042 and if negative then A783, etc.  When finished, wait for another new twig to show up in the tree.

            If you are predicted L540 with high confidence based on STRs, you can skip the L540 test, but come back and test L540 and/or S3003 if you test negative for the two branches.

            If you are in that table with a low confidence blue prediction number, and you have not been tested for L540, consider one of the SNP package deals.  SNP packages are offered because DNA testing companies (like FTDNA) provide you with haplogroup predictions for your main Y-DNA branch.  Even if you are not in my table, if your sample is predicted by FTDNA to be in one of the main branches upstream from L540, consider an SNP package deal.  At your FTDNA home page, click on “Haplotree & SNPs”, which jumps to your predicted location in the tree.  A banner ad should be in the tree above your position for the SNP pack recommendation for you.

            FTDNA has an “E-V68 SNP Pack, with more than 100 SNPs downstream of V68, at $119, to determine which SNPs are yours.  V68 is the “father” of V13, which is the father of L540.  This pack has SNPs for most of the branches of L540.

            FTDNA update 22 Sep 2017:  The following L540 SNPs are available in the V68 pack and also individually:  S3003, L540, Y7026, A783, Z29042, A6295.  If you are sure you are V13, there is a V13 pack with:  S3003, Y7026, A783, Y12393, Z29042, A6295.

            Yseq update 22 Sep 2017:  If you are sure you are V13, Yseq has a “V13 Panel” with:  S3003, L540, Y7026, A783, Y12393, A779, Z29042, Z39377, A6295.  It costs $88, plus $5 for a cheek swab kit if you have not already tested at Yseq.  Link with a description:  http://www.yseq.net/product_info.php?products_id=2486.  That description has a nice V13 tree.  Please let me know if you are L540 and order this panel, so I can keep track of results.

            For more specific discussion, click on  L540, A783, Z29042, Y7026, A6295, SNP ordering, and Big Y.

            How about STRs?  In the past, I encouraged upgrading to 111 Markers, the largest set available at FTDNA.  Now that there are plenty of SNPs available with low cost tests, SNPs are better than STRs for finding your closest Y matches.  However, there are plenty of samples without the latest SNP tests, so if you are anxious to find out which of these best match your Y, 111 STR markers are much better than the smaller standard sets.

 

700 Year Old Skeleton - Might be L540

            New Topic 21 Mar 2016.  Minor edit 22 Sep 2017.  Edit 16 Aug 2018.

            Recent Article:  Vanek D. et. al. (2015) Complex Analysis of 700-Year-Old Skeletal Remains found in an Unusual Grave - Case Report. Anthropol 2: 138. doi: 10.4172/2332-0915.10000138.

            This Vanek article describes male skeletal remains found in Bohemia.  DNA testing included 14 STR markers from the Y chromosome, placing this man in the V13 haplogroup with high confidence.  (V13 is almost always predicted with confidence using only the basic 12 STR markers.)

            Vanek was brought to my attention by Svercl, who in turn found it mentioned at the Haplozone forum by “Passa”, who points out that the Y-DNA data matches L540 samples.  Passa was directed by “Ignis90” at the Anthrogenica forum.

            I demonstrate in this topic that the Vanek skeletal Y-DNA might belong to the L540 branch of V13, although I figure this prediction to be somewhat less than 50% probable.

            Vanek et. al. used 13 of the 14 markers, because DYS635 data was not available to them.

            Passa used the Haplozone database, which does not use 635.  Passa’s list of closest matches really uses only 12 markers, because Haplozone ignores DYS385a when 385b is not present.

            Vanek used Ysearch, apparently before 635 data was widely available.  The Vanek skeletal sample data is available at Ysearch as DE8WN.  Vanek, Table 1, has “X” for all but one of the values at 635 for matching samples.  (That one sample has an X at another missing marker DYS481.)  My  sample KFKGM is in Table 1 with X.  Today (18 Mar 2016), many samples, including mine, have 635 values.  The other samples in Table 1 still do not have data at all 14 markers.  Today, a Ysearch using the 13 values without 635 provides many more matches, with Kargol as the closest match, although Kargol is not in Vanek Table 1, probably because that data was taken before I recruited Kargol.

            The Vanek sample DE8WN does not match the L540 samples as well with all 14 markers, because DE8WN has 635 = 23, whereas the L540 Modal has 635 = 21.  Today, a Ysearch using all 14 markers, provides one match at step (also called GD, or genetic distance) 5, and 6 matches at step 6 including mine.  One at step 6, Bohman, has tested L540-;  the others do not have L540 test data available to me.

            Seventeen L540 samples are available to me (this paragraph updated 22 Sep 2017) with the 111 STR set, which includes 635;  9 have 635 = 21, one has 20, two have 22;  one has 23.

            Passa’s list of closest matches to the Vanek sample are all at step 3, which is not a good match for L540 using 12 markers.  I know of 2 samples with 67 or 111 markers that are not L540 but differ from L540 by only step 1 at 12 markers;  I know of 35 such samples with step 2 at 12.

            The samples from Vanek Table 1 using 13 markers are all at step 3 or greater from DE8WN.

            There are 542 samples available from E-M35 with 111 markers, including DYS635 (from data download 1 Nov 2015).  I analyzed these using all 14 Vanek markers.  There are no matches at genetic distance step < 5.  There are 5 samples at step 5 from the Vanek skeleton STRs, including Kargul and Hochreutter, who are tested L540+.  The other 3 matches at step 5 are not predicted L540, because the full 111 set with C75 gives step 19, 20, and 23.  Step 8 is the cutoff.  (2017 comment:  I currently use C74, which is only slightly different than C75.)  One of those other 3 not predicted L540 is confirmed L540- because it has a Big Y placement in S2927, another branch of V13.  Another is V13+ without an L540 test.  The last one of those 3 has no SNP data and is conservatively predicted M35 (upstream from V13) by FTDNA, although that last one is probably V13.

            There are 12 samples at step 6;  4 are tested L540+ and listed in the L540 neighborhood;  8 are clearly not L540, with C75 step 21 to 49.  Of those 8, 3 have Big Y results placing them in other branches of V13 (one is V13* belonging to a branch not yet identified).  One is tested V13+ L540-.  Two are tested V13+ without L540 test.  One is tested only M35+.  The last of those 8 is conservatively predicted M35 by FTDNA, again probably V13.

            This analysis at steps 5 and 6 using all 14 markers is my basis for the rough estimate for less than 50% probability that the skeleton is L540+.  2 of 5 at step 5;  4 of 12 at step 6.

            At step 7, 4 of 16 samples are L540+.  At step 7, 1 of 20 samples are L540+.  The last L540+, at step 10, is the STR outlier Fredeen.

            Summary:  14 markers are not enough for confident prediction of L540.

            End of analysis using the 111 marker database.

            There are 1728 samples available from E-M35 with 67 markers.  The L540 definition at 67 markers is C54.  I analyzed these using the 13 Vanek markers excluding 635.  There are no matches at genetic distance step < 3.  There are 2 samples at step 3 from the Vanek skeleton STRs, Kargul and Sabieka, who are tested L540+.  At step 4, 5 of 14 samples are L540+, about 1/3.  At step 5, 11 of 36 samples are L540+.  At step 6, 3 of 68 samples are L540+.  L540 outliers are 3 at step 7 and 1 at step 8.  At step less than 5 there are no marginal predictions;  at step 5 or more a few of the results are marginal - close to the L540 cutoff at step 8.  The match of Vanek to L540 is better using the 67 vs 111 marker sets, and statistics is better with a lot more in the database, but of course confidence is lower with 13 instead of 14 markers available for comparison.

            Although there is no close match, it is possible the skeletal remains came from a man who was an STR outlier in L540.  It is also possible he was an outlier from another known branch of V13.  A third possibility:  this man’s Y-DNA data might be the only data available to me from a small branch of V13 not yet discovered.

            There is a variation on that 3rd possibility:  the skeletal remains might be in a yet to be discovered very old node on the segment leading to the L540 samples.  L540 has 23 phyloequivalent SNPs.  I disuses this at haplozone.  That means the tree segment that goes from the V13 MRCA to the L540 MRCA is quite long, with 23 SNPs distributed along the length (in time) of that segment.  The Yfull tree estimates that segment as 4300 to 2000 years ago.  We don’t know where L540 lies in that segment;  if L540 is among the younger of those SNPs, then there may well be L540- branch twigs to be discovered along that long segment.  (S3003 seems to be one such twig, with only one PGP89 sample, as mentioned in my haplozone discussion.)

            It would be nice to get an L540 test from these skeletal remains.  If positive the known branches of L540 could also be tested.  Today, individual SNP tests are the same cost and effort as individual STR tests, with primers published by Yseq.

 

Cluster C

            Rewrite 31 Oct 2015.

            Friedman proposed cluster C in 2008, based on STR correlations, when the data was less than what is available today.  Cluster C now seems equivalent to L540.  The cluster C data is still available at the haplozone site but may not be up to date.

 

C Type

            Rewrite 31 Oct 2015.  Edit 16 Aug 2018.

            I defined C type in Jan 2010 as my version of Cluster C.

            I use C type to predict L540 samples based on STRs, for samples that do not have the L540 SNP test.

            I use the word type for an STR cluster with statistical validity as established by my Mountain Method.  “Type” is my own term.  I chose the word “type” because it is not generally used in genetic genealogy and I wish to distinguish my types from haplogroups and from other clusters.  By “type” I mean the cluster data, the hypothetical clade, the modal haplotype, and the set of all possible haplotypes, at any number of markers.  Accordingly, by “C type” I mean any or all of these 4 things.  I sometimes use just “C” as short for “C type”.  I also have a previous C type identified in R1a;  unrelated;  please don’t get confused.  I published my methods in the Fall 2009 issue of JoGG.

            My analysis files define C type.  Sorry, it can be a bit confusing because I have multiple STR definitions for C type, for various marker sets.  The number of markers in my definitions change slightly when new samples show up with unusual STR values.  I hope the meanings are clear from the context of my discussions in this web document. 

            Click on seems equivalent for an explanation that STR types (such as C type) cannot be exactly equal to equivalent SNP haplogroups (such as L540), due to STR outliers.

 

V13C

            Rewrite 31 Oct 2015,

            I coined the name V13C in 2010 to represent C type, cluster C, the hypothetical haplogroup, and the samples (men) in the hypothetical haplogroup.

            I also used V13C to mean samples that match C type from the database of samples at E-M35 or at Haplozone, or at other databases.

            This web document used to be named V13C.html.

            Now that C type seems equivalent to L540 I edited away most of my mentions of the name “V13C”, but I’ll continue to use “C type” for the predicted clade based on STRs.

 

L Type

            Rewrite 27 Sep 2017.

            I proposed L type on this web page in mid 2011, based on only 2 samples, which means not very high statistical confidence.  L type (also called L540 type) was a type that included C type plus those 2 samples that did not fit C type at that time.

            I no longer consider the distinction between C type and L type useful.  Those two samples, Gebert and Fredeen, both tested positive for L540.  So they are just statistical STR outliers.  Since then, more outliers have shown up;  recently, with lots of 111 marker  (next topic) data, I was able to come up with an STR definition of C type to capture all L540+ STR outliers, and not capture any L540- samples.  My C type definitions using less than 111 markers are not quite perfect at predicting L540 based on STRs, but they are satisfactory.

            I edited this web page to remove mentions of L type (except this topic).

 

111 Markers

STRs are still Valuable

            Rewrite 6 Oct 2017.  Edit 16 Aug 2018.

            FTDNA provides STR markers in various sets.  The largest, a set of 111, was introduced in 2011.  Upgrades can be purchased for samples with fewer markers.  Obviously, matches and predictions are more accurate using more markers.  Until 2014, I had been recommending the 111 set to L540 members, hoping to discover STR correlations good enough to divide the L540 haplogroup into clusters with high confidence.  Today, SNPs are more important than STRs.  This is because the cost of discovering new SNPs has come down a lot.  SNPs define haplogroup divisions;  STRs only provide statistical predictions for haplogroups.

            Some clusters are still defined by STRs, as predictions for new haplogroups, which need confirmation by discovery of a corresponding SNP.  However, STR analysis is yielding diminishing returns for this effort.  SNP discovery is now accelerating instead.

            Still, the majority of on-line samples have STR data without adequate SNP data.  So Y-STRs still provide you with your best list of on-line close male line matches.

            At your FTDNA home page click on the Y-DNA “Matches” button to see your closest matches using the various STR marker sets.  Many men at FTDNA do not join the various projects;  if someone in L540 does not join the E-M35 Project, I do not get to see his data.  If you you are L540 and have a very close STR match, please send him an email message about this L540 web page and about the E-M35 Project.  I still occasionally find new L540 members this way.

            Ysearch has lots of STR data from companies other than FTDNA, most without SNP data.  You can send emails to your closest Ysearch matches.  Ysearch does not support the full 111 set, but standard sets with fewer markers are supported.

            Haplozone is another on-line STR database.

            As a specific example of the value of STRs, I discovered DYS445 = 11 as an unusual mutation in my own Y, shared by my 3rd cousin, and also shared by Kargul, adding evidence that we form a twig in the L540 tree, perhaps restricted to south Poland, perhaps only a few centuries old.  DYS445 is not available at less than 111 markers in FTDNA standard sets.  The rest of L540 samples have the value DYS445 = 10.  The value 11 does show up very rarely elsewhere in V13, as an independent mutation, so although DYS445 is very slowly mutating it is not as slow as a typical SNP, so not as statistically reliable as an SNP.  Later, I discovered A9035 (tested at Yseq), an SNP for only the 3 of us.  A9035 is a twig in the A6295 branch (see the Tree).  In other words, DYS445 = 11 seems equivalent to A9035 today, although exceptions may well show up in the future.

            Summary:  111 STR markers are valuable if you are very interested in genetic genealogy, and if cost is not a big issue for you.  If cost is an issue, and if you are merely curious about your Y-DNA, as a first test I recommend the 37 marker STR set (topic after next).

 

67 Markers

            Rewrite 5 Dec 2015.  Edit 16 Aug 2018.

            FTDNA provides a 67 marker standard set of STR markers.  I have been using this 67 set for analysis for more than 8 years.  Although the 111 set is more accurate, this 67 set is valuable for analysis because there are a lot more samples on-line at 67, and all samples with 111 are included.

 

37 Markers

            Rewrite 5 Dec 2015.  Edit 16 Aug 2018.

            FTDNA no longer offers the 25 and 12 STR marker standard sets.  The 37 marker set is sufficient as a first test if you are curious to see in which Y-DNA main branch haplogroup you belong.  With 37 markers, FTDNA will automatically place you in one of the main large haplogroup branches of the Y-DNA tree.  For the smaller branches of the tree, there are SNP tests.  For L540 candidates, I have a separate discussion topic about this:  Dividing L540.

            Most of the more rapidly mutating STRs are in the 37 marker set, so the 37 marker set is good to search for your best matches to other men with a male line common ancestor in the last millennium or so.  FTDNA provides you with matches to other men with similar STR haplotypes.  All samples with 67 or 111 are included because they have these 37 plus more.  For more discussion see Value of STRs.

 

25 Markers

12 Markers

            Rewrite 11 Oct 2017.  Edit 16 Aug 2018.

            FTDNA provides the older STR sets, using 12 and 25, as special orders by project administrators, but for the price difference the 37 set makes more sense.

            There are still lots of data on-line with only 12 markers - not so many with only 25.  Those samples can still be checked for candidates for L540, but not with very high confidence.

 

Best STR Markers

            Edited 27 Sep 2017.  Edit 16 Aug 2018.

            STR markers that mutate relatively slowly are statistical indicators for clades in which they are recently mutated, but they are not perfect because of subsequent independent mutations.  When a clade has a few such good STR markers those provide a signature set of STR markers.  A signature is statistically expected to be a more probable indicator of a clade than just one marker.  Indeed cluster C is characterized by the Friedman Signature.  My definitions of C type (and thereby L540) use other helpful markers, not just the signature.

            My analysis files automatically rank markers, as useful for a particular definition, using a method that I published.  The exact ranking of markers varies slightly from month to month due to the random nature of mutation values in new samples, and due to the somewhat arbitrary cutoff that I use to restrict the database to the L540 neighborhood.  (Using too many samples provides a ranking of the father clade instead of the clade of interest.)  For example, a sample that ranks 6th one month might come out 4th or 5th or 7th or 8th the next month.

            An SNP that defines a haplogroup is very unlikely to have happened exactly at the time of the most recent common ancestor (TMRCA) of a haplogroup.  Most likely the SNP is somewhat older, because usually there are many generations between nodes.  By definition an SNP cannot be younger than the TMRCA.  Similarly, we can consider a hypothetical clade defined by a particular STR mutation, which is likely somewhat older than the TMRCA of that clade.  However, for clusters defined by signatures, and for types defined by definitions, one rare STR mutation that contributes to the signature might have happened before or after the TMRCA of that cluster or type.

            Very slow mutators should make the best markers.  However, the slowest are rarely mutated, so those with intermediate mutation rate show up more often as signature markers.  My Type.xls master file has the Chandler STR mutations rates, in the ASD sheet, row 5.  The ASD sheet is not usually included in my analysis files.

            Best Dozen STR Markers:  Using my latest (Sep 2017) analysis at 111 markers, here are my rankings of the best STRs for C type and thereby for L540 (DYS numbers):  1&2 (two way tie) - 594=12 & 636=12;  3 - 390=25;  4 - Δ389=19;  5 - 561=17;  6 - 444=13;  7 - 406=11;  8 - 504=14;  9 - 517=24;  10 - CDYa=29;  11 - 447=25;  12 - CDYb=33.

 

ΔDYS389 = 19

Original Marker for Cluster C

            Rewrite 27 Dec 2015.  Edited 27 Sep 2017.  Edit 16 Aug 2018.

            ΔDYS389II = 19 is one of the original Friedman Signature markers for cluster C.  It remains a good marker for C type and L540.

            [Technical detail:  DYS389 is a compound marker, where 389I is the first STR chain and (389II minus 389I) is the second STR chain.  For cluster C the first chain is 389-1 = 389I = 13.  The second chain is 389-2 = 19.  389II = 13 + 19 = 32.  The marker of interest here is really 389-2 = 19 (389II minus 389I = 19).  However, 389I mutates more slowly and has the value 13 for all but one L540 sample so far and for almost all samples in the L540 neighborhood.  At Ysearch or Haplozone, both 389 markers need to be used together;  if one is omitted both are ignored.  I use both 389 values, or neither, in my definitions to be compatible with other web sites.]  My xls files can be easily modified to use Δ389 without 389-1.

            All STR standard marker sets by all DNA companies include the 389 pair.  (I have not noticed any exceptions.)

            389 = 13, 30 is the modal value for V13, so it seems to be the ancestral value for L540.  389 = 13, 32 is rare in V13 (other than L540), but shows up in E-M35 branches outside V13.

            Only two L540+ samples, Fredeen and Gebert, have the ancestral value 13, 30.  Butman, the closest STR match with L540-, also has 13, 30.  Only a few samples in the branches of L540 have the value 13, 31, which is not common in the neighborhood.  On this basis, it seems likely that the mutations to from 13, 30 to 31 to 32 happened before the TMRCA for L540, and later mutations back from 13, 32 to 31 to 30 happened in very few L540 male lines.  (We cannot rule out a rare double size mutation incident, from 30 to 32, or a double mutation back to 30.)

            DYS389II (actually the difference value 389-2) ranks 43rd in Chandler mutation rates.  Near the middle.  So exceptions are expected, due to recent mutations.  DYS389-2 is ranked as the 4th best marker in my analysis of 111 markers.

 

DYS594 = 12;  Best Marker for L540 at 67 Markers

            Rewrite 27 Dec 2015.  Edited 27 Sep 2017.  Edit 16 Aug 2018.

            In my analysis, DYS594 = 12 is the best marker for L540 (and C type) using the 67 marker set.  594 is not in the 37 marker set.

            All L540+ samples with 67 or more markers  have the 594 = 12 value.  Butman, the closest STR match not predicted L540, indeed tested L540-, and has the ancestral 11.

            All C type samples (predicted L540), except one marginal sample not yet tested for L540, have the 12 value.

            A few samples in the L540 neighborhood have 594 = 12 but are L540-.  These are not a random sample;  I recruited two of them for the L540 test to find out if all 594 = 12 in the neighborhood are L540;  no, not all.

            The 594 = 12 value is more common in the L540 neighborhood than in the rest of the V13 data.  So I was wondering if 594 = 12 is an old mutation in the S3003 branch.  So I tested one of those two L540- samples with 594 = 12;  it came out S3003-, so it seems to be an independent mutation.  Also, considering the L241 haplogroup, some of those samples are in the neighborhood, but they have 594 = 11 except one sample that has the value 12, so that is also independent.

            DYS594 ranks 12th from the slowest in the 67 Chandler mutation rates.  Quite slow, so independent recent mutations should be rare.

 

DYS636 = 12;  DYS561 = 17;  DYS504 = 14; DYS714 = 24

Excellent Signature Markers for L540

Available in the 111 Set

            Rewrite 27 Sep 2017.

            These 4 are not in the FTDNA 67 STR maker set, but are available in the 111 STR marker set.  636 is just as good as 594 [previous topic];  they are tied as the best two STR markers.  Those other 3 are among the dozen best.  That’s why C74(111), my 111 marker definition for C type, works very well.

 

Friedman Signature

            Rewrite 29 Dec 2015.  Edit 27 Sep 2017.  Edit 16 Aug 2018.

            The signature is (390, 389-2, 447) = (25, 32, 25).

            Friedman had been calling this the “characteristic marker values” for cluster C at the Haplozone site before I started working on this, back in 2008, when there were only 9 samples available in cluster C, including mine.

            This original Friedman signature works surprisingly well by itself for samples with only 25 of the standard markers, but not with high confidence.

            In early 2011 Friedman added 594 = 12 to the “characteristic marker values”, for 67 marker samples.

            DYS389 is a compound marker, discussed above.

            Friedman used a more complicated analysis than just this simple signature in her C type assignments.  I do not know her method exactly, but most definitions (not all) that I tried, selecting well ranked markers, extracted the same samples that she did.

 

L540 Neighborhood

            16 Aug 2018:  Neighborhood Table removed, due to the new FTDNA rules for on-line data.

            I still use the word Neighborhood to mean samples that seem close to L540 based on STRs but are not predicted L540 with high confidence based on STRs.  Neighborhood samples may have results for the L540 SNP test;  those are used to calibrate my predictions based on STRs.  I also use the word Neighbor to refer to samples that are close STR matches.

Gwozdz

            My sample is kit N16800.  N81304 is my 3rd cousin Gwozdz.

 

Kargul

            Edit 17 Dec 2015.

            Kit 199446, Aloysius Kargol is my closest STR match available on the web (other than my 3rd cousin).  In May 2010, his daughter noticed, on ancestry.com, that he and I are perfect matches at 12 STR markers.  I studied the LDS microfilms and located his 1820’s Kargul ancestor living in a village in Poland only 20 miles away from the village of my Gwozdz ancestor.  I paid for his FTDNA sample.  Kargul is in the table above.  His L540 test came out positive, placing him in that new haplogroup.  We are 5 steps apart at 67 STR markers;  9 at 111.

            For estimating the size of L540 or C type, my cousin and Kargul should not be included, because I recruited them, paying for their tests.  Family sets such as these distort size estimates, when comparing the number of samples per haplogroup or per STR type or cluster.

 

Butman

            New topic 13 May 2011.  Rewrite 22 Dec 2015.  Edit 16 Aug 2018.

            Butman’s L540 SNP test came out negative in 2011.  That means he is not a member of the L540 haplogroup.  Kit N91348.

            This sample is interesting because it is an STR outlier from another haplogroup, coming out closest to C type.  (C type is the STR equivalent of L540.)

            At 67 markers, this sample actually falls within C type;  check the numbers in that table, at the columns for the 67 and 37 marker modal haplotypes.  That’s because the 111 marker set has quite a few good signature markers for C type.  Before 2011, at this web page, I listed this sample as at the edge of C type, or predicted L540 with low confidence.  Using only the 37 marker set, Butman’s 5 closest neighbors are C type (Dec 2015).

            This sample recently came out negative for S3003, which is the “father” of L540.  The MRCA node for S3003 is older than the MRCA for L540.  This sample tested V13+ but has not yet been tested for all the recently discovered SNP branches of V13.  Using all 111 STR markers, Butman has no close neighbors;  his closest are Bartlett at step 21, Hohnloser at step 22, and Hochreiter (L540+) at step 23 along with another Bartlett sample and two other samples that are not in the Table above (Dec 2015).

            In the Y-DNA tree, Butman’s node where he branches apart from L540 is surely older than 1,000 years and might even be older than 4,000 years, according to the estimated age of L540.

            What does this mean?  The simplest explanation is that Butman is alone in the E-M35 database, in a very small haplogroup that branches off the branch leading to S3003 and L540 perhaps 2 or 3 millennia ago.  Another possibility:  he may belong to the recently discovered Z17264 haplogroup, since Bartlett belongs to that one (Table above).  Z17264 is a twig in the main branch Z5018 so Butman might have an MRCA older than Z17264, perhaps.  (The test results might come out Z5018+ Z17264-.)  This paragraph is statistical speculation;  Butman might end up in a new branch of V13, negative for all known branches, for all we know.  This paragraph is a good example of the uncertainty of STR based predictions for outliers.  Big Y or SNP tests are needed here.

 

Fredeen

            Rewrite 27 Dec 2015.

            Kit 162917, Fredeen, has been listed at this web page since Mar 2010.  L540+ result May 2011.

            This sample is an STR outlier.  Even with all 111 markers, this Fredeen sample differs a lot from all the other L540 (C type) samples.  The closest neighbor is at step 24;  most L540 samples have closest neighbor at step 14 to 18.  (Samples with the same family name are even closer, of course.)

            The original best L540 signature marker is DYS389 = 13,32;  Fredeen has 13,30, which is the ancestral value (for most Neighborhood samples outside L540).  Fredeen also differs at two other L540 signature markers.

            The simplest explanation is that Fredeen belongs to a branch with a node in the L540 tree that is older than the other nodes.  Perhaps those 3 signature markers mutated to the L540 values after the node leading to Fredeen.

            However, there is an alternate possibility:  Fredeen may belong to one of the currently known branches;  perhaps those 3 signature markers experienced back mutations;  perhaps the Fredeen line has more mutations than normal, due to the luck of mutations.  Read the following topic, Gebert, also an outlier.

            SNP testing is required to determine the branch for this sample.

 

Gebert

            Rewrite 27 Dec 2015.

            I noticed Gebert’s sample on Ysearch and encouraged him to join the E-M35 project, which he did in 2011, kit 166692 in the table.  I helped pay for the orders for the L540 test and for the 111 extension.  He purchased Big Y in 2014.

            Gebert is also an outlier;  read the previous topic, Fredeen, for a brief explanation.  Gebert is not quite as extreme an outlier as Fredeen, with closest neighbor at step 20.  Gebert also has the ancestral DYS389 = 13,30, and also differs at two other signature markers (not the same two as Fredeen).

            In this case, because Gebert purchased Big Y, we know that this sample falls in the Z29042 branch of the L540 tree.  So it is clear that the Gebert line has more than the expected number of STR mutations;  it is just luck that those 3 signature markers mutated back to the ancestral values, because L540 samples both in Z29042 and outside Z29042 have the signature values.  This sample is an example of the limitation of predicting haplogroup based on STR values.

 

Hohnloser

            Rewrite 22 Dec 2015.

            Hohnloser (kit N39989) is another outlier outside L540.  To understand this, please see the topic above for Butman.  Hohnloser is not quite as close to C type as Butman, but otherwise the Butman discussion mostly applies also to Hohnloser.

            Hohnloser has been mentioned here at this web page since 2010.

            Hohnloser also does not belong to the L540 haplogroup because his SNP test came out negative.  He has not been tested for S3003.

            Hohnloser’s nearest neighbors at 111 markers, step 22, are Butman and two other samples not in the Table above.  Hohnloser’s nearest neighbors with haplogroup identification are at the next step, 23, 3 samples, 2 of which are L241+.  However, Hohnloser tested L241-.  L241 is a branch of Z5018, so maybe Hohnloser might fall in one of the other Z5018 branches.

            Jorg Hohnloser has extensive family tree research results.  He administers a Hohnloser project at FTDNA.  He exchanged helpful email discussions with me.

 

Hochreutter

            New topic 12 Dec 2014.  Edit 17 Dec 2015.

            Kit N45041, Administered by Andrew Hochreiter, who runs the Hochreiter Project.

 

Ysearch

            Update 4 Dec 2015.

            QAZ7P is a direct link to my definition for C type.

            If you are not listed in the table above you can compare your data on Ysearch.  You can compare your step genetic distance to this definition if you have the standard 67 STR markers.  The comparison may not work if you have a non standard marker set.  For more discussion see the notes below the table above.

            To join Ysearch, click on the Create A New User tab, where you can upload your Y-DNA STR data from a number of testing services.  Or, you can type in your data.  You end up with a “User ID”.

            Brief description of Ysearch.  Link to the site home:  http://www.ysearch.org.

 

            Instructions for comparison to C type at Ysearch:

            Click here:  Research Tools (or click on the tab with that name)

            Copy the following line into the “UserIDs” bar at the Research Tools page:

                                    USEID, QAZ7P

            Change USEID to your User ID.

            You need to type the Captcha puzzle for access.

            Click on ‘Show genetic distance report” to see your step genetic distance from C type (from L540).

 

Ancestry.com

            Update 27 Dec 2015.  Edit 16 Aug 2018.

            Ancestry.com no longer provides a comprehensive Y-DNA database.  They now concentrate on autosomal DNA (all chromosomes, not just Y).

            Kargul originally matched with me at this site, back in 2010, so I encouraged Kargul to join the E-M35 Project.

            I last checked for matches 16 May 2011, when the Y-DNA database was still active.  There were 9 matches of Y-DNA to Kargul & me, but these were not very close matches.

 

Age of L540

            Rewrite 22 Dec 2015.  Edit 2 Jan 2016.  Update 16 Apr 2018.

            About 1,950 years old.

            The Yfull Tree provides estimates of age for haplogroups, based on the number of accumulated SNPs.  Here is a link to the Yfull L540 tree: 

https://www.yfull.com/tree/E-L540/.  Click on the “info” box for links to details of the Yfull age estimation methodology.

            Compare this to the Fix V13 tree and to my L540 Tree.

            L540 has 27 phyloequivalent SNPs listed by Yfull.  On  that basis, Yfull estimates the L540 branch segment to be 1,950 to 4,200 years before present (ybp).  In other words, L540 and 27 other SNPs are distributed along a (4,200 - 1,950) = 2,250 year segment of the Y-DNA tree between two nodes.  The 4,200 ybp is the node where L540 along with several other branches split out from the CTS1273 branch V13.  The 1,950 ybp is the node where the known branches of L540 split out - the time to the most recent common ancestor (TMRCA) for all the L540 samples.

          The L540 haplogroup seems to be roughly 1,950 years old - the TMRCA for the L540 samples at Yfull.

            That 1,950 year age for L540 haplogroup may likely increase in the future if samples show up with branch nodes older than the currently known branching node of L540.  1,950 years will remain a good estimate for the TMRCA for the two currently known branches, A6295 & Y7026.

            The Fix tree also has additional data not included in the Yfull analysis.  These produce an adjustment in the V13 segment, but there is no effect on the L540 segment.

            The Fix tree also includes S3003, listed by Yfull as one of the L540 phyloequivalents, because the Yfull database does not include that one sample PGP89 that is S3003+ L540-.  This may be confusing, but phyloequivalents are different at Yfull vs Fix because the data is different.  In the Fix tree, S3003 is phyloequivalent to S2999 and S3015;  all three are included in that Yfull list of 27.

            The actual L540 mutation is probably older than the 1,950 year old TMRCA, because we do not know where the L540 mutation sits along that 2,250 year segment (between 4,200 years ago and 1,950 years ago).

            In previous versions of this L540 web page I used STR mutations to estimate age, but SNPs are preferred now that we have lots of SNP data.

            In my 11 Jul 2011 version, I excluded the two STR outliers Gebert and Fredeen, getting 1,000 years for my STR based estimate for C type without the outliers.  I guessed double that for L540, with high uncertainty because of only two known outliers.  That’s was roughly 2000 years for TMCA previously with STRs, about the same result as today (Apr 2018) using SNPs.

            Such age estimates are statistically uncertain because of the small sample size.

            Net confidence is even more uncertain because of the caveats associated with DNA age calculations.  There is no way to calculate the effects of such caveats.  I personally think that 1500 to 2500 ybp range is a 75% confidence range.

 

Origin of L540

            Update 27 Dec 2015:  Edit 16 Aug 2018.

            I know of 25 samples with “+” indicating confirmed L540 with an SNP test (Dec 2015).  9 of these indicate “Germany” as the origin of their most distant known ancestor.  That’s 36% Germany.  Actually, that needs adjustment for recruitment:  3 of the 4 “Poland” were recruited by me, where I paid for the L540 test;  1 of the 4 “Sweden” is a repeat family name;  both “Russia” were recruited.  All 9 “Germany” samples are independent as far as I know.  So adjusting the statistics for recruitment, that means 9 out of 19 independent samples come from Germany, which is 47% Germany.

            In addition, there are 12 samples predicted C type at 67 markers (< step 8 in the C54(67) column);  most of these would probably be L540+ if tested.  6 of them indicate “Germany” and 6 are “Unknown” so more than half are probably German.

            That is very good (although not 100% certain) evidence that the MRCA of L540 lived in what is now Germany. 

            It is also possible that most of his descendants migrated to Germany from somewhere else.

            If not Germany, it seems very likely the origin is somewhere in central or eastern or northern Europe.  Even this is not 100% certain, because there is some bias in on-line DNA data toward Europe.  Many parts of the world are not represented well in the database.  I suppose there is a slight chance that someday L540 samples will show up as common elsewhere - for example a group of villages somewhere in the mountains of Russia, or somewhere in the Balkans. or somewhere on the Eurasian Steppe.  Discussing an origin on the basis of so few samples is a bit speculative.  We’ll see how it comes out as more data accumulates.

            How about those 25 SNPs (previous topic) that are phyloequivalent to L540 (22 in the Fix tree)?  Those represent a smooth branch of the tree, with no branches (or none yet discovered).  The smooth branch length in time seems to be about 1,900 years.  The default explanation is statistical;  most Y-DNA branches become extinct;  our L540 MRCA is the lucky individual who won the ancestry lottery.  If it seems to you counter intuitive that most branches die out, check my discussion on extinction.

            It is possible there was a population bottleneck to accelerate the pruning of branches.  There may have been a severe reduction of population in the region of the L540 origin, followed by a population expansion at the time of the ancestor - TMRCA - or soon after.

 

Size of L540

            New topic 29 Dec 2015.

            I estimate there are 100,000 L540 males living in the world.  This is my very rough educated guess, explained here in this topic.  This estimate is surely not wrong by a factor of 10;  my 90% confidence range is a factor of 3;  in other words the actual number is very likely between 33,000 and 300,000.  My basis:

            There are 25 samples (men) that fit my C54(67) definition.  I have high confidence at 67 markers, so that number 25 is not far off.  My definitions work well at capturing only L540 samples.  From my experience I expect a few outliers to show up out of the samples just missed by the definition.  However, a few of those 25 were recruited, making L540 seem bigger.  On balance, I see no reason to statistically adjust that 25 number up or down.

            That data is from the E-M35 Project, downloaded 1 Nov 2015.  Only 30 samples are clearly not E-M35 in that project, and at 67 or more markers prediction of E-M35 can be done with very high confidence.  There are 1711 samples with 67 (or 111) markers that are clearly E-M35.  That means the percent of L540 in the M35 Project is 25 / 1711 = 1.46%.

            Ysearch gives 4.90% as the proportion of their database that belongs to E-M35.  Multiplying 1.46% times 4.90% equals 0.0716%, which is 1/14000.

            So it seems about 1 out of 14,000 of the samples in the full database at FTDNA, or at Ysearch, should be L540 samples.  I’m assuming here that M35 men are just as likely to sign up for a project as non-M35 men.  I’m assuming that the full FTDNA database can be represented by Ysearch.  These assumptions may not be exactly correct, but it’s not obvious if I should compensate with a slight increase or slight decrease of that 14,000 number.  I won’t adjust this;  the uncertainty in my last step, next paragraphs, is much more significant:

            I cannot use full the world population of about 7.3 billion, because L540 ancestry is concentrated in Europe.  Also, on-line databases are biased toward developed countries, where individuals can afford the DNA tests.

            I need to make an educated guess for the fraction of world population that contributes to the FTDNA and Ysearch databases, and is representative of the populations where L540 can be found.  Here, I might make a big mistake:  Maybe there is a significant population of L540 men somewhere obscure, where men are unlikely to buy DNA tests, for example in the Balkans, or in North Africa, due to mass migrations during the last 2 millennia;  maybe L540 is larger than it seems.  On the other hand, if L540 ancestry is concentrated in Germany, and if men of German ancestry are over-represented in FTDNA, then maybe L540 is smaller than it seems.  We don’t know.

            My estimate is roughly 1/5 for the “fraction of the world”.  Actually, my estimate is 1.4 billion people as the world population representative of L540 and representative of the men who join FTDNA and Ysearch.  I picked exactly 1.4 billion because dividing by 14,000 give the nice round 100,000 result.  That’s my rough guess.  Obviously, the uncertainty is dominated by this 1.4 billion estimate.

            Uncertainty discussion:

            That number of samples, 25, is not a very large number.  90% Poisson Confidence range is 17 to 35 which contributes about a factor of 1.5 uncertainty.  This is small compared to that “fraction of the world” uncertainty.

            That 4.90% comes from the Ysearch site, at the “Statistics” tab, where the data was figured in 2007, when E-M35 was called E3b.  2007 data may not be perfect, but again any uncertainty in the 4.90% is surely much smaller than my uncertainty in that “fraction of the world” uncertainty.

            Regarding that “fraction of the world”, it seems to me 1/2 of the world would be far too large a guess, and 1/10th of the world would be far too small a guess.  Roughly 1/5th seems like a good compromise.  There is no way to calculate the uncertainty of that guess.

            My guess for 90% confidence is a factor of 3, net for all uncertainty reasons, as stated in the first paragraph of this topic.

 

Validity of C Type

            Edit 27 Dec 2015.

            Quite frankly, I was originally surprised by cluster C.  Friedman did a good job finding this one.  I admit I dismissed it when I first saw cluster C in 2007 because it was so small that statistical significance did not seem possible to me.  I postponed analysis until Jan 2010, independently verifying cluster C as C type.

            By “valid” I mean a cluster whereby most of the samples belong to a single clade, and whereby very few other samples in the database belong to that clade.  In other words, a valid cluster should eventually have a corresponding SNP discovered.  Throughout 2010 I confidently predicted such an SNP here in this topic, although I doubted it would be discovered soon.  L540 turned up in my WTY (next topic) in 2011.  C type is the STR equivalent of L540.

 

My WTY Analysis

            Edit 27 Dec 2015.

            Fifteen new SNPs were discovered in my “Walk Through the Y” (WTY).  L535 through L547, L614, and L618.  All 15 are available as commercial SNP tests from  FTDNA.

            My WTY test read about 200,000 base pairs of the Y chromosome in Feb 2011.  WTY is no longer available, having been replaced by Big Y.

            I announced 8 new SNPs here on 29 Mar 2011.  The count on 30 Mar was 13 new SNPs in my WTY.  L614 was added in June.  L618 was added in August.  That was a lot more than I expected.  I now realize that’s because FTDNA expanded the number of DNA bases included in WTY just before my test.  Also, I seem to have been the first WTY from E-M78 in quite some time.

 

SNP Test Orders

            Edit 22 Sep 2017:

            SNP tests cost $39 each from FTDNA if your sample is already there from previous testing.  From your FTDNA home page, in the Y-DNA section, click on “Haplotree & SNPs”.  Next, below the haplotree, just above your SNP results (bottom of the page), click on “advanced SNP order form”.  (Do not click on “Order Selected SNPs” unless the SNP you wish is available for selection in the tree.)  Next, the box “Test Type” should say “SNP”.  Type the SNP code (for example L540) into the “Find” box to search for it.  Click on “Find” and when the SNP comes up click on “Add” to order it.

            FTDNA has been slow adding new SNP tests.  If FTDNA does not have the SNP for your new haplogroup, try Yseq, where SNPs cost $18.  If you are new to Yseq, they only charge $5 to mail the cheek swab kit, required for your first SNP order.

 

Index of Bookmarks

            If you open this html document with Word, all the link targets (bookmarks) can be viewed alphabetically or by location.

 

References & Sources

            Update 27 Dec 2015.

            Big Y:  https://www.familytreedna.com/learn/y-dna-testing/big-y/.  A commercial product  at FTDNA for reading about 12 million base pairs of the DNA of the Y chromosome, which has about 60 million base pairs total.  New SNPs are being discovered in the Big Y data provided by customers.  This Big Y test replaces the smaller Walk Through Y product, no longer offered.  Other companies offer similar tests;  I recommend FTDNA because I like the convenience of most L540 data being available at the E-M35 project, next:

            E-M35, a project at FTDNA, is my main source of data.  Previously called E3b.  Link:  https://www.familytreedna.com/groups/e-3b/about/background.  The official name today would be E1b1b1.  ISOGG changes the name when new defining SNPs are discovered, so the name may change again in the future.  M35.1 is the name of the SNP that defines E1b1b1 within haplogroup E.  I am not planning a separate L540 project, because it is more convenient to run this web page using the E-M35 project.

            Haplozone is a web site for analysis of data from the E-M35 project.  This site has not been fully updated since September 2013, but it is still useful.  Link:  http://www.haplozone.net/e3b/project.  Data from E-M35, plus some data added from sources other than FTDNA, so this database is larger than the E-M35.  Page with a listing of proposed clusters:  http://www.haplozone.net/e3b/project/cluster/.  Page with L540 / C cluster samples:  http://www.haplozone.net/e3b/project/cluster/42.  Discussion forum:  http://community.haplozone.net/

            Yseq:  www.Yseq.net.  A company that provides Y- SNP tests at competitive price and fast turnaround.

            Yfull:  www.Yfull.com.  A company that provides analysis of raw DNA data, very useful for Big Y data.

            SNP Tracker is a web page added to the E-M35 project in late 2011, to keep track of all the new SNP branches in M35.  http://tinyurl.com/e-m35-snps.  Not up to date.

            The V13 data:  http://www.haplozone.net/e3b/project/cluster/10.  V13 is the defining SNP for E1b1b1a1b1a, a major branch haplogroup in E, and “father” of L540.  That page of data does not have the data for samples that have been assigned to clusters as subdivisions of V13, just the data that does not fit any downstream proposed STR cluster.  The number code for other clusters can be typed over that “10” to quickly get to other cluster data.

            Cluster C Data:  http://www.haplozone.net/e3b/project/cluster/42.

            ISOGG link:  http://isogg.org/tree/  Y-DNA tree SNPs and corresponding alphanumeric codes for the haplogroups.  ISOGG names change as new SNP divisions are discovered.  ISOGG names are getting quite long due to the flood of new SNPs in the past few years.  The V13 branch at ISOGG has not been updated in more than a year.  For these reasons, ISOGG codes are being used less often lately;  for example, V13 is often just called E-V13.

            Steve Fix uses Big Y data to maintain an up to date Y tree for V13.

            Andrew Lancaster was an administrator for the E-M35 (E3b) Project.  Andrew had been particularly patient with me with long helpful email discussions.  Villarreal and Friedman had also been very helpful.

            Victor Villarreal was an administrator for the E-M35 Project.

            Elise Friedman was a co-administrator for the E-M35 Project and is administrator for the Jewish E3b project.

            Denis Savard is a current administrator for the E-M35 Project.

 

            Peter Gwozdz.  That’s me.  pete2g2@comcast.net.

 

Revision History

2010 Jan 14 original draft version

2010  13 updates

2011  28 updates

2012 - 2013  13 updates

2014  15 updates

2015 Jan - Jun  15 updates

2015 Jul - Nov  24 updates

2016 12 updates

2017 Jan 22 another sample added

2017 May 28 minor update

2017 Jun 11 Ratuszni added

2017 Jul 19 minor edit of L540 Tree

2017 Aug 17 - 3 more samples added to Tree

2017 Aug 28 - 1 more sample added to Neighborhood

2017 Sep 14 Big Y SNP numbers added to Tree

2017 Sep 22 - 1 more sample added to Tree;  4 to Neighborhood

            also modified the C75(111) Definition to C74(111)

            update the “C111Type.xls” analysis file

            also update the “111 Markers” topic

2017 Sep 27 update the “C67Type.xls” analysis file

            also update the “67 Markers” topic

            minor update of Neighborhood Table 67 markers columns

            edit of 6 topics mentioning 67 markers

            update of Neighborhood Table 37 markers columns and “C37.xls”

2017 Oct 1 corrections in the Neighborhood C4(67) column

2017 Oct 2 rewrite C30(37) and update “C37.xls”

2017 Oct 6 rewrite the 111 Marker topic with comments about the value of STRs

            add one sample to the tree;  rewrite C12(25) and update “C25.xls”

2017 Oct 11 rewrite 12 marker analysis and update “C12.xls”

            this finishes the Neighb0orhood STR analysis & column updates

2017 Nov 4 minor edit to Tree

2017 Nov 19 Belov 321021 added to Tree

2017 Dec 2 one sample added to Neighborhood;  one added to Tree

2018 Jan 10 another sample added to Tree

2018 Feb 6 another sample to Neighborhood

2018 Apr 11 Kusyi tested L540*

2018 Apr 16 another sample added to Tree

2018 Jun 15 another sample added to Tree

2018 Aug 3 another sample added to Tree

2018 Aug 3 Notice of removal of xls files from the web

2018 Aug 4 minor correction

2018 Aug 16 Neighborhood Table removed

2018 Oct 9 Tree branch added