A Small Y-DNA Haplogroup:  E1b1b1a1b1a6 (L540)

19 Dec 2014

Peter Gwozdz

pete2g2@comcast.net

News

 

           19 Dec 2014 update:  DNA testing is in process, in a search for new SNPs to divide L540.  Two Big Ys in L540 have been finished and analyzed;  two more are on order.  We have identified 27 SNPs in the L540 branch.  L540 seems to be equivalent to 18 of these, so far.  S3003, older than the L540 set, seems to be equivalent to 6 others.  One is of undetermined location.  See the discussion below in the Gebert topic about some SNP testing that did not pan out.

 

Abstract

           Rewrite 14 Dec 2014:

           This web document is a summary of my information on a small haplogroup of Y-DNA based on an SNP mutation named L540.  The subject is genetic genealogy.

           There is a Neighborhood table below with a list of samples (men) predicted to belong to the L540 haplogroup, and also samples predicted to be in the Neighborhood just outside L540.  The samples near the cutoff (borderline fit) are the ones that should be tested for the L540 SNP to see if they belong to the L540 haplogroup;  probability of belonging decreases with the STR step number, as explained in the discussion below that table.

           This Abstract is for people reasonably familiar with the jargon of genetic genealogy.  If you are new to genetic genealogy you might prefer to first read an Introduction that I wrote for another of my web documents.

           L540 was discovered in my Walk Through the Y (My WTY).  I purchased WTY, a commercial product (no longer available - replaced by Big Y).

           The clade that we now call L540 was originally called cluster C, a hypothetical cluster proposed as a division of haplogroup E1b1b1a1b1a, which is defined by the SNP named V13.  Hence I coined the name V13C.html for this document about cluster C in early 2010.  I rewrote this document and renamed it L540.html on 30 Apr 2011.  E1b1b1a1b1a (V13) is the largest haplogroup division of haplogroup E, but L540 is relatively small.

           I am not planning a separate L540 project, because it is more convenient to run this informally through the E-M35 project.  Villarreal and Lancaster have been very helpful.

           My equivalent name for V13C was “C type”, or just “C”.  I independently verified C type on 9 Jan 2010 as a good candidate type.  I use the word “type” to mean an STR cluster with statistical validity as established by my Mountain Method.  I published my methods in the Fall 2009 issue of JoGG.

           Two of the L540 positive samples in the Neighborhood table do not belong to C type, so C type is a predicted branch of L540.  I use “L type” to mean samples predicted to be L540 using STR values.

           C type is quite young, perhaps less than 1,000 years old.  The L540 mutation is older, maybe more than twice the age of C.  Such age estimates are uncertain, particularly with so little data in this case.

           L540 seems to be roughly 90% C type plus 10% other, older branches, as explained in the next topic about L540.

           Watch this document.  I’ll add more information as data accumulates.

 

L540

           Update 14 Dec 2014:

           L540 is the code name for an SNP that was discovered in my WTY.  L540 was announced 29 March 2011.  On 27 Apr 2011 I demonstrated that L540 defines a new haplogroup branch of E1b1b1a1b1a (V13).

           I use the code name L540 for the SNP, for the associated haplogroup, and for the samples (men) in that haplogroup.

           The L540 haplogroup includes C type, and C type is most of L540.  The C type samples that have been tested are all L540+.

           The Neighborhood table below has my predictions for L540 and for C type.

           My sample was tested negative in 2011 for all the 7 ISOGG confirmed branches of V13, so L540 was an 8th branch of V13, not a branch of one of the prior 7 branches.

           Three of the 7 branches are too small - few or no samples available on-line for testing:  M35.2, V27, and P65.  In April 2014, V27 and P65 were dropped from the ISOGG tree (those two considered “private”) so L540 is now considered the 6th known branch of V13 in the ISOGG tree.

           Samples from the other 4 main branches of V13 were available;  I recruited and paid for testing as needed:  L143, L250, L17, and L241.  All are L540-.  That means none of these are branches of L540.           ISOGG names change as new SNP divisions are discovered.  L540 was officially added 10 Jul 2011.  The code name is now E1b1b1a1b1a6 (L540).  Code names change as new branches are discovered.  For example in early 2011 V13, the father of L540, was called E1b1b1a2, so the original code for L540 was E1b1b1a2h, but the codes for V13 and L540 changed as new older SNP branch nodes were discovered.

           Steve Fix is now heading a project to discover new branches of V13, using Big Y data.  New branches are showing up more than one per month.  His V13 tree and discussion can be found at:

http://community.haplozone.net/index.php?topic=3657.msg36427#new

           The Fix tree, concentrating on newer branches, is very different than ISOGG’s.  V13 at ISOGG has not been updated in more than a year.  In Fix’s tree, L540 is a branch of a new haplogroup SNP named S3003, which is a branch of V13.

           The FTDNA tree was updated in mid 2014, with a large number of new branches for V13, but these are independent of the ISOGG tree, and these seem to be mostly very small branches (I call them twigs), not main branches of V13.  S3003 (and L540) seems to be a branch independent of all the FTDNA branches.

           I am working with others in an effort to find new SNP branches (twigs) for L540.  Forum topic:

http://community.haplozone.net/index.php?topic=3815.0

           We have identified 27 SNPs in the S3003 branch.  L540 seems to be equivalent to 18 of these.  S3003, older than the L540 set, seems to be equivalent to 6 others.  One is of undetermined location.

           We have Big Y tests on order for two more L540 samples.  Watch this web page and that haplozone topic for updates.

           Neighborhood table:  There are 17 samples confirmed positive with the L540 test;  15 of these are predicted C type and 2 are not.  There are 7 additional samples confidently predicted C type based on STR values;  these need the L540 test for confirmation.  In addition there are 26 more samples predicted C type with lower probability due to few STR markers;  these also need the L540 test.  The table also lists 8 STR neighbors that have tested L540- or L241+ (L241 is a neighbor haplogroup).

           In addition there are many more neighbors listed in the table where L540 C type probability is low for each individual sample;  these would benefit from the L540 test;  some of these would probably come out positive meaning L540 but not C type and most of these would come out L540 negative meaning STR neighbors.  L540 negative samples in that table should try the S3003 test, which includes L540 and surely other branches not yet defined.

 

Cluster C

           Rewrite 15 Dec 2014:

           Clusters are based on STR correlation.  There are 22 samples predicted C type in the Neighborhood table, labeled “C”, with greater than 80% probability.  An additional 26 samples, most with fewer than 67 markers, are labeled “C” with less than 80% probability;  some of those have only about 50% probability of belonging to C.  Most of the C samples are listed at the E-M35 site and at the haplozone site, in the V13 + L540 branch.

           Friedman proposed cluster C in 2008, based on STR correlations, when the data was less than what is available today. 

           Cluster C includes me and my 3rd cousin (Gwozdz).

 

C Type

           Rewrite 15 Dec 2014:

           I use the word type for an STR cluster with statistical validity as established by my Mountain Method.  “Type” is my own term.  I chose the word “type” because it is not generally used in genetic genealogy and I wish to distinguish my types from haplogroups and from other clusters.  By “type” I mean the cluster data, the hypothetical clade, the modal haplotype, and the set of all possible haplotypes, at any number of markers.  Accordingly, by “C type” I mean any or all of these 4 things.  I sometimes use just “C” as short for “C type”.  I also have a previous C type identified in R1a;  unrelated;  please don’t get confused.

           My analysis files define C type.  Sorry, it can be a bit confusing because I have multiple STR definitions for C type, for various marker sets.  The number of markers in my definitions change slightly when new samples show up with unusual STR values.  I hope the meanings are clear from the context of my discussions in this web document.

           I also provide STR definitions for L540, discussed below, treating L540 as L type.

           C type is roughly 90% of L540.  My evidence:  Considering only the independent samples with 67 or more STR markers in the Neighborhood:  There are 17 type L samples and only 2 do not fit C type but have tested positive (L540+) for the L540 SNP test:  I say “independent” because some C type have been recruited based on known genetic relationship to C type men (my Gwozdz cousin, Kargul, Svercel);  these should not be counted in this estimate.  I do not include samples with fewer than 67 markers in this estimate, because type prediction has lower confidence with fewer markers.  Notice in the table that most borderline samples (near the L type cutoff) have L540 test results.  I say “roughly” for this estimate because I have been recruiting samples for this table at 67 markers with equal emphasis on borderline samples, in order to properly sample the STR borderline, but it is difficult to prove no bias toward samples that fit well;  also there is a chance of outliers showing up in the future;  either L540- among those samples that fit C type very well;  or L540+ among beyond the borderline.  Also, with only 2 L540+ outside C type, the sampling confidence is not good.

           The evidence today indicates that L540 is much older than C type - details below.

 

V13C

           Rewrite 15 Dec 2014:

           I coined the name V13C in 2010 to represent C type, cluster C, the hypothetical haplogroup, and the samples (men) in the hypothetical haplogroup.  This web document used to be named V13C.html.

           Now that C type is a subdivision of L540 I am editing away the name “V13C”, but I’ll continue to use “C type” for the hypothetical clade that is part of (most of) L540.

           V13 is the defining SNP for E1b1b1a1b1a, so I similarly use “V13” to mean the “father” haplogroup - the large branch in the Y-DNA tree from which L540 is a small branch.  I also use “V13” to mean the associated database of V13 samples at E-M35 or at Haplozone, or at other databases.

 

Dividing L540;  Discovering New SNPs

           Update 17 Dec 2014:

           Discovering new SNPs is part of my genetic genealogy hobby.  I have been recently recruiting L540 members to purchase Big Y in order to discover new SNPs.  We hope to discover SNPs to subdivide L540.  It’s not cheap.  $525 for Big Y.  Anyone interested in joining this L540 project can contact me.  For V13 not in the L540 Neighborhood contact Steve Fix.

           I have been using Yseq for new SNPs in this project.  I prefer FTDNA for established SNPs in order to have all the data at one web page, although FTDNA is much slower than Yseq.

           SNPs are only $39 each at FTDNA.  How to order SNP tests.  I recommend the L540 test for anyone in the Neighborhood Table;  for those L540- I recommend the S3003 test.  Of course, if you are planning Big Y, which includes almost all Y SNPs, individual SNP tests are not needed.

           If you are V13 and not in that table, and anxious to learn your branch, Yseq has a new $88 “V13 Downstream” package of 21 SNPs, including S3003 and L540.  I estimate about half of the V13 haplogroup will come out positive for one of those branches.  If you are not very anxious, no doubt that package will increase in size as new V13 branches are discovered, so you can just wait.

 

111 Markers

           Rewrite 17 Dec 2014:

           FTDNA provides STR markers in various sets.  The largest, a set of 111, was introduced in 2011.  Upgrades can be purchased for samples with fewer markers.  Obviously, matches and predictions are more accurate using more markers.  For 3 years, I had been recommending the 111 set to L540 members, hoping to discover STR correlations good enough to divide the L540 haplogroup into clusters with high confidence.

           I do not have a confident STR based division of L540.  L540 is dominated by C type, which seems to be young, and more like a bush than a tree.

           Today, SNPs are more important than STRs.  This is because the cost of discovering new SNPs has come down a lot.  SNPs define haplogroup divisions;  STRs only provide statistical predictions for haplogroups.

           Still, the set of 111 Markers is the most accurate way to find out which samples in the database for your haplogroup are your best matches, and statistically most likely to from a recent branch (node) in your male line ancestry, and a prediction of their order (older vs younger nodes).  As an example of the value of 111 STRs, I discovered DYS445=11 as an unusual mutation in my own Y, shared by my cousin, and also shared by Kargul, adding evidence that we form a twig in the L540 tree, perhaps restricted to Poland, perhaps only a few centuries old.  DYS445 is not available at less than 111 markers at FTDNA.  The rest of L540 has the value DYS445=10.  The value 11 does show up rarely elsewhere in V13, as an independent mutation, so although DYS445 is very slowly mutating it is not as slow as a typical SNP, so not as statistically reliable as an SNP.

           New clusters can still be discovered with STRs, as predictions for new haplogroups, which still need confirmation by discovery of a corresponding SNP.  However, STR analysis is yielding diminishing returns for the effort.  SNP discovery is now accelerating instead.

           C111Type.xls is my C type analysis using 111 Markers.  L111Type.xls is my L type analysis using 111 Markers.  I consider L type statistically equivalent to L540, although there may be rare exceptions in the future.  Those files use data from E-M35, downloaded March 2014;  since then two more L540 samples have been upgraded to 111 markers, and a few new samples have shown up in the Neighborhood, but the result of recent analysis is the same.  The Neighborhood table has been more recently updated with all data available.  The columns in that table use the results from these analysis files.  There are notes below that table that provide explanations.

           My Type.xls master file has instruction sheets explaining how my xls analysis files work.

 

67 Markers

           Update 17 Dec 2014:

           FTDNA provides a 67 marker standard set of STR markers.  I have been using this 67 set for analysis for more than 5 years.  Although the 111 set is more accurate, this 67 set is valuable because there are more samples at 67, and all samples with 111 are included.

           The 67 marker data includes a few L540+ samples where I paid for the extra data, but it also includes one sample that was in the neighborhood table but has been removed because I paid for the extra data, which showed it is not an L540 neighbor.  So my help is causing a slight bias in the data, if you compare the size (number of samples) to other haplogroups.  (Other haplogroups may or may not have similar bias.)

           L67Type.xls is my analysis of L type STRs, which are predicted equivalent to L540.  C67Type.xls is my analysis of C type STRs, which is a predicted branch of L540.  Those files use data from E-M35, downloaded March 2014;  since then a few more 67 marker samples have shown up but the result of recent analysis is the same.

           L45(67) means a modal haplotype for the L540 haplogroup using 45 of the 67 standard markers.  My definition for L540 is L45(67), all samples less than the cutoff (genetic distance, or step) 6.  L45(67) was new on 21 Mar 2013.  I also typed L45 into Ysearch as 479H7.  My Mar 2014 analysis L67Type.xls indicates that L45(67) is still my best STR definition for the L540 haplogroup.  L67(67) refers to the modal haplotype using all 67 markers.

           Similarly, C42(67) with cutoff 3 is my definition of C type, introduced 11 Mar 2014, Ysearch QAZ7P.

           Please refer to the Neighborhood table, where these definitions and others provide columns.

           My previous definitions are available in those xls files, in the sheet “Haplotypes & Masks”.

 

37 Markers

           Update 17 Dec 2014:

           This and the following 2 topics are based on my Feb 2011 analysis.  These 3 topics are C type only;  data is not good enough yet for L540 analysis below 67 markers.

           C37.xls is my C type analysis using 37 Markers.  I no longer update my on-line xls files at less than 67 markers, but I do the analysis sufficiently to update the Neighborhood table, which has columns for the results.  The table shows how 37 marker predictions compare to 67 and 111 markers.  The 37 marker results predict all the C cluster samples, but the L samples are not distinguished, and there is no gap in the step separating the L540- samples.  The C cluster does not qualify to be called a “type” using my Mountain Method at 37 markers.

           FTDNA no longer sells the 25 and 12 STR marker standard sets.  The 37 marker set is sufficient as a first test for men curious to see in which main branch haplogroup they belong.  In the past, I highly recommended 67 markers, and I encouraged 111 markers, but SNPs are now more important.  37 STR markers are enough to identify the main branch of the Y-DNA tree;  any number of STR markers usually needs to be followed by SNP tests to identify the smaller branches, and even the twigs to which the sample belongs.  If cost is not an issue the 67 and 111 sets can more accurately distinguish closest matches.

           Actually, you can do better by comparing your 37 markers to the database;  if your best 3 or 4 fits (samples at lowest step) are all C type, you are very likely C type.  Even if only some of your best fits are C type, that is enough evidence to try the L540 and S3003 SNP tests for confirmation.

 

25 Markers

           Update 17 Dec 2014:

           C25.xls is my C type analysis using 25 Markers.  Again, the Neighborhood table has more recent results, where you can see that 25 markers provide a reasonable but not accurate prediction for C type. 

           The same 25 marker STR set has been used by other Y-DNA testing companies.  The Haplozone data include Sorenson data that is not in the E-M35 data, and 3 of these land in cluster C.  Those samples have kit numbers starting with “S” in the Neighborhood table below.

 

12 Markers

           Update 17 Dec 2014:

           I also have a file using only 12 markers for analysis:  C12.xls.  I used the full database but truncated that analysis to the closest 33 samples to keep the file small.  The 3 Sorenson samples in the table below are not in this database.

 

Best STR Markers

           Update 17 Dec 2014:

           STR markers that mutate relatively slowly are statistical indicators for clades in which they are recently mutated, but they are not perfect because of subsequent independent mutations.  When a clade has a few such good STR markers those provide a signature set of STR markers.  A signature is statistically expected to be a more probable indicator of a clade than just one marker.  Indeed cluster C is characterized by the Friedman Signature.  My definitions of C type and L540 use other helpful markers, not just the signature.

           My analysis files automatically rank markers, as useful for a definition, using a method that I published.  You can view my ranking in those xls files linked in the previous topics.  See row 11 of the Calculator sheet, and row 17 of the TypeRank sheet.  The exact ranking of markers varies slightly from month to month due to the random nature of mutation values in new samples, and due to the somewhat arbitrary cutoff that I use to restrict the database to the neighborhood (using too many samples provides a ranking of the father clade instead of the clade of interest).  For example a sample that ranks 6th one month might come out 5th or 7th the next month.

           An SNP that defines a haplogroup is very unlikely to have happened exactly at the time of the most recent common ancestor (TMRCA) of a haplogroup.  Most likely the SNP is somewhat older, because usually there are many generations between nodes.  By definition an SNP cannot be younger than the TMRCA.  Similarly, we can consider a hypothetical clade defined by a particular STR mutation, which is likely somewhat older than the TMRCA of that clade.  However, for clusters defined by signatures, and for types defined by definitions, one rare STR mutation that contributes to the signature might have happened shortly before or after the TMRCA of that cluster or type.

           Very slow mutators should make the best markers.  However the slowest are rarely mutated, so those with intermediate mutation rate show up more often as signature markers.  My Type.xls master file has published STR mutations rates, in the ASD sheet, row 5.  My publication has the Chandler reference for these rates.  The ASD sheet is not usually included in my analysis files.

           Usually it is silly to speculate about clusters defined by a single STR value.  In this case, however, we have a hypothetical haplogroup, C type, which seems quite young, with relatively little STR variation, so some speculation is in order:

 

DYS389II = 32  (389II minus 389I = 19);  Best Marker for C type

           Update 18 Dec 2014:

           DYS389II=32 is one of the Friedman markers for the C cluster.  It is the best marker for C type.

           [Technical detail:  DYS389 is a compound marker, where 389I is the first STR chain and (389II minus 389I) is the second STR chain.  For C type the first chain is 389-1 = 389I = 13.  The second chain is 389-2 = 19.  389II = 13 + 19 = 32.  The marker of interest here is really 389-2 = 19 (389II minus 389I = 19).  However, 389I mutates more slowly and has the value 13 for all but one C sample so far and for almost all samples in the L540 neighborhood.  At Ysearch or Haplozone, both 389 markers need to be used together;  if one is omitted both are ignored.  My analysis files allow the 389-2 chain to be used alone in analysis, using 389-I only to calculate the difference.  However, I use both 389 values (or neither in some cases) in my published definitions to be compatible with other web sites.  In this discussion topic, by “32” I really mean 19 for the delta value.]

           All STR marker sets by all DNA companies include the 389 pair (I have not noticed any exceptions).

           The two L540+ samples that do not fit cluster C, Fredeen and Gebert, have the ancestral value 30.  Butman, the closest STR match with L540-, also has 30.  So far, there are no C type with the ancestral 30 value, although a few have the value 31, which is not common in the neighborhood.  On this basis, it seems likely that the mutations to from 30 to 31 to 32 happened close to the TMRCA for C type, and some time after the L540 mutation.  However, we cannot confidently conclude that those samples with 31 are older, because they may have independently mutated back from 32 to 31.  (We also cannot rule out a C type unique double size mutation incident, from 32 to 30, followed by back mutations to 31.)

           DYS389II is the only signature marker that clearly distinguishes C type from the 2 known L540+ samples that are not C type.  None of the other 110 markers in the standard set at FTDNA does this.  The other signature markers work well for both C type and L540.  It is possible with more data that another marker might statistically distinguish (weak correlation due to relatively rapid mutations).  On this basis, it seems likely (not certain) that the clade of descendants of the initial 389II mutation is the same as the C type clade.

           Those two non C L540 samples differ from C type by other markers that are not signature markers.  C type has only this one very good marker to distinguish from the other L540 samples.  Actually, a type does not need any very good markers if it is well isolated in haplospace.  A correlation using several markers can sometimes be very good, as it is here for C vs L at 111.

           The 32 value is rare throughout V13 but shows up in E-M35 branches outside V13.

           DYS389II (actually the delta value 389-2) ranks 43rd in Chandler mutation rates.  Near the middle.  So exceptions are expected, due to recent mutations.

           Speculation:

           Model A:  A mutation from 30 to 31 happened not long before the TMRCA for C type.  A short time later in the history of C type another mutation happened from 31 to 32.  Most C type samples with 31 represent the oldest nodes, and only a minority are back mutations from 32 to 31.  The samples in the STR neighborhood with 31 that do not match C are independent mutations.  This seems to me the simplest model, so I favor it, but only tentatively.

           Model B:  There was a double mutation from 30 to 32 in one man not long before the TMRCA.  Or two single mutations too close in time to be distinguished.  Almost all 31 in C are back mutations, most of them from a single subclade.

           Model C:  The 32’s that do not match C belong to the same 389II=32 clade, but there was a population bottleneck.  C is only one of two or more nodes, from MRCA’s who survived to produce descendants with corresponding STR clusters today.  Only C is large enough to be noticed so far, due to a population expansion for C.  This model predicts at least one other small 389II=32 clade will be discovered as L540+ branches with STR values different than C.

           Model D:  I can think of more complicated models.

           The data is not good enough to distinguish these models.  Maybe more data in the future will show correlation with other markers to distinguish a model like A through D.

           More likely, now that we are experiencing a deluge of new SNPs, we will soon find SNPs to divide L540 into official haplogroup twigs, and one of these might be equivalent to C type.

           Summary (for now) comments:  The initial mutation to 31 (or double to 32) is very close to the same age as C type, so the mutation(s) defines C type.  Eventually, due to mutation luck, samples with the 30 value will show up in C type, isolated in haplospace together with the C samples.  Eventually, due to mutation luck, L540+ samples with 31 or 32 will show up that are too old to fit C type.

 

DYS594 = 12;  Best Marker for L540 at 67 Markers

           Update 19 Dec 2014:

           Reminder:  “L540” means here samples that have tested positive for the L540 SNP test (or will test positive in the future).  L540 is an official haplogroup.  “L type” is a statistically confident type based on STRs;  L type seems equivalent to L540, but there may be exceptions in the future.  C type is my hypothetical branch of L540, not yet confirmed by an SNP.

           In my analysis, DYS594=12 is the best marker for L540, and is also a good marker for C type.  594 is not in the 37 marker set.  594 helps a lot in defining C type and L540 using the 67 marker set, but does not distinguish C from the rest of L540.

           The 11 L540 samples with 67 markers, including 2 that are not C type, all have the 594=12 value.  Butman, the closest STR match with L540-, has the ancestral 11.

           Two samples in the neighborhood have 594=12 but are L540-.  These are not a random sample;  I recruited them for the L540 test based on STR matches closest to, but beyond the 10 closest matches to C type at 67 markers.  Other 12 values have not been tested for L540.

           The 594=12 value is more common in the L540 neighborhood than in the rest of the V13 data.

           DYS594 ranks 12th in Chandler mutation rate.  Quite slow, so independent recent mutations should be rare.

           Speculation:

           Model A:  The 11 to 12 mutation in DYS594 is significantly older than L540.

           Model A1:  Quite a few branches, both younger and older than the 11 to 12 mutation, survived the population bottlenecks.  Only C had a significant subsequent population expansion, so only C stands out today.  The other 11 vs 12 branches will not be distinguished by STR values because they are too small and too old to be isolated in STR haplospace.

           Model A2:  The 11 vs 12 branches in the neighborhood will be distinguished by STR values when they are all evaluated for the L540 SNP, and when enough STR data is available to identify the signatures.

           Model B:  The 11 to 12 mutation is not much older than L540.  There is only one other significant independent 12 mutation in the neighborhood outside L540.  By luck.  That clade is the reason there are more 12s in the neighborhood.

           Future data will probably eliminate one or more of these models, and perhaps suggest other models.

           Model C:  The 11 to 12 is younger than L540.  An old 11 branch with L540+ will be found as data accumulates.  If this happens, STR data will not likely have enough correlation to distinguish if such a branch is really due to a back mutation.  A new SNP would probably need to be discovered.

 

DYS636 = 12;  DYS504 = 14;  DYS561 = 17

Excellent Signature Markers for L540;  Available Only in the 111 Set

           New topic 19 Dec 2014:

           These three are not in the FTDNA 67 STR maker set, but are available in the extended 111 STR marker set.  They are each about as good as DYS594=12, previous topic.  They do not distinguish C from L.  There are other markers almost as good.  That’s why the L77(111), my 111 marker definition for L type, works very well.

           Also, it is not a coincidence that the critical samples (all the samples from the 67 marker data near the L540 cutoff) have all 111 markers - I recruited the 111 data and paid for it as needed at the borderline.

           In the 67 marker analysis file, all 18 L540+ samples have 594=12, and only 6 of the 59 nearest STR neighbors have this 12 value;  2 of the 3 closest 12’s are confirmed L540-.

           In the 111 marker analysis file, all 10 L540+ samples have 636=12, and all of the 39 nearest neighbors have 636=11.  8 of the 10 L540+ have 504=14, and all but 1 of the 39 nearest neighbors have 504=13;  the two L540+ exceptions are not really exceptions, because they have 504=15, representing an additional mutation.  9 of the 10 L540+ have 561=17, and only 1 of the 39 nearest neighbors have 17;  the rest are mostly 16 with a few 15.

 

Other Good Markers:  DYS390=25, DYS444=13, DYS406=11; DYS456=15, CDYb=33, DYS447=25

           Update 15 July 2011:

           These typically rank among the best in my analysis files, usually in about the order listed in the title here.  Good for both C type and L540 prediction.  The exact ranking is very sensitive to the choice of database.  In the close neighborhood of L540, 390=25 and 406=11 do very well.  Using the entire E-M35 database, 444=13 does better;  by luck 444 does not have any major clades with the 11 value;  there is a cluster in E1b1b1c1a (M84) that has samples with the L540 signature (389II minus 389I)=19 and 594=12, but that interfering cluster in M84 has 444=11, two steps away from L540 at that 444 marker.  This is an example of why the database should be restricted to reasonably close STR samples for analysis.

           Chandler rank for these, in the same order:  47th, 49th, 35th, 60th, 67th, 45th.

           390 is tied with 594 for 1st place in my current L540 analysis file, but I suppose it will end up in 2nd place because it has a higher published mutation rate.

           Models for 390 are similar to the models for 594.

           None of the other markers in the title here are as good as 389II for C type or as good as 594 or 390 for L540.  Each clearly has confounding mutation in the data.  For example, 444 has one L540+ sample with the ancestral 12 indicating that L540 is older, but there is also one L540- sample with the signature 13 indicating L540 is younger;  one of those must be an independent mutation (or an error).  More data will help this get sorted out.

           Many complicated models can be constructed combining 2 or more good markers.  Complication comes from figuring out the age order of the markers.  More data might point to a compelling model.

 

Signature C4

           Update 15 July 2011:

           An excellent signature for C type is (389I, 389II, 594, 444) = (13, 32, 12, 13).  Seven of 9 C type samples with 67 markers have this signature, and the two that miss are at step 1;  no other samples in the neighborhood have step 1;  Gebert is the only one at step 2.  In the vast E-M35 Haplozone database there is only one confounding sample at step 1, but that one is from E1b1b1c1a;  all others differ from this signature by 2 or more steps.  In other words, this 3 marker signature, cutoff 2, extracts all the C type samples and none others from V13 data.  Eventually, of course, exceptions will turn up.

           There are better markers than 389I.  I included that one because it enables C4 in the search function at the Haplozone site.

 

Friedman Signature

           Update 29 Mar 2014:

           The signature is (390, 389-2, 447) = (25, 32, 25).

           Friedman had been calling this the “characteristic marker values” for cluster C at the Haplozone site before I started working on this, back in 2008, when there were only 9 samples available in cluster C, including mine.

           This original Friedman signature by works surprisingly well by itself for samples with only 25 of the standard markers, but not with high confidence.  For more details, see the discussion about C3(25) below the Neighborhood Table.

           In early 2011 Friedman added 594=12 to the “characteristic marker values”, for 67 marker samples.  For more details, see the discussion about C4(67) below the Neighborhood Table.

           DYS389 is a compound marker, discussed above.

           Friedman used a more complicated analysis than just this simple signature in her C type assignments.  I do not know what her method was exactly, but most definitions (not all) that I tried, selecting well ranked markers, extracted the same samples that she did.

 

CDYb = 33;  Another Good Marker

           Update 16 July 2011:

           The marker CDYb is very unusual in L540;  10 of the 11 samples have the value 33.  See my 67 marker analysis file.  It ranks tied for 6th.  Just beyond L540 in STR step from the definition, less than half the samples have the 33 value, and some of them might turn out to be L540+ because most are not tested.

           The CDY pair is the most rapid mutator of the 67.  I have never seen a cluster or type that is so uniform for one of the CDY markers.

           It is possible but very unlikely this is a coincidence.  If it is just a lucky coincidence, then as data accumulates over the months lots of C type samples should show up with values other than CDYb = 33.  I originated this CDYb topic in April 2010, and the accumulated data since then has strengthened the evidence that CDYb=33 dominates what is now L540.

           Of the 5 tested among the 31 samples at steps 7 to 10 beyond L540:  2 are 33 and 3 are 34.

           My hypothesis:  There is a mutation within the CDYb chain.  Either a point mutation, or a foreign insertion, or a deletion (a deletion that is not a simple STR chain deletion of a motif, but a removal of only part of an STR motif).  It is known that a mutation within an STR that spoils the motif effectively splits that STR into two small STRs.  Smaller STRs have lower mutation rates.  A spoiler mutation near the middle of the main CDYb chain would turn that marker into a slower STR mutator.

           (By the way, 447 is known have two such defects, so 447, which looks like a long STR, is really 3 short STRs, so 447 is not very rapid, and in fact 447 is a good marker, discussed above.)

           CDY, also called DYS724, is a compound marker, so that means one of the pair often copies onto the other, providing equal values, CDYa = CDYb.  This is called recLOH, my publication has references if you want to read more about recLOH.  Or check Wiki, where there is an explanation that 459, 464, and CDY are all on the same “palindrome” P1, where P1 has two arms that are mirror copies.  Sometimes all three of these markers get the values copied from one arm of P1 onto the other arm.  A mutation at CDYb, making it unlike CDYa, would make copy mutations less common.

           Such seems to be the case, providing more evidence for a CDYb spoiler mutation.  There are no recLOH mutations in CDY or in 464 in the L540 data.  Of course there is not enough data yet to be compelling.  The 459 marker has both values = 9, so an recLOH would not be noticed there.

           It is not clear if the ancestral CDYb value is 33, or 34.  Both are common in the neighborhood.

           Model A:  More than a millennium ago, for a man in the L540 male line, not very long before or after the L540 mutation, a mutation destroyed the middle of the CDYb STR chain, turning CDYb into a unique marker, distinct from CDYa, and a much slower STR mutator than CDYa, and less likely to combine with CDYa in an recLOH event.  This rare mutation happened in a man who ended up with the equivalent net value of 33 at CDYb after the mutation.  A descendant of this man would be the most recent common ancestor (MRCA) for the clade corresponding to this mutation.  There were some normal STR mutations at CDYb in the descendants of that MRCA, and some of those mutated CDYb men were lucky enough to have male descendents living today, and one of them shows up in our C type data today with the 34 value.

           Although more data will add evidence to this model, I doubt the data will be good enough to determine if 33 or 34 was ancestral.  If 33, samples from old nodes might show up, which I doubt will be distinguished older or younger than that spoiler mutation, based on STR analysis.  If 34 is ancestral, I doubt the data will distinguish older branches from more recent back mutations.

           More models can be constructed along the lines of the models discussed in previous topics above.

           In a previous version of this web document I speculated that this CDYb mutation might define a new haplogroup.  I asked Thomas Krahn at FTDNA about sequencing my CDYb to prove there is an SNP in there.  Krahn explained that the P1 palindrome is very difficult to sequence with standard methods because the data is a mixture from the two arms of P1.  Krahn also pointed out that even if the SNP were proven it would not be accepted as a haplogroup division because an recLOH can still happen, and if CDYa gets copied onto CDYb that wipes out the SNP in the clade descending from that recLOH.

           My 2010 versions of this topic pointed out that the CDYb mutation is definitely older than C type, because the 33 value predominated in the very near neighborhood of C type.  That prediction has been validated.  The two L540+ from outside C type both have the 33 value.  With my new L540 definition, the 33 is not dominant beyond the L540 data, but of course a few of those that are there might end up in L540 when they are tested.

 

L540 Neighborhood

           Update 14 Nov 2014:  update;  see the News topic at the top of this web page

           Update 4 Oct 2014:  a few minor updates

           Update 10 Aug 2014:  a few minor updates

           Update 21 Jun 2014:  a few minor updates

           Update 24 May 2014:  3 more L540+ results added to the table

           Update 21 Apr 2014:

           L540 is small enough that I can insert a complete table here, including neighbors just beyond in STR values.

           Those numbers are STR step, which is mutation count from the Modal Haplotype;  the columns are explained more in the notes below the table.

           Violet numbers are L “paratype” (L540 samples not in C type), where step less than the cutoff means predicted members of the L540 haplogroup.

           Boldface means confirmed:

                       + vs --- means confirmed positive vs negative by the L540 test.

                       L241 means positive for another haplogroup, implying negative for L540.

                       There are many more negative L540 results from outside this neighborhood (higher step).

           Red step numbers are predicted C type, a predicted branch of the L540 haplogroup.  Boldface means 80% or higher confidence that a future SNP will be discovered, confirming these samples in a future haplogroup branch.  C? means predicted C type at less than 80% confidence, based on the STR step number in boldface.

           Black step numbers are greater than the cutoff.  Without SNP testing, even with high step number, there is a low probability that a sample might be an outlier member of L540. 

           Data sources:  e = E-M35 project, h = Haplozone,  y = Ysearch

 

 

 

 

 

 

 

 

Modal>

C90

(111)

C111

(111)

L77

(111)

L3

(111)

C42

(67)

C67

(67)

L45

(67)

C4

(67)

LnotC3

(67)

C15

(37)

C37

(37)

C12

(25)

C25

(25)

C3

(25)

C12

(12)

Note

 

 

 

 

 

 

 

Cutoff >

7

16

8

2

3

12

6

2

1

6

8

2

3

2

1

 

Kit

Ysearch

L540

Ancestor

Origin

Data

Type

Markers

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

N16800

KFKGM

+(WTY)

Gwozdz 

Poland

ehy

C

111

3

9

3

0

0

7

1

1

5

3

5

1

4

1

2

 

N45041

UQR4B

+

Hochreiter

Germany

ehy

C

111

4

7

2

0

0

4

2

0

4

4

4

0

3

0

1

 

155155

 

+

Svercl

Czech

eh

C

111

4

15

5

0

1

8

1

0

5

1

5

0

2

0

2

 

N81304

 

(+)

Gwozdz

Poland

eh

C

111

4

11

4

0

1

9

1

1

6

4

7

1

5

1

3

1

140927

9JM9U

+

Donovan

Prussia

ehy

C

111

4

11

3

0

1

6

2

1

3

2

3

1

1

1

1

 

175213

5XP46

+

Burlik Stelz

Germany

ey

C

111

5

12

5

0

2

7

2

0

4

2

5

1

2

0

0

 

51282

 

+

Wion 

Germany

eh

C

111

6

13

1

0

2

10

3

0

4

5

7

1

4

0

1

 

199446

TK98K

+

Kargol

Poland

ehy

C

111

6

10

6

1

1

6

2

1

5

2

4

1

4

1

2

 

225596

6S4J6

+

Nowak

Poland

ehy

C

111

6

12

5

0

1

9

4

0

4

2

5

0

0

0

0

 

171456

79QF7

+

Glasser 

Germany

ehy

C

111

6

9

3

0

0

3

1

0

4

1

3

0

1

0

0

 

166692

8FTXT

+

Gebert

Germany

ehy

L

111

13

17

7

0

6

10

5

2

0

2

6

2

4

2

3

 

162917

 

+

Fredeen 

Sweden

eh

L

111

17

23

6

0

8

18

5

2

0

10

13

3

6

2

4

 

N91348

 

---

Butman

England

e

X

111

17

20

15

3

6

11

8

3

2

4

6

2

2

2

2

 

N39989

5N5MF

---

Hohnloser

Germany

ehy

X

111

20

28

20

4

11

18

12

3

2

7

9

2

3

2

3

 

5960

V93B3

-(BigY)

Bartlett

England

ehy

X

111

20

27

18

3

9

16

11

3

2

7

11

3

6

2

4

 

N58717

CV7WB

(--)

Bartlett

Unknown

ehy

X

111

21

29

19

3

9

18

12

3

2

8

13

3

8

2

5

 

98212

 

L241

Baber

England

ehy

X

111

22

33

21

3

10

22

11

5

3

11

15

5

9

4

5

 

105741

3FVPX

 

Malay

Slovakia

ehy

X

111

22

32

20

4

9

20

15

5

1

11

16

4

8

3

4

 

 

 

 

5 samples

 

e

X

111

23

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

---

5 samples

 

e

X

111

24

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

3 more

 

e

X

111

24

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

2 more

 

e

X

111

 

 

19

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

V13 Modal

 

e

X

111

23

25

21

3

9

17

14

6

3

8

12

4

7

4

4

 

 

 

 

L241 Modal

 

e

X

111

25

33

25

3

10

24

14

5

3

13

17

4

8

4

5

 

 

 

 

L143 Modal

 

e

X

111

29

34

25

3

10

20

15

5

2

11

15

4

7

4

4

 

320415

 

+

Micek

Russia

e

C

67

 

 

 

 

0

4

1

0

4

1

4

0

2

0

1

 

229581

 

 

Zinin

Unknown

eh

C

67

 

 

 

 

1

4

2

1

3

1

3

1

2

1

2

 

262750

 

+

Svercel

Slovakia

eh

C

67

 

 

 

 

2

8

2

0

4

1

4

0

1

0

1

 

243901

FSQXZ

 

Stubblefield

Unknown

ehy

C

67

 

 

 

 

2

10

2

0

3

5

9

1

6

0

2

 

E10751

 

 

Schulz

Germany

 

C

67

 

 

 

 

2

7

3

2

5

5

7

2

5

2

4

3

6104

4HJ3D 

 

Boyd

Unknown

ehy

C

67

 

 

 

 

2

10

3

0

4

4

9

0

1

0

0

 

207878

 

 

Frind

Germany

eh

C

67

 

 

 

 

2

10

3

0

4

4

7

1

4

0

2

 

70482

6HMRD

+

Ostholm 

Sweden

ehy

C

67

 

 

 

 

2

10

4

1

4

4

6

1

2

1

1

 

226416

 

+

Sabieka

Belarus

eh

C

67

 

 

 

 

2

11

5

2

5

7

10

3

7

2

5

 

174240

 

 

 

Unknown

 

C

67

 

 

 

 

2

3

2

1

3

1

2

1

1

1

1

3

 

WHFQB

 

Froetscher

Germany

y

C?

67

 

 

 

 

3

15

6

1

3

8

12

1

4

1

2

 

97005

CBF87

 

Strejc

Austria

ehy

X?

67

 

 

 

 

6

16

9

4

3

8

11

3

8

3

5

 

310951

 

 

Petrov

Russia

e

X?

67

 

 

 

 

6

18

9

3

2

9

13

2

7

2

5

 

E7459

8K6VZ

 

Casado

Croatia

ehy

X?

67

 

 

 

 

6

17

11

5

2

9

14

5

9

4

6

 

E8272

 

L241

Abdurrah

Kosovo

eh

X

67

 

 

 

 

7

15

9

4

2

8

10

4

7

4

4

 

25780

5DQ2B

 

Wilson

England

ehy

X

67

 

 

 

 

7

20

12

3

2

9

14

3

7

2

2

 

143479

 

 

Mastel

United K

eh

X

67

 

 

 

 

7

17

10

4

3

8

12

4

8

4

3

 

24437

 

 

Harvison

Scotland

h

X

67

 

 

 

 

7

23

16

3

3

14

17

2

6

2

3

 

199300

EJ4B6

---

McKrell

Unknown

ehy

X

67

 

 

 

 

7

18

10

3

2

11

15

3

8

3

5

 

 

 

 

4 more

 

y

X

67

 

 

 

 

7

 

 

 

 

 

 

 

 

 

 

 

 

 

 

27 samples

 

ey

X

67

 

 

 

 

8

 

 

 

 

 

 

 

 

 

 

 

N109412

BYHHR

 

Howe

Unknown

eh

C?

37

 

 

 

 

 

 

 

 

 

1

3

0

2

0

0

 

158091

QHU8Y

+

Kline 

Germany

eh

C

37

 

 

 

 

 

 

 

 

 

2

5

1

2

1

2

 

B3807

 

 

Stavbom

Sweden

eh

C?

37

 

 

 

 

 

 

 

 

 

4

9

1

5

0

4

 

141863

W5JHS

 

Pohl 

Germany

eh

C?

37

 

 

 

 

 

 

 

 

 

5

8

1

3

1

3

 

B2670

X2JH9

 

Bogdanski

Germany

e

C?

37

 

 

 

 

 

 

 

 

 

5

7

2

5

2

2

 

N106293

GJNU6

 

Beasley

USA

e

X?

37

 

 

 

 

 

 

 

 

 

6

14

7

12

1

8

 

275510

3K5CF

 

Roider

Germany

e

C?

37

 

 

 

 

 

 

 

 

 

7

9

0

4

0

1

 

 

 

 

8 more

 

e

X?

37

 

 

 

 

 

 

 

 

 

7

 

 

 

 

 

 

177898

B6CUR

 

Miller

Germany

e

L?

37

 

 

 

 

 

 

 

 

 

8

13

5

8

4

5

 

 

 

 

7 more

 

e

 

37

 

 

 

 

 

 

 

 

 

7

 

 

 

 

 

 

 

Q8JRJ

 

Spooner

USA

y

C?

37

 

 

 

 

 

 

 

 

 

1

3

0

1

0

0

 

 

9P4Z5

 

Sager

Germany

y

C?

37

 

 

 

 

 

 

 

 

 

2

5

1

1

1

1

 

 

2N3UM

 

Oppitz

Germany

y

C?

37

 

 

 

 

 

 

 

 

 

3

7

1

4

1

2

 

 

EDS4E

 

Haenicke

Germany

y

C?

37

 

 

 

 

 

 

 

 

 

3

5

1

2

1

1

 

 

V6X4V

 

Fitze

Germany

y

C?

37

 

 

 

 

 

 

 

 

 

3

6

0

1

0

0

 

 

3K4Y2

 

Lintner

Germany

y

C?

37

 

 

 

 

 

 

 

 

 

4

8

0

4

0

1

 

 

4Q933

 

Kephart

USA

y

C?

37

 

 

 

 

 

 

 

 

 

4

6

2

3

2

2

 

 

A9FVE

 

Weiand

Germany

y

C?

37

 

 

 

 

 

 

 

 

 

4

6

1

4

0

1

 

 

WME5S

 

Cervenka

Hungary

y

C?

37

 

 

 

 

 

 

 

 

 

5

10

1

6

0

2

 

 

UF6K3

 

Spatz

Poland

y

X?

37

 

 

 

 

 

 

 

 

 

6

9

3

6

3

4

 

 

 

 

More

 

y

X?

37

 

 

 

 

 

 

 

 

 

7

 

0

 

 

 

 

S10193

 

 

Engel

Germany

h

C?

34

 

 

 

 

 

 

 

 

 

 

 

0

1

0

1

 

S10194

 

 

Kochtitizky

Hungary

h

C?

34

 

 

 

 

 

 

 

 

 

 

 

0

3

0

1

 

294225

 

 

Belinskiy

Russia

eh

C?

 

 

 

 

 

 

 

 

 

 

 

 

0

 

 

1

 

A10196451

 

 

Stavbom

Sweden

h

C?

34

 

 

 

 

 

 

 

 

 

 

 

1

6

0

4

 

285764

 

 

Stavbom

Sweden

e

C?

12

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

A2983

 

 

Undisclosed

Austria

h

C?

34

 

 

 

 

 

 

 

 

 

 

 

1

4

1

1

 

S10195

 

 

Jordan

Germany

h

C?

34

 

 

 

 

 

 

 

 

 

 

 

2

 

 

 

 

 

 

 

22 more

 

eh

X?

25

 

 

 

 

 

 

 

 

 

 

 

2

 

 

 

 

N26163

R38X2

 

Fritsch

Czech

ehy

C?

12

 

 

 

 

 

 

 

 

 

 

 

 

 

 

0

2

N39377

 

 

Obendorf

Germany

eh

C?

12

 

 

 

 

 

 

 

 

 

 

 

 

 

 

0

2

N57225

XKCE3

 

Livingston

Germany

ehy

C?

12

 

 

 

 

 

 

 

 

 

 

 

 

 

 

0

2

284871

 

 

Knotz

Austria

eh

C?

12

 

 

 

 

 

 

 

 

 

 

 

 

 

 

0

2

 

Ysearch

 

6 more

 

y

C?

12

 

 

 

 

 

 

 

 

 

 

 

 

 

 

0

2

 

Summary:  See the News topic at the top of this web page for a summary of this table.

 

Explanation of the modal haplotype columns in the table; update 29 Mar 2014:

           C90(111) is my new best modal haplotype definition for prediction of C type, a hypothetical branch of L540, using 90 of the 111 standard STR markers.  The cutoff is 7;  notice that there are no samples in the gap at steps 7 through 12.  No doubt future samples will show up in the gap, at which time the cutoff and the definition may need slight adjustment.  My analysis file http://www.gwozdz.org/C111Type.xls is available if you are interested in the details.

           C111 is the modal haplotype using the full 111 STR set.  It just barely works, with a gap of step one at the cutoff value of 16.

           L77(111) is my new best modal haplotype definition for prediction of L540 type, using 77 of the 111 standard STR markers.  L540 type is predicted equivalent to the haplogroup L540.  The cutoff for L540 type is 8;  notice that there are no samples in the gap at steps 8 through 14.  No doubt future samples will show up in the gap, at which time the cutoff and the definition may need slight adjustment.  My analysis file http://www.gwozdz.org/L111Type.xls is available if you are interested in the details.  Notice that two samples in the table meet L77 but not C90, so those two are predicted to belong to older branches of the Y-DNA tree - older than the node for C type.  C type samples do not differ much in STR values, which means the C type clade is quite young.  Those two L type samples that are not C type differ a lot more in STR values, meaning their nodes are probably older.  Future data may provide more such samples, at which time the L type definition will likely need adjustment.  In this respect, L type predictions are not as reliable as C type predictions;  there may be future samples with L77 step greater than 15 that turn out L540+.  Such a probability is low for each particular sample, so I used boldface “X” for samples in the table with step greater than 15, even without the L540 test, because each particular sample has greater than 80% confidence, in my estimation, of not belonging to L540.  The probability might be 1% or 20% that any one sample at L77 steps 15 to 20 might be an L540 outlier;  I do not know how to estimate such probability, except that I’m sure such probability is not very high and decreases with increasing step.

           L3(111) is the L540 STR signature for L540 type, using only 3 markers, and it nicely selects only the confirmed L540+ samples.  L3 is (594,636,561) = (12,12,17).  That file L111Type.xls shows that any number of markers from 3 to 105 can be used to distinguish the L540 samples, but this will surely change in the future as L540 samples with unusual STR mutations show up in the database.  Actually, any one of those three L3 markers is sufficient to identify the L540 samples in this L540 Neighborhood table, but there are a few samples outside the V13 father haplogroup that match at any one of the three due to the large number samples in the full E-M35 database.  Notice that the L3 signature markers are not among the standard 37 set.  The signature markers that strongly distinguish C type from the rest of L540 are all in the standard 67 set (see C4(67) below);  the data so far do not suggest more signature markers in the 111 set for C type, although a few are useful in the C90 definition discussed above.

           L111, the modal haplotype using the full 111 STR set, is not in the table, because L111 differs from C111 at only two STR markers, which are very variable, so for most samples L111 differs from C111 by step < 2. L77 step numbers differ from C90, because the same markers are not used;  C90 markers are selected to best distinguish C type, while L77 markers are selected to best predict L540+.

           Nearest Neighbors at 111 markers:  The table includes a few samples just beyond the C90 cutoff of step 8.  These are all the samples I could find with C90 step less than 23, or with L77 step less than 19.  These “near neighbors” act as a calibration of the modal haplotypes using 67 or fewer markers - data on the right side of the table at 111 markers.  I hope the meanings of those code names are obvious in light of the 5 examples explained above.  For more discussion, see the topics C67Type.xls, L67Type.xls, C37Type.xls, C25, and C12.  Notice that the definitions C42(67) and L45(67) work, but without a wide gap.  As expected, 67 STR markers provide less confidence in assignments than 111 markers.  At 37 or fewer markers there is yet less confidence, and the defintions fail for a few samples.

           LnotC3(67) is a signature that seems to predict L “paratype” samples (L540 not C type):  (439, Δ389, 413b) = (11, 17, 25).  My confidence is not high in this signature, because it is based on only two samples, which might be just the luck of random mutations.

           C4(67) is the signature used by Haplozone cluster C since before L540 was discovered:  (390, Δ389, 447, 594) = (25, 19, 25, 12).

           C3(25) is the original Friedman signature, proposed years ago.  For this table I used the difference for DYS389: (390, Δ389, 447) = (25, 19, 25).  The table shows that it still works remarkably well.  However, there is selection bias, because some samples at C3 step 1 were not included in the table because so far all these have 37 or more markers and do not fit C type using the corresponding definition in the table;  some of these belong to V13, the father haplogroup of L540.  At C3 step 0 there are only a few more samples in the database but these are from outside V13 so these are not listed in the table.  Note that Sabieka is L540+ at C3 step 2, and a few others at step 2 are predicted C type based on more markers.  On the other hand, the table has at C3 step 2 two samples L540- and a few more predicted well outside C type.  The table indicates 22 more samples with only 25 markers at C3 step 2;  no doubt a few of these might eventually test L540+, but the probability for each one individually seems to be low.

 

Explanation:  The simplest explanation for what this table means: Butman (N91348) has a male line with node in the Y-DNA tree slightly older than the L540 mutation, so Butman is the closest “neighbor” to L540 but not in the L540 haplogroup.  Gebert (166692) and Fredeen (162917) each have male lines with nodes in the Y-DNA tree younger than the L540 mutation.  The node for C type is much younger.  More complex explanations are possible;  for more discussion see the Structure topic.

 

Notes;  column in the table at the far right:

           1:  Sample from a 3rd cousin of an L540+ sample, so assumed L540+ without testing.

           2:  The 12 marker set at step zero (perfect match to the 12 marker modal haplotype) provides a low confidence prediction of which samples might benefit from the L540 test.

           3:  Two samples, E10751 and 174240, are not in the E-M35 Project;  these two were brought to my attention by Paul Svercl (in the table), who noticed them in an E haplogroup tree by Marko Heinila, but that tree is no longer on-line.

           I use the Ysearch method for calculating step, which gives a result slightly different than the Haplozone method.  There is also a one marker discrepancy mentioned in the Ysearch topic.

           If you are a neighbor and wish to be added to this table, please let me know.

 

Gwozdz

           My sample is kit N16800.  N81304 is my 3rd cousin Gwozdz.

 

Kargul

 

           Aloysius Kargul (Kargol) is my closest STR match available on the web.  Kit 199446.  In May 2010, his daughter noticed, on ancestry.com, that he and I are perfect matches at 12 markers.  I studied the LDS microfilms and located his 1820’s Kargul ancestor living in a village in Poland only 20 miles away from the village of my Gwozdz ancestor.  I paid for his FTDNA sample.  Kargul is in the table above.  His L540 test came out positive, placing him in that new haplogroup.  We are 5 steps apart at 67 markers;  9 at 111 (4 Jul 2011 update).

           For estimating the size of C type or L540, my cousin and Kargul should not be included, because I recruited them, paying for their tests.  Family sets such as these distort size estimates, compared to random data.  In other words, C type really has only 16 samples, not 18 (on 4 July 2011) if compared to other clusters (which should also be adjusted for family sets).

 

Butman

 

           New topic 13 May 2011.  Update 17 Jul 2011:  Butman’s L540 SNP test just came out negative;  that means he is not a member of the new L540 haplogroup.

           Raymond Butman, kit N91348, is right on the edge of the predicted C type using the old 61 marker definition.  This sample is a recent addition to the M35 database.  His step using the 61 marker definition is 8.  My cutoff for the definition before this sample showed up was 7 because of the gap - no samples from step 7 to 12 at that time.  When I wrote this topic after Butman’s sample showed up, I changed my cutoff to 9 and I pointed out here that this sample might land in V13C (L540), but it was a close call, not a confident prediction.

           Although this sample matches most of the markers of the definition, it misses at the two best signature markers, DYS389-2 and DYS594.

           I subsequently developed a new L540 definition that excludes Butman.

           What does this mean?  The simplest explanation:  Butman’s node in the Y-DNA tree (his male line common ancestor - branch joint) seems slightly older than the nodes of L540 members (in the database so far today).  His node is older than the L540 mutation, and also seems older than the DYS389-2 mutation, and also older than the DYS594 mutation.  Most of his other STR values match the L540 definition because his node is not much older, so there has not been much time for more mutations.

           This simplest explanation is a good statistical prediction, not a proof.  Other less likely explanations are possible.  For example Butman might be an outlier from a clade very distant from L540, where most men have different STR values, where he might have many matching STR values to L540 due to the luck of random mutations in his male line.  Yet another possible explanation:  Butman might belong to a very small clade with a much older node with L540, even older than the node for the men at steps 7 and 8, but the ancestor at that node might just happened to have STR values very close to the values for the L540 ancestor, due to the luck of random mutations, and now Butman is the only sample available from that very small clade.

 

Gebert

           Update 19 Dec 2014:

           I noticed Gebert’s sample on Ysearch and encouraged him to join the E-M35 project, which he did in 2011, kit 166692 in the table.  I helped pay for the orders for the L540 test and for the 111 extension.  He purchased Big Y in 2014.  Although he is L540+, his STR values do not quite fit C type.  Also, his STR values produce greater estimated genetic distance (step) from all C type samples than C type samples are from each other.  On this basis, I predicted that Gebert’s node in the tree is younger than L540 but older than C type.  Hochreiter’s Big Y produced 10 SNPs where Gebert is negative, so I hoped one of these would be a marker for C type, or at least younger than L540.  I tested all 10 of these for my sample, but I came up negative for all.  One explanation is that Gebert’s node is not much older than Hochreiter’s node, with all 10 of these in Hochreiter’s branch but not in my branch;  this would be a surprise because Gebert’s STR’s imply an older node, so one or more of those 10 are expected on the main branch.  Another explanation is that Gebert has a statistically very unusual number of STR mutations, with back mutations in C type signature markers, and he is really a member of C type, with my node older than both of theirs;  this explanation also seems surprising.  So I don’t have a convincing analysis.  In retrospect, I should have purchased Big Y instead of just those 10 SNPs.  My Big Y is on order.  So is Svercl’s.

 

Fredeen

           Update 30 Sep 2011:

           Kit 162917, Fredeen, came up L540+ in May 2011.  This was significant because this was the first L540+ sample outside C type, later joined by Gebert.

           Logically, Fredeen and Gebert might be outliers from cluster C, with back mutations in 389II and multiple other mutations just due to the luck of random mutations.  This is very unlikely

           These two samples most likely represent two old branches of L540, with older nodes than the C type node.  They do not match each other well at 67 markers.  Their closest matches at 67 markers are each other and C type samples.  There are no close STR matches to either in the 67 marker data.  Any future close STR matches would of course be predicted L540+ in the corresponding branch.

           Three markers are of interest here:  (385a,406,444) = 17,11,13) are the values for L540 and C type.  Most samples in the STR neighborhood just outside L540 have the ancestral values (16,12,12).  Fredeen is ancestral for two of these, (16,11,13) while Gebert is ancestral for one of them (17,12,13).  This is evidence that Fredeen’s node is older, but the evidence is statistically unconvincing.  We realize that these 3 markers may have experienced mutations after the two nodes of interest.  Both Fredeen and Gebert have 111 marker data, which does not help out for this question.

 

Hohnloser

 

           Hohnloser (kit N39989) fell into C type at 37 markers in 2010 (marginally), but not at 67 markers (not particularly close).  He is not a member of cluster C because his sample does not match the Friedman signature (originally at 25 markers).  He provides an interesting example of how statistics works - in this case, due to the luck of random STR mutations, the sample is close to C type only at 37 markers.

           Hohnloser does not belong to the L540 haplogroup because his SNP test came out negative.

           Hohnloser has extensive family tree research results.  He administers a Hohnloser project at FTDNA.  He exchanged helpful email discussions with me.

 

Hochreiter

           New topic 12 Dec 2014:

           His STRs are a good fit for C type.  Kit N45041 in the table.  He purchased Big Y in 2014.  Hochreiter is my closest STR match (excluding a cousin and an apparent distant cousin), so I expected to be on the same twig of the tree with Hochreiter.  Hochreiter’s Big Y produced 10 unique STRs (not found in Gebert);  I tested all 10 and came up negative for all.  See the Gebert discussion above for details.

 

Structure of the L540 Haplogroup

           Complete rewrite 16 Jul 2011.

           C type is a hypothetical haplogroup within L540.  The evidence is presented throughout this web document, particularly in topics about my 67 marker analysis files and about DYS389II.  My estimate is about 99% confidence that my C type definition corresponds to a clade that will be proven to be a haplogroup by a newly discovered SNP - someday when tests for new SNPs are more comprehensive and lower cost - continuation of the current trend.  My estimate is that 98% (80% confidence range more than 90%) of the samples predicted C type by my definition will end up in such a haplogroup.

           This confidence is based on a combination of statistical calculation, plus judgment where calculation is not possible;  for more discussion see my confidence topic.

           C type seems to be about 80% of L540, based on only 2 L540+ not C type, compared to 7 independent C type samples, for 77.8%, in the 67 marker data.  Confidence in this 80% is not high because it is based on only two samples.  There may be more samples in the STR neighborhood that will test out L540+ in the future, with STR values quite different than my current L540 definition.  Also, there may be L540- samples in the future that match my current L540 definition.

           In other words, I have high confidence that C type predictions are L540+, but not high confidence in the L540 predictions outside C type.

           I expect to update my L540 definition as data accumulates, thereby improving the confidence for L540 STR predictions outside C.

           For now, I recommend the L540 test to everyone in the neighborhood table, with particular emphasis on those with closest step to my current L540 definition, and with the caveat that there may be more L540 beyond that table.

           C type is close to the same as a clade defined by a particular mutation(s) at the DYS389II marker, from value 30 to 32.  We can even speculate that clade is identical to C type data.

           A particular mutation at the DYS594 marker, from 12 to 11, seems to define a “father” clade that is slightly older than L540.  The same might be true for a particular mutation at the DYS636 marker, although more 111 marker data is needed to estimate if 636 is younger or older than 594 and / or L540.  With more data, these two markers will provide a foundation from which the age of other mutations can be estimated, because independent mutations in other markers are unlikely to also have mutations in these two foundation markers.

           The L540 data is bimodal in a number of markers.  Each of these bimodal markers is evidence that there is a significant subclade division than might be determined for L540 structure.  However, these various suggestions point to different divisions;  so far no two of them are strongly correlated.  With more data, it may be possible to split L540 based on statistical correlation of STR markers, using my mountain method.

 

Ysearch

 

           Update 17 Jul 2011:

           479H7 is a direct link to the my modal haplotype for L540.

           QAZ7P is a direct link to the my modal haplotype for C type.

           If you are not listed in the table above you can compare your data on Ysearch.  You can compare your step genetic distance to these modal haplotypes if you have the standard 12, 25, 37, or 67 markers.  The comparison may not work if you have a non standard marker set.  The cutoff for each marker set is given in the legend in the table above.

           Brief description of Ysearch.  Link to the site home:  http://www.ysearch.org.

           To join Ysearch, click on the Create A New User tab, where you can upload your Y-DNA STR data from a number of testing services.  Or, you can type in your data.  You end up with a “User ID”.

 

           Instructions for comparison to V13C at Ysearch:

           Click here:  Research Tools (or click on the tab with that name)

           Copy the following line into the “UserIDs” bar at the Research Tools page:

                                  USEID, 479H7, QAZ7P

           Change USEID to your User ID.

           You need to type the Captcha puzzle for access.

           Click on ‘Show genetic distance report” to see your step genetic distance from C type and from L540.

 

Ancestry.com

 

           www.Ancestry.com  is the web page for a commercial DNA testing company.  Men with Y-DNA test results can choose to make results available for matching to others.  Kargul originally matched with me at this site.

           I last checked for matches 16 May 2011.  There are 9 close matches of Y-DNA to Kargul & me, but these are not close enough to include in my Neighborhood Table.

 

Age of C Type

Age of L540

 

           Comment 25 Mar 2013:  this topic needs update modification using the latest data.  The new on-line Excel file versions do not yet have the ASD sheets.  Coming soon.

           Topic update 11 Jul 2011:

           The discussion in this topic is based on the sheet “ASD” in the two 67 marker analysis files.

           Average Squared Distance (ASD) is equivalent to variance of STR values.  Most people use ASD to calculate age in genetic genealogy, as I explain in my publications.  The ASD method has large known systematic uncertainties, discussed in my publications, which make age calculation uncertain.  It is not possible to calculate a confidence range because the systematic errors might be larger than the statistical errors, even for small samples of data.

           C type is quite young.  The age using all 67 markers comes out 805 years, cell N12 on the “ASD” sheet in C67.xls.

           Although I do not have high confidence in estimating the exact age of C type, there is additional evidence that C type and L540 are young.  The fact that C37, using the first 37 markers, provides a reasonable definition of C type is evidence of youth.  Old haplogroups do not provide reasonable modal definitions using all 37 markers, because of the wide variation in the rapidly mutating markers.  Another way of saying this:  using all 37 markers, there is a lot of overlap of old haplogroups.  Another way of saying it:  Isolation in STR values is evidence of youth.  My publication elaborates on this.

           The fact that C type samples can be extracted from V13 using only a 4 marker signature is also evidence of isolation and youth.

           Low SBP is evidence that C type and L540 are well isolated;  see the analysis for SBP.

           I expect the age to creep up somewhat as new data is discovered.  The correct well known statistical way to correct for this expectation:  divide by N-1 instead of N when figuring ASD.  My files do not use N-1 because that is not the common practice in genetic genealogy.  Using N-1 the age is 939 years instead of 805.  This estimate includes a best guess for future samples;  eventually samples will show up that fit C type but have more mutations than the samples so far - either because of true older nodes with the Y-DNA tree or just due to bad luck in random STR mutations.

           Conclusion:  My best guess for the age of C type is about 1,000 years. 

           Of course, this result has very low confidence, because it is based on only 7 samples.  The true age might be up to a factor of 4 older.  Or it might be a lot younger.  Although it is not possible to calculate a confidence range that includes systematic errors in age estimation, I suppose a reasonable high confidence range estimate would be 500 to 2000 years old.

           Discussion of C age estimation.  My calculation excludes two samples, Gwozdz and Kargul, whom I recruited, because a random selection is required.  Including them yields 806 years, corrected to 907, not significantly different.  All 67 markers are valid for use in a young clade like this, because there are no recLOH issues in the C data.  Issues in recLOH usually cause problems using all 67.  Those analysis files include comparison to a classic “Thomas” method, which I explain in my publication, and which yields 292 (vs 805) years, but I consider that just a demonstration that the old method uses too few markers, which by luck do not vary much in the C data.  My files allow the user to easily vary the markers and easily vary the samples.  Boyd seems to be the oldest sample;  removing him lowers 805 to 712, corresponding to the age without his hypothetically older clade.  The “oldest” two markers (far right of the ASD sheet has a sort by age) are DYS460 and DYS385a, with 3975 and 3058 years respectively;  this is evidence that those markers might correspond to clades within C type, but the evidence is not compelling yet because some markers are bound to come out old just by luck.  My definition of C type of course excludes such old markers, which is a proper procedure for a good definition.  But it is not fair to exclude old markers in age estimates because the zeros balance the old ones statistically;  C type has 45 markers with zero age - the lucky ones with no mutations.  Many zeros is evidence of youth but also evidence of few samples - only 7.  With more samples there will be fewer zeros.  Most published ASD age calculations include a correction making the result older, but the reason for that correction is population bottlenecks, which reduce ASD;  since C type is large for a very young clade, I’m guessing the age corresponds to a rapid population expansion after which there were no significant bottlenecks.  If I am right, no such correction would be appropriate.  Also, most published ASD ages use N instead of N-1, so a published correction method should be applied to my first number, 805.

           An age calculation for L540 is not appropriate, although L67.xls does it, getting 971 years in cell N12 of the ASD sheet.  That’s not fair.  C type dominates the data, but C type is really only one man, the MRCA.  Our data for L540 is really only 3 men:  the hypothetical C type MRCA, Fredeen, and Gebert.  It is well known that estimating the TMRCA of two men is highly uncertain, just due to the luck of random mutations.  For 3 men it is not much better.  The L540 age (TMRCA) is surely older than C type, just a bit older based on the meager data here of 2 men with STR values different than C type - but not very different at 67 markers than the C type samples with highest step.

           Age of a mutation is of course older than the TMRCA because there should almost always be multiple generations between nodes.  We know the L540 mutation happened after the node for Butman and before the nodes for Fredeen and Gebert.  That assessment will get better with more data.

           I said in the Abstract here that L540 might be twice as old as C type.  That is just a guess.

 

Origin of L540 and C type

           Update 17 Jul 2011:

           The neighborhood table shows that 9 of the 18 cluster C men indicated “Germany” in the “Origin” field of their data.  One of the two L540 men outside C indicate “Germany”.  That is very good (although not convincing) evidence that the ancestors (MRCA) of C type and L540 lived in what is now Germany.  A caveat:  men of German origin are more likely to purchase a DNA test and submit data to web databases.  It is obvious from data searches, in Ysearch for example, that men of east European ancestry are under represented.  This sample bias is difficult to measure, but I doubt the bias is sufficient to rule out a German origin as our best guess.

           The parent V13 haplogroup is concentrated in the Balkans, according to density maps on the web.  I’m guessing that our L540 MRCA lived in Central Europe, but I do not know that.  This cannot be checked with data available today, because the published Balkan Y-DNA data has too few markers to distinguish L540 or C type.  I look forward to the near future when data with more STR markers become available from the Balkans to verify my guess.  L540 SNP data would help, where I expect almost all to be negative from the Balkans.  On the other hand, if L540 is common in the Balkans, that would imply a probable MRCA origin in the Balkans, with subsequent expansion into Central Europe.

           Bird published evidence for a hypothesis that E1b1b1a1b (V13) appeared in England, concentrated at the two locations of ancient Roman garrisons, because of men from Moesia Superior who joined the Roman Legions when the Romans conquered the Balkans.

           Speculation:

           Model L540A:  V13 in Europe springs largely from Roman Legionnaires from the Balkans.  Due to the statistics of Y-DNA, most men do not form lasting clades, but many Balkan Legionnaires were lucky enough in their male line descendants so that many small V13 clades in Europe today correspond to individual Roman Legionnaires.  Because enlistment in Balkan armies, and subsequent enlistment in the Roman army, is largely random from the point of view of Y-DNA, these clades are a random selection from a much larger population, so the Legionnaire founders had very variable STR values.  The clades today have STRs quite different from each other.  Most clades are small enough that no samples, or only 1 or 2 samples, are present from each in the databases today.  The two L540 samples from outside C type, and the one sample just outside L540, and others not yet tested for the L540 SNP, represent such small clades.  C type is an exception, with 18 samples available today.  The MRCA of C type was a descendant of one of these Legionnaires, but that MRCA lived about 1,000 years later, in what is now Germany.

           Model L540A1:  C type is larger just by luck.  Statistically, some clades are necessarily larger than others.  It is not very surprising that one clade is unusually large.  We do not notice small clades, so of course we are now studying C type because it stands out.

           Model L540A2:  C type is larger because a descendant who lived about 1,000 years ago was a king or otherwise very prominent man, so his family grew much faster than others.  His ancestors were not prominent, so C type is isolated, but not more isolated than those other small clades from Roman times.

           Model L540A3:  C type is larger because of a local population expansion during Medieval times.  The C type MRCA was one of many who participated in these good times, but the others were from other haplogroups.  For example, the population expansion might have been associated with a Germanic R1b tribe, where the C type MRCA was an outsider who joined the tribe before the expansion, along with outsiders from other haplogroups.

           Model L540A1:  C type is larger for another population expansion reason.

           Model L540Aa:  The relatives of the Roman Legionnaires stayed in the Balkans.  In the near future L540+ samples will show up from the Balkans.  C type will show up, and it will not be particularly isolated in STR values, because the ancestors will not be the same at the C type MRCA.

           Model L540Ab:  There was a population bottleneck (or 2 or more bottlenecks) in the Balkans during the past 2,000 years.  War, famine, whatever.  There was a later population expansion in other haplogroups, so very few if any L540+ samples will show up in the Balkans.

           Model L540Ac:  That parent population did not get entirely wiped out.  It survives, in a remote area.  Maybe a group of villages in the Balkan mountains.  The population has not grown much over the centuries.  If we go there and test for Y-DNA we’ll find lots of L540 men, some of them C type.

           Model L540B:  Not Roman.  A tribe of barbarians showed up in Germany about 1,000 years ago.  Prior to that, they passed through another region, not necessarily the Balkans, where a lone L540 individual joined them.  This is similar to ModelL540A3.  There are Ba, Bb, Bc variations similar to Aa, Ab, Ac.

           Model L540C:  He didn’t join a Roman army.  He was a medieval trader.  A very charming traveling salesman.  He fathered children all over central Europe, mostly in what is now Germany.  This model has similar variations to models A and B.

           Model L540D:  I can think of other speculative scenarios.  I’m sure you can, too.

           The point of these examples:  we don’t know the history, but C type is unusual in that it is young, small, and well isolated in STR values (a small mountain in haplospace).  It is not closely related to the rest of L540 or V13.  The migration history of the very large V13 parent haplogroup may or may not be relevant to the history of the relatively small C type hypothetical haplogroup.

 

Validity of C Type

 

           Update 10 Jul 2010.  Quite frankly, I was surprised by cluster C.  Friedman did a good job finding this one.  I admit I dismissed it when I first saw cluster C in 2007 because it was so small that statistical significance did not seem possible to me.  I postponed analysis until Jan 2010, independently verifying cluster C as C type.

           By “valid” I mean a cluster whereby most of the samples belong to a single clade, and whereby very few other samples in the database belong to that clade.  In other words, a valid cluster should eventually have a corresponding SNP discovered.  Throughout 2010 I confidently predicted such an SNP here in this topic, although I doubted it would be discovered soon.  L540 turned out to be almost the same as C type, although slightly larger and quite a bit older, as discussed elsewhere in this web page.  As samples predicted C type test L540+, this adds evidence that C type corresponds to a clade.

 

My WTY Analysis

 

           Update 23 Feb 2012:  Fifteen new SNPs were discovered in my “Walk Through the Y” (WTY).  L535 through L547, L614, and L618.  All 15 are available as commercial SNP tests from  FTDNA.

           My WTY test read about 200,000 base pairs in Feb 2011.  In Feb 2012 the test has expanded to twice that many.  For details, here is a link for this "WTY" commercial product from FTDNA.

           I announced 8 new SNPs here on 29 Mar 2011.  The count on 30 Mar was 13 new SNPs in my WTY.  L614 was added in June.  L618 was added in August.  That was a lot more than I expected.  I now realize that’s because FTDNA expanded the number of DNA bases included in WTY just before my test.  Also, I seem to have been the first WTY from E-M78 in quite some time.  Since then, a few others from M78 and V13 have tested, so there are quite a few more new SNPs of interest recently discovered.

           I tracked the status on these 15 SNPs right here on this web page for a year, in detail.  Recently the positions in the Y-DNA tree have been determined for most of these SNPs of interest to me.  I recently removed most of the detail from this page.  I’m leaving the Summary, below, for a while because other people have links to that Summary.  I’ll remove most of this, including the Summary, later in 2012.

           In late 2011 the SNP Tracker was set up, as part of the E-M35 Project, to track all new SNPs of interest.  That’s another reason for me to drop my details here.  That SNP Tracker merges data from WTY, from the 1000 Genomes, and from SNP tests by members of the E-M35 Project.

 

SNP Summary

 

           Update 18 Jan 2013.  For a detailed SNP tree of the E-M35 haplogroup, see the SNP Tracker.  This topic used to have a summary of the SNPs found in my WTY, but the SNP Tracker is now a better place to find an update.  Only L540 defined a new haplogroup.  L542 is equivalent to V13.  The others are all equivalent to known haplogroup SNPs older than V13.

 

SNP Test Orders

           Update 16 Dec 2014:

           SNP tests cost about $39 each from FTDNA if your sample is already there from previous testing.  From your FTDNA home page, in the Y-DNA section, click on “Haplotree & SNPs”.  Next, below the haplotree, just above your SNP results (bottom of the page), click on “advanced SNP order form”.  (Do not click on “Order Selected SNPs” unless the SNP you wish is available for selection.)  Next, the box “Test Type” should say “SNP”.  Type the SNP code (for example L540) into the “Find” box to search for it.  Click on “Find” and when the SNP comes up click on “Add” to order it.

           FTDNA has been slow adding new SNP tests.  If FTDNA does not have the SNP for your new haplogroup, try Yseq.

 

References & Sources

 

           Big Y:  https://www.familytreedna.com/learn/y-dna-testing/big-y/.  A commercial project  at FTDNA for reading most of the DNA of the Y chromosome.  New SNPs are being discovered in the Big Y data provided by customers.  This Big Y test replaces the smaller Walk Through Y product, no longer offered.  Walk Through Y read more than 200,000 base pairs of the Y chromosome;  Big Y reads 10 million.  Other companies offer similar tests;  I guess I recommend FTDNA because I like the convenience of most L540 data being available at the E-M35 project, next:

           E-M35, a project at FTDNA, is my main source of data.  Previously called E3b.  Link:  http://www.familytreedna.com/public/E3b.  The official name today would be E1b1b1.  ISOGG changes the name when new defining SNPs are discovered, so the name may change again in the future.  M35.1 is the name of the SNP that defines E1b1b1 within haplogroup E.

           Haplozone is a web site for analysis of data from the E-M35 project.  This site has not been updated since September 2013.  Link:  http://www.haplozone.net/e3b/project.  Data from E-M35, plus some data added from sources other than FTDNA, so this database is larger than the E-M35.  Page with a listing of proposed clusters:  http://www.haplozone.net/e3b/project/cluster/.  Page with L540 / C cluster samples:  http://www.haplozone.net/e3b/project/cluster/42.  Discussion forum:  http://community.haplozone.net/

           SNP Tracker is a web page added to the E-M35 project in late 2011, to keep track of all the new SNP branches in M35.  http://tinyurl.com/e-m35-snps

           The V13 data:  http://www.haplozone.net/e3b/project/cluster/10.  V13 is the defining SNP for E1b1b1a1b1a, a major branch haplogroup in E, and “father” of L540.  That page of data does not have the data for samples that have been assigned to clusters as subdivisions of V13, just the data that does not fit any downstream proposed cluster.  The number code for other clusters can be typed over that “10” to quickly get to other cluster data.

           Cluster C Data:  http://www.haplozone.net/e3b/project/cluster/42.

           Victor Villarreal is an administrator for the E-M35 (E3b) Project.

           Andrew Lancaster is an administrator for the E-M35 (E3b) Project.  Andrew has been particularly patient with me with long helpful email discussions.

           Elise Friedman a co-administrator for the E-M35 (E3b) Project and is administrator for the Jewish E3b project.

           Peter Gwozdz.  That’s me.  pete2g2@comcast.net.

           Yseq:  www.Yseq.com.  A company that provides Y- SNP tests at competitive price and fast turnaround.

 

Revision History

2010 Jan 14 original draft version

2010 13 updates

2011 28 updates

2012 6  updates

2013 7 updates

2014 Mar 10 updates; major rewrite including update of the Neighborhood table

2014 Apr 21 minor update of table;  update ISOGG code for L540;  change “new” to “small” in the title

2014 May 3 Big Y news

2014 May 7 S3003 is father of L540

2014 May 24 three more L540+;  Big Y data from Hochreiter & Gebert

2014 June 21 a few minor updates

2014 Aug 10 a few minor updates

2014 Oct 11 update SNP ordering, minor update of table, news about ongoing SNP tests

2014 Nov 14 update of the Neighborhood Table

2014 Dec 14 update of a few topics including Abstract and L540

2014 Dec 15 a few more topics updated

2014 Dec 16 a few more topics updated

2014 Dec 17 a few more topics updated;  P cluster topic deleted

2014 Dec 19 a few more topics updated