



Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
A solution for analyzing diallel data using a uniform approach based on the theory of restricted linear models. The paper derives formulae for estimating diallel parameters and their standard errors, and obtains uniform statistics for hypothesis testing of parameters, factors, and differences between general and specific combining abilities. The authors have developed a windows software program, gsca, for analyzing a flexible diallel linear model that can handle various diallel crosses and data structures. Gsca can estimate fixed effects of general and specific combining abilities and provide hypothesis testing for these parameters and various factors.
Typology: Essays (university)
1 / 7
This page cannot be seen from the preview
Don't miss anything!
126 Silvae Genetica 61, 3 (2012)
P. GORE and R. VAILLANCOURT (2007): Advances in repro- ductive biology and seed production systems of Eucalyp- tus : The case of Eucalyptus globulus. IUFRO Working group 2.08.03 – Eucalypts and diversity: balancing pro- ductivity and sustainability. Durban, South Africa, 22–26 October 2007. P OUND, L., M. WALLWORK, B. POTTS and M. SEDGLEY (2002): Early ovule development following self- and cross-pollinations in Eucalyptus globulus Labill. ssp. globule. Annals of Botany 89 : 613–620. S EDGLEY, M. and R. S MITH (1989): Pistil receptivity and pollen tube growth in relation to the breeding systems of Eucalyptus woodwardii ( Symphyomyrtus : Myr- taceae ). Annals of Botany 64 : 21–31. S ETTERFIELD, S. and R. WILLIAMS (1996): Patterns of flow- ering and seed production in Eucalyptus miniata and E. tetrodonta in a tropical savanna woodland, Northern Australia. Australian Journal of Botany 44 : 107–122.
S OKAL, R. and F. R OHLF (1995): Biometry: the principles and practice of statistics in biological research (3rd^ Edi- tion). W. H. Freeman and Company, New York pp 702–703. T RINDADE, H., L. BOAVID, N. BORRALHO and J. FEIJO (2001): Successful fertilization and seed set from polli- nation on immature non-dehisced flowers of Eucalyptus globulus. Annals of Botany 87 : 469–475. VAN WYK, G. (1977): Pollen handling, controlled pollina- tion and grafting of Eucalyptus grandis. South African Forestry Journal 101 : 47–53. V ENN, T. (2005): Financial and economic performance of long-rotational hardwood plantation investment in Queensland, Australia. Forestry Policy and Economics 7 : 437–454. W ILLIAMS, D., B. POTTS and P. BLACK (1999): Testing single visit pollination procedures for Eucalyptus globulus and E. nitens. Australian Forestry 62 : 346–352.
Abstract The diallel mating designs have been extensively employed to gain genetic information by crop and tree breeders, but analysis of diallel data faces some chal- lenges because the same parent acts both male and female roles. Theoretically, little attention was paid to the statistical inference and hypothesis testing for a fixed diallel linear model. In this paper we provide a uniform solution to any fixed diallel linear model with matrix expression based on the theory of restricted lin- ear models. We derive formulae for estimating diallel parameters and their standard errors, and obtain uni- form statistics for hypothesis testing of parameters, fac- tors and differences between general combining abilities (GCA) or specific combining abilities (SCA). To put the result into practice, we have developed a Windows®^ soft- ware program “GSCA” for analyzing a flexible diallel linear model that could contain the GCA, SCA, recipro- cal, block and environment effects as well as interaction effects such as GCA by environment. GSCA can perform
analyses not only for Griffing’s four types of diallel crosses but also for more complicated diallel crosses whether the data structure is balanced or unbalanced. A real example is given to illustrate the convenience, flexi- bility and power of our software for diallel analysis.
Key words: Diallel mating design, restricted linear models, gen- eral combining ability, specific combining ability, least squares.
Introduction Diallel mating designs have been widely used in crop and tree breeding programs to obtain genetic informa- tion of the parents involved for determining breeding strategies (SPARAGUE and TATUM, 1942; JINKS, 1954; B URLEY et al., 1966; SNYDER and NAMKOONG, 1978; YEH and HEAMAN, 1987; JONSSON et al., 1992; WU and M ATHESON, 2005; GARDNER et al., 2007). A diallel cross consists of of all possible crosses between a number of varieties. If there are p^2 combinations, consisting of p selfings and p ( p –1) crosses, since reciprocal crosses, alternating pollen and ovum parents may be differential by maternal or paternal effects (MAYO, 1987). As p increases, p^2 becomes impossibly large. For this reason, many methods have been developed for the examination of partial diallel crosses (MAYO, 1987).
By CHUNFA TONG1), GUANGXIN LIU , LIWEI YANG and JISEN SHI2)
Key Laboratory of Forest Genetics and Biotechnology of Ministry of Education, Nanjing Forestry University, Nanjing 210037, China
(Received 3rd^ December 2001)
(^1) ) Corresponding author: CHUNFA TONG. Key Laboratory of Forest Genetics & Biotechnology of Ministry of Education, Nanjing Forestry University, Longpan Road No. 159, Zip code 210037, Nanjing, Jiangsu Province, China. Phone and Fax: 86-25-
DOI:10.1515/sg-2012-
127
SPRAGUE and TATUM (1942) originally defined the most important concepts of general combining ability (GCA) as the average performance of a line in hybrid combina- tion, and specific combing ability (SCA) referring to spe- cific crosses that exhibit superiority or inferiority to the average performance of the lines involved. MAYO (1987) more clearly defines GCA as the average performance of a strain in a series of crosses and SCA as deviation in a particular cross from performance predicted in the basis of GCA. GRIFFING (1956a) summarized diallel crosses into 4 cat- egories (i.e. full diallel, full diallel without selfings, half diallel with selfings, and half diallel without selfings) and provided formulae of calculating the fixed effects of GCAs and SCAs as well as the variance components of GCAs and SCAs as random effects for balanced data. Griffing’s analysis ignored the possible selfings, since these can introduce bias, but as noted by G ILBERT (1958), if the particular parents are of interest in themselves, it may be more important to include the selfings (M AYO, 1987). The uniqueness that each observation has two lev- els of the same main effect, and the common phenomena of missing plots or missing crosses (especially in forest trees) in a diallel mating design make it difficult to esti- mate related genetic parameters in a diallel statistical model (WU and MATHESON, 2001; XIANG and LI, 2001). Two kinds of statistical linear models are commonly employed to analyze diallel crosses with balanced or unbalanced data. One is the fixed-effects linear model in which GCA and SCA are treated as fixed effects to be estimated to rank the parents for selection (HUBER et al., 1992; WU and MATHESON, 2000), and the other is the random-effects linear model where GCA and SCA are considered as random effects for variance component estimation and further for estimating heritabilities and genetic correlations (WU and MATHESON, 2001; XIANG and LI , 2001). NELDER (1977) discussed various views as to the procedures which involve ‘fixed’, ‘mixed’ and ‘ran- dom’ models. In this study, we focus on the fixed-effects linear model to estimate GCA and SCA and to provide hypothesis testing for these parameters and various factors. Early solutions for fixed GCA and SCA effects (G RIFFING, 1956a; BECKER, 1975; FALCONER, 1981; H ALLAUER and MIRANDA, 1981) were limited to balanced data and based on ordinary least squares (OLS) with the restrictions that the sum of all effect estimates for an factor equals to zero (HUBER et al., 1992). Later on, HUBER et al. (1992) and WU and MATHESON (2000) described the fixed linear models in matrix notations and gave the OLS estimates by reducing parameters using the sum-to-zero restrictions. Although their method can deal with unbalanced data, it requires ana- lyst to reconstruct the linear model almost by hand and some problems remain such as hypothesis testing for the reduced parameters. A series of analytical tools have been developed for the cumbersome computations of the GCA and SCA in diallel crosses. These tools are primarily divided into two class- es: 1) packages written in high-level computer program- ming languages, and 2) programs based on standard commercial packages such as the Statistical Analysis System (SAS). The Fortran program DIALL was devel-
oped by S CHAFFER and USANIS (1969) only to estimate GCA and SCA variance components, whereas another Fortran program written by SNYDER (1975) could calcu- late the fixed GCA and SCA effects. The drawbacks of these Fortran programs include unfriendly user inter- face, inability to handle large data size and inflexibility in choosing a fixed- or random-effects model (JOHNSON and KING, 1998; XIANG and LI, 2001). Since SAS is a powerful tool for statistical analysis, a number of SAS programs have been developed for diallel analysis (ZHANG and KANG, 1997; JOHNSON and KING, 1998; WU and MATHESON, 2000; WU and MATHESON, 2001; XIANG and LI, 2001; MURRAY et al., 2003; ZHANG et al., 2005). Some of these SAS programs can estimate the fixed GCA and SCA effects and their standard errors and provide hypothesis testing for genetic parameters and factors. Others can estimate variance components of GCA and SCA. The SAS codes of these programs are relatively complicated because the same parent generally plays both the male and female roles in a diallel mating design so that the SAS procedures cannot be directly applied to analyze diallel data. Users must be familiar with SAS programming so as to modify the SAS codes when they adopt these programs to analyze their data. Now that Microsoft Windows®^ is overwhelming popular operating system in personal computers, Windows based software is desirable specifically for analyzing diallel crosses. In this paper we describe how to construct the fixed linear model and its linear restrictions in matrix nota- tions for a diallel mating design. With the matrix expression of the diallel linear model, the estimates and their standard errors of the parameters such as the fixed GCA and SCA effects are given by a single formula based on the theory of linear models with linear restric- tions, respectively (WANG and CHOW, 1994; RAO et al., 2008). Uniform statistics are obtained for hypothesis testing of each parameter and various factors such as GCA, SCA and the interaction between GCA and envi- ronment. A formula is presented for hypothesis testing of the difference between GCAs or SCAs. Windows®^ soft- ware has been developed for analyzing a flexible diallel linear fixed effect model that could contain the GCA, SCA, reciprocal, block and environment effects as well as interaction effects such as GCA by environment. The software can perform analyses not only for Griffing’s 4 diallel mating designs but also for more complicated diallel designs whether the data structure is balanced or unbalanced. The published radiata pine data (WU and M ATHESON, 2000) was analyzed to illustrate the conve- nience, flexibility and power of our software for diallel analysis.
Statistical Methods Restricted Linear Model and Least Square Estimates Consider a simple linear model for a diallel mating design, which is usually specified as (1)
after GRIFFING (1956a) which follows SPRAGUE and TATUM (1942) where yijk is the k th observation of the ij th cross; μ is the overall mean; G (^) i and Gj are the GCA
DOI:10.1515/sg-2012-
129
There are two kinds of hypothesis tests in analysis of a diallel cross. One is the test for a factor or an effect, in which the null hypothesis is of the form (13)
and the coefficient matrix in (8) can be written as H = ( 0 , IS , 0 ). For examples, in linear model (4),
if the null hypothesis is H 0 : G 2 = 0, and
if the null hypothesis is H 0 : G 1 = G 2 = G 3 = G 4 = 0. In this case, let M *^ = M –1 H ’, then we have by (12)
(14) The other test is for difference between GSAs or SCAs. The null hypothesis is of the form H 0 : i – j = 0, and the matrix H in (8) is only a row matrix with the i th ele- ment 1 and the j th element –1. Hence, we can obtain by (12)
(15)
where Mi and Mj are the i th and j th columns of matrix M –1^ , respectively, and Mij is the ij th element. Another problem is the pseudoinverse computing in (10) because the pseudoinverse of a matrix commonly is not unique and the algorithm is relatively complicated. It can be solved by linearly transforming the row vectors of matrix C so that the calculation of the pseudoinverse of matrix Q can be avoided. Suppose that there exists an
row rank, then it can be deduced that
(16)
where only matrix inverses are involved instead of pseudoinverse.
Software Development We have developed Windows®^ software, GSCA, for computing GCA and SCA in extensive diallel mating designs. The algorithm is based on the above theoretical results of restricted linear models and can be deal with missing data. In addition to providing GCA and SCA estimates, GSCA gives hypothesis testing for model, various factors and differences between GCAs or SCAs. The software can be freely downloaded from the web- page: http://fgbio.njfu.edu.cn/tong/GSCA/gsca.htm. GSCA focuses on treating the following linear model of a diallel mating design with fixed effects:
where y (^) ijlkm is the m th observation of the l th block with- in k th environment for the ij th cross; μ is the overall mean; E (^) k is the k th environment effect; Bl ( k ) is the l th
block effect in the k th environment; Gi and Gj are the GCA effects of the i th female and j th male respectively; Sij is the SCA effect of the i th and j th parents; Rij is the reciprocal effect due to the cross between the i th female and the j th male; GE (^) ik and GEjk are the k th environ- ment with the i th and the j th GCA interactions, respec- tively; SE (^) ijk is the k th environment with the ij th SCA interaction; REijk is the interaction of the k th environ- ment with the reciprocal effect of R (^) ij ; and ijlkm is the within plot error term. The linear restrictions for model
k
l
Bl ( k ) = 0 for each k ,
i
i
j
Sij = 0 with S (^) ij = Sji , Rij + Rji = 0,
i
i
j
SEijk = 0 with SE (^) ijk = SEjik for each k , and REijk + REjik = 0.
The model can be reduced to some simpler forms that could analyze Griffing’s four kinds of diallel crosses. For example, for a half-diallel mating design in a single site, the model will be altered to (18)
l
i
i
Sij = 0 and
The raw data to be analyzed by GSCA should be for- matted in a text file as shown in Table 1. The first line lists factor and trait names of “Pi”, “Pj”, “Blk” ,“Env” ,“Trt1”, “Trt2”, etc., and must be in such order, where “Pi” stands for female parent, “Pj” for male parent, “Blk” for block, “Env” for environment, “Trt1” for trait 1, “Trt2” for trait 2, and so on. From the second line on, each line is the data for an individual that corresponds to factors and traits in the first line. Since GSCA can deal with the flexible model, either block or environ- ment factor or both can be missed in the raw data file. An appropriate linear model could be chosen by GSCA itself or by hand when you use GSCA for analysis of a diallel cross. When the data is successfully opened by GSCA, if you click the menu “Analysis” and then the option “Run”, a dialog window ( Fig. 1 ) will pop out for parameter choosing. The default linear model given by GSCA contains the main effects of GCA, SCA, reciproca- tion, environment and block if they exist. The interac- tion effects such as GCA by environment and SCA by environment could be chosen by clicking your mouse on the parameter selection window ( Fig. 1 ).
Table 1. – Data format of GSCA. Either block or environment factor or both can be missed.
DOI:10.1515/sg-2012-
130
We used the radiata pine data published in WU and M ATHESON (2000) as an example to illustrate GSCA’s function and its usage convenience. The data is the diameter at breast height (DHB) of radiata pine mea- sured at age 11 in two environments for a 6 6 half- diallel mating design with 3 blocks and 4-tree plots at each site. W U and M ATHESON (2000) performed the analysis using the following linear model,
which is the reduced form of model (17). We used this model again to implement the calculations of the radiata pine data with GSCA. The analysis here was carried out for scenario 1 of missing crosses (1,3), (2,5) and (5,6) in WU and MATHESON (2000). Thus, the data is unbalanced in both cross and plot levels. The output by GSCA is list- ed in Appendix A. It is observed that GSCA not only gives the results that the SAS program DIAFIXED.SAS does but also the results of hypothesis testing for differ- ences between GCAs or between SCAs.
Discussion We have applied the theory of linear models with lin- ear restrictions to describe the inherited nature of gen- eral diallel mating designs. Formulae of estimating fixed genetic parameters and the F-ratio statistics for hypothesis testing of parameters were derived to get the genetic information for determining breeding strategies. The results are adapted to any diallel matings including Griffing’s four types of diallel crosses and even more complex mating designs that may have environmental factors and their interactions with genetic effects. The methods can handle balanced and unbalanced data. Unbalanced designs are a common phenomenon, espe- cially in tree breeding programs. Compared with previous statistical methods for diallel analysis, our statistical methods based on restricted lin- ear models are superior to Griffing’s diallel methods (G RIFFING, 1956a) and the OLS analysis (HUBER et al., 1992; WU and MATHESON, 2000). GRIFFING (1956a) pro-
vided different formulae to calculate the fixed GCA and SCA effects using genotype means for the four types of diallel crosses, but this method is limited to balanced data structures and lacks the flexibility to be extended to more complicated mating designs. The OLS methods make the parameters estimable by reducing redundant parameters utilizing the sum-to-zero restrictions. How- ever, the procedure for reducing parameters is so trivial that it is almost operated by hand for specific mating design and is difficult to implement with computer pro- gramming. Furthermore, the standard errors of the reduced parameters and statistics for hypothesis testing cannot be obtained directly from the reduced linear model. On the contrary, we propose a universal approach to estimating the fixed parameters and giving statistics for hypothesis testing of single or multiple parameters or the difference between them. Our statistical formulae (eqs. 6~7,10~11) for parame- ter estimation and hypothesis testing are readily calcu- lated because they are expressed in matrix forms and a lower dimensional matrix inverse is involved. First, since the number of parameters p is generally far less than the sample size n , calculating the inverse of p p matrix M = X ’ X + L ’ L in these formulae does not need much time. Second, although the vector of parameter estimates (eq. 10) under the null hypothesis contains the pseudoinverse of matrix Q , which is more complex than a normal inverse in algorithm, linear transforma- tions are applied so that eq. (10) is replaced by eq. (16) where the pseudoinverse is avoided instead of common matrix inverse. Third, the key to calculating the F sta- tistics (eq. 11) is to obtain the estimate of the parameter vector, ^ ^ H , which depends on the inverse of matrix T. This inverse can be simplified and becomes eqs. (14) and (15) by using eq. (12) and considering the coefficient matrix C of the two types of hypothesis testing, respec- tively. With these technical treatments, it is feasible to calculate the parameter estimates and the statistics of hypothesis testing in a short while. GSCA is a typical Windows®^ based software developed for analyzing model (17) based on the statistical results we have obtained in this paper. It has a user-friendly interface and can give a comprehensive output with one click. Compared with the SAS programs prepared for analysis of the fixed diallel linear model (WU and MATH- ESON, 2000; ZHANG et al., 2005), GSCA has several major advantages: i) The SAS packages cannot be directly applied to ana- lyze diallel data because of the uniqueness that the same parent plays both the male and female roles in a diallel cross. Hence, the codes of these SAS programs are usually complicated and most breeders feel difficult to understand them. Users must spend much time to understand and modify these SAS codes so that they can use the modified program to analyze their own data. However, users of GSCA have no annoyance to modify any codes; ii) GSCA provided t value and its p-value for hypothe- sis testing of difference between GCAs or SCAs which is equivalent to Griffing’s LSD methods, whereas the DIAFIXED.SAS program (WU and MATHESON, 2000) did
Figure 1. – The main window of GSCA with the pop-out win- dow for model selection.
DOI:10.1515/sg-2012-
Herausgeber: Johann Heinrich von Thünen-Institut. Bundesforschungsinstitut für Ländliche Räume, Wald und Fischerei. Schriftleitung: Institut für Forstgenetik, Sieker Landstrasse 2, D-22927 Grosshansdorf Verlag: J. D. Sauerländer’s Verlag, Berliner Strasse 46, D-63619 Bad Orb Anzeigenverwaltung: J. D. Sauerländer’s Verlag, Bad Orb Gesamtherstellung: PPPP Norbert Wege e.K., Gladenbach — Printed in Germany.
© J. D. Sauerländer’s Verlag, Bad Orb, 2012 ISSN 0037-
Appendix GSCA output from the radiata pine data with missing crosses of (1,3), (2,5) and (5,6).
DOI:10.1515/sg-2012-