Download SHORT BIOTECHNOLOGY NOTES and more Lecture notes Plant Biotechnology in PDF only on Docsity!
CONTENTS
1. GENOMICS
2. GENOME SEQUENCING
3. GENOME SELECTION
4. ALLELE MINING
5. QTL MAPPING
6. ASSOCIATION MAPPING
7. TILLING
8. MAGIC POPULATION
9. NAM POPULATION
- SNPs GENOTYPING
- LINKAGE DISEQUILIBRIUM
- TRANSCRIPTOMICS
- PROTEOMICS
- METABOLOMICS
- METAGENOMICS
- PHENOMICS
1. GENOMICS
INTRODUCTION
- The term genomics was first used by Thomas Roderick in 1986.
- It refers to the study of structure and function of entire genome of an organism.
- Genome refers to the basic set of chromosome. In a genome, each type of chromosome is represented only once. Now genomic is being developed as a sub-discipline of genetics which is devoted to the mapping, sequencing and functional analysis of genome.
MAIN FEATURES OF GENOMICS
- It is a computer added study of structure and function of entire genome of an organism.
- It deals with mapping of genes on the chromosome.
- It deals with the sequencing of genes of an organism.
- It is a rapid and accurate method of gene mapping. It is more accurate than recombination mapping and deletion mapping techniques.
- The genomic techniques are highly powerful, efficient and effective in solving complex genetic problem.
- (^) Now the use of genomic techniques has become indispensable in plant breeding and genetics.
TYPES OF GENOMICS
- Structural genomics
- It deals with the study of the structure of entire genome of an organism. In other words, it deals with the study of the genetic structure of the each chromosome of the genome.
- The Maxam-Gilbert sequencing and the chain-termination methods are basic methods of sequencing, rst reported in 1977 (Maxam and Gilbert 1977; Sanger et al. 1977 ). Subsequently, the shotgun sequencing method was developed for analysis of >1000 base pairs (Staden 1979 ).
- By this method the target DNA is broken into random fragments and individual fragments which can be used for sequencing, and their sequences can be reassembled on the basis of their overlapping regions. Then, several different DNA sequencing techniques were developed; these sequencing methods are faster, cheaper and have very high throughput.
- The genome sequences of several plant species, viz. thale grass, rice, grapevine, poplar, papaya, sorghum and many more, are either completed or in the pipeline. The sequences of most crop genomes will probably be in hand by the next decade.
- This information will seed the in silico birth of plant biology for functional plant genomics, and is likely to reveal relationships between DNA sequence variation and genetic diversity (Hamilton and Buell 2012 ; Pennisi 2007 ).
- (^) The Omics-based approaches e.g. genomics, trascriptomics, metabolomics etc., are very important for disease diagnostics and treatment, and equally important for improvement of crops for food and fuel production.
- Genomics and transcriptomics have radically altered the scope of genetics by providing a landscape of genes and their epigenetic states, analysis of enormous range of genetic variation, and the potential to measure gene expression with high quality and accuracy.
- Systems breeding approaches can be used to study the diverse genomic information, based on which phenotypic information can be predicted from genotypic information and thereby accelerate crop- improvement programs to address food security issues (Bevan and Uauy 2013 ; Edwards and Batley 2010).
- Genomics resources are used to derive molecular markers that enable indirect selection for traits that are not very amenable to phenotypic selection. Marker-assisted selection may offer one or more of the following advantages over phenotypic selection: ease of implementation, lower costs, faster assays that are independent of the environmental and developmental factors, and increased efcacy and reliability.
- Markers allow selection for traits like yield, in the greenhouse and off-season nurseries early in the growing season so that the selected plants can be ordered in crosses in the same growing season (Babu et al. 2004 ; Collard et al. 2005 ). These factors accelerate the crop development process and the private seed industry is extensively using markers in their breeding programs (Eathington et al. 2007 ).
- (^) One of the major limitations of markers has been the cost of genotyping, which is being effectively addressed by NGS technologies that have allowed the development of strategies for detection and genotyping for SNPs in a single step. In addition, reduced representation genotyping approaches are being developed to genotype the individuals of a population at a fraction of the cost that would be incurred if whole genome resequencing were done.
- Linkage mapping has facilitated positional cloning of several genes and quantitative trait loci (QTLs), are providing the base material for use of these genes in creation of the desired transgenic plants.
METHODS OF GENE SEQUENCING
- (^) Sanger Sequencing
- This method was developed by Frederick Sanger and his collegues in
- It is also known as Sanger’s method of DNA sequencing.
- Inthis method small concentration of Dideoxy nucleotide is used for chain termination. Hence this method is also referred to as dideoxy DNA sequencing procedure.
- This is most widely used method of DNA sequencing being accurate and simple.
- In this method a chain termination process is used to stop the DNA synthesis selectively at any one of the four DNA nucleotides.
- The primer or the chain terminator (Dideoxy nucleotide ) are labeled either with radioactive chemical or flurescent dyes. So this method is also called Chain termionation method.
Material required:
NEXT-GENERATION SEQUENCING
- Low-cost sequencing technologies, commonly referred to as of Next- Generation Sequencing (NGS) technologies, produce millions of sequencing reads concurrently (Church 2006 ).
- NGS rapidly generates huge amounts of sequence data in a very cost- effective way, and allows proling for nucleotide variation and large-scale discovery of genetic markers.
- These markers aid in the indirect selection for economically- important traits based on gene/ quantitative trait locus ( QTL) mapping and/or genome- wide association studies (GWAS). Furthermore, comparative genomics using NGS data provides better chances of identifying loci under selection.
- High-throughput sequencing technologies cut down the cost of sequencing, while ultra-high-throughput sequencing (UHTS) technologies are very fast and help to reduce the time required for sequencing (Schuster 2008 ; Tucker et al. 2009 ).
3. GENOMIC SELECTION
- Genomic selection (GS) or genome-wide selection (GWS) is a form of marker-based selection, referring to the simultaneous selection for many (tens or hundreds of thousands of) markers, which cover the entire genome in a dense manner so that all genes are expected to be in linkage disequilibrium with at least some of the markers (Meuwissen 2007).
- In GS genotypic data (genetic markers) across the whole genome are used to predict complex traits with accuracy sufcient to allow selection on that prediction alone. Selection of desirable individuals is based on genomic estimated breeding value (GEBV) (Nakaya and Isobe 2012) , which is a predicted breeding value calculated using an innovative method based on genome- wide dense DNA markers (Meuwissen et al. 2001 ).
- (^) GS does not need signicant testing and identifying a subset of markers associated with the trait (Meuwissen et al. 2001 ), i.e., GS can remove the need to search for signicant QTL-marker loci associations individually
(Desta and Ortiz 2014 ). In other words, QTL mapping with populations derived from specic crosses can be avoided in GS.
- However, it does rst need to develop GS models, i.e. the formulae for GEBV prediction (Nakaya and Isobe 2012 ). In this process (training phase), phenotypes and genome wide genotypes are investigated in the training population (a subset of a population) to predict signicant relationships between phenotypes and genotypes using statistical approaches.
- Subsequently, GEBVs are used for the selection of desirable individuals in the breeding phase, instead of the genotypes of markers used in traditional MAS (Jiang 2013a ). For accuracy of GEBV and GS, genome wide genotype data are necessary and require high marker density in which all quantitative trait loci (QTLs) are in linkage disequilibrium with at least one marker.
- The use of high-density markers is one of the fundamental features of GS (Desta and Ortiz 2014). GS is possible only when high-throughput marker technologies, high-performance computing and appropriate new statistical methods are available.
- (^) This approach has become feasible due to the discovery and development of a large number of Single Nucleotide Polymorphisms (SNPs) by genome sequencing and new methods to efciently genotype large number of SNP markers (Jiang 2013a ). The ideal method to estimate the breeding value from genomic data is to calculate the conditional mean of the breeding value given the genotype at each QTL (Goddard and Hayes 2007 ).
- This conditional mean can only be calculated by using a prior distribution of QTL effects, and thus this should be part of the research to implement GS. In practice, this method of estimating breeding values is approximated by using the marker genotypes instead of the QTL genotypes, but the ideal method is likely to be approached more closely as more sequence and SNP data are obtained (Goddard and Hayes 2007 ).
- In a recent review, Desta and Oritz ( 2014 ) discussed in detail the estimation of GEBV, the accuracy and gain of selection of GS, and other related issues. Since the application of GS was proposed by Meuwissen et al. (2001) to breeding populations, theoretical, simulation and empirical studies have been conducted, mostly in animals (Goddard and Hayes 2007; Jannink et al. 2010).
seed weight in soybean. The results suggested that GS exhibited higher prediction accuracy than MAS either for various cross-validations within the association panel or when unrelated panels were used in validation (Zhang et al. 2015 ). However, the number of loci/markers involved in MAS is usually much smaller than that needed in GS.
- MAS might have advantages in lowering genotyping cost for a relatively high prediction accuracy. GS has been highlighted as a new approach for MAS in recent years and is regarded as a powerful, attractive and valuable tool for plant breeding. Kumpatla et al. ( 2012) recently presented an overall review on the GS for plant breeding.
LIMITATIONS
- Desta and Oritz ( 2014 ) suggested that with the advent of cutting-edge Next-Generation Sequencing (NGS) and High-Throughput Phenotyping techniques, GS would revolutionize the applications of plant improvement programs. However, GS has not become a popular methodology in plant breeding, and there may be a long way to go before the extensive use of GS in plant breeding programs (Jiang 2013a ).
- The major reason might be the unavailability of sufcient knowledge of GS for practical use (Nakaya and Isobe 2012 ). Statistics and simulation discussed in terms of formulae in GS studies are most likely too specic and difcult for plant breeders to understand and to use in practical breeding programs.
- In addition, GS relies on the degree of genetic similarity between training population and breeding population in the LD between marker and trait loci (Desta and Ortiz 2014) , but in practice the breeding populations breeders are working on are considerably different from the training population studied. Therefore, one should not get too excited about GS in plants, in particular it directly applying to breeding programs.
- The comprehensive nature of population structure, especially in inbreeding or self-pollinated species, is a major barrier to implementing GS in plant breeding (Desta and Ortiz 2014). From a plant breeder’s point of view, GS can be practicable for a few breeding populations with a specic purpose, but may be impractical for an entire breeding program dealing with hundreds and thousands of crosses/populations at the same time (Jiang 2013a ).
- Therefore, GS must shift from theory to practice, and its accuracy and cost effectiveness must be evaluated in practical breeding programs to provide convincing empirical evidence and warrant a practicable addition of GS to the plant breeder’s toolbox (Heffner et al. 2009 ).
- Development of easily understandable formulae for GEBVs and user- friendly software packages for GS analysis will be helpful in facilitating and enhancing the application of GS in plant breeding.
4. ALLELE MINING
INTRODUCTION
- There are several options for identifying or capturing diversity that might not exist in the germplasm pool of existing breeding lines: allele mining, transformation, mutation breeding, use of landraces or synthetic polyploids and wide crossing (Able et al., 2008).
- Allele mining , which is important for utilizing novel alleles hidden in genetic diversity, will be discussed here. Molecular and functional diversity of crops genomes can be characterized by allele mining, identification of distinct ‘ haplotypes ’ for different inbred lines, single feature polymorphism (SFP) analysis, discovery of nearly identical paralogues (NIPs; Emrich et al.,
- and determination of their evolutionary implications.
- In general, there are two approaches that have been elaborated for allele mining: re-sequencing (e.g. Huang et al., 2009) and EcoTILLING .Whole genome genotyping using gene-based markers can be used as the foundation of the re-sequencing method. Allele mining from germplasm collections is in its infancy currently facing the fundamental challenge to establish which of the various alleles present is functionally different from the wild type and where possible to identify which new alleles beneficially influence the target trait.
- Methods to ascertain allele function include marker-assisted backcrossing (MABC), transformation, transient expression assays and association analysis using an independent set of germplasm for association mapping from that used to identify the original allele. As more of these studies are carried out, it is hoped that the growing database of comparisons between
carefully controlled phenotypic screens (not generally possible on a very large scale).
APPROACH OF ALLELE MINING
- Derect sequencing
- EcoTILLING ( Targeted Induced Local Lesion in Genome)
- When allele mining includes coding, non-coding and regulatory regions of genes it is called true allele mining.
- When there is mutation in intronic region we may observe alteration in phenotypes.
For example:
- Role of intronic mutation in rubi 3(Poly ubiquitin gene) in rice.
- VRN-1: Affect vernalization response in barley and wheat.
- A mutation in 5’ splice site of 1st^ intron of the waxy gene (WX) gene had resulted in 10 fold increase in gene activity in rice.
STEPS INVOLVED IN ALLELE MINING
- Selection of target trait (trait prority)
- Identification of accessions associated with desired phenotypic trait
- Selection of genes underlying the chosen target trait (gene target)
- Primer designing for whole length of gene
- PCR amplification from the identified accessions
- Ecotilling based mining
- PCR
- Heteroduplex
- Nuclease cleavage
- Li cor gel and SNP identification
- Confirmatory sequenceing
- Comparision of sequence data with phenotypic data and identification of superior alleles.
APPLICATION AND USES OF ALLELE MINING
- Identification of new haplotypes
- Discovery of superior alleles
- Similarity analysis (inter and intra species)
- Functional molecular markers for MAS
- Evolution study
- Promotor mining(Expression study and gene prediction)
5. QTL MAPPING
INTRODUCTION
- The Quantitative Trait Loci, commonly known as QTLs, has been defined in various ways by different scientists. Important difinitions of QTL are given as:
- The QTL may be defined as a region of DNA that is associated with the expression of a quantitative trait.
- A QTL is a chromosomal region (or locus on the chromosome) containing alleles that differently affect the expression of a quantitative trait.
MAIN FEATUIRES OF QTL
- A QTL is a polymorphic site (locus) on the chromosome which controls expression of quantitative traits.
- The presence of QTL is inferred from genetic mapping. Total variation is partitioned into components linked to a number of discrete, mapped chromosome regions.
- A set of genes collectively control a quantitative trait.
- Ten or more QTLs can influenced a single trait.
- The mapping population is evaluated for the target trait in replicated trials conducted, preferably, over locations and years; this is known as phenotyping.
- The two parents of the mapping population are tested with a large number of markers covering the entire genome, and polymorphic markers are identied. It is important that the polymorphic markers should cover the whole genome at a sufcient density.
- All the individuals/lines of the mapping population are now analyzed using these polymorphic markers; this is termed as genotyping.
- The marker genotype data are used to construct a framework linkage map for the population, which depicts the order of the markers and the genetic distances between marker pairs in terms of centimorgans (cM).
- Finally, the marker genotype and the trait phenotype data are analyzed to detect association between marker genotypes and the trait phenotype. In simple terms, the plants are divided into separate groups on the basis of their marker genotype.
- (^) For each of these groups, mean and variance for the trait phenotype are estimated and used for comparison between the groups. In case the genotype groups for a marker differ signicantly for the trait of interest, it is concluded that the concerned marker is associated with the trait, i.e., the marker is most likely linked to a QTL controlling the trait phenotype.
ADVANTAGES OF QTL LINKAGE MAPPING
- Linkage mapping detects and maps each of the QTLs governing the target trait within relatively short condence intervals.
- QTL mapping identies markers anking the QTL regions; these markers can be used for MAS, including recombinant selection, for the concerned QTL.
- It provides an estimate of the QTL effect size on the trait phenotype. Thus, breeders get a rough idea of the usefulness of incorporating a given QTL in their breeding programs.
- Joint QTL analysis of multiple correlated traits can distinguish between close linkage and pleiotropy as the basis of the trait correlations. This
would indicate whether negative trait correlations may be broken or not in breeding programs.
- High-resolution QTL mapping can locate a QTL in a very small (<1 cM) condence interval, which greatly facilitates cloning of the genes located in the QTL region.
- Selective DNA pooling can be combined with transcriptome analysis to identify a limited number of candidate genes located in the genomic region harboring the QTL for the target trait.
- Appropriate experimental designs and QTL analysis methods are available for the detection and estimation of QTL QTL and QTL environment interactions.
- QTL analysis based on bi-parental populations presents some unique advantages over association mapping. For example, association mapping cannot identify and map rare functional alleles of genes/QTLs, but this can be easily achieved by linkage mapping. This is because the rare allele will be present in one of the two lines crossed to generate the concerned mapping population. This will ensure the frequency of rare allele to be 50 % in the mapping population, which will facilitate its mapping by increasing QTL detection power.
LIMITATIONS OF QTL MAPPING
- Since the mapping population is initiated by crossing two parents selected for the purpose, genetic variation in the quantitative traits of the population is limited to the differences between the two parents.
- The effects of only two alleles of the genes/ QTLs can be studied in most mapping population. In real situation, many, if not all, genes/QTLs may have more than two alleles each. However, multiple alleles of genes/ QTLs can be analyzed in interconnected populations like MAGIC and NAM.
- QTL mapping has low-resolution power because only few meiotic divisions occur during the period between the hybridization of the parents and the use of the resulting populations for mapping. As a result, a QTL position may span from few to tens of centi morgans (typically, 5–20 cM). This region often corresponds to several mega bases (on an average, 1.2–4.8 Mb), which may typically contain hundreds of genes.
- The mapping approaches are basically of two types, viz., family mapping and population mapping. In family mapping, populations constructed by crossing generally two homozygous lines are used for linkage mapping of markers and genes/QTLs. Thus, these populations comprise closely related families derived from common parents using a specic mating scheme (Myles et al. 2009).
- (^) In population mapping, generally referred to as Association Mapping (AM), the mapping population consists of a diverse set of individuals/ drawn from natural populations, e.g., random mating populations of wild species; wild relatives of crops like wheat, barley, maize, rice, etc.; as well as breeding populations. These populations can also be regarded as groups of many families of rather small (one individual per family in extreme cases) size.
- In addition, AM can use populations designed for family mapping. In such cases, it exploits the linkage disequilibrium (LD) resulting from hybridization between the lines used as parents of these populations as well as the historical LD present between them. AM uses LD between markers and the concerned genes/QTLs for identifying marker-trait associations.
- AM is also known as Association Analysis , LD mapping, and structured association mapping. The AM approach was originally developed by human geneticists for measuring genetic proximity of loci to each other and to map oligogenes. Subsequently, AM approach was extended to mapping of QTLs and still later to mapping in plants, including crop plants and perennial tree species.
- The AM approach is expected to identify markers located much closer to the genes of interest than is feasible with conventional linkage mapping. This is expected because LD analysis utilizes all the recombination events that would have occurred between the gene and the marker in the past in the population being used for AM.
- In contrast, linkage mapping uses only those recombination events that occur between the gene and the marker after the two selected parents are crossed. The AM approach offers some other advantage over linkage mapping, but it suffers from some limitations as well.
THE GENERAL PROCEDURE FOR ASSOCIATION MAPPING
The general procedure for genome-wide association mapping in plants is briey outlined here based on Abdurakhmonov and Abdukarimov (2008). But the exact details of the procedure will depend on the chosen study design and whether or not the population shows structure.
- Association mapping population. A large random sample from a natural population, a germplasm core collection, a collection of breeding lines including cultivars, or a population derived from multiparent crosses of the concerned species is used for AM. The sample should include as much genetic diversity present in the population/germplasm collection as is practically feasible. This sample constitutes the association mapping population, association mapping panel, or, simply, association panel.
- Phenotyping. The selected sample is evaluated for the various traits of interest; this is called phenotyping. Phenotyping should be preferably based on replicated trials conducted over locations and years to minimize environmental effects. The trials should be conducted using a suitable experimental design like randomized block design, augmented design, nested design, etc. A precise and reliable phenotyping is critical to any mapping effort.
- Genotyping for population structure analysis. The sample is then genotyped, i.e., tested with a set of molecular markers (preferably SSR markers) that are evenly distributed over the entire genome of the species. These markers should be unlinked, i.e., should be located more than 40 cM apart in the genome.
- Structure and kinship analysis. The marker data are analyzed to detect and estimate the population structure of the sample using the increased by increasing the sample size, but this will not be practical beyond a point. In LD-based association mapping, this is achieved by including recombinations that have occurred in the past generations in the population, from which the sample for AM study is drawn. Therefore, the older is a mutation, the smaller will be the region of high LD around the mutant allele a (Based on Ardlie et al. 2002) STRUCTURE program and the extent of kinship among the individuals of the sample using the TASSEL program.
- Genotyping for LD analysis. The sample is also genotyped with a sufciently large number of molecular markers that cover the entire genome as densely as is feasible (Table 8.1) so that LD between markers and the loci of interest can be detected. The pattern of LD in the concerned genomic