Applying Multivariate Analysis to Assess Phenotypic Diversity in Rice Cultivars Grown at Fogera and Pawe, Northwest Ethiopia

Field experiment was conducted to assess genetic variation and traits relationship in rice cultivars from Ethiopia based on quantitative traits. Sixty rice cultivars comprising improved varieties and collected accessions were evaluated at Fogera and Pawe research stations. The experiment was arranged using alpha lattice design of three replications. Correlation studies revealed panicle length, plant height, filled grains per panicle, spikiletes fertility rate, biomass yield, harvest index and thousand seeds weight having a significant positive association with grain yield both at Pawe and Fogera. However, days to heading and days to maturity exhibited a significant negative association with grain yield at Fogera but positive association at Pawe. Principal component analysis showed that the first four components explained a cumulative variance of 81.07, 74.95%, and 80.63% at Fogera, Pawe and combined data, respectively. Days to heading, days to maturity, plant height, panicle length, filled grains per panicle and grain yield were the most discriminate traits for explaining the total variation. Cluster analysis classified sixty rice cultivars into four different groups. The first and second clusters comprised the largest number of cultivars. About 48% of improved cultivars containing the desired traits adapted predominantly to upland production system belonged to Cluster I, having the lowest intra-cluster distance. Clusters II and IV included both upland and lowland cultivars and the highest intra-cluster distance in both confirmed their heterogeneous composition while Cluster III contained exclusively of upland cultivars comprising improved and landrace types. The highest inter-cluster distance was observed between clusters II and IV, followed by clusters III and IV but the lowest between clusters I and II. Thus, future crossing program between cultivars of different cluster groups could possibly result in better heterosis in our rice breeding program.


Introduction
Rice (Oryza sativa L.) is the most important cereal crop cultivated and consumed globally to meet the daily calorie needs of the increasing world population mainly in Asia and Sub-Saharan Africa (Anyaoha et al. 2018). This important cereal is cultivated and consumed across Africa, where domestic production is unable to meet local demands for rice resulting in huge annual imports. In Ethiopia, since its introduction, the crop is increasingly expanding and its consumption has sharply increased. Rainfed-upland and lowland are predominant ecosystems in Ethiopia covering about 85,288.87 ha of land and cultivated by 230,496 rice farmers and contributing about 268, 223 tones of paddy rice to the annual cereal production (CSA 2020). Considering the potential, which is about 5million ha highly suitable for rice cultivation, the national share of rice crop in terms of area coverage is yet insignificant.
Despite the huge potential in the country, the national average yield of the crop is about 3.15 t/ha. In fact, it is the second high yielding crop, next to maize among major cereal crops in the country. This national average is significantly lower compared to major rice producing countries (>5 t/ha) as well as that of global average of 4.66 t/ha (FAOSTAT 2020). Lower productivity in rice could be attributed to low yielding varieties and improper agronomic practices, among other things. Thus, improvement of grain yield and yield attributing traits of rice is necessary for different production ecosystems in Ethiopia. The development of varieties is a continuous process and the success of the plant breeding programme which aimed at identifying high yielding, better quality, fertilizer responsive, disease and insect resistance varieties depends upon the selection of suitable parents to be utilized in breeding (Pathak et al. 2020).
The effectiveness of selection depends primarily upon the magnitude of genetic variability in the breeding materials available (Qamar et al. 2012;Maji and Shaibu 2012). Although Ethiopia has some locally known rice cultivars and more than 30 improved rice varieties recommended for various production systems, they are not yet effectively used as parents in rice breeding programs. Thus, precise evaluation, classification and identification among cultivars in Ethiopia are required for further breeding program. Such evidence is particularly useful to assess the potential of heterotic combinations before attempt crossing and hence saving time and resources (Hallauer and Miranda 1988). Phenotypic diversity and yield performance of some selected rice accessions of Ethiopia have been reported (Bitew et al. 2016(Bitew et al. , 2018Girma et al. 2018). However, there has been no report comprising local and improved cultivars that are adapted to different localities in Ethiopia. To investigate the extent of phenotypic diversity and traits relationship among improved cultivars and collected accessions, it is important to evaluate these materials together under the same experimental procedure. In this study, we focused on some selected quantitative traits under two environments and analyze the pattern of diversity and relationship among different traits.
Different statistical methods can be applied to assess diversity and traits relationships in plant materials. Multivariate statistical tools are extensively used in summarizing and describing the inherent variations among crop genotypes and helps plant breeders to formulate their selection approaches for improving traits of their interest (Riaz et al. 2018). Principal component analysis (PCA), as a multivariate statistical technique, analyses data consisting of several inter-correlated quantitative dependent variables as observations and used to explore the variation among genotypes (Riaz et al. 2018;Mahendran et al. 2015). PCA helps to identify the traits with the highest variability, while correlation analysis reveals the strength of relationships of the identified traits with yield and other traits, which makes the development of new varieties efficient and effective (Chakravorty et al. 2013). Moreover, the cluster analysis using Euclidian distance provides a useful statistical tool for measuring the genetic diversity in rice gemplasms with respect to traits considered and thus hybridization programme involving genetically diverse parents belonging to different clusters would provide the opportunity of maximum hetrosis for different traits of interest to improve and high variability in subsequent generations (Tejaswini et al. 2016). The present study, therefore, aims to determine the relationship among rice cultivars and to estimate the relative contribution of different traits and thereby guide the selection of promising parents for use in rice variety improvement programme.

Plant materials
This investigation was carried out using a total of sixty rice cultivars comprising twenty-nine released varieties, four candidate varieties and twenty-seven collected accessions. Rice accessions were collected from five different districts of four regional states in Ethiopia; Fogera in Amhara region, Pawe and Assosa in Benshangul Gumize region, Guraferda in South Nations Nationalities and Peoples Region (SNNPR), and Abobo in Gambella region ( Figure 1).

Figure 1: Rice accessions collection sites in Ethiopia
These accessions were collected with kind cooperation of rice researchers at Fogera, Pawe, Assosa, Bonga, and Gambella research centers. Improved rice varieties included the study were released by different research centers in Ethiopia. NERICA-10 was not approved for release in Ethiopia but released in other African countries. They are from diverse production systems; rain fed upland and lowland as well as intermittent irrigated types. Improved varieties included NERICA varieties that are adapted to rain fed upland and irrigated conditions.

Description of experimental sites
The field experiment was carried out at two rice research stations, Fogera and Pawe ( Figure 2). Fogera is found in Amhara Region and Pawe in Benshangul Gumize Region, both in Northwest Ethiopia. Fogera station is positioned at 11 0 58'N, and 37 0 41'E at elevation of 1810 meter above sea level. Predominantly lowland rain fed rice is grown. It is characterized by high elevation and low temperature. Fogera areas frequently face terminal moisture stress. The soil is vertisol with slightly acidic pH of 5.90 (Tilahun et al. 2013). Rainfall of the area is unimodel, mainly from June to October, amounting to 1234.5mm. Important rain fall months at Fogera ranges from mid-June to early October which covers more than 90% of annual rainfall. Pawe station is located at 11 0 19'N and 34 0 24'E, at elevation of 1120 meter above sea level. Upland rain fed is the major production domain in Pawe areas. This station has relatively high temperature and long rainy season. Rainfall of the areas is unimodel, mainly from April to October amounting to 1570.3mm. Important rain fall months at Pawe ranges from late April to September which covers more than 90% of annual rainfall.

Experimental design
At each experimental site, the trial was laid down in a 15 x 4 alpha lattice design of three replications, each replication with four blocks. Each cultivar was sown in six rows of plot size 7.5 m 2 (5m x 1.5m). Spacing between replications, blocks, plots, and rows was 1.5m, 1m, 50cm, and 25cm, respectively. Fertilizers (Urea and DAP) were applied as per to the local recommendation. DAP was applied all at planting and Urea in three splits (planting, tillering and panicle initiative) Weeding and other field management operations were effective accordingly uniformly across experimental plots as required.

Data collection and statistical analysis
Data were collected from the middle four rows excluding two border rows. Ten quantitative traits were assessed and data recorded for each cultivar per block per replication: Days to 50% heading (DTH, days), days to 85% maturity (DTM, days), plant height (PH, cm), panicle length (PL, cm), filled grains per panicle (FGP, count), thousand seed weight (TSW, g), grain yield (Gy, t/ha), biomass yield (By, t/ha), and harvest index (HI, %), following the guidelines developed by IRRI (1996). Except for DTH, DTM, Gy, By, HI, and TSW, for the other traits data were collected from ten randomly selected plants per plot. Grain yield and biomass yield per plot was sampled from the middle four rows by cutting all plants from the bottom and sundried within bags. Weight of dried sample for each plot was measured to obtain above ground biomass and converted in tonnes per hectare (t/ha). For grain yield, each sample was threshed and weighed and adjusted at 14% moisture content in t/ha. Harvest index (HI, %) was calculated a ratio of grain yield to total above ground biomass yield.
Statistical analysis was carried out for each site and for the combined data. All collected data were subjected to analysis of variance using SAS software version 9.0 (SAS 2002). Phenotypic correlations coefficients were computed to elucidate relationships between quantitative traits by using PROC CORR procedure in SAS. Principal component analysis (PCA) was performed using GenStat software version 16 (GenStat 2013). PCA was employed to identify the different quantitative traits that contributed the most to total variance in the measured variables. In PCA, the raw data for each variable were standardized and the distance matrix using the correlation matrix was computed and the proportion of variance criterion was used to identify the different principal components that contributed to the total variance in the dataset. Ward's minimum variance method was applied for cluster analysis of rice cultivars (Ward 1963).

Correlation analysis
The degree of correlation among different traits is very important to deal with complex traits such as grain yield (Anyaoha et al. 2018). It is often reported that yield in rice is determined by indirect traits like plant height, growth period, tillering ability, panicle length, seed length, seed setting rate, and grains per panicle as well as direct traits like panicle number per unit area and/or per plant, filled grains per panicle and 1000 grain-weight (Sakamoto and Matsuoka 2008;Huang et al. 2013;Anyaoha et al. 2018;Li et al. 2019). Correlation results of 10 quantitative agro-morphological traits are presented in Table 1 and Table 2. The correlation between days to heading and days to maturity was found to be significantly positive at Fogera (r= 0.73), Pawe (r= 0.71) and in combined data (r= 0.69), but both traits tended to show significant negative correlation with all other traits considered at Fogera and in combined data. On the other hand, at Pawe, days to heading and days to maturity revealed positive significant association with filled grains per panicle (r = 0.37, 0.37) and grain yield (r= 0.27, 0.29), respectively while they showed significant negative association with thousand seed weight (r = -0.23, -0.24), but weekly correlated with panicle length, plant height and biomass yield ( Table 1). The positive significant association of days to heading and maturity to grain yield and yield attributed traits at Pawe suggested that relatively late and high yielding varieties can be selected for Pawe due to the presence of long rainy season with optimum temperature. Earliness is not farmers' top priority at Pawe (personal informal interview). Most farmers preferred high yielding and medium to late type varieties. Farmers explained that early type varieties are subjected to birds attack and get sprout in the long rainy season and hence not preferred. 17 -0.27* *, **, ***: Significant at 0.05, 0.01, and 0.001 levels, respectively. DTH: days to heading, DTM: days to maturity, PL: panicle length (cm), PH: plant height (cm), FGP: filled grains per panicle (no.), Gy: grain yield (t/ha), TSW: 1000 seed weight (g), By: biomass yield (t/ha), HI: harvest index (%).
However, at Fogera early maturing and high yielding rice varieties are preferred by all farmers due to recurrent terminal moisture stress. In addition, early type varieties in Fogera areas create favorable condition for double cropping using residual moisture, a common practice after rice. Traits such as panicle length (r=0.41), pant height (r= 0.45), filled grains per panicle (r= 0.25), fertility rate (r=0.137) and harvest index (r= 0.66) showed significant positive correlations with the grain yield at Fogera with similar trend at Pawe and in combined data (Tables 1, 2). Such positive significant correlations between grain yield and other important traits panicle length, plant height, filled grains per panicle, fertility rate and biomass yield suggested that selection in favor of these traits may lead to positive indirect selection for high grain yield (Ahmadikhah 2010). Thousand seed weight was poorly correlated with other traits except for plant height (r= 0.25) and grain yield (r= 0.16) at Fogera and for grain yield (r= 0.22) at Pawe.
The trait, biomass yield was found to have a significant positive correlation with panicle length, plant height, filled grains per panicle, fertility rate and grain yield both at Fogera and in combined data (Tables 1, 2). At Pawe, biomass yield was weekly but positively correlated with days to heading, days to maturity and fertility rate while it exhibited significant positive association with panicle length (r= 0.20), plant height (r= 0.48), grain yield (r= 0.59) and thousand seed weight (r= 0.28) ( Table 1). Harvest index exhibited significant positive correlation with panicle length (r= 0.36), plant height (r= 0.29), filled grains per panicle (r= 0.23), fertility rate (r= 0.34), and Journal of Biology, Agriculture and Healthcare www.iiste.org ISSN 2224-3208 (Paper) ISSN 2225-093X (Online) Vol.11, No.15, 2021 grain yield (r= 0.66) but negatively correlated with biomass yield (r= -0.27) at Fogera but at Pawe and in combined data it was poorly correlated with most other quantitative traits except for biomass yield which had significant negative association (r= -0.31, -0.25) (Tables 1, 2). Similar result was reported by Lakshmi et al. (2019) who evaluated 31 rice accessions and found that grain yield per plant had significant positive correlation with days to flowering (0.479) and spikelets per panicle (0.497), while a negative association was found with 1000 seed weight (-0.294). They further explained that 1000 seed weight was found to have a significant negative association with days to flowering and a negative relation with the remaining traits considered. Konate et al. (2016) also reported that grain yield/ plant showed a positive significant association with day to 50% flowering (r=0.3997), plant height (r=0.2794), panicle weight (r=0.112) and biomass (r=0.9291) and they also found that plant height had positive correlation with all the characters except with number of fertile spikelets and in turn number of fertile spikelets had negative correlation with almost all the characters except 1000 grains weight. .08 -0.25* *, **, *** significant at 0.05, 0.01, and 0.001 levels, respectively. DTH: days to heading, DTM: days to maturity, PH: plant height(cm), PL: panicle length(cm), FGP: filled grains per panicle, FR: fertility rate(%), Gy: grain yield(t/ha), By: biomass yield(t/ha), TSW: thousand seed weight(g), and HI: harvest index(%).

Principal component analysis
Principal Component Analysis (PCA) was computed based on 10 quantitative agro-morphological traits of 60 rice cultivars at each site and in the combined data (Table 3). Principal components were computed from the correlation matrix and genotypic scores obtained for the first component and succeeding components with Eigen values greater than unity (Jeger et al. 1983). The PCA measures the contribution of each component or independent impact of a particular trait to the total variance observed in a given population in relation to the traits of interest to the breeder (Anyaoha et al. 2018). In the present study, only the first four principal components (PC1 to PC4) had Eigen values greater than one in all the three data sets such as Fogera, Pawe and in combined data set. The first four principle components (PCs) were responsible to explain a cumulative variation of 81.07%, 74.95% and 80.63% at Fogera, Pawe and in combined data, respectively (Table 3). In previous study, Chakravorty et al. (2013) found that the first six principal components in the PCA analysis with Eigen values greater than one contributed 75.9% of the variability among 51 rice landraces evaluated for different agro-morphological traits in which traits such as leaf length (0.625), plant height (0.751), culm diameter (0.780), culm number (0.802) and panicle length (0.697) accounted for most of the observed variability in PC1 (23.47%). On the other hand, Lakshmi et al. (2019) reported that only the first two principal components with Eigen values of 2.643 and 1.189, respectively, together explained 54.752% of the total variance for all the characters in 31 rice accessions and characters such as yield (0.809), days to 50% flowering (0.764) and number of spikelets per panicle (0.669) had higher contributions in PC1 (37.763%), while the PC2 accounted for 16.988% of the total variation with panicle length (0.541) giving the highest contribution.
Journal of Biology, Agriculture and Healthcare www.iiste.org ISSN 2224-3208 (Paper) ISSN 2225-093X (Online) Vol.11, No.15, 2021 At Pawe, PC1 explained 31.30% of the total variance with Eigen values of 3.44 and only four quantitative traits such as days to heading (0.347), days to maturity (0.349), filled grains per panicle (0.476), and grain yield (0.395) exhibited higher contribution to the total variation while the remaining traits tended to show lower contribution. PC2 with Eigen value of 1.96 accounted for 17.83% of the variation and three traits; plant height (0.510), panicle length (0.261) and biomass yield (0.570) contributed positively to the overall variation while fertility rate (-0.273) and harvest index (-0.476) had a negative contribution. According to PC3, with Eigen value of 1.73, traits such as days to heading (-0.512), days to maturity (-0.457), panicle length (0.453), and harvest index (0.275) had significantly higher contributions to the total phenotypic variability of 15.73%, while the PC4 accounted for 10.09% of the total variation with panicle length (-0.413), grain yield (0.352) and particularly that of thousand seed weight (0.787) had the highest contribution to the overall variance as observed among 60 rice cultivars (Table 3).
In combined data, the first principal component (PC1) with Eigen value of 3.186 explained 28.97% of the total variance, while PC2 with Eigen value of 2.579 accounted for 23.45% of total variability. Four traits that contributed the highest to the overall variation in PC1included days to heading (-0.462), days to maturity (-0.470), fertility rate (0.413) and grain yield (0.301), while plant height (0.282), panicle length (0.460), filled grains per panicle (0.485), grain yield (0.291) and biomass yield (0.299) contributed the highest in PC2. PC3 had Eigen value of 1.848 and contributed 16.80% to the overall variability. Similarly, PC4 had contributed 11.41% of the total variability with Eigen value of 1.255. While plant height (0.401), filled grains per panicle (-0.335), biomass (0.513) and harvest index (0.451) contributed the highest to the variation in PC3, the other four quantitative traits such as days to heading (0.257), grain yield (0.578), thousand seed weight (-0.374) and harvest index (0.451) contributed largely to the variation reflected in PC4 (Table 3).

Cluster analysis
An attempt was made using Ward's minimum variance method of hierarchical clustering based on eleven quantitative traits combined across two sites (Fogera and Pawe) to visualize the patterns of grouping of 60 rice cultivars. Accordingly, the sixty cultivars were grouped into four different clusters as illustrated in Figure 3. Cluster I was the largest cluster which included 22 (36.7%) cultivars, followed by Cluster II which comprised 20 (33.3%) cultivars. Clusters III and IV included nine (15%) cultivars each dominated by collected accessions (Figure 3 and Table 4). About 77% of cultivars in cluster I comprised improved varieties including Nericas. X-Jigna and other four accessions also belonged to this cluster. Days to heading, days to maturity, plant height, filled grains per panicle, grain yield and biomass yield contributed highly to the genetic diversity which should be given importance during hybridization and selection in segregating populations by choosing plants with early duration (Fogera areas), medium to late duration (Pawe areas) and more grain yield per plant. Previous study reported by Lakshmi et al. (2019) classified 31 rice accessions based on their Euclidean distances into six clusters with the first cluster included the maximum number of accessions and they explained that panicle length 15.0 Fogera 2, Candidate-2, SGU01, GAM01, GAM03, BGA01, BGP-04, BGP-06, BGP-14 On the other hand, the largest intra-cluster distance observed in clusters II and IV, regardless of the number of cultivars, could be attributed to the heterogeneous nature of the cultivars within each cluster. The inter-cluster distance was found to be maximum between clusters II and IV (45.79), followed by clusters III and IV (35.70) and between clusters I and III (35.62). Minimum inter-cluster distance was obtained between cluster I and cluster II (25.33) while it was intermediate distance between II and III (30.95), and between clusters I and IV (30.92). In all the cases, the inter-cluster distances observed was larger than that of intra-cluster distances (Table 6). Therefore, crossing between cultivars in clusters II and IV could result higher hetrosis, followed by between clusters III and IV for particular traits of interest targeted. Similar result was reported by Pathak et al. (2020) indicating the highest inter-cluster distance between clusters III and VI and hence, hybridization between the genotypes of these clusters would yield desirable segregates with the accumulation of favorable genes in the segregating generations

Conclusion
The current results obtained from sixty rice cultivars grown at two sites showed that grain yield was significantly and positively correlated with panicle length, plant height, filled grains per panicle, spikiletes fertility rate, biomass yield, harvest index and thousand seeds weight at Pawe and Fogera. Principal component analysis identified the traits days to heading, days to maturity, panicle length, filled grains per panicle, grain yield, biomass yield and harvest index contributing towards the maximum divergence among sixty rice cultivars evaluated. Cluster analysis also classified rice cultivars into four different groups attributed to different traits.
Journal of Biology, Agriculture and Healthcare www.iiste.org ISSN 2224-3208 (Paper) ISSN 2225-093X (Online) Vol.11, No.15, 2021 Cultivars included in cluster I were early maturing and with good panicle length, large number of filled grains per panicle, good fertility rate, high thousand seed weight and with high grain yield as well as largest harvest index where some of the cultivars in clusters III and IV showed low performance of these important traits. Selection of contrasting parents for hybridization program should be on the basis of the magnitude of genetic distance, contribution of different traits towards the total variance, patterns of clustering and magnitude of cluster means for different traits performance. Therefore crosses between selected cultivars of clusters I and III, I and IV, II and III, II and IV, III and IV could result in high heterosis as well as wide range of genetic variation in the subsequent generations of the crossing. Overall, results from principal component and clustering analysis of this study demonstrated presence of genetic variation among Ethiopian rice cultivars for quantitative traits to be exploited differently and /or concurrently for the improvement of rice varieties at the two contrasting environments.