Evaluating the Impact of Seasonal Variability on Groundwater Quality using Multivariate Analysis of Variance

Groundwater which constitutes high percent of the global fresh water is one of the most important sources of drinking water. When polluted, groundwater has deleterious effects on its users. Consequently, the quality and pollution of groundwater is a health concern in the world. The focus of the study is to evaluate the impact of seasonal variation on the quality of groundwater within the study areaHundred (100) boreholes spread to cover the study area were sampled. The water samples were analyzed using standard procedures for assessing drinking water qualities in order to determine the condition of groundwater quality within the study area. Statistical analysis of the groundwater quality data was done using weighted average index method to determine the water quality index and multivariate analysis of variance (MANOVA) to assess the impact of seasonal variation. Result of multivariate analysis of variance (MANOVA) which was employed to assess the presence of seasonal variability revealed that the calculated partial Eta squared of the Pillai’s trace statistics was 1.00 which indicates 100% variability among the dependent variables occasioned by seasonal change.


Introduction
Water and its quality are a very serious and vital issue for mankind due to its link with human health and welfare. It is one of the most precious and replenishable natural resources. There is abundance of it on the earth surface but the quality as well as the quantity to serve its intended purpose is where the problem lies. The demand for water has increased over the years and this has led to water scarcity in many parts of the world and the situation is aggravated by the problem of water pollution or contamination (Sundara et al., 2010). The application of different multivariate statistical techniques, such as cluster analysis (CA), principal component analysis (PCA) and factor analysis (FA) help identify important components or factors accounting for most of the variances of a system. They are designed to reduce the number of variables to a small number of indices while attempting to preserve the relationships present in the original data (Simeonov, et al., 2003;Simeonov, et al., 2004). (Iyer et al., 2003) constructed a statistical, model which is based on the PCA for coastal water quality data from the Cochin coast in south west India to explain the relationships between the various physicochemical variables that have been monitored and also to evaluate the impact of environmental fluctuations on the coastal water quality. MANOVA is the multivariate analogue to Hotelling's T 2 . The purpose of MANOVA is to test whether the vectors of means for the two or more groups are sampled from the same sampling distribution. Just as Hotelling's T 2 will provide a measure of the likelihood of picking two random vectors of means out of the same hat, MANOVA gives a measure of the overall likelihood of picking two or more random vectors of means out of the same hat. There are two major situations in which MANOVA is used.
i. The first is when there are several correlated dependent variables, and the researcher desires a single, overall statistical test on this set of variables instead of performing multiple individual tests. ii.
The second, and in some cases, the more important purpose is to explore how independent variables influence some patterning of response on the dependent variables. Here, one literally uses an analogue of contrast codes on the dependent variables to test hypotheses about how the independent variables differentially predict the dependent variables. MANOVA also has the same problems of multiple post hoc comparisons as ANOVA. An ANOVA gives one overall test of the equality of means for several groups for a single variable. The ANOVA will not tell you which groups differ from which other groups. (Of course, with the judicious use of a priori contrast coding, one can overcome this problem.) The MANOVA gives one overall test of the equality of mean vectors for several groups. But it cannot tell you which groups differ from which other groups on their mean vectors. (As with ANOVA, it is also possible to overcome this problem through the use of a priori contrast coding.) In addition, MANOVA will not tell you which variables are responsible for the differences in mean vectors. Again, it is possible to overcome this with proper contrast coding for the dependent variables Shrestha and Kazama, 2007).

Description of study area
The study area for this research is the Niger Delta Basin Development Authority. This study covers the original area of operation of the River Basin Authority, which is Rivers and Bayelsa State alone. The geographical coordinates of Rivers and Bayelsa states are 4.8581˚N and 6.9209˚E and 4.25˚S and 5.37˚W and 6.75˚E respectively (Nwankwoala et al., 2011). The Niger Delta Basin is situated in the south-south geo-political zone of Nigeria. It is located in the rain forest region with relative humidity above 80% having an annual temperature range of 25⁰C to 31⁰C and annual rainfall of 4700mm on the coast to about 2400mm. The basin is characterized by two alternating climatic conditions of a long period of rainy season spanning from March to November, followed by a dry season spreading from November to March (Nwankwoala, et al., 2011). Figures 1 and 2 shows the Google earth and the study area maps respectively.

Sampling location and sample collection
The boundary of built up area (land use) within the study area was digitized and gridded at 2km interval to determine the sampling points and ensure uniform coverage. Water samples was collected systematically so as to have a general overview of the water quality condition within the study area. For accurate geo-referencing of the selected boreholes, Garmin hand held GPS receiver was employed to determine the geographical coordinates of each borehole. A section of the boreholes sampled including their location and geographical coordinates is presented in Table 1. One hundred (100) boreholes were systematically sampled with reference to location points at each season: Wet season (July to October 2018) and dry season (November to December 2018) in order to determine the physico-chemical and biological parameters of the groundwater samples. At every point of collection, the air tight, clean and dried plastic containers were rinsed two to three times with the borehole water to be sampled before collection. The samples were labelled properly and stored in air tight, clean and dried plastic containers before been transported to Water Resources and Environmental laboratory in the Department of Civil Engineering, University of Benin were the analysis were conducted in line with standard procedures and guideline recommended by World Health Organization (WHO). The water samples were analyzed in triplicates to obtain the mean value and standard deviation of each water quality test parameters. For the analysis of biochemical oxygen demand (BOD), the black bottles containing the water samples remained tightly closed prior to analysis in order to prevent photosynthetic and oxygen generation. In-situ parameters, namely; dissolved oxygen (DO), temperature, pH electrical conductivity (EC) and total dissolved solids (TDS) were determined in the field immediately after sample collection to avoid false measurement values (APHA, 2005).  235669  286341  237214  236203  235766  262207  280557  279389  274121  286716  283293  286238  289003  280451  278741  283012  279849  287302  307314  278023  279801  279684  325714  334047  289245  323819  319044  289599  339050  316831  286421   561471  525361  537656  528043  556600  556600  540433  524264  534103  530030  532098  533642  536010  527163  534162  526634  529397  536068  530118  531219  519241  530112  527029  525973  496236  508598  538032  495804  517018  538240  510170 519746 533116

Water Quality Analysis
A total of thirty-three (33) physico-chemical parameters and two (2) microbiological parameters were analyzed for each sampled domestic borehole to provide an insight into the overall quality of water within the study area. The physico-chemical parameters include: temperature, odour, colour/clarity, total hydrocarbon content (THC),  The multi portable meter probe was submerged in the water at 4cm and pH mode selected. Water sample was stirred gently and pH value displayed on the meter was allowed to adjust and stabilize before recording. Other measurements buttons were pressed successively and values recorded. The procedure was repeated three (3) times and the mean value calculated for each parameter. DO meter was also inserted into the water sample at about 10cm depth using the oxygen probe handle.
UNICAM 969 Atomic Absorption Spectrometer (AAS) shown in Figure 4 was used to determine the concentration of heavy metals such as; Iron (Fe), Manganese (Mn), Zinc (Zn), Copper (Cu), Chromium (Cr), Cadmium (Cd), Nickel (Ni), Lead (Pb), and Vanadium (V) while UV visible spectrophotometer (Thermo Scientific Spectronic 20D+ ) presented in Figure 5 was used to analyzed the level of phosphorous (P), Nitrate (NO3), Nitrite (NO2) and Sulphate (SO4). Other apparatus utilized included 250ml separating glass funnels, Cuvette, 10ml and 50ml pipette, 250ml conical flask, 50ml burette, 25ml and 50ml volumetric flask, glass beads, refrigerator, oven and whatman filter paper. Preparation of reagents and procedures employed in the laboratory for the analysis and determination of all water quality parameters followed the standard methods recommended by relevant authorities such as World Health Organization (WHO).

Analysis of seasonal variability using MANOVA
To study the seasonal variability of the groundwater quality parameters, multivariate analysis of variance (MANOVA) was employed. The following steps were used to justify the presence of seasonal variability in the water quality parameters 2.4.1 Assessing the suitability of MANOVA based on multivariate outliers Multivariate alliance is usually calculated through a measure known as the Mahalanobis constant. If the maximum calculated value of the Mahalanobis constant is less than the critical value, then the assumption of multivariate outliers has not been violated. Therefore, if multivariate outliers have not been violated, then we can investigate the concept of seasonal variability using multivariate analysis of variance (MANOVA) otherwise, we must think of another statistical concept to track the presence of seasonal variability (Alkarkhi, 2008;Shrestha and Kazama, 2007). The critical values of the Mahalanobis constant is presented in Table 2

Descriptive Statistics
Descriptive statistics was employed to check the difference in the mean and standard deviation of the sampling group. The mathematical equations for computing the mean and standard deviation are presented as follows.

Box Test or Covariance Matrix
In multivariate analysis of variance, we set out to test the null hypothesis that observed covariance matrix of all the dependent variables (water quality parameters) are equal across group (season) that is there is no seasonal variation in the water quality parameters. If the calculated p-value is less than 0.05 (p < 0.05) we reject the null hypothesis and conclude that the assumption of equal covariance matrices across group has not been satisfied; an indication that seasonal variability exists among the group (Alkarkhi, 2008; Shrestha and Kazama, 2007).

The Multivariate Test
Different statistical method for computing the F-value for multivariate analysis of variance exits in literature. One of them is the Roy's largest root which is probably the most acceptable and also the most susceptible to deviation in the covariance matrix. The next is the Pillai's Trace followed by Wilk's Lambda. Pillai's Trace is the least sensitive to the violation of the assumption of covariance matrix. If the p-value of the Pillai's Trace is less than 0.05 then we reject the null hypothesis that the water quality parameters are the same for the two groups and conclude that seasonal variability actually exists (Alkarkhi, 2008; Shrestha and Kazama, 2007).

Levene's Test of Equality of Error Variance
If seasonal variability exists, then the calculated error variance for all the dependent variables for the different sampling location must not be the same. To test the null hypothesis that the error variance of the dependent variables is equal across groups, Levene's test of equality of error variance was computed. Since calculated pvalue for most of the dependent variables (groundwater quality parameters) is greater than 0.05, then, it was concluded that seasonal variability exists among the group (

Analysis of seasonal variability using MANOVA
Variation in season affects the quality of groundwater. For shallow wells which are highly susceptible to infiltration of anthropogenic impurities, seasonal variation is pivotal to the quality of the water. In the Niger Delta region for example, activities of oil exploration and exploitation can affect the quality of groundwater owing to the porous nature of the soil which allows for speedy infiltration of impurities. To study the effect of seasonal variation, twenty one (21) water quality parameters, namely; pH, Electrical conductivity (EC), Salinity, Total Dissolve Solids (TDS), Dissolved Oxygen (DO), Bicarbonate (HCO3), Sodium (Na), Potassium(K), Calcium (Ca), Magnesium (Mg), Chloride (Cl -), Phosphate (PO4), Nitrate (NO3), Sulphate (SO4), Iron (Fe), Zinc (Zn), Copper (Cu), Turbidity, Total suspended solid (TSS), Temperature and Alkalinity were monitored using 100 boreholes for wet and dry season. To apply multivariate analysis of variance (MANOVA), in the study of seasonal variability, the following assumptions and conditions were tested.

Testing the normality assumption of the dependent variables
For seasonal variability, it is expected that the dependent variables (water quality parameters) varies with season and do not obey normality. In addition, results of the water quality parameters should not contain outliers and the significant value (p-value) computed based on Kolmogorov smirnov and Shapiro-wilk test must be less than 0.05; i.e. (p < 0.05) for all the dependent variables. Results of the computed p-value based on Kolmogorov smirnov and Shapiro-wilk is presented in Table 3 and 4 while the outlier detection test using box plot is presented in Figure 6 Table 3: Testing the assumption of normality for MANOVA

Figure 6: Seasonal box plot for assessing the presence of outliers
From the result of Table 3 and 4, it was observed that most of the dependent variables had p-value less than 0.05 based on Kolmogorov smirnov and Shapiro-wilk test. Since the calculated p-values based on Kolmogorov smirnov and Shapiro-wilk test are less than 0.05, it was concluded that the dependent variables did not obey normality. Non-normally distributed dependent variables indicate the presence of seasonal variation. A further test of normality was done using the detrended normal quantile-quantile (Q-Q) plot presented in Figures 7a and 7b Since the dependented variables did follow the detrended normally distributed line, it was concluded that the variables are not normally distributed an indication that there is variation in the water quality parameters occassioned by season. On whether the dependent variables contain any form of outliers, the seasonal box plot presented in Figure 6 was employed. The presence of outlier is normally indicated with a square box or circle containing a number inside it. Since the circles in Figure 6 did not contain any number inside them, it was concluded that the dependent variables are devoid of possible outliers.

Assessing the suitability of MANOVA based on multivariate outliers
Multivariate alliance is usually calculated through a measure known as the Mahalanobis constant. If the maximum calculated value of the Mahalanobis constant is less than the critical value, then the assumption of multivariate outliers has not been violated. Therefore, if multivariate outliers have not been violated, then we can investigate the concept of seasonal variability using multivariate analysis of variance (MANOVA) otherwise, we must think of another statistical concept to track the presence of temporal variability. Results of the calculated Mahalanobis constant using regression analysis is presented in Figure 8  49 of variance in assessing the effect of seasonal variability. With (df > 10) the critical value of Mahalanobis constant was (> 29.59). Since 173.1431 > 29.590, it was concluded that the assumptions of multivariate outliers have not been violated hence the use of multivariate analysis of variance to study the presence of seasonal variability is justified. To assess the degree of reliability of this claim, regression goodness of fit criteria was computed and presented in Table 5  Table 5: Computed regression goodness of fit criteria The regression model is highly significant with a p-value < 0.05. Coefficient of determination of 0.948 and Adjusted R-square value of 0.942 were good enuogh to conclude that the assumptions of multivariate outliers has not been violated which justify the use of MANOVA in this study. Since the assumption of multivariate outliers was not violated, multivariate analysis of variance was then applied to explain the seasonal variability in the quality of water at different sampling time (season). The following step by step analysis was employed to study the imaginative variance (seasonal variability in the water quality as a function of season)   From the results of Tables 6a and 6b, it was observed that there is a significant difference between the calculated mean and standard deviation of all the dependent variables as a function of sampling time (wet and dry season). For pH, the mean ± standard deviation during dry season was observed to be 6.679 ± 1.0970 and during wet season it was observed to be 5.376 ± 0.6318. For nitrate, the mean ± standard deviation during dry season was observed to be 1.3601 ± 2.810711 and during wet season it was observed to be 2.3677E1 ± 7.082085. For electrical conductivity (EC), the mean ± standard deviation during dry season was observed to be 123.07 ± 137.557 and during wet season it was observed to be 236.58 ± 79.786. For turbidity, the mean ± standard deviation during dry season was observed to be 16.6645 ± 57.7256 and during wet season it was observed to be 0.0800 ± 0.37255. For dissolved oxygen (DO), the mean ± standard deviation during dry season was observed to be 4.593 ± 0.1076 and during wet season it was observed to be 4.145 ± 0.1329. The difference in the mean and standard deviation suggest the presence of imaginative variance which is seasonal variation occasioned by change in sampling time (dry and wet season).

Box Test or Covariance Matrix
In multivariate analysis of variance, we set out to test the null hypothesis that observed covariance matrix of all the dependent variables (water quality parameters) are equal across group (wet and dry season) that is there is no seasonal variation in the water quality parameters. If the calculated p-value is less than 0.05 (p < 0.05) we reject the null hypothesis and conclude that the assumption of equal covariance matrices across group has not been satisfied; an indication that seasonal variability exists among the group. The computed covariance matrix for the corrected model and season is presented in Tables 7a and 7b  Table 7a: Computed covariance matrix for corrected model From the results of Tables 7a and 7b, it was observed that the computed significant values (p-value) for both the corrected model and season were less than 0.05; (p < 0.05), hence the null hypothesis was rejected and it was concluded that the covariance matrix assumption was not satisfied. This means that the covariance matrices of the 52 dependent variables are not equal across group an indication that seasonal variability exists. It was concluded based on the covariance matrix that the variation in the dependent variables is due to seasonal variability 3.3.5 The Multivariate Test Different statistical method for computing the F-value for multivariate analysis of variance exits in literature. One of them is the Roy's largest root which is probably the most acceptable and also the most susceptible to deviation in the covariance matrix. The next is the Pillai's Trace followed by Wilk's Lambda. Pillai's Trace is the least sensitive to the violation of the assumption of covariance matrix hence it was selected for this study. Result of multivariate test statistics computed to study the effect of seasonal variability is presented in Table 8 From the result of Table 8, it was observed that the computed significant value (p-value) based on Roy's largest root, Wilk's Lambda, Hotelling's Trace and the Pillai's Trace was less than 0.05 (p = 0.00) hence, the null hypothesis that the water quality parameters are the same for the two groups (wet and dry season) was rejected and it was conclude that seasonal variability actually exist. To calculate the percent variability that is accounted for due to seasonal variation, the partial Eta squared value of the Pillai's trace was employed. From the result of Table 4.11, the calculated partial Eta squared of the Pillai's trace was observed to be 1.00 which indicates 100% variability among the dependent variables occasioned by seasonal change.
In addition, when the null hypothesis of equal variance assumption is rejected, then the observed power function based on Pillai's trace must be between 0.9-1.00. From the result of Table 8, it was observed that the calculated power function based on Pillai's trace is 1.00 for both intercept and season. This validates the initial claim that seasonal variability exists between the dependent variables.

Levene's Test of Equality of Error Variance
If seasonal variability exists among the dependent variables then, the calculated error variance for all the dependent variables for wet and dry season must not be the same. To test the null hypothesis that the error variance of the dependent variables is equal across groups, Levene's test of equality of error variance was computed and presented in Table 9. From the result of Table 9, it was observed that the calculated p-value for most of the dependent variables were less than 0.05; an indication that the error variance of the dependent variables is not equal across group. Since the error variance of the dependent variables varies across group, it was concluded that seasonal variation exists between the dependent variables. Results of parameters estimates based on MANOVA is presented in Tables 10a and 10b

Conclusion
The study was conducted to assess the quality of groundwater around the Niger Delta Basin Development Authority and evaluate the impact of seasonal variability (wet/dry season) on the groundwater quality. Results of the study have shown that a high degree of variability exist in the quality of groundwater collected from different locations within the study area. One of the major factors that are responsible for this variability is the influence of climate change occasioned by season. The study also demonstrated the potential of multivariate statistics as a tool for climatic variability studies. The content of this study is not completely exhaustive of the subject matter, but it has provided additional information to the already existing literatures on groundwater variability studies using statistical approach.