IN BAYESIAN MODEL AVERAGING TO WATER POLLUTION IN IBADAN

A special technique that measures the uncertainties embedded in model selection processes is Bayesian Model Averaging (BMA) which depends on the appropriate choices of model and parameter priors. Inspite the importance of the parameter priors' specification in BMA, the existing parameter priors give exitremely low Posterior Model Probability (PMP). Therefore, this paper elicits modified g-parameter priors to improve the performance of the PMP and predictive ability of the model with an application to the Water Pollution of Asejire in Ibadan. The modified g-parameter priors gj =   j a k n , 3,4,5 a   established the consistency conditions and asymptotic properties using the models in the literature. The results show that the PMP with the best prior (gj= 5 / j k n ) had the least standard deviations (0.0411 at n=100,000 and 0:000 at n=1000) for models 1 & 2 respectively; and had the highest posterior means (0.9577 at n=100,000 and 1.000 at n=1000) for models 1 & 2 respectively. The point and overall predictive performances for the best prior were 2.357 at n=50 and 2.335 at n=100,000 when compared with the BMA Log Predictive Score threshold of 2.335. Applying this best g-parameter prior in modeling the Asejire river, it indicates that the dissolved solids (mg/l) and total solids (mg/l) are the most important pollutants in the river model with their PIP of 6.14% and 6.1% respectively.


INTRODUCTION
Over the years in Nigeria, environmental problem is a great issue especially in the Southern part of the country where oil is spilled into water to cause water pollution. The people of the area are adversely affected with one environmental issue or the other. Previous researches on environment in Nigeria involve the classical approach. To this end, there is prior knowledge about challenges facing the community. I am now motivated to apply Bayesian Analysis through prior elicitation so as to form likelihood in such a way to give a compromise and update of knowledge in pattern of the Posterior using Bayesian Model Averaging (BMA). Bayesian Model Averaging (BMA) is a method that measures the uncertainties embedded in the model selection processes which depends on the appropriate choices of model and parameter priors. By averaging over many different competing models, BMA incorporates model uncertainty into conclusions about parameters and prediction. BMA approach allows the assessment of the predictive skill of a model. Akanbi, (2016) contributed that a composite inference that takes account of model uncertainty can be made in a simple and formally justifiable way. BMA is the method that has been proposed for handling some applications that are very large numbers of models. In BMA, elicitation of priors can be of two forms which are: model and parameter priors. Model priors can be fixed, random, uniform or even custom prior inclusion probability while the parameter priors also knowns as Zellner can also be fixed, empirical Bayes (local) or hyper g prior.
The Zellner g-structure in the parameter prior is expected to be as small as possible such that consistency of the true posterior model probability holds, Zellner, (1986). Fernandez, Ley and Steel (2001a) improved this work based on the priors, Akanbi, (2016) gave an extension in eliciting additional five g-parameter priors. Therefore this research is being undertaken so as to serve as an extension to the literatures on g-parameter prior elicitation in the BMA approach to normal linear regression model based on the increment in prior information with the number of regressors in the model. Hence, the modified parameter prior, gj = kj/n a (a=3, 4 and 5) combined with the uniform model prior is elicited for this study.

Bayesian Model Averaging Framework
Suppose a linear model structure of n-independent random samples from a normal regression, with y being the dependent variable, X is the independent variable, 0 y  a constant, y  the coefficients and  a normal iid error term with variance 1 (1) If X contains K potential variables, this means estimating 2 K variable combinations and thus 2 K is given thus; K 1, 2,..., M(M 2 );0 Where kj is the number of regressors for model j and K is the total number of regressors in the model. The model weights for this averaging stem from posterior model probabilities (PMP) that arise from Bayes; theorem: The integrated likelihood of the model is given thus; The marginal likelihood of the model is given thus; Thus, the parameter prior is;

/ (ln ) n
Prior to capture reduction of information by reducing the sample sizes but with a higher value of the numerator compared with the FLS. Its asymptotic convergence is Hannan-Quinn Criterion with level CHQ = 3 Nothing that, the first probability limit is with the respect to the true model Ms.
The g-parameter prior takes the functional form of: Where, t1(kj) is the numerator function, in most cases a constant or number of regressors in the model, t2(n) is the denominator function, usually the sample size used for simulation procedure and 2 t (n) is the first order derivative of the function t2(n) Given, the assumption that Ms generates the Data, then if the following conditions t  is an increasing function Now, we examine the conditions mentioned above with regard to our modified gj prior; Then, the conditions are satisfied as established below.
(a) 11 Thus, the seven Asymptotic properties of the modified g-parameter prior are now derived as follows: Case i: Distribution of the Modified Parameter Prior Case iii: Marginal Likelihood of Model j using the Modified gj Case iv:Bayes Factor for Models (j,s) using the Modified gj Thus, the Log predictive score (LPS) is

Simulation and Analysis
The According to their performed simulations, a design matrix Z for the regressors is an n×K, K = 15 is a fixed number of regressors for a sample size n, such that (z(1), z (2), · · · , z (10)) are drawn from N(0,1) and the subsequent five columns (z(11), · · · , z (15)) are built standard from; It is indicated that Model 1 is explained by a more or less realistic situation where one third of the regressors intervene, while Model 2 is an extreme case without any relationship between regressors and response. In this analysis, a uniform prior is used over the model space M using MCMC of 50,000 recorded drawings after a burn-in 20,000 drawings and sample sizes of n=50,100,1000,10000,100000 with the model prior;

Convergence and Implementation
To examine the convergence of the chain, the empirical (MCMC) and the exact (Bayes factor) are compared. Though, the results are reported based on Bayes factors, the chain is run long enough to have PMP almost equal to those exact results. An important tool to assess this convergence is the correlation coefficient between these two components (Bayes Factors and Empirical relative frequencies of model visited).

Posterior Model Inference (PMI)
The Posterior Probability assigned to the model that generated the data is one of the main indicators of performance of the Bayesian Methodology. It is expected that the true model should be high for small or moderate values of n that are likely to occur in practice. Generally, the posterior probability of this true model converges to 1 for large samples. The motive of any model used is to visit the only the true model which is one (1), meaning that; the smaller the visited model, the better it is. The Quartiles of the ratio between the posterior probability of the correct model and the highest posterior probability of the next model, in most cases this ratio tends to be far above unity to confirm the certainty of the true model. It can be affirmed from the table above that as sample size n increases, posterior probability of this true model converges to 1 whereby the best modified g-parameter prior (gj = 5 / j kn ) was concluded to be the best for the Model 1 with the estimated value of 0.9577  1. From the records means and standard deviations of the number of visited models in the model 1 with 50 ≤ n ≤ 100, 000 of sample sizes, it can be deduced that the g-parameter prior (gj = 5 / j kn ) gives the best result for the Model 1 with the least value of 46.31 when n = 100, 000 (large). The Quartiles of ratio of the true model 1 posterior probability established the best prior with Q3 value of 6.3. It is indicated from the table above that as sample size n increases, posterior probability of this true model converges to 1 whereby the best g-parameter prior (gj = 5 / j kn ) was concluded to be the best for the Model 2 with the estimated value of exactly 1 from when n = 1, 000 to n = 100, 000. The means and the standard deviations of the number of visited models in the model 2 with 50 ≤ n ≤ 100, 000 of sample sizes established that the g-parameter prior (gj = 5 / j kn ) gives the best result for the Model 2 with the least value of 10.41 when n = 100, 000. From the quartiles of the ratio between the posterior probability of the correct model and the highest posterior probability of the next model in the Model 2, it is highly shown that all the g-parameter priors (gj = / a j kn ; ∀a = 3, 4, 5) ascertained the true model 2 with the highest values range from Q3 = 8.5 to Q3 = 241 when n = 50 to n = 100, 000 as it far above unity.

Posterior Inclusion Probability (PIP)
This section presents the means and standard deviations of the posterior probabilities of including each of the regressors (1, 5, 7, 11 and 13) as indicated in the above equation of model 1. It is expected that as sample size (n) increases, those means of these regressors also tend to 1. It gives the degree of errors when the posterior model probability is allocated to the wrong sampling model. priors (gj = / a j kn ; ∀a = 3, 4, 5); with sample size n = 1,000 for the Model 1, all regressors (1, 5, 7, 11 and 13) are close to 1 but regressors (5, 7 and 13) are equally 1 in terms of mean of the three g-parameter priors (gj = / a j kn ; ∀a = 3, 4, 5) and for n=100,000 it is shown that all the regressors (1, 5, 7, 11 and 13) are equal to 1 in terms of mean for the Model 1 of the three gparameter priors (gj = / a j kn ; ∀a = 3, 4, 5).
This establishes that highest sample size yields the best result in this case and hence, the best modified g -parameter prior is gj = 5 / j kn .

Predictive Inference (PI)
This section deals with predictive inference via the Log predictive Score (LPS) in terms of point and overall predictions for some samples based on the values of the regressors * w X ; for model 1, w=19 different vectors of the K = 15 regressors. The below Table depicts the predictions via log predictive score (LPS) for model 1 via the 100 samples (y, X * ). It can be established from the above Table that the vector of regressors that lead to the minimum value for the mean (100 replication) of the sampling model 1 for the modified priors (gj = / a j kn ; ∀a = 3, 4, 5) are all close to the threshold of 2.335 as specified for BMA models, especially when n = 50. In the same vein, the above Table presents the overall predictive performance via the LPS ( ** ,, w X y X ) for the 19 different values of * w X and the 100 replications of (y, X * ). Obviously, all the elicited g-parameter priors showed well predictive behaviour for n = 100, 000, but the best of all is the modified prior (gj = 5 / j kn ) with the exact value of threshold i.e. 2.335 as specified for BMA models.  Water pollution is the contamination of water bodies, usually as a result of human activities. Water is considered polluted when unwanted materials with potentials to threaten human and other natural systems find their ways into water sources or reserved fresh water in homes or industries. Therefore, the BMA method is applied to the water pollutants and its pollution level to account for the uncertainties embedded in both the parameters and model using the best modified g-parameter prior, gj = 5 / j kn with the water pollution level model given below: where  is a stochastic error term, independently and identically distributed as   Table 6 presents the means and standard deviations of the posterior inclusion probabilities (PIP) of each of the regressors in the water pollution level. It is indicated that the dissolved solids (DS) with PIP of 6.14% is very important if modelling water pollution of Asejire River in Ibadan     Figure 6 shows the cumulative model inclusion probabilities based on best 14 models. It also depicts the inclusion of a regressor with its sign in the model selection process. This image plot is based on the Bayes factor of the MC 3 simulator. The blue colour means a positive sign. It is confirmed that the selected best model with PMP of 97% includes only the dissolved solids (DS).

CONCLUSION
In this paper, the elicited modified g priors need only the choice of one scalar hyper parameter known g-class. The consistencies conditions and asymptotic properties for the modified gparameter priors were derived. The empirical results on both posterior model and predictive inferences indicate that the modified prior gj = 5 / j kn was the best out of the three g modified parameter priors considered in the BMA technique. This implies that, the higher the power of the sample size (n), the more efficient and the g parameter prior. The application of the best g prior to modelling Asejire River shows that the effect of dissolved solids (mg/l) and total solids (mg/l) as water pollutants in Asejire River, Ibadan, Oyo State are very important. Thus, the two water pollutants are recommended in modelling Asejire River and also to used the elicited modified parameter prior, gj= 5 / j kn combined with a uniform model prior for model selection or Bayesian model averaging in Asojire River model whenever informative prior is not available for both small and large samples.