Statistical Modelling of Significant Wave Height: Inshore and Offshore Bonny River Estuary, Nigeria

The selection of best – fit probability distribution is of fundamental importance in the design, planning, and operability studies for harbours, coastal and offshore structures and exploitation of coastal resources. A wrong choice of probability distribution could lead to either under – or over – estimation of design loads, which may have detrimental impacts on safety and project economy. Consequently the Inshore and Offshore wave data of Bonny River estuary, Nigeria was studied by fitting Rayleigh, 2- parameter Lognormal, Fisher – Tippett Type 1(FT-1), 2-parameter and 3-parameter Weibull probability distributions using probability paper method. The probability distributions were subjected to five model evaluation metrics; coefficient of determination (R 2 ), demeaned RMSE, normalizedRMSE, Scatter index (SI) and Performance scores index(d 1 ) and a scoring scheme was adopted on the basis of which the best – fit probability distribution for either Inshore of Offshore was selected. The results show that 2-parameter Lognormal distribution is the best – fit probability distribution for the Inshore station, seconded closely by Fisher – Tippett Type 1 and 3- parameter Weibull distributions. The results also show that Fisher – Tippett Type 1 is the best – fit probability distribution for the offshore station seconded closely by 3 – parameter Weibull distribution. This study will contribute to the development of the coastal zone and also a recipe for Integrated Coastal Zone Management of the Nigerian Gulf of Guinea.

module. The unit consists of a radio receiver, phaselock for demodulator, anti-aliasing filter, A-D converter and RS232 communication port. The various onshore wave logging components are shown in Figure 2 (Nigerian LNG -Annual Report, 1990

Probability Distribution
The probability distributions, Cumulative density functions, and probability plotting axes of the selected probability models are shown in Table 1.

Wave Evaluation Metrics
Statistical metrics applicable to evaluation of wave models were used to assess the performance of the probability distributions. The metrics are coefficient of determination, demanded root mean square error (demeaned RMSE), Scatter Index (SI), Normalized root mean square error (NRMSE), and Performance Score index(d1). For each of the measures given, Hso represents the observed significant wave heights, Hsp represents the predicted significant wave heights, N is the number of observations and os H is mean significant wave height. The computational forms of the above metrics are: (i) The bias estimate; b = 1/N ∑(Hsp -Hso) …………………………………… …(1)  Vol.11, No.4, 2019 78 The RMSE is a measure of the residuals between the model predictions and measured observations, where larger number indicate greater variance. RMSE can range from zero to infinity and lower the value, the better the model. RMSE (demeaned) is RMSE estimate corrected for the bias, resulting in its equivalence to the standard deviation of the difference normalized (NRMSE) are RMSE estimates which include components of variance and bias (Ardhuin et al., 2010).
By presenting the RMSE as unbiased, a more complete picture of the error distribution is provided (Chai and Draxler, 2014 (5) d1 is based on the absolute values of the errors and is less sensitive to errors concentrated in outliers compared to its original formulation. The R 2 in Equation indicates the proportion of observed variation that can be explained by the model. The higher the value of R 2 , the more successful is the linear regression model in explaining the variation. Tables 2 and 3 show the probability distribution of inshore and offshore significant wave height calculations for 1989 -1990. The class intervals in Table 2 is 50mm (0.05m) each. Ni in column 4 indicates, the total number of waves in each class interval per year. In column 5, SNi signifies the number of observations up to and including the present class interval. Column 6 gives the probability that any wave height H' is equal to or less than a specified wave height H , defined as Q(H'>H) = 1-P. The values in column 6 may be presented in equation form as; is the empirical cumulative distribution function (CDF). Since the most robust relationship for both interpolation and extrapolation is a straight line, the CDF or the Plot of column 6 (probability) against wave height H is transformed into a linear model as: Y = aX +β ……………….. (7 ) Where Y is the transformed probability axis, also called the reduced variate and X is the transformed wave height axis. The X and Y axes of the probability distributions used in the study are shown in Table 1. For normal distribution, each coordinate point is plotted as (H, z = F -1 (Pi)), where F -1 (Pi) is inverse function F -1 (.) is calculated using the Ms Excel built-in function Norm.Si Inv(Pi). In this way column 8 (z = F -1 (Pi) is plotted against H (column 3). The coefficients a and β are the slope and intercept of Equation 7, they represents the mean and standard deviation, and in turn can be used to calculate the distribution parameters. Accordingly for Log-normal distribution, z = F -1 (P) is plotted against LnH. In the case of Fisher-Tippett 1(FT-1) distribution, the reduced variate Y = -Ln(Ln P -1 i). In column 10 (G) is plotted against H (column 3). For the 2-parameter Weibull distribution, column 9 is plotted against x = Ln(-Ln(Q)); the Qvalue are given column 7. In the case of three-parameter Weibull distribution, LN(H-A) is plotted against Ln(-Ln(1-Pi), the parameter "A" is chosen by trial, until the best straight line is obtained. The above analysis is repeated for Table 3.

Results
The goodnessoffit ( GoF) tests have been selected based on wave evaluation criteria mostly cited in literature ( e.g. Bryant et al. 2016, Akpinar et al. 2012. They indicate how well a probability distribution fits the observed wave climate. The results of GoF tests are given in Table5. The recommended performance rating for each test was used to assess each probability distribution. For example, the range of R 2 lies between 0 and 1 which describes how much of the observed dispersion is explained by the prediction. A value of zero means not correlation at all whereas a value of 1.0 means a perfect fit (Krause et al., 2005). The range of performance scores index (d1) is similar to that of R 2 and lies between zero (poor performance) and one (excellent performance). The scatter index (SI) is a normalized measure of error. Lower values of the SI are an indication of the better model performance. The RMSE is a measure of the residuals between the model predictions and measured observations, where larger values indicate greater variance. The RMSE is corrected for bias resulting in RMSE (demeaned) and the other forms of RMSE which includes components of variance and bias as normalized (NRMSE). Accordingly, a perfect model has a bias, RMSE including RMSE (demeaned and normalized RMSE), and SI of 0.0. Based on the above performance rating, the 3parameter Weibull distribution is scored 1 with R 2 value of 0.9937 in the offshore category. Similarly, the 3-paramter Weibull distribution is also scored 1 with R 2 value of 0.9935 being the highest in the inshore category. The inshore and offshore wave climates were scored separately in order to account for the different wave height attenuation processes. Table 6 shows the scored results of each probability distribution. The total score of each distribution was obtained by summing the individual point scores obtained from all the GoF tests (Izinyon and Ajumuka, 2013). The best-fit model for each distribution was selected based on the lowest score obtained as shown in  Vol.11, No.4, 2019  Where R 2 is coefficient of determination, RMSE(Dm) is RMSE(demeaned), NRMSE is normalized RMSE, S.I. is scatter index and P.S Index is performance score index.

Discussion
The results obtained in this study agrees with those found under similar geographic settings in other parts of the world. The Nigerian NLNG found 3parameter Weibull distribution the best fit for both Inshore and Offshore stations. Conversely this study found 2parameter Lognormal the bestfit distribution for the Inshore station. and Fisher -Tippett Type1 the bestfit probability distribution for the Offshore station seconded closely by 3parameter Weibull distribution. The findings of this report is also in agreement with  report that Fisher -Tippett Type1 distribution is a good fit to 3hourly data from the North Atlantic and North Sea. WMO-No.702(1989) also recommended 3parameter Weibull distribution to be good fit , though it is not   Vol.11, No.4, 2019 the best fit in this study. Further , the good performance of 2-parameter Lognormal distribution for the inshore station may be attributed to the absence of seasonal effects in significant wave heights. It has been observed that the Inshore wave climate has a clear periodicity with the 12hourly tidal cycle. The Offshore wave climate did not show significant periodicity, implying that the Offshore wave climate is different from the Inshore probably due to the presence of the Bonny bar which separates the two stations, causing wave breaking in front of the Bonny bar (NLNG-1990)

Conclusion
In this paper, the significant wave heights Inshore and Offshore of Bonny River estuary has been modelled statistically. The probability distribution employed are Rayleigh, 2-parameter Lognormal distribution, Fisher -Tippett Type 1, 2parameter and 3parameter Weibull distributions and the goodness -offit (GoF) tests are R 2 , demeaned RMSE, normalized RMSE, Scatter index (SI) and Performance scores index(d1) and the performance of each distribution was scored according to the GoF test rating . The distribution that satisfies the performance rating the best is giving a score of 1, the next is score 2, and so on. The total score of each distribution was obtained by summing all the individual point scores. The distribution that has the lowest score was adjudged the bestfit. In this way , for the Inshore station , the lognormal scored 10, seconded closely by FT-1 and 3-parameter Weibull distribution scoring 11 points each, 2parameter Weibull 18, and Rayleigh distribution 19, consequently the bestfit distribution for the Inshore station is 2-parameter Lognormal distribution. Similarly , the FT-1 is the best fit distribution for the Offshore station. The most important outcomes of this study may be summarized as ; (i)The bestfit probability distribution for the Inshore station is 2-parameter Lognormal distribution with a total score of 10, seconded closely by FT-1, 3-P Weibull distributions having scored of 11 points each. (ii) The bestfit probability distribution for the Offshore station is FT-1 distribution with a total score of 7 points, seconded closely by 3-Parameter Weibull distributions with a score of 9 points. (iii)The wave climates Inshore and Offshore are different due to the presence of the Bonny bar which prevents the penetration of Offshore waves. (iv) Reasoning from (iii) above, the seasonal effects observed Offshore was absence in the Inshore waves.
(v) The waves climate Inshore is mainly due to swell waves.