Flood Frequency Analysis of Niandan River at Baro, Nigeria

The determination of appropriate probability distribution is crucial for flood – risk reduction and mitigation of flood –induced damages in flood plains. This paper investigates the selection of an appropriate probability distribution for at – site flood frequency analysis using annual maximum series of Niandan River at Baro. The model results were subjected to five goodness of fit –tests(i.e. Probability plots, Probability Plot Correlation Coefficient (PPCC), Percent bias (PBIAS), Nash- Sutcliffe efficiency(NSE) and RMSE-observations standard deviation (RSR) and performance scoring/ranking to identified the best – fit distribution.The Normal, and Weibull distributions have been identified as the top two distributions for Niandan at baro. Assessment of other of flood frequency analysis studies across Nigeria show that no single distribution has been found appropriate for whole country. Consequently, research work for the search of best – fit distribution for Nigeria will continue into the future.

EV2 respectively . This implies that there is no probability distribution universally accepted as standard tool for flood frequency analysis. Consequently, each country or agency need to establish the best-fit distribution(s) peculiar to its geographic setting and the prevalent stochastic processes. Rahman et al., (2013) studied the selection of probability distributions for at-site flood frequency analysis in Australia and identified the Log-Pearson 3, generalized extreme value, and generalized Pareto distribution as the top three best-fit distributions. Bayazit (1995) evaluated various statistical distributions for determination of best-fit probability distribution from 19 stations mainly in Europe and the GEV was found superior to others. Vogel and Wilson (1996) studies the probability distribution of annual maximum, mean, and minimum stream flows in the United States. They found the generalized extreme value (GEV), three-parameter lognormal (LN3) and the log Pearson Type III (LP3) distributions good approximations to the distribution of annual maximum series. The COST Action ESOPO1 Flood Freq (2013) study undertook a pan-European comparison and evaluation of methods of flood frequency estimation and found no standardized European flood frequency estimation approach. The study also observed that in a number of countries (e.g Austria, Germany, Italy and Spain) the Generalized Extreme value distribution is among the recommended choices, but a variety of 2-or 3 -parameter distributions are also used (e.g., Gumbel EV1) Generalized Pareto, (GPA), 3-parameter lognormal LN3) Gumbel EV1, Generalized praetor, (GPA), 3-parameter log Normal (LN3), etc) depending on the region. Abida and Ellouze (2007) analyzed the probability distribution of flood flows in Tunisia and found the Generalized Extreme Value (GEV) and Generalized Logistic (GLO) superior to the various candidate distributions.
The suitability of a given model is usually evaluated using goodness-of-fit tests, such as Probability Plot Correlation Coefficient (PPCC); Root Mean Square Error (RMSE), Relative root mean square error (RRMSE), Mean Absolute error (RRMSE), Chi-square text (χ 2 ), Kolmogorov-Smirnov (KS) GoF test, Anderson-Darling (AD) GoF tests, Akaike's Information criterion (AIC), L-moments diagrams, Bayesian Information criterion (BIC) (Abida andEuoeeze, 2007, Rahman et al., (2013), Rao and Hamed, 2000). Once a model has selected, the parameters that fit the data need to be identified. The most commonly used methods are (1) the method of moments (MOM), (ii) the maximum likelihood method (MLM),(iii) the probability weighted moments (PWM). The MLM approach is considered the most efficient method since it provides the smallest sampling variance of the estimated parameters, hence of the estimated quintiles, compared to other methods. The MOM is natural and relatively easy parameter estimation method. The PWM method gives parameter estimates comparable to the method of maximum Likelihood (Rao and Hamed, 2000).
This research study investigate the best-fit probability distribution model of flood flow for River Niger at Baro. The probability distributions used are, Normal, Lognormal, Weibull and Gumbel EV1 while the appropriateness of the distributions is investigated by the PPCC methods and probability plots. The goodness of fit -tests applies to select the best -fit distribution are probability plots, probability plot correlation coefficient (PPCC), coefficient of determination(R 2 ), Percent bias (PBIAS), Nash-Sutcliffe efficiency(NSE) and RMSEobservations standard deviation (RSR). This paper is organized into four sections, section 1 gives a concise background understanding of flood frequency analysis, comprising the fundamental assumptions on data quality, data models, probability distributions, parameter estimation, and goodness-of-fit. Section 2 contains the materials and methods including consistency checks, data and site description, PPCC and probability Plots methods. Section 3 presents results and Discussions of the results in section. Section 4 contains the conclusion and recommendations.

Materials and Methods 2.1Data and Study Area
The Baro hydrological station (Gauge station) is located between latitude 08 o 35' and longitude 06 o 23'. The size of its drainage area is 729,510km 2 . The altitude of the gauge is 57.22m above mean sea level. The station is 698km from the coast. The annual maximum series (AMS) used for the study were obtained from Mahe andOlivry, 1991 andSangare (2001). The record length of the AMS of Niandan River at Baro is between 1984 and 2000 (54 years).

Preliminary Test of Flood Data
The hydrologic data series , was subjected to the following tests; (i) Spearman Rank Order Correlation (ii) Theil Sen Trend (TSA) test and (iii) Trend -Free Pre-Whitening (TFPW) method.

Spearman's rho (SP) test
The Spearman's rho test like the Mann -Kendall test is a non -parametric test. The method distribution -free and has almost uniform power for linear and non-linear trends (Dahmen and Hall, 1990;Hamed, 2016). The Spearman's Rank statistics R sp and the standardized test statistics Z SR are: Where n is the total number of data, R i is the rank of mth observation Xi in the time series One can test the null hypothesis, H o : R sp = 0 (their is not trend), against the alternate hypothesis, H 1 : Rsp < > 0 (trend exist if R sp is less than or greater than zero). t (n -2, 1 -α/2) is the critical value of t from the t-students table, for 5% significance level.

Serial autocorrelation
The autocorrelation function is a tool to verify the independence of the time series. Given a sequence of consecutive data points, form the set of overlapping pair (x t , x t+1 ) for i = 1,., n -1. The general approximate equation for the k th lag is given by Equation 7. The autocorrelation function is a valuable tool for assessing different types of autocorrelation (Chatfield, 2004).
Where x i is an observation, x t+1 is the following observation, x is the mean of the time series and n is the number of data. Based on the value of the first order autocorrelation (r 1 ), the hypothesis Ho: r 1 (that there is no correlation between two consecutive observations) against the alternate hypothesis, H1: r 1 < > 0. (Dahmen and Hall, 1990): defined the critical region, u at the 5% level of significance as;

Theil Sen Trend (TSA) Test.
The Theil-Sen trend line (Helsel, 2005) is a non-parametric alternative to linear regression which can be used in conjunction with the Mann-Kendall test to estimate the magnitude of the detected trend. The Sen's method uses a linear model to estimate the slope of the trend and the variance of the residuals should be constant in time ( Da Silva et al., 2015). The Theil -Sen (Q ik ) is calculated as: Where Q j and Q k are the data points at time j and k (j -k), respectively. With a sample size of n, there should be a total of N = n(n-1)/2 such pairwise estimates Q ik . The Theil-Sen's Estimator of the slope is the median of the N values of Q ik . The N values of Q ik are ranked in ascending order, smallest to the largest and the Sen's estimator is calculated as: The sign of Q ik reflects the data trend, while its numerical value indicates the steepness of the trend. Positive or negative slope is obtained as upward (increasing) or downward (decreasing) trend.

Trend-Free Pre-Whitening -Mann -Kendall Procedure (TFPW-MK)
The trend -free -pre -whitening procedure was applied to remove AR(1) process as follows (Yue et al., 2003); 1) The slope (β) of a trend in sample data is estimated using the approach proposed by ( Theil, 1950 andSen, 1968). The trend is assumed to be linear, and the sample data are detrended by: Where Q d is detrended time series, Q o is original time series, β is slope estimated by Theil-Sen approach, t is time.
2) The lag-1 serial correlation (r 1 ) of the detrended time series Q t is computed using (Yue et al, 2003), thereafter the AR(1) is removed from Q d , resulting in pre-whitened times series.
Where Q pw is pre -whitened time series, the application of Equation 13 is know as trend-free prewhitening (TFPW) procedures. The residual serves after applying the TFPW procedure (Q pw ) should be an independent series. 3) The identified trend T t (=βt) and Equation 13 are blended by Q bd = Q pw + T t = Q d -r 1 Q d -1 + β* t…………………………………………………14 Where Q bd , is blended series, could preserve the true trend and is no longer influenced by the effect of serial correlation. In Equation 13, the trend (β*t) was added back to the autocorrelation data before applying the MK test in order for its significance.

Probability Density Functions
In flood frequency analysis, a random variable is used to fit probability distribution which in turn is used to extrapolate recorded events and design events either graphically or analytically by estimating the parameters of the distribution. The probability density functions, cumulative density functions the associated unbiased plotting position formulae are shown in table1.  ; If the estimate of the observed Q is ∧ Q then the relationship between the observed Q and g(Fi) should be linear. If the fitted distribution is exactly the parent distribution. Then by plotting the observed data Q against g (Fi), these distributions which give a straight line relationship on the probability plot can be selected. Table 1 contains the probability distributions and coordinates of the x-axis and Y-axis on the probability plots of the evaluated probability distributions and unbiased plotting position formulae. The slope(β) and intercept ( α) in Equation15 are parameters of the model.

Probability Plot Correlation Coefficient (PPCC)
The PPCC test was developed by Filliben (1975) as a powerful test among many goodness fit tests for normality and non-normality. The test measures the linearity of the plot under an assumed distribution and provides a quantitative measure for comparing the relative goodness of fit of a fitted distribution and also with sufficient power to discriminate between different distribution hypothesized (Vogel, 1986l Stedinger et al., 1993 Aminataee and Montaseri, 2013). The equation for the probability plot correlation coefficient is expressed as: Given the level of significance (α), determine the critical point (r cp ) the Filliben's test which depends on the sample size n. Compare the probability plot correlation coefficient (r) against the critical point (r cp ). If r ≥ r cp , conclude that normality is a reasonable model for the underlying population at the α-level of significance. If, however, r < r cp , reject the null hypothesis and conclude that another distributional model would provide a better fit.
Where Q and _ m Q are the mean values of the observation X i and the fitted quantiles Q i , respectively, and n is the sample size.

Model Evaluation Statistics
The percent bias (PBIAS); RMSE -Observation standard deviation ratio (RSR) and Nash-Sutchiffe efficiency (NSE) where used to evaluate the quantiles of the data against the quantiles of the standardized empirical distribution. (Moriasi et al., 2007) i) The Percent bias (PBIAS) measures the average tendency of the standard data (Q sim ) to be larger or smaller than the observed data (Q obs ). The optimal value of PBIAS is 0. The lower RSR; the lower the RMSE, and the better the model simulation. The best probability distribution is one with the lowest RSR value.

Analysis, Results and Discussion 4.1 Analysis
The main results comprises the descriptive statistics, auto-correlation analysis, Spearman Rank Order (SROC), Probability Plot Correlation Coefficient (PPCC) test, probability plots, Sen's slope estimation of the Annual Maximum Series (AMS). The time series of AMS is shown in Figure 2 with a discernable negative trend typical of most time series in the Niger River Basin. Figures 3 -6 show the probability plot of Niandan River while Figures 7 -10 presents the probability plot correlation coefficient tests. The abscissas and ordinates of the probability plots is represented by the expression (Φ -1 (Pi), Y i ). Thus, the probability plot is a plot of the sample quantiles Y i against theoretical quantiles of Qi, where Φ -1 is the inverse of the cumulative standard normal distribution, and Pi denote the appropriate quantiles probabilities. In the paper, Φ -1 (P i ) and the theoretical quantile (Q m ) in Equation 15 were calculated using the MS Excel built-in functions; NormInv (P i ) and Norm. Inv (P i , mean, standard deviation). The results of the descriptive statistical analysis are summarized in T -statistic 4.99 In Table 4.1, the coefficient of variation (CV) lie in the range;f 0.17 ≤ CV ≤ 0.352, indicating that the AMS is moderate to highly variable and also positively skewed thus a non-normal distribution. The Kurtosis coefficient and excess coefficient (E) defined as E = C K -3, gave negative value, indicating a platykurtic and non-normal type frequency distribution. The auto-correlation coefficient (r 1 ) of the original data (r 1 = 0.320) fall outside the critical region given by Equation 9, thus indicating that the data is serially correlated at significance level of 5%.The SROC test statistic (t) is greater than the critical point of the t -distribution t 0.005,51 at the 5% significant level as shown in Table 4.1. Consequently, there is no statistically significant long-term trend in the AMS. However, since the auto-correlation coefficient (r 1 ) falls outside the critical region. Trend free Pre-Whitening approach was applied to the original data following the steps in sub-section 3.1.5. Thus, log-one auto-regressive AR(1) component was removed and the data became fit for flood frequency analysis.
The probability plot is a graph of quantiles of sample data plotted against quantiles of the standardardized theoretical distribution. The distribution which causes the data to be most like a straight line on its probability plot is the one which most closely resembles the distributional shape of the data. The PPCC test is an aided instrument to augment the visual determination of linearity on the plot ( Helsel and Hirsch, 2002). Table 4.2 show the results of the PPCC test, Probability plots and model evaluation indices. The best -fit distribution will be selected based on the following criteria; probability plots, probability plot correlation coefficient (PPCC), coefficient of determination(R 2 ), Percent bias (PBIAS), Nash-Sutcliffe efficiency(NSE) and RMSEobservations standard deviation (RSR). (Moriasi et. Al 2007, Raju andKumar, 2017).
The acceptance region for each of the evaluation statistics are (i) R 2 is a measure of the explanatory power of the model. R 2 ranges from 0 to 1.0 and a value near to 1.0 indicates good model performance.(ii) The PPCC test statistic (r) is the linear correlation between the empirical and theoretical quantiles. If the PPCC test statistic(r) > r crit,0.05 the normality is an acceptable model but if (r) < r crit,0.05 , then reject the null hypothesis.(iii) Nash -Sutcliffe and PBIAS were used for evaluating the model error, by quantifying the efficiency of the model. NSE ranges from -minus infinity to 1.0. If a model simulates observed conditions perfectly, NSE will be 1.0.The optimal value of PBIAS is 0.0, with low -magnitude values indicating accurate model simulation. Positive values indicate model underestimation bias, and negative values indicate model overestimation bias ( Gupta et al 1999).  Table 4.2 are plausible models of the underlying population at the 5% level of significance. The lognormal distribution is best rated in terms of PBIAS, seconded by normal distribution. The Weibull is rated best in terms of NSE and RSR indices, seconded by Normal distribution. Based on the five model evaluation criteria and the performance scoring of Table 4.3, the best -fit probability distribution model for Niandan river at Baro is Normal distribution, seconded by Weibull, Lognormal-third and Gumbel EV1, the fourth position. .

Discussions
These findings disagree with Ehiorobo and Akpejiori, (2016) and Ibrahim et al., (2016) who found Lognormal distributions most suitable for flooding frequency analysis at Agenebode on the Niger River and at Hadejia-Jama'are River Basin respectively in Nigeria. Also Ibrahim et al., (2009) found Pearson type 3 distribution the best-fit for Guruara River at Jere while Mamman et al., (2017) found Gumbel EVI the Best-fit probability distribution model for the prediction of inflows at Kanji dam reservoir. Similarly,Izinyon and Ajumuka (2013) evaluated flood prediction models for three flow gauging stations in upper Benue River Basin and found Gumbel EV1, log Pearson Type 3, and Lognormal for the stations at River Donga at Manya,River Donga at Donga and River Bantaji at Suntai. In conclusion, no single distribution can be specified as the best-fit for the whole of Nigeria, Finally, the results of Table 4.2 indicate that the Normal distribution is the best-fit distribution while Weibull is the secondbest distribution for Niandan River at Baro.

Conclusion
This paper investigates the selection of an appropriate probability distribution for at-site flood frequency analysis using AMS of Niandan River at Baro. A total of four good -of -fit tests were employed, i.e. probability plots, probability plot correlation coefficient (PPCC), Percent bias (PBIAS), Nash-Sutcliffe efficiency(NSE) and RMSE-observations standard deviation (RSR) to identify the best -fit distribution model. The Normal and Weibull distributions have been identified first and second -best distributions. Literature review of flood frequency analysis across Nigeria show that no single distribution is found superior to the others as the case in countries like Australia and United States of America. Consequently, research work for the search of best -fit distribution for Nigeria will continue in the future.  Vol.11, No.4, 2019