In this study, we used an ARIMA model and a log linear Poisson Autoregressive model to forecast and estimate daily confirmed coronavirus cases in Saudi Arabia. Studying of infectious illnesses, count-time series linked to incidence, such as the daily incidence of an infectious disease, are common. This count time series data can be modeled and forecasted using a variety of methods, including deterministic models like the SIR and SEIR models, as well as stochastic models like discrete and continuous time Markov chains and stochastic differential equations.
2.1 The log linear Poisson autoregressive model
To model the daily cases of COVID-19 in Saudi Arabia, which is a countable variable, a Poisson autoregressive is represented as a function of both short-term dependence and long-term dependence for count time series (see [4,5,6,7]). Following [8], the number of new confirmed cases \(y_{t}\), reported at time t (day), is assumed to follow a Poisson distribution, i.e.
$$y_{t} \sim {\text{Poissn}}\left( { \lambda_{t} } \right)$$
As pursued with a log-linear autoregressive density specification
$$\log \left( { \lambda_{t} } \right) = \alpha + \beta \log \left( {1 + y_{t - 1} } \right) + \gamma \log \left( { \lambda_{t - 1} } \right),$$
(1)
where \(\alpha \in R\) is the intercept, \(\beta \in R\) is the short-term dependence of the anticipated percentage of case related to time t, \(\lambda_{t}\) represent all past counts of the observed process. Note that, \(\lambda_{t - 1}\) the observed of the previous day (time t − 1) and \(\log \left( {1 + y_{t - 1} } \right)\) is included rather than \(\log \left( {y_{t - 1} } \right)\), to make it possible to deal with the issue produced by null values. The term \(\gamma \in R\) relates to a trend component and represents the long-term dependence of \(\lambda_{t}\). Negative dependence is possible using a log-linear autoregressive density description rather than a linear one.
2.2 ARIMA models
According to Box and Jenkins [9], an ARIMA \(\left( {p,d,q} \right) \times (P,D,Q)^{s}\) model can be written as:
$$\varphi \left( {\rm B} \right)\Phi \left( {{\rm B}^{s} } \right)\nabla_{d} \nabla_{s}^{D} {\rm X}_{t} = \theta \left( {\rm B} \right)\Theta \left( {{\rm B}^{s} } \right)e_{t} ,$$
where \(\left( {p,d,q} \right) \equiv\) nonseasonal part of the model, \(\left( {P,D,Q} \right) \equiv\) seasonal part of the design, and \(S\) is the season length see [7]. Additionally, \(p, d\) and \(q\) stand for the autoregressive order, the non-seasonal differencing degree and the moving average order, respectively, and \(P, D\) and \(Q\) are the abbreviations for the seasonal autoregressive order, the seasonal differencing degree, and the seasonal moving average order.
2.3 Evaluation criteria
Very widespread accuracy measurement functions are used to assess the performance of each model. These performance functions are:
\({\text{MAE}} = \frac{1}{N}\mathop \sum \limits_{i = 1}^{N} \left| {y_{i} - \tilde{y}} \right|\),
where \(y_{i}\) and \(\tilde{y}\) are actual and anticipated ratings, respectively.
\({\text{RMSE}} = \sqrt {\frac{1}{N}\mathop \sum \limits_{i = 1}^{N} \left( {y_{i} - \tilde{y}} \right)^{2} }\).
2.4 The data
The Saudi Ministry of Health provided the data for this study (https://covid19.moh.gov.sa). It represents COVID-19 confirmed Saudi Arabian incidents from March 3, 2020, to June 10, 2021, and it was used in the examination of the study.