applied

sciences

Article

Prediction Interval Adjustment for Load-Forecasting

using Machine Learning

Miguel A. Zuniga-Garcia

*, G. Santamaría-Bonﬁl

2,3,

* , G. Arroyo-Figueroa

* and

Rafael Batres

Tecnologico de Monterrey, School of Engineering and Sciences, Av. Eugenio Garza Sada Sur No. 2501,

Col. Tecnologico, Monterrey 64849, Mexico

Instituto Nacional de Electricidad y Energías Limpias (INEEL), Av. Reforma 113, Col. Palmira,

Cuernavaca CP 62490, Morelos, Mexico

CONACYT-INEEL, Instituto Nacional de Electricidad y Energías Limpias (INEEL), Av. Reforma 113,

Col. Palmira, Cuernavaca CP 62490, Morelos, Mexico

* Correspondence: miguel.zugar@gmail.com (M.A.Z.-G.); guillermo.santamaria@ineel.mx (G.S.-B.);

garroyo@ineel.mx (G.A.-F.); rafael.batres@tec.mx (R.B.)

Received: 9 November 2019; Accepted: 30 November 2019 ; Published: 4 December 2019



 

Featured Application: Prediction interval adjustment designed to be used in the Real-Time

Electricity Market in Mexico.

Abstract:

Electricity load-forecasting is an essential tool for effective power grid operation and

energy markets. However, the lack of accuracy on the estimation of the electricity demand may

cause an excessive or insufﬁcient supply which can produce instabilities in the power grid or cause

load cuts. Hence, probabilistic load-forecasting methods have become more relevant since these

allow an understanding of not only load-point forecasts but also the uncertainty associated with

it. In this paper, we develop a probabilistic load-forecasting method based on Association Rules

and Artiﬁcial Neural Networks for Short-Term Load Forecasting (2 h ahead). First, neural networks

are used to estimate point-load forecasts and the variance between these and observations. Then,

using the latter, a simple prediction interval is calculated. Next, association rules are employed to

adjust the prediction intervals by exploiting the conﬁdence and support of the association rules.

The main idea is to increase certainty regarding predictions, thus reducing prediction interval width

in accordance to the rules found. Results show that the presented methodology provides a closer

prediction interval without sacriﬁcing accuracy. Prediction interval quality and effectiveness is

measured using Prediction Interval Coverage Probability (PICP) and the Dawid–Sebastiani Score

(DSS). PICP and DSS per horizon shows that the Adjusted and Normal prediction intervals are similar.

Also, probabilistic and point-forecast Means Absolute Error (MAE) and Root Mean Squared Error

(RMSE) metrics are used. Probabilistic MAE indicates that Adjusted prediction intervals fail by less

than 2.5 MW along the horizons, which is not signiﬁcant if we compare it to the 1.3 MW of the Normal

prediction interval failure. Also, probabilistic RMSE shows that the probabilistic error tends to be

larger than MAE along the horizons, but the maximum difference between Adjusted and Normal

probabilistic RMSE is less than 6 MW, which is also not signiﬁcant.

Keywords:

prediction intervals; probabilistic electricity demand forecasting; association rules;

artiﬁcial neural networks; machine learning

1. Introduction

Load-forecasting is an important tool for decision-makers that helps them in the creation of

policies for planning and operation of the power system [

]. Most of these decisions must be taken

Appl. Sci. 2019, 9, 5269; doi:10.3390/app9245269 www.mdpi.com/journal/applsci

Appl. Sci. 2019, 9, 5269 2 of 20

based on electric demand forecasts, and the lack of accuracy in the estimations will lead to an inefﬁcient

decision-making process [2].

Speciﬁcally, the lack of accuracy may cause overestimation or underestimation of electricity

demand [

]. The former causes an excessive amount of electricity to be purchased and supplied to the

system, which causes power balance disturbances and instabilities in the power grid. The latter, on the

other hand, leads to a risky operation of the power system by restraining the production of electricity,

which may lead to load cuts directly affecting electricity users.

In particular, short-term demand forecasting is highly difﬁcult due to its requirement of quick

response, the amount of information required, as well as its complexity. These models must take

into consideration not only the electric consumption pattern of the region but also its regulatory

requirements. For instance, the Mexican Electric Market (MEM) establishes that every short-term load

forecast method must be capable of estimating 8 periods of 15 min ahead (2 h) [4].

Intending to improve the accuracy of load demand forecast, researchers have developed forecast

methods for short-term, mid-term and long-term [

]. Mid-term and long-term load demand forecasting

is closely related to planning activities (e.g., power system maintenance tasks and capacity expansion

planning), whereas short-term forecasts are employed for the ongoing operation (e.g., everyday

unit commitment).

The main characteristic of electricity demand is that it is mostly (if not completely) inﬂuenced

by human behavior patterns [

]. In this regard, human behavior follows a certain tendency with

cycling patterns. This is, we humans do mostly the same things (e.g., the time to wake, the time to

go to sleep, job schedule, etc.) on a day-to-day basis around the same time (e.g., wake up early in

the morning). For instance, most productive human adults work on a working week basis. Hence,

a weekday electricity consumption pattern is not only different from weekend patterns, but also

different from holidays patterns (which may fall within the working week). This means that every

period of time has different needs in terms of electricity demand forecasting and those needs are also

different for each type of day.

Regarding a more technical aspect, electricity load-forecasting can be performed by using only

historical measurements or forecast predictors (i.e., future period loads). For instance, load forecasting

can be done 1 h ahead in 15 min interval using only load data from the past hour, or to use the predicted

15 min load within the 1 h ahead horizon. In particular, it has been stated that the incorporation of

recursive forecast predictors leads to better performance in time series prediction [

]. Furthermore, if for

each forecast horizon (i.e., 15 min ahead) a different model is trained, performance deterioration related

to more distant forecasting horizons can be avoided (for instance, by using individual forecasting

models for every 15 min) [7].

Therefore, an effective load-forecasting model must consider such patterns. In this paper,

we investigate a modeling approach based on association rules (association rules are useful to

describe a model in terms of cause and effect). The proposed approach aims at predicting electricity

demand for two hours ahead in 8 periods of 15 min. The dataset is from a representative load zone

of Mexico, which is 15-min load demand measures. The prediction intervals are estimated using

Artiﬁcial Neural Network models. Then, the prediction intervals are adjusted through association-rule

mining algorithms.

2. Literature Review

Unlike other Data Mining (DM) algorithms such as artiﬁcial neural network (ANN) or Random

Forest (RF), association rules are not so popular regarding time series prediction. One reason is

that these types of algorithms are usually associated with the design of expert systems which have

fallen into disuse. For instance, in a recent review of DM algorithms applied to electricity time series

forecasting [

], from the more than 100 works reviewed only 6 corresponded to algorithms using

rule-based prediction. Nevertheless, regarding time series prediction, in [

] they proposed to use an

ensemble of forecasting algorithm which is combined using fuzzy rule-based forecasting. The purpose

Appl. Sci. 2019, 9, 5269 3 of 20

of the latter is to determine the best weights for each forecasting method, such that the dependence

between forecasting methods and time series statistical properties is aligned. The fuzzy rules are

selected using linguistic association mining given the statistical properties of the time series. Using

classical time series point forecasting methods, the proposed ensemble algorithm is tested against

individual and the equal-weights ensemble employing the M3 competition time series. They found

that the proposed ensemble performs slightly better than the tested algorithms.

In a more recent work [

], a modiﬁed a priori-based association-rule mining algorithm based on

the Continuous Target Sequential Pattern Discovery (CTSPD) is proposed, it is then used to generate a

set of association rules that help in predicting the concentration of air pollutants in New Delhi. In this

work, time-dependent features from air pollutants time series are identiﬁed ﬁrst to conform new

variables (i.e., frequent sequences), which are then used (in the form of association rules) to predict the

concentration of air pollutants. Their results showed that the proposed approach performed better

than the India System of Air Quality and Weather Forecasting and Research (SAFAR). Similarly, in [

]

authors propose an improved a priori algorithm for temporal data mining of frequent item sets in

time series. This improved algorithm is focused on reducing the computational burden of identifying

all frequent item sets, by constraining temporal relations. In this sense, this algorithm determines

time constraints intervals, which are then used to ﬁlter (using the time interval algebra) and mine the

corresponding transactions from the database. The method is compared against the classical a priori

algorithm, obtaining a better performance regarding the storage and time required to mine rules.

On the other hand, an approach to the analysis of the electricity demand required by home

appliances is proposed in [

]. In such work, several unsupervised learning algorithms (among them

association rules) are employed in the identiﬁcation of appliances energy consumption patterns. Using

sequential rules, authors found that there exists a heavy interdependence between the usage patterns

of home appliances, the time of the use, and the user activities. In the same fashion, a more recent work

related to the analysis of smart metering analysis using a Big Data infrastructure is presented in [

By employing unsupervised data clustering and frequent pattern mining analysis on energy time

series, authors derived accurate relationships between interval-based events and appliance usages.

Then, these patterns are exploited by a Bayesian Network to predict the short and long-term energy

usage of home appliances. This method is then compared with Support Vector Machines (SVM) and

Multi-layer Perceptron (MLP), outperforming both in all tested forecasting horizons.

In general, there are many works focused on the effective estimation of prediction intervals using

neural networks [

–

]. Although the approach developed on this works are optimal, none of them

considers the modeling of an adjustment of the prediction interval using a rule-based analysis. Also,

some research apply rule-based analysis to create point forecasts; however, the creation or adjustment

of the prediction interval is not considered [18,19].

3. Materials and Methods

In this section, all the concepts of the developed methodology are described. First, there is a data

preprocessing stage in which the raw data is transformed to be used by machine learning algorithms.

Speciﬁcally, in this step time series data is transformed into a tabular form in which every element

of the table is a segment of the original time series. Also, every element of the table is paired with

its correspondent time period. Then, point forecast and prediction interval estimation are performed

using artiﬁcial neural network (ANN) models. Speciﬁcally, the ANN models perform point forecasts in

a test database and every error is stored. The stored errors are used to estimate the prediction intervals.

At the same time, association-rule mining is performed to extract signiﬁcant rules by means of the

a priori algorithm. Then, the prediction intervals estimated with the Artiﬁcial Neural Networks models

are adjusted by means of the obtained rules. Finally, the prediction intervals and its adjusted versions

are evaluated. In Figure 1, the overarching methodology is shown.

Appl. Sci. 2019, 9, 5269 4 of 20

Figure 1.

The overarching methodology. PIPC: Prediction Interval Coverage Probability. DDS:

Dawid-Sebastiani Score.

3.1. Data Preprocessing

The data are from a representative location of Mexico. The exact location of the data cannot be

revealed due to conﬁdentiality reasons. The data is composed of 15-min immediate measurements of

load demand. This means that we have 96 periods of 15 min within a day. Also, that means that by the

rules of the Mexican wholesale market, any prediction model bust be capable of predict 8 values in the

future in 15 min interval (2 h ahead). In Figure 2, the graph of the complete data is shown.

Figure 2. Load time series measured every 15 min.

To understand how the values distributed in the dataset, we use a histogram. In Figure 3, a graph

of the histogram of the complete data is shown.

Appl. Sci. 2019, 9, 5269 5 of 20

Figure 3. The distribution of values in the whole dataset represented by means of an histogram.

The complete dataset consists of 81,128 measurements from 1 October 2015, to 24 January 2018.

Table 1 gives a summary of the statistical properties of the data.

Table 1. Statistical properties of the dataset.

Name Value

Minimum 779.2 MW

Median 1418.6 MW

Mean 1501.8 MW

Maximum 2641.0 MW

3.1.1. Load Data Embedding

For preprocessing, the time series is ordered in a form of delay embedding. The selected number

of periods for the delay embedding is represented by

. For this paper,

is selected by the thumb rule

described in [

]. The thumb rule states that for autoregressive models, at least 50 but preferably more

than 100 observations should be taken. Based on this thumb rule, we decided to select 10 times the

horizons

needed for this problem (8 horizons). Therefore, every example is conformed by vectors

of 80 values, in which the last value is considered to be the dependent variable described by the rest

of autoregressive values. Thus, for this paper

s =

79. The ﬁnal objective is to transform the original

time series dataset into a table form. To achieve this transformation, the set of delayed time series is

constructed as follows:

Let L be set of load measurements:

L = {V

, V

, . . . , V

} (1)

where n is the number of observations.

Let D be the set of delayed time series:

D = {d

, d

, . . . , d

} (2)

where m = n − s and d

is deﬁned as follows:

= {V

∈ L | j ∈ {t − s, . . . , t − 2, t − 1, t}} (3)

Appl. Sci. 2019, 9, 5269 6 of 20

where t = i + s.

In summary, the constructed dataset

contains

delayed time series

(i ∈ {

1, 2,

. . .

m})

and every delayed time series

contains values of the

dataset in the form of

t−s

. . .

t−2

t−1

}

where

t = i + s

. It is important to note that every

is distinct because of

t = i + s

, which means that

even if every

is constructed by

t−s

. . .

t−2

t−1

}

values,

is different for every

. In Figure 4,

an example of a d delayed time series is shown.

Figure 4. Graphical representation of a delayed time series.

For every

represents the dependent variable and

t−s

. . .

t−2

t−1

}

the independent

variables. Also, every

is paired to its correspondent period

p ∈ {

1, 2,

. . .

, 96

}

. This pairing allows

applying machine learning algorithms in subsets deﬁned by each period. Speciﬁcally, in the case of

Association Rules, it is necessary to apply a discretization method, albeit the format is essentially

the same. In Section 3.3.1, this discretization process is explained. In the next section, the process of

prediction interval estimation through artiﬁcial neural networks is explained.

3.2. Prediction Interval Estimation

A prediction interval (PI) is the estimation of a range in which a load value will fall with a certain

probability [

]. PI estimation is an important part of a forecasting process and it is intended to indicate

the expected uncertainty of a point forecast. Also, PIs allows us to offer a set of values in which a

future value will fall given a probability, thus, creating a probabilistic forecast result. The following is

a general form of a 100(1 − α)% conﬁdence prediction interval expression:

(p, h ) z

Var[e(p, h)] (4)

where

is the point forecast of the period

in the horizon

is the

score

of an empirical

distribution given the probability 100

(

− α)

% and

e(p

is the empirical distribution of errors of the

forecast method in the period

and horizon

. In Equation (4), the

score

is the parameter that allows

us to modify the prediction interval coverage [

]. In Figure 5, an example of how the

score

modiﬁes

the prediction interval coverage.

Appl. Sci. 2019, 9, 5269 7 of 20

Figure 5. Prediction interval modiﬁcation by means of the z-score.

It is worth mentioning that

score

value depends on the

value. Speciﬁcally,

z-score = Z(100(1 − α)) where Z is a function that estimates the z-score value.

Prediction interval estimation using Equation (4) requires estimation of a set of prediction errors

of a forecast model. In this paper, we use Artiﬁcial Neural Networks to generate a prediction model

for each period of p.

3.2.1. Artiﬁcial Neural Network Training and Validation

Artiﬁcial Neural Networks (ANN) are models inspired by the central nervous system, which are

made of interconnected neurons [

]. One of the most common ANN paradigms for both classiﬁcation

and regression is the Multi-Layer Perceptron (MLP). An MLP artiﬁcial neural network is composed of

multiple layers of neurons: an input layer, one or more hidden layers, and an output layer. The input

layer is responsible for receiving a given input vector and transform it into an output that becomes

the input for another layer. A hidden layer transforms the output from the previous layer through a

transfer function. Each neuron receives the input from all the neurons in the preceding layer, multiplies

each input by its corresponding weight vector and then adds a bias. In this paper, a 3-hidden layer

with 11 neurons per layer ANN was implemented.

We selected 3 hidden layers employing a rule described in [

] and then updated in [

]. The rule

indicates that for complex problems, such as time series prediction and computer vision, 3 or more

layers are adequate. Also, [

] states three rules to select the number of hidden neurons, for our

problem we selected the rule that establishes the number of hidden neurons as 2/3 of the number of the

input neurons. Thus, the number of hidden neurons would be

(

) ×

≈

52 neurons (17 neurons

per layer.). However, in [

] they warn that too many neurons per layer may lead to overﬁtting, so we

still tried to reduce the number of hidden neurons and tested architectures from 17 to 10 neurons per

layer, 11 neurons per layer was the one architecture to have similar error rate as 17 neurons per layer

in terms of MAE (Means Absolute Error).

The method for training the ANN models in this paper is the Resilient Backpropagation method

described in [

]. Regarding the activation function, despite the existence of several types of activation

functions such as linear, tanh, gaussian, etc. the sigmoidal function is conventionally used in time

series forecasting, hence, the latter was employed [

]. In Figure 6, a graphical representation of this

conﬁguration is shown.

To train the ANN models, create the prediction intervals, and test the prediction intervals, the total

dataset

was divided into three groups:

Train

TrainPI

and

Test

Train

is composed of the ﬁrst

70% of data,

TrainPI

of the following 20%

Test

of the last 10%.

Train

is divided in 96 subsets in a

80% train-20% test format. The elements contained in every

Train

subdivision are sorted randomly.

The subdivided

Train

is used to train an ANN model

per subdivision. As a result, 96 ANN

models were obtained and stored in a dataset

A (a

∈ A|p ∈ {

1, 2,

. . .

, 96

})

. In Figure 7, a graphical

representation of the Artiﬁcial Neural Networks models training is shown.

Appl. Sci. 2019, 9, 5269 8 of 20

Figure 6.

A graphical representation of the structure of the Artiﬁcial Neural Network used for this work.

Figure 7. Graphical representation of the Artiﬁcial Neural Networks models training.

In Figure 8, a graphical representation of the training error per ANN model is shown. In the x

axis of the graph, every model is represented with its correspondent time period.

Once the ANN models were obtained, we proceed to extract the prediction errors using

TrainPI

The following section explains this process.

Appl. Sci. 2019, 9, 5269 9 of 20

Figure 8. Graphical representation of the Artiﬁcial Neural Networks models training.

3.2.2. Prediction Error Extraction

For prediction error extraction, the obtained ANN models are used to predict the

values

contained in

TrainPI

. The ANN models takes the values of

= {V

t−s

. . .

t−2

t−1

}

inputs and produces

as output. Every measured error is stored in a separate database (

The obtained ANN models are purposely trained to predict only 1 horizon ahead (this to get 96

specialists ANN models in its correspondent period). Therefore, to predict the 8 horizons

needed

for our problem, we use the predictions of the previous models of the last horizon as inputs, thus,

contains a set of

e(p

prediction errors for every period

for each horizon

. For every

e(p

the

Var[e

(

p, h )]

value is estimated to obtain the ﬁnal prediction interval per horizon and per period.

In Figure 9 a series of graphs of the obtained prediction intervals are shown.

Figure 9. Graphs of the prediction intervals obtained.

We call these prediction intervals the normal prediction intervals. Normal prediction intervals are

used in conjunction with the support values of the association rules method to construct the Adjusted

prediction intervals. In the next section, the extraction of association rules from the dataset is explained.

3.3. Association Rules Extraction

Association rules is a data mining methodology to extract relationships and dependencies between

variables in datasets [

]. Its objective is to identify if-then patterns which are discovered in databases

using some measures of interest [

]. For this paper, the a priori algorithm (see Appendixes A and B) is

used to extract the rulesets needed.

Appl. Sci. 2019, 9, 5269 10 of 20

3.3.1. Data Discretization

Numeric data is difﬁcult to use in the a priori algorithm, so a discretization method is needed.

In this paper, the discretization of data is made through the method described in [

], speciﬁcally,

the type 7 quantile method. The discretization method is carried out as follows:

Let Q

(P) be the type 7 quantile of the probability P ∈ {0.1, 0.2, . . . , 1}:

(P) = (1 − γ) · x

(j)

+ γ · x

(j+1)

(5)

where

(j)

is the

th order statistic of

is the sample size,

j =

n · P + m

and

γ = n · P + m − j

and

m =

− P

. In this paper,

X = {d

∪ V

∈ d

;

i ∈ {

2, 3,

. . .

m}}

. Using Equation (5) in the set

we can obtain the bins for data discretization. In Table 2, the obtained bins and quantiles are shown.

Table 2. Quantiles obtained using Equation (5).

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

779 1079 1178 1273 1353 1418 1508 1631 1810 2088 2641

Bin 1 Bin 2 Bin 3 Bin 4 Bin 5 Bin 6 Bin 7 Bin 8 Bin 9 Bin 10

The last row of Table 2 indicates the bins correspondent to the quantiles above them. Speciﬁcally,

above every Bin, there is its lower and upper limit. Every value of set

is mapped according to its

correspondent bin, so if a value falls inside the range of a bin, that value is substituted with the bin

number and stored in a dataset

∗

. Therefore, using the dataset

∗

we can construct the transactional

dataset D

Trans

as follows:

Let L

∗

be set of quantile mapped load measurements:

∗

= {V

∗

, V

∗

, . . . , V

∗

} (6)

where n

∗

is the number of observations in L

∗

Let D

Trans

be the set of delayed of the quantile mapped time series :

Trans

= {d

∗

, d

∗

, . . . , d

∗

} (7)

where m

∗

= n

∗

− s and d

∗

is deﬁned as follows:

∗

= {V

∗

∈ L

∗

| j ∈ {t − s, . . . , t − 2, t − 1, t}} (8)

where t = i + s.

For every

∗

= V

∗

represents the right part of the rule, and

∗

= {V

∗

t−s

. . .

∗

t−2

∗

t−1

}

represents the left part of the rule . Also, every

∗

is paired to its correspondent period

p ∈ {

1, 2,

. . .

, 96

}

The rule extraction process is carried out in

Trans

using the a priori algorithm. In this paper,

the parameters for the a priori algorithm are set so as obtaining rules with minimum

support =

0.1 and

con f idence =

0.9 (In some periods where rules were not found,

support <

0.1 was used). The period

element of the transactions dataset is used to segment the data into 96 subsets, one for every period.

Then, the a priori algorithm is applied on each subset to obtain a ruleset

for every subset. As a result,

96 rulesets were obtained and stored in a dataset R (rs

∈ R |p ∈ {1, 2, . . . , 96}). In Figure 10, a graph

of the distribution of the support value per period is shown. Every period distribution is represented

as a box plot in which inside the box there is the 95% of the support values in period p.

The process of prediction interval adjustment using the rulesets contained in

is explained in the

next section.

Appl. Sci. 2019, 9, 5269 11 of 20

Figure 10. The support value distribution per period of the dataset to extract the rules.

3.4. Prediction Intervals Adjusted by Means of Association Rules Support Metric

The prediction interval is adjusted by subtracting the value of a correspondent rule support to

the 100

(

− α)

% value when estimating the prediction interval. This adjustment occurs only when

a speciﬁc rule in

matches with the inputs of a

ANN model. In this case, Equation (4) can be

re-written as follows:

(p, h ) z

Var[e(p, h)] (9)

where δ = α + β

The parameter

is a bias to adjust the value of the

score

. The modiﬁed

score

will decrease or

not the prediction interval. The parameter β can take values according to the following expression:

β =

(

support(l

∗

⇒ r

∗

), if W

∗

= l

∗

0, Otherwise

(10)

where

∗

is the quantile mapped version of the ANN model inputs

. To modify the value of the

z-score, we re-write the conﬁdence interval probability expression as follows:

100 · (1 − (α + β)) (11)

From the modiﬁed conﬁdence interval probability shown in Equation (11), we can obtain the

modiﬁed prediction interval conﬁdence. With this value, we can look forward to its corresponding

modiﬁed

score

from any

score

table [

]. This, in fact, gives us the corresponding

score

value,

such that we can modify the coverage of the prediction interval.

4. Experiments and Results

To measure the efﬁciency of the prediction intervals (Normal and Adjusted), we propose the use

of the Prediction Interval Coverage Probability (PICP) [

] and the Dawid–Sebastiani Score (DSS) [

We also measure probabilistic and point-forecast MAE and RMSE per horizon

and the PICP per

period

for a better understanding of the prediction interval efﬁciency. To evaluate the quality of the

Adjusted prediction interval, three experiments were conducted: All days, Weekdays and Weekends

in dataset

Test

. For every experiment, normal prediction intervals (Normal) and Adjusted prediction

interval (Adjusted) are evaluated. The process to implement these evaluations consists of three steps:

Calculate Root Mean Squared Error (RMSE) and Mean Absolute Error (MAE) for the point

forecasts for each horizon.

Appl. Sci. 2019, 9, 5269 12 of 20

Calculate the Dawid–Sebastiani score (DSS) and Prediction Interval Coverage Probability (PIPC)

along with Probabilistic RMSE and MAE per horizon.

3. Estimate PIPC and probabilistic RMSE and MAE of the Adjusted prediction interval per period.

In the following section, the implementation of the mentioned metrics is described.

4.1. Prediction Intervals Evaluation Metrics

In this section, prediction intervals evaluation metrics are described. These metrics are helpful to

evaluate and understand both Normal and Adjusted prediction intervals.

4.1.1. PICP (Prediction Interval Coverage Probability)

The PICP is the rate of real values that lies within the prediction interval. The PICP is estimated

using the following equation:

PICP =

∑

g=1

(12)

where w is the number of observations and θ

is deﬁned by the following equation:

(

1, if V

(p, h) < U(p, h) and V

(p, h) > L(p, h)

0, Otherwise

(13)

where U(p, h) =

(p, h ) + z

Var[e(p, h)] and L(p, h) =

(p, h ) − z

Var[e(p, h)].

4.1.2. Probabilistic and Point-Forecast RMSE (Root Mean Squared Error) and MAE (Mean Absolute error)

Probabilistic and point-forecast error is measured as indicated in Figure 11. Point forecast

corresponds to the predicted load value, Outside Prediction Interval stands for those load measures that

fall above or below the prediction interval range, and Inside Prediction Interval corresponds to the actual

load measure that falls inside of the prediction interval range.

Using all the errors, we estimate probabilistic and point-forecast MAE and RMSE using the

respective set of errors. Point MAE and RMSE estimation help us to estimate the precision of the

forecast method, also helps us to understand the prediction intervals in general. Probabilistic MAE

and RMSE help us to understand better the PICP metric result.

Figure 11. Probabilistic and point-forecast errors.

Appl. Sci. 2019, 9, 5269 13 of 20

4.1.3. DSS (Dawid–Sebastiani Score)

The Dawid–Sebastiani Score (DSS) helps us to understand the quality of the prediction interval.

The DSS is estimated as indicated in the following equation:

DSS =

− E[e (p, h)]

e(p,h)

+ 2 · log(σ

e(p,h)

) (14)

where

and

e(p,h)

are the

th error and the standard deviation from the error distribution

e(p

respectively. Equation (14) is modiﬁed to estimate the DDS of the Adjusted prediction intervals based

on the support of the rules. The following equation describes the modiﬁed version of the DSS:

DSS =

− E[e(p, h)]

e(p,h)

· (1 − E[supp(p, h)])

+ 2 · log(σ

e(p,h)

· (1 − E[supp(p, h)])) (15)

where

supp(p

is the set of support values used to adjust prediction intervals in period

and horizon

. It is worth mentioning that if no interval were modiﬁed, then

supp(p

will be ﬁlled with 0

and

Equation (15) will become (14).

4.2. Results and Discussion

To compare the proposed approach, the Autoregressive Integrated Moving Average (ARIMA)

model and a persistence model are also evaluated. The ARIMA model is a classical time series

forecasting method. This method depends on three parameters:

which stands for the number of

autoregressive variables;

which refers to the number of moving average variables; and

which

indicates the number of times the data needs to be differentiate such that the time series is stationary.

For experimental purposes, we estimated the ARIMA model using the process one described in [

The persistence model is often used to know if a forecast model provides better results than any trivial

reference model [

]. First, we present the point-forecast MAE and RMSE. Whereas point-forecast

MAE gives us a general idea of the precision of the forecast model, point-forecast RMSE penalizes

large errors, so if the ANN models tend to return large error values the RMSE will be greatly separated

from point-forecast MAE. In Figure 12, the point-forecast MAE and RMSE for the Persistence, ARIMA

and the proposed model are shown.

As we can observe in Figure 12, the persistence and the ARIMA models work better, for point

forecasts, in the ﬁrst 5 horizons in comparison to the proposed model. However, the proposed model

point-forecast RMSE follows the same tendency as point-forecast MAE, so we can say that the errors

are consistent along the horizons for the three experiments, unlike the ARIMA and the persistence

model for which the errors are larger along the horizons. Although we can observe that errors are

consistent along the horizons for the proposed model, we can also observe that point-forecast MAE and

RMSE tends to be larger in the Weekend experiment for the three models. This behavior is expected as

we suppose that human activities in the Weekend are less constant than Weekdays. Then we present

the DSS and the PICP along with the probabilistic MAE and RMSE. These results are presented per

horizon and for all the three experiments and the three models evaluated. In Figure 13, the result of

the DDS and the PICP per horizon is shown.

Appl. Sci. 2019, 9, 5269 14 of 20

Figure 12.

Point-forecast MAE and RMSE for (from

top

bottom

) the persistence, ARIMA and the

proposed model.

Figure 13.

DSS and PIPC corresponding to (from

top

bottom

) the persistence model, ARIMA,

and the proposed model.

As we can observe in Figure 13, the DSS for the ARIMA and the persistence model is larger than

the measurement of the proposed model for both Adjusted and Normal prediction intervals (larger

Appl. Sci. 2019, 9, 5269 15 of 20

values of DSS indicates lower quality of prediction intervals). Also, we can observe that the PIPC for

the ARIMA and the persistence model is lower than the measurement of the proposed model for both

Adjusted and Normal prediction intervals. For ARIMA and the proposed model, DSS and PICP per

horizon of Adjusted and Normal prediction intervals are really close along the horizons for the three

experiments, which indicates that for those models, the Adjusted and Normal prediction intervals are

quite similar. Probabilistic MAE and RMSE provide another perspective of this result. In Figure 14,

probabilistic MAE and RMSE for the persistence, ARIMA, and the proposed model is shown.

Figure 14.

Probabilistic MAE (Means Absolute Error) and RMSE (Root Mean Squared Error) for

(from

top

bottom

) the persistence, ARIMA (Autoregressive Integrated Moving Average) and the

proposed model.

As we can observe in Figure 14, for ARIMA and the persistence model the probabilistic MAE

and RMSE are larger than the proposed model, and also that is increasing along the horizons.

Although probabilistic MAE and RMSE for ARIMA and the persistence model are quite similar

between Adjusted and Normal prediction intervals, probabilistic RMSE value is much separated from

probabilistic MAE, which indicates that errors are very large sometimes.

For the proposed model, probabilistic MAE indicates that Adjusted prediction intervals fail by less

than 2.5 MW along the horizons, which is not signiﬁcant if we compare it to the 1.3 MW the Normal

prediction interval failure. Also, probabilistic RMSE shows that the probabilistic error tends to be larger

than MAE along the horizons, but the maximum difference between Adjusted and Normal probabilistic

RMSE is less than 6 MW, which is also not signiﬁcant. This signiﬁcance is measured by the ancillary

services requirements [

]. Ancillary services requirements are published daily in the Independent

System Operator ofﬁcial site. For the region this method is applied, the requirements of the ancillary

services for the ﬁrst horizon is a constant value of 25 MW. This means that errors below 25 MW do

not affect the power systems signiﬁcantly. Also, it is worth mentioning that probabilistic MAE and

RMSE are smaller in the Weekend experiment. This behavior may happen point-forecast RMSE and

MAE are larger in the Weekend experiment, so we can expect prediction intervals to be larger on

Appl. Sci. 2019, 9, 5269 16 of 20

Weekends. In general, we can observe that the error metrics of the Adjusted prediction intervals are

similar to the Normal prediction intervals. To understand better this similarity, we make use of PICP

per period. In Figure 15 the PICP per period for all the three experiments and the three models is shown.

Figure 15.

PICP comparison of Normal vs Adjusted prediction intervals, and probabilistic MAE/RMSE

for the Adjusted prediction intervals per period.

As we can observe in Figure 15, for the ARIMA and the persistence models, Normal and Adjusted

PIPC are similar. However, we can also observe that RMSE and MAE are larger than the measured for

the proposed approach. Also, it is interesting to observe that ARIMA and persistence models have

larger errors in the periods of 08:00–12:00 and 15:45–19:00, this may be caused by the load change

during the day with respect to the sun position. Also, it is interesting to observe that these errors are

lower in the persistence model for the periods 15:45–19:00. For the proposed approach, we can observe

that probabilistic MAE and RMSE are more stables along the periods than the measured of the ARIMA

and the persistence model. Also, we can observe that although Adjusted PICP drops until less than

75%, probabilistic MAE indicates that the error is always less than 5 MW, which is not signiﬁcant.

Also, probabilistic RMSE shows that in most of the periods the error is less than 15 MW, which is also

not signiﬁcant. A prediction interval creation method is presented. The proposed approach for the

creation of the prediction interval allows modifying the prediction interval by means of an association

rules method. Using the proposed approach, the prediction interval can be reduced as much as the

corresponding support value. We construct prediction intervals using Artiﬁcial Neural Network

models and we adjust them by means of rules obtained with the a priori algorithm. Prediction interval

quality and effectiveness are measured by means of Prediction Interval Coverage Probability (PICP)

and the Dawid–Sebastiani Score (DSS). PICP and DSS per horizon show that the Adjusted and Normal

prediction intervals are pretty similar. The proposed approach was compared to the ARIMA model

and a persistence model. The proposed model demonstrates to have better performance in all the

prediction interval evaluation metrics. Also, probabilistic and point-forecast MAE and RMSE metrics

are used. Probabilistic MAE indicates that Adjusted prediction intervals fail by less than 2.5 MW along

Appl. Sci. 2019, 9, 5269 17 of 20

the horizons, which is not signiﬁcant if we compare it to the 1.3 MW the Normal prediction interval

failure. Also, probabilistic RMSE shows that the probabilistic error tends to be larger than MAE along

the horizons, but the maximum difference between Adjusted and Normal probabilistic RMSE is less

than 6 MW, which is also not signiﬁcant. This work was focused on the prediction interval adjustment,

so as future work we will use an optimization method to select the optimal structure of the ANN

models per period with the objective of increasing accuracy of the ANN models prediction. For the

association rules method, the discretization method will be modiﬁed to obtain more quantiles so the

rules can be more speciﬁc. Also, we will relax the parameters of Support and Conﬁdence to enlarge the

diversity of rules, and at the same time, we will include the Conﬁdence metric to the prediction interval

adjustment. Finally, this method will be tested in another dataset such as ERCOT or the GefCom(2012,

2014, 2017).

Author Contributions:

The following are the speciﬁc contributions per author: (M.A.Z.-G.) Data curation,

Formal analysis, Investigation, Methodology, Software, Writing—original draft. (G.S.-B.) Formal analysis,

Investigation, Methodology, Visualization, Writing—original draft. (G.A.-F.) Methodology, Supervision, Validation,

Writing—review & editing. (R.B.) Methodology, Supervision, Validation, Writing—review & editing, Project

administration, Funding acquisition.

Funding:

This research was funded by the CONACYT SENER Fund for Energy Sustainability grant number

S0019201401.

Acknowledgments:

This research is a result of the Project 266632 “Laboratorio Binacional para la Gestión

Inteligente de la Sustentabilidad Energética y la Formación Tecnológica” [“Bi-National Laboratory on Smart

Sustainable Energy Management and Technology Training”], funded by the CONACYT SENER Fund for Energy

Sustainability (Agreement: S0019201401)

Conﬂicts of Interest: The authors declare no conﬂict of interest.

Appendix A. Association Rules

Association rules is a data mining methodology to extract relationships and dependencies between

variables in datasets. The Association-rule formal model is described as follows:

Let I be a set of n binary attributes called items.

I = {i

, i

, . . . , i

} (A1)

Let T be a set of transactions called the database.

T = {t

, t

, . . . , t

} (A2)

Let X be a set of items in I called the left-hand-side or antecedent.

X = {i

, i

, . . . , i

}, where j < n (A3)

Let Y be an item in I called the right-hand-side or consequent.

Y = i

, where k 6= j (A4)

Then, an association rule is an implication of the form:

X ⇒ Y (A5)

Appendix A.1. Measures of Interest

There are two basic measures of interest: support, conﬁdence.

Appl. Sci. 2019, 9, 5269 18 of 20

Support is an indication of how frequently the rule appears in the database. Support is estimated

by the following expression.

support(X ⇒ Y) =

|X ⇒ Y|

|D|

(A6)

Conﬁdence is an indication of how frequently the

rule

has been found to be true. Conﬁdence is

estimated by the following expression.

con f idence (X ⇒ Y) =

support(X ⇒ Y)

support(X)

(A7)

There is a third measure of interest called lift. This measure indicates the ratio of independence

between

and

. In other words, it indicates if the rule is not a coincidence. Lift is estimated by the

following expression.

li f t(X ⇒ Y) =

con f idence (X ⇒ Y)

support(Y)

(A8)

Any algorithm that is designed to extract association rules from a database must use a least one

of these measures of interest to select reliable rules.

Appendix B. The a priori Algorithm

The most used algorithm for obtaining association rules is the a priori. The a priori algorithm

selects the rules based on the minimum support. The minimum support is settled by the user of the

algorithm. The pseudocode of the a priori Algorithm A1 is shown as follows:

Algorithm A1 a priori algorithm Pseudocode.

: Set of Candidate elements of size k

: Set of Frequent elements of size k

Begin

= {set of frequent elements of size k};

for (k = 1; L

= 0; k++)

k+1

= Selected candidates from L

for each transaction in database D

Increment the count of candidates C

k+1

that are contained in t

end

k+1

= candidates in C

k+1

that meets the minimum support

end

End

References

Oconnell, N.; Pinson P.; Madsen, H.; Omalley, M. Beneﬁts and challenges of electrical demand response:

A critical review. Renew. Sustain. Energy Rev. 2014, 39, 686–699. [CrossRef]

Alfares, H.K.; Nazeeruddin, M. Electric load forecasting: Literature survey and classiﬁcation of methods.

Int. J. Syst. Sci. 2002, 33, 23–34. [CrossRef]

Fan, S.; Hyndman, R.J. Short-term load forecasting based on a semi-parametric additive model. IEEE Trans.

Power Syst. 2012, 27, 134–141. [CrossRef]

SENER; Secretaría de Energía (MX). Acuerdo por el que se emite el Manual de Mercado de Energía de Corto Plazo;

Published Reform in 2016-06-17 Second Section; Diario Oﬁcial de la Federación (DOF): Ciudad de México,

México, 2016; pp. 10–76.

Raza, M.Q.; Khosravi, A. A review on artiﬁcial intelligence based load demand forecasting techniques for

smart grid and buildings. Renew. Sustain. Energy Rev. 2015, 50, 1352–1372. [CrossRef]

Almeshaiei, E.; Soltan, H. A methodology for Electric Power Load Forecasting. Alex. Eng. J.

2011

, 50, 137–144.

[CrossRef]

Appl. Sci. 2019, 9, 5269 19 of 20

Lee, D.; Park, Y.G.; Park, J.B.; Roh, J.H. Very short-Term wind power ensemble forecasting without numerical

weather prediction through the predictor design. J. Electr. Eng. Technol. 2017, 12, 2177–2186.

Martínez-Álvarez, F.; Troncoso, A.; Asencio-Cortés, G.; Riquelme, J. A Survey on Data Mining Techniques

Applied to Electricity-Related Time Series Forecasting. Energies 2015, 8, 13162–13193. [CrossRef]

Burda, M.; Štˇepniˇcka, M.; Štˇepniˇcková, L. Fuzzy Rule-Based Ensemble for Time Series Prediction: Progresses

with Associations Mining. In Strengthening Links Between Data Analysis and Soft Computing; Springer

International Publishing: Cham, Switzerland, 2015; Volume 315, pp. 261–271. [CrossRef]

10.

Yadav, M.; Jain, S.; Seeja, K.R. Prediction of Air Quality Using Time Series Data Mining. In Opinion Mining of

Saubhagya Yojna for Digital India; Springer: Singapore, 2019; Volume 55, pp. 13–20. [CrossRef]

11.

Wang, C.; Zheng, X. Application of improved time series Apriori algorithm by frequent itemsets in

association rule data mining based on temporal constraint. Evol. Intell. 2019. [CrossRef]

12.

Gajowniczek, K.; Zabkowski, T. Data mining techniques for detecting household characteristics based on

smart meter data. Energies 2015, 8, 7407–7427. [CrossRef]

13.

Singh, S.; Yassine, A. Big Data Mining of Energy Time Series for Behavioral Analytics and Energy

Consumption Forecasting. Energies 2018, 11, 452. [CrossRef]

14.

Khosravi, A.; Nahavandi, S.; Creighton, D. Construction of optimal prediction intervals for load forecasting

problems. IEEE Trans. Power Syst. 2010, 25, 1496–1503. [CrossRef]

15.

Quan, H.; Srinivasan, D.; Khosravi, A.; Nahavandi, S.; Creighton, D. Construction of neural network-based

prediction intervals for short-term electrical load forecasting. In Proceedings of the IEEE Symposium on

Computational Intelligence Applications in Smart Grid (CIASG), Singapore, 16–19 April 2013; pp. 66–72.

16.

Rana, M.; Koprinska, I.; Khosravi, A.; Agelidis, V.G. Prediction intervals for electricity load forecasting using

neural networks. In Proceedings of the International Joint Conference on Neural Networks, Dallas, TX, USA,

4–9 August 2013.

17.

Moulin, L.S.; da Silva, A.P.A. Neural Network Based Short-Term Electric Load Forecasting with Conﬁdence

Intervals. IEEE Trans. Power Syst. 2000, 15, 1191–1196.

18.

Liu, H.; Han, Y.H. An electricity load forecasting method based on association rule analysis attribute

reduction in smart grid. Front. Artif. Intell. Appl. 2016, 293, 429–437.

19.

Chiu, C.C.; Kao, L.J.; Cook, D.F. Combining a neural network with a rule-based expert system approach for

short-term power load forecasting in Taiwan. Expert Syst. Appl. 1997, 13, 299–305. [CrossRef]

20.

Box, G.E.P.; Tiao, G.C. Intervention Analysis with Applications to Economic and Environmental Problems.

J. Am. Stat. Assoc. 1975, 70, 70–79. [CrossRef]

21. Chatﬁeld, C. Time-Series Forecasting, 1st ed.; Chapman and Hall/CRC: Boca Raton, FL, USA, 2000.

22.

Hyndman, R.J.; Athanasopoulos, G. Forecasting: Principles and Practice, 2nd ed.; OTexts: Melbourne,

Australia, 2018.

23.

Hastie, T.; Tibshirani, R.; Friedman, J. The Elements of Statistical Learning; Springer Series in Statistics; Springer

New York Inc.: New York, NY, USA, 2001.

24.

Heaton, J. Introduction to Neural Networks for Java, 2nd ed.; Heaton Research, Inc.: Washington, DC, USA, 2008.

25.

Jeff, H. The Number of Hidden Layers. Available online: https://www.heatonresearch.com/2017/06/01/

hidden-layers.html (accessed on 21 August 2017).

26.

Riedmiller, M. Rprop-Description and Implementation Details. Available Online: http://www.inf.fu-berlin.

de/lehre/WS06/Musterererkennung/Paper/rprop.pdf (accessed on 1 September 2017).

27.

Chang, H.; Nakaoka, S.; Ando, H. Effect of shapes of activation functions on predictability in the echo state

network. arXiv 2019, arXiv:1905.09419.

28.

Agrawal, R.; Imieli´nski, T.; Swami, A. Mining Association Rules Between Sets of Items in Large Databases.

SIGMOD Rec. 1993, 22, 207–216. [CrossRef]

29.

Frawley, W.J.; Piatetsky-Shapiro, G.; Matheus, C.J. Knowledge Discovery in Databases—An Overview.

Knowl. Discov. Databases 1992, 1–30.. [CrossRef]

30. Hyndman, R.J.; Fan, Y. Sample Quantiles in Statistical Packages. Am. Stat. 1996, 50, 361–365.

31.

Quan, H.; Srinivasan, D.; Khosravi, A. Uncertainty handling using neural network-based prediction intervals

for electrical load forecasting. Energy 2014, 73, 916–925. [CrossRef]

32.

Czado, C.; Gneiting, T.; Held, L. Predictive Model Assessment for Count Data. Biometrics

2009

, 65, 1254–1261.

[CrossRef] [PubMed]

Appl. Sci. 2019, 9, 5269 20 of 20

33.

Hyndman, R.J.; Khandakar, Y. Automatic Time Series Forecasting: The forecast Package for R.J. Stat. Softw.

2008, 27. [CrossRef]

34.

Coimbra, C.F.; Pedro, H.T. Chapter 15—Stochastic-Learning Methods. In Solar Energy Forecasting and Resource

Assessment; Kleissl, J., Ed.; Academic Press: Boston, MA, USA, 2013; pp. 383–406.

35.

CENACE. Servicios Conexos. Available online: https://www.cenace.gob.mx/SIM/VISTA/REPORTES/

ServConexosSisMEM.aspx (accessed on 30 November 2019) .

2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access

article distributed under the terms and conditions of the Creative Commons Attribution

(CC BY) license (http://creativecommons.org/licenses/by/4.0/).