Design of an Approximate Dynamic Programming based neural controller for Smart Home Energy Management

Demand Side Management (DSM) is the control of consumer demand for energy via different techniques such as financial incentives. This technology has become inevitable in the new smart grid infrastructure. In this study, a DSM scheme, a novel smart home energy management system, is proposed. The goal, defined in terms of cost, is to manage the home energy system according to time-varying prices in a way that energy demand from grid is reduced as much as possible or it is moved to off-peak times. The proposed scheme takes advantage of local energy generation, energy storage unit and schedulable load. Our offline scheme uses an Adaptive Dynamic Programming (ADP) based algorithm to solve the energy management problem and optimally schedule the battery and load operations in a given time horizon. We also use PSO method to solve the mentioned problem. The results obtained by PSO are used as an element of comparison. Simulation results show that the ADP algorithm can reduce costs with respect to PSO due to better decision making ability.


Introduction
In the smart grid infrastructure, Demand Side Management (DSM) is explored to modify the energy demand at the consumer side.DSM can help for purposes such as load balancing and cost reduction which is necessary in a smart home environment [1][2].A smart home (SH) system includes some basic components such as household appliances, energy storage unit, local energy generation such as photovoltaic (PV) array and a smart meter [3].A SH scenario is illustrated in figure 1 [2].This paper focuses on minimizing costs of a residential consumer through optimally scheduling battery and load operations based on real-time pricing.The optimal behavior of battery and load results in maximum use of local renewable energy and consequently minimum demand for energy from the main grid.
There is considerable research effort on scheduling problem in the DSM for a home scale.Some solutions have been proposed which mostly focus on scheduling appliances operations, based on start time and operated time duration of appliances [4][5][6][7].Also [8][9][10] have developed some methods for the appliances operation scheduling but they don't consider an optimal energy resource scheduling policy therein.For the energy resource scheduling problem several techniques have been explored here.A dynamic programming approach for battery charge/discharge management is used by [11].An optimal energy scheduling adopting genetic algorithm is proposed by [12].Reference [13] presented an efficient approach for renewable resources scheduling based on PSO.Moreover [14][15][16] have developed some Adaptive Dynamic Programming (ADP) based algorithms for single/multi-battery activity control for SHEMS.All these research works assume that the load is predefined by the user and they don't consider the portion of load that can be scheduled according to the optimal strategy, helping cost reductions.Inspired by all the works above, we are going to design a new offline Smart Home Energy Management System (SHEMS) considering energy resources scheduling and also the schedulable portion of load.Flexibility in battery activity which has not been supported by most of the mentioned works is included in this study.In other words, battery behavior is continuous and with any amount of charge/discharge in the permitted ranges; moreover it is allowed to be charged from electrical grid or local energy generation output and discharged at any time to supply the load and help cost reductions.To the best of our knowledge, the best choice for the problem is Adaptive Dynamic Programming (ADP).We design an Action Dependent Heuristic Dynamic Programming (ADHDP) algorithm which is able to deal with two types of constraints; constraints on one time step and constraints on the total time horizon.Being offline, the optimization algorithm knows the state of system over total scheduling time horizon, so that better decision making is possible compared to online paradigms and more cost reductions is obtained; also decisions are made step by step.Section 2 of this paper presents the home energy system model.Section 3 describes the ADHDP algorithm design.In section 4, the PSO algorithm design is presented.Simulation results are given in section 5. Finally, section 6 provides conclusions and some future idea.http://www.ispacs.com/journals/cacsa/2017/cacsa-00076/International Scientific Publications and Consulting Services

Home energy system model
The proposed home energy system consists of the following components: connection to the power distribution system, solar panel, storage unit, (non) schedulable load and an energy management system.The grid energy and the solar energy can satisfy the load and charge the battery.The battery can work in one of the charge, discharge or idle modes (Table 1).The nonschedulable portion of load must be supplied at each time step completely and the schedulable portion must be satisfied during the total time horizon in an optimal way (Table 2).The optimal policy is obtained after running the optimization algorithm simulation.The specifications of the battery model adopted here are reported in table 1 and of the schedulable load, in table 2. Battery parameters are as follows:  is the battery efficiency,  0 is the initial state of charge of the battery,   and   are respectively the maximum and minimum battery energy level and ℎ  /ℎ  is the maximum charge/discharge rate.About schedulable load parameters, total schedulable load is the accumulative load demand to be satisfied over the total time horizon, minimum and maximum schedulable load are respectively the lower and upper bounds for schedulable load power demand in each time step.Figure 2 illustrates the energy management system described above.The proposed energy management problem aims at minimizing the energy demanded from the grid or sold to the grid, defined in the form of a cost function, based on some constraints.The optimization algorithm taking the state of system, namely, renewable energy, nonschedulable load, electricity price and battery level as input, manages the battery action and schedulable load demand.The cost function and the related constraints on control parameters are as follows: (2.5) where (), the battery charge/discharge amount, and   (), the schedulable load demand, are system control; moreover,   (), the nonschedulable load demand, (), the available renewable energy, and (), the unitary electricity price, are system states. is the scheduling time horizon.ℎ  , ℎ  ,   and   have been defined previously.Also   and  are maximum schedulable load and total schedulable load; minimum schedulable load is considered to be 0.

ADHDP Algorithm
Adaptive Critic Designs (ACDs), developed by Werbos, perform optimization based on approximate dynamic programming using neural networks [17][18].The ACDs series includes Heuristic Dynamic Programming (HDP), Dual Heuristic Programming (DHP) and Globalized Dual Heuristic Programming (GDHP) [19].Since the load and renewable data are stochastic, an action dependent HDP (ADHDP) without the need of model is chosen for design of the SHEMS illustrated in Figure 2. Using ADHDP, a control series is produced.This control series is able to minimize the proposed cost function using two neural networks.The critic network estimates the cost function in dynamic programming obtained by solving Hamilton-Jacobi-Bellman equation and returns a feedback signal to the action network.The action network adapts its weights using this feedback signal.As a result the action network takes as input the immediate state and provides a control to the critic in order to minimize the mentioned cost function.After an appropriate number of iterations, an optimal policy is obtained.The action network input and output are system state and system control, respectively (figure 3).Also, critic network input includes the immediate system state and control.The immediate system control comes from the action network output.Considering the previous states and controls as critic inputs results in cost reduction but increasing the computational complexity of the optimization process.A reasonable trade off like in [20] is taken here; the state and control at the current and the two past time steps.

Critic neural network
The critic network inputs are system state and control (action network output) in the time steps, ,  − 1,  − 2. The network structure contains 18 linear input neurons, 150 sigmoidal hidden neurons and 1 linear output neuron as defined in figure 4. Multi Layer Perceptron (MLP) is one of the most efficient neural network structures for approximating the nonlinear function in the ADP field [19].Accordingly, we consider MLP in our optimization method as well.Moreover, standard backpropagation (BP) algorithm is adopted for network training.The associated equations for the critic network are: where  ℎ is the number of critic hidden neurons,   the number of critic inputs and () the critic output.Critic network output estimates the cost function in dynamic programming: (3.9) where , the discount factor, must be in the range [0,1]; here it is considered to be 0.8.(), the utility function, is proposed as follows: () = [(  () +   () − () + ()) * ()] 2  (3.10)The critic output estimates the discounted total cost-to-go.In other words, it approximates () at time  as provided in [21]: () ≅ ( + 1) + ( + 2) + ⋯ (3.11)The prediction error related to the critic network is defined as   () = () − [( − 1) − ()] (3.12) and the critic network objective function that must be minimized is The weight update rule for the critic network is a gradient-based adaptation given by   () =   () + ∆  (), ∆  = {∆  (1) , ∆   where   is the learning rate and   contains the critic weights.
Figure 5: Action neural network

Action neural network
The action network structure consists of 4 linear input neurons, 150 sigmoidal hidden neurons and 2 sigmoidal output neurons.The network is trained using the BP algorithm.The input to the action network is the current state of system, composed of   (), (), () and .The output of the action network is the control decision, composed of () and   ().
The action network structure is shown in figure 5.The related equations will be of the following form: () = ∑  , (1) ()  () ,  = 1, … ,  ℎ   =1 (3.22)where  ℎ is the number of action hidden neurons and   the number of action inputs.The control decision ()at time k is used to compute the battery level (BL) in the subsequent time step.Action output has been defined in a way that it can be ensured that the battery and load constraints are met.

Optimization process
The optimization process shown in Figure 6 has the following steps: 1) Randomly initializing the action and critic networks weights (range of values [-1,1]).
2) Updating the critic network weights using (3.14) to (3.18).Then refreshing the action network using (3.26) to (3.28).3) Computing the performance measure (PM) to check the system performance (If PM decreases, the new action network weights are acceptable; otherwise, the old weights with a small perturbation added to them, are considered as current weights).Then restarting from step 2. The first two steps are carried out for each step of the total time horizon and when the last step is done, we have a control series to evaluate the performance of the action network using a performance measure.The PM is defined according to equations (2.1) and ( 2 (3.31)A small random perturbation (in the range [-0.1,0.1]) is applied in step 3, so that the system does not remain in a local minimum [16].After an appropriate number of iterations when PM reduction is not considerable from one iteration to the next one, the cost is considered minimum and the optimal solution over the total time horizon is reached.

PSO algorithm
In this section, the energy management problem is solved by PSO method [22].Because of its capabilities, the PSO results are used for comparison and ADHDP performance evaluation.PSO has the advantages of simplicity, ability to deal with the size and nonlinearity of the problem, and convergence in most types of problems where most of other methods fail.http://www.ispacs.com/journals/cacsa/2017/cacsa-00076/International Scientific Publications and Consulting Services PSO deals with the two equations (4.32) and (4.33) for each particle in a swarm.The vector   () denotes the position for particle (a possible solution) and the velocity vector   () represents its movement:   () =   ( − 1) +   () (4.32)   () =  *   ( − 1) +  1 * 1 * (  −   ( − 1)) +  2 * 2 * (  −   ( − 1)) (4.33)where  is the time step,  the particle number, and  the inertia factor. 1 , 2 are positive correction factors and 1,2 two random vectors in the range of [0,1].The velocity is stochastic so that an uncontrolled trajectory creation may occur and leads to a useless particle.Accordingly, upper and lower bounds limit are necessary for the velocity.The value of   can be selected by trial and error.   () >   ℎ   () =   (4.34)    () < −  ℎ   () = −  (4.35)Our PSO energy management scheme is an optimization process that computes the optimal control (() and   ()) for every time step  in the time horizon.The optimization is offline thus giving an optimal solution knowing the system state, (),   () and (), over the total scheduling horizon.The cost function is as defined by ( 2 The function for these constraints -heaviside step function of constraints -is multiplied with a penalty factor; the penalty factor takes a high value and here the value 1000 is adopted.The optimization process is described in figure 7.

Simulation results
In this section, the simulation results of ADHDP method are compared with the ones obtained by PSO and differences in their behaviors are discussed.The home energy system described in section 2 is used to implement the proposed energy management scheme.The aim is to minimize the amount of energy obtained/provided from/to the power grid or shift the demand for grid energy to off-peak times, over a given time horizon.The simulations process is performed taking historical data as input over a 48 hours time horizon.

Data
Nonschedulable load profile taken from [15] is shown in figure 8.The daily load pattern is divided into 24 slots.Each slot represents each hour of the day.A real-time pricing is used to shift the grid energy demand from peak times to off-peak times.Figure 9 provides the chosen real-time pricing taken from [23].To perform simulations, four different solar irradiation scenarios (four months of the year) of a United States city have been chosen from [24] data sets.The produced renewable energy at the output of PV system can be computed as follows: () =  *  * () (5.36) where () is the solar irradiation at time  expressed in ℎ/ 2 ,  = 10 2 the area of the PV system,  = 100% the efficiency and () the amount of renewable energy available at the output of PV system.Figure 10 shows the renewable energy profiles of four mentioned climates for Austin city over a 48 hours time horizon related to two first days of each month.

Results
In this section, simulation results are presented.The following results are obtained by the ADHDP energy management system.The simulation results shown in figure 11 are related to months with low produced energy profiles (like April).In this figure the PSO performance is also reported to illustrate the advancements obtained by the proposed ADHDP optimization scheme; this is shown by schedulable load and battery level graphs.
As illustrated in figure 11, both in ADHDP and PSO optimization processes, the optimization parameters (schedulable load and battery level) represent a reasonable interaction to reach the minimum cost.The advantage of ADHDP with respect to the PSO algorithm is that ADHDP highly relies on renewable energy when the PV system output is at a high level whereas the PSO doesn't; moreover, when the PV system output is low (specially at times when electricity prices are high), the schedulable load demand is reduced or turned off completely.Having such a better decision making ability, ADHDP has reached a lower cost than PSO.PSO does not focus on the renewable energy peak time and the load demand is distributed over the scheduling time horizon, but the interactions of two optimization parameters are in a way that when the prices are low, the load demand is higher than the times when the prices are high and when prices are high, the battery discharges itself to help supplying the load.The simulation results shown in figure 12 are related to months with high produced energy profiles (like August).It is obvious that the results are the same as the ones derived from figure 11.Table 3 reports the simulation results of ADHDP and PSO performed on four mentioned case studies (for months with four climates of Austin city) which ordered from the highest to the lowest produced renewable profile climate.Accordingly, using ADHDP technique highly improves the performance with respect to the PSO algorithm, therefore its effectiveness for home energy management is considerable.Table 4 illustrates the ability of both algorithms in satisfying the schedulable load.Mismatches between the satisfied load by optimization and the real load demand can be compensated by online tuning, which is not provided in this study but an example is available in [2].The amount of energy wasted at the output of PV within the both ADHDP and PSO optimization processes is presented in table 5.As illustrated in figures 11 and 12, ADHDP puts the emphasis on renewable energy using rather than grid energy, but PSO doesn't; therefore less renewable energy is wasted when using ADHDP.

Conclusion and future work
In this study, a new demand side management scheme for smart home scenario has been presented.The proposed scheme, which is an energy management problem, is solved by means of an adaptive dynamic programming method.Also we use PSO solving the mentioned problem, as an element of comparison.Results show that ADHDP outperforms PSO in terms of cost saving and battery and load behavior policy.The goal is to find a decision making policy that minimizes cost over a number of stages, so that a trade off between immediate and future costs is established.Dynamic programming provides a formal framework for this trade off and ADHDP makes it possible to use dynamic programming without a need of system model.Accordingly, ADHDP behaves more dynamically and has a better ability of decision making compared with PSO.As the future work, we are going to design an online detailed load scheduler that can support our offline controller.The online load scheduler corrects mismatches between historical and real data whereas the offline controller makes it possible to have a better optimization and provides an optimal 24 or 48 ahead, load and battery activity pattern.

Figure 1 :
Figure 1: A smart home scenario

Figure 2 :
Figure 2: Energy flows within the home energy system

Figure 3 :
Figure 3: ADHDP scheme for smart home energy management

Figure 10 :
Figure 10: Renewable energy profiles of four months of the year for Austin city

Figure 11 :
Figure 11: Simulation results for the two first days of April for Austin city related to ADHDP and PSO algorithms.Each plot reports the normalized values of electricity price, available renewable energy, schedulable load and battery energy level.

Figure 12 :
Figure 12: Simulation results for the two first days of August for Austin city related to ADHDP and PSO algorithms.Each plot reports the normalized values of electricity price, available renewable energy, schedulable load and battery energy level.

Table 2 :
Schedulable load parameters related to a 48 hours time horizon.

Table 5
confirms this fact.
International Scientific Publications and Consulting Services

Table 3
Energy costs obtained by ADHDP and PSO methods.

Table 4
Satisfied schedulable load obtained by ADHDP and PSO methods.

Table 5
Wasted energy at the output of PV system obtained by ADHDP and PSO methods.