Deep Learning Empowered Task Offloading for Mobile Edge Computing in Urban Informatics

Led by industrialization of smart cities, numerous interconnected mobile devices, and novel applications have emerged in the urban environment, providing great opportunities to realize industrial automation. In this context, autonomous driving is an attractive issue, which leverages large amounts of sensory information for smart navigation while posing intensive computation demands on resource constrained vehicles. Mobile edge computing (MEC) is a potential solution to alleviate the heavy burden on the devices. However, varying states of multiple edge servers as well as a variety of vehicular offloading modes make efficient task offloading a challenge. To cope with this challenge, we adopt a deep Q-learning approach for designing optimal offloading schemes, jointly considering selection of target server and determination of data transmission mode. Furthermore, we propose an efficient redundant offloading algorithm to improve task offloading reliability in the case of vehicular data transmission failure. We evaluate the proposed schemes based on real traffic data. Results indicate that our offloading schemes have great advantages in optimizing system utilities and improving offloading reliability.


I. INTRODUCTION
Along with the advancement of Internet of Things (IoT) in industrial application scenarios, urban life pattern is undergoing a tremendous change [1]. A high level of interconnections between heterogenous smart devices bring in the possibility to enable industrial automation, which improves operation and increases productivity with less or without human work [2].
Autonomous driving is one of the most attractive industrial automation applications. With large number of on-board sensors and actuators as well as advanced control systems, autonomous vehicles are capable of interpreting sensory information and identifying appropriate navigation paths. In addition, with the aid of information infrastructure in urban area and the introduction of Intelligent Transportation System (ITS), smart vehicles facilitate us with a pervasive and promising platform to realize a broad range of novel mobile applications, such as augmented reality, natural language processing and interactive gaming [3]. However, the process of understanding highly dynamic and complex traffic environment while making realtime driving decisions involves processing a great volume of sensory data and requires intensive computations. Due to the constraint of on-board computation power, supporting these real-time and computationally intensive tasks and applications on vehicles is a big challenge.
Mobile Edge Computing (MEC), where heavy computation tasks are offloaded to the cloud resources placed at the edge of mobile networks, has emerged as a promising approach to cope with growing computing demands [4]. Fig. 1 represents a typical scenario of applying MEC technique for intelligent traffic applications. Aided with MEC technique, safety-oriented tasks in the context of autonomous driving, computation-intensive 2 bility of MEC services. Thus, novel solutions are necessary to address this issue in order to guarantee both reliability and efficiency of the vehicular task offloading. However, very few works have investigated integrating management of computing and communication resources in a multi-server vehicular edge computing network, and task offloading reliability has not been incorporated in the recent literatures.
To bridge this gap, in this paper, we focus on task offloading in an MEC-enabled vehicular network, and present an approach to optimize MEC system performance while also improving offloading reliability. The main contributions of this paper are as follows: • We present an MEC-enabled LTE-V network, where the influence of various vehicular communication modes on the task offloading performance is qualitatively analyzed. • By applying deep Q-learning approach, we propose optimal target MEC server determination and transmission mode selection schemes, which maximize the utilities of offloading system under given delay constraints. • To cope with transmission failure in vehicular networks, we design an efficient redundant offloading algorithm, which also ensures offloading reliability while improving the gained utilities. The remainder of the paper is organized as follows. In Section II, we review related work. A vehicular edge computing system model is presented in Section III. A deep Q-learning based offloading scheme is described in Section IV. In Section V, we investigate offloading reliability. Performance evaluation is presented in Section VI. Finally, we conclude our work in Section VII.

II. RELATED WORK
To meet the demands from computationally intensive vehicular applications, some studies have investigated applying MEC approach in vehicular networks. In [7], the authors proposed an energy efficient resource allocation for vehicular fog computing centers. In [8], an MEC-based architecture was used in urban traffic management in a distributed and adaptive service manner. In [9], the authors designed a fog vehicular computing framework that integrates resources from both edge server and remote cloud. In [10], the authors unveiled underutilized vehicular computing resources, and put them into use for providing efficient computational support to MEC servers. To efficiently merge MEC technology in vehicular networks, the authors in [11] introduced a collaborative task offloading and output transmission mechanism. In [12], the authors designed an MEC service migration scheme that ensures vehicles always connect to the nearest MEC entities. Although these studies have provided some insights about MEC-enabled vehicular applications, the effects of vehicular communication on the design of task offloading strategies have not been thoroughly investigated.
Benefiting from fast commercialization of LTE system, LTE-V has been one of the key technologies in vehicular networks. Several recent works have focused on analytical models and implementations of LTE-V. In [13], direct vehicle communication was utilized to offload data transmission from vehicles with poor quality link to infrastructures. Taking into account high mobility of vehicles, the authors in [14] designed a wireless link formulation mechanism, where beamwidths between vehicular communication pairs were optimized. In [15], the authors discussed key building blocks of 5G networks in the context of vehicular communications. However, joint V2I and V2V transmission schemes in a multiple MEC server scenario have not been considered in the previous studies.
Learning is a branch of artificial intelligence, which studies systems and acquires knowledge from data. Recently, various learning techniques have been deployed for scheduling task offloading. In [16], the authors proposed an online learning based workload offloading scheme in mobile edge computing systems with renewable power supply. In order to reduce resource consumption in task offloading, the authors in [17] formalized intelligent offloading metric prediction utilizing a machine learning based approach. Deep Q-learning is a powerful tool in policy optimization, and has been utilized in various process decisions. For instance, the authors in [18] designed an integrated resource management scheme for connected vehicles using a deep reinforcement learning approach. The authors in [19] used deep Q-learning in scheduling voltage and frequency for real-time systems in embedded devices. In [20], this learning approach was adopted in designing a video streaming framework. Moreover, deep Q-learning also can be used in traffic area. To relieve traffic congestion at highway junctions, the authors in [21] applied deep Q-learning in traffic simulation study and vehicle pathway optimization. However, the potential of learning based approaches have not been explored for designing scheduling algorithms for vehicular edge computing applications. Furthermore, mobile characteristics of vehicles and reliability of vehicular task offloading have not been considered in the previous studies.
Different from these studies, in this paper we concentrate on task offloading in an LTE-V network, and propose optimal offloading schemes that jointly schedule vehicular communication and edge computing through a deep Q-learning approach.  the vehicles requires various sensory data processing. Moreover, urban informatic infrastructure-aided mobile applications may also pose computing requirement on the vehicles. We model these data processing and computing as computation tasks. Various tasks may have different characteristics. For instance, autonomous navigation has strict delay constraints, while entertainment applications do not impose a critical delay requirement. According to this consideration, we classify these tasks into G types. A task is described in four terms as 22]. Here, f i and g i are the size of task input data and the amount of required computation, respectively. To provide timely response in various traffic context, the tasks are time sensitive. t max i is the maximum delay tolerance of task κ i . Offloading system gets utility ς i ∆t from the completion of task κ i , where ∆t is the saved time in accomplishing κ i compared to t max i . The probability of the tasks belonging to type-i is denoted as β i with i∈G β i = 1.

III. SYSTEM MODEL
The road is covered by a heterogeneous vehicular network. Besides the cellular network provided by a Base Station (BS), there is an LTE-V network composed of mobile vehicles and M Road Side Units (RSUs) deployed along the road. The set of these RSUs is denoted as M. The cellular network and the vehicular network operate on different and non-overlapping spectrum. Compared to the BS that has seamless coverage and high data transmission cost, the RSUs conversely provide spotty coverage and inexpensive access service. The costs for using a unit spectrum of the cellular network and for that belonging to the vehicular network in a unit time are c c and c v , respectively. We have c c > c v .
The BS is equipped with an MEC server through wired connections, which is denoted as Serv 0 . In addition, each RSU hosts an MEC server. These servers are denoted as Serv 1 , Serv 2 ,...,Serv M , respectively. The MEC servers get data from their attaching BS or RSUs directly. Let {W 0 , W 1 , W 2 , ..., W M } denote the computing capacities of these servers, respectively. As the server equipped on BS can serve the vehicles located on the whole road, we consider that the capacity of Serv 0 is much higher than that of the servers deployed on the RSUs. Each MEC server is modeled as a queuing network, where the input is the offloading task. The arrived tasks is first cached in an MEC server, and then served with the first-come-first-serve policy. A server utilizes all of its computing resources to execute currently served task. The cost for tasks using a computing resource in a unit time is c x .
Each vehicle has a cellular and an LTE-V radio interface, which work on different spectrum and enable multiple communication paradigms. In the heterogeneous network formed by overlapping of the BS and the RSUs coverage, vehicles can offload their tasks to MEC servers through multiple modes. We name the task file transmission between a vehicle and the BS as vehicle-to-BS (V2B). When a vehicle turns to the LTE-V network for task offloading, the file can be transmitted to an MEC server in a mode with joint vehicle-to-vehicle (V2V) and vehicle-to-RSU (V2R) transmission.
In self-driving vehicles, real-time vehicle traffic information, such as position, speed, heading directions can be gathered by vehicular sensors [23]. Furthermore, channel state information also can be detected by these vehicles. All this information together with the description of generated vehicular tasks are transmitted to a control center through cellular networks. There is spectrum allocated for this information transmission besides the spectrum used for task offloading. Based on the collected information, the control center can utilize communication resources of the heterogeneous network as well as the computing resources of MEC servers, and efficiently schedule task offloading.
The scheduling and resource management are considered to operate in a discrete time model with fixed length time frames. The length of a frame is denoted as τ . In each time frame, a vehicle generates a computing task with probability P g . Enabled by advanced LTE technology, we consider that the duration time of a task file transmission is within a time frame. In addition, a task offloading vehicle can only choose one transmission mode.
The communication topology between vehicles and infrastructure keeps constant during one frame. However, the topology may change in different time frames due to the mobility of the vehicles. To facilitate the modeling of the dynamic relations, we divide the road into E segments. The position of a vehicle on the road can be denoted by the index of the segment e, where 1 ≤ e ≤ E. We consider that the vehicles in the same road segment have an identical distance to a communication infrastructure.
In assessing the network performance, we focus on the upstream communication process that offloads tasks from vehicles to MEC servers in various modes. We consider that all vehicles have fixed transmission power for a given transmission mode, i.e., power P tx,b in V2B mode and power P tx,v in V2R and V2V modes. In addition, these vehicles have enough storage for caching task files.
In the case of V2B mode, the assignment of spectrum to vehicles is orthogonal, and there is no collision between V2B communication vehicles. For receiving task file from a V2B mode vehicle, the signal to noise and interference ratio (SINR) at the BS is given as 4 where d v,b is the distance between the transmitting vehicle and the BS. G r is the antenna gain at the BS. L 0 and α are the path loss at a reference unit distance and path loss exponent, respectively. P w is the power of additive white Gaussian noise. When vehicles choose LTE-V communication in V2R or V2V modes, collisions may occur due to the spectrum reuse between communication pairs working in these modes. In such a case, the SINR at receiver r is calculated as where V is the set of other vehicles that communicate in the same spectrum within the interference range. The receiver r can be either an RSU or a relay vehicle. Let γ min be the minimum SINR at a receiver under the premise that the received data can be decoded. Given a static network topology and spectrum resource allocation, we can get the feasible communication pairs whose SINR is no less than γ min . These pairs form the potential way to offload task files from vehicles to MEC servers.

IV. OPTIMAL OFFLOADING SCHEMES IN A LEARNING APPROACH
In this section, we formulate an optimal offloading problem, and then model it as a Markov decision process. Based on deep Q-learning, an approach that incorporates deep learning algorithm with Q-functions, joint MEC server selection and offloading mode determination strategies are obtained.

A. Problem Formulation
In a given time frame, for a vehicle that is located in road segment e and generates task κ i , we use x i,e = 1 to indicate the task offloading to Serv 0 through V2B mode. Similarly, we use y i,e,m = 1 and z i,e,m = 1 to indicate the task offloading to Serv m in a V2R mode and in a joint V2V and V2R mode, respectively. Otherwise, these indicators are set to 0.
The proposed optimal task offloading problem, which maximizes the utility of the offloading system under the constraints of task delay, is formulated as follows: where, n is the number of tasks generated in a time frame. e j is the road segment index of vehicle j's location. H ej is the number of transmission hops. q c and q v are the amount of spectrum resources allocated for each task file offloading through cellular network and LTE-V network, respectively. R v,b,ej is the transmission rate of offloading task file from vehicle at road segment e j to the BS, which can be given as . R v,r,ej and R v,j,ej can be calculated similarly based on the allocated spectrum q v and SINR γ v,r .
In (3), the first three constraints indicate that a task file can only be transmitted through one mode. The fourth constraint shows that the time cost for offloading task κ i should be under its delay constraint. Here t total i,ej is the total time cost for completing a type i task generated by a vehicle located at road segment e j . Given the task offloading strategies, t total i,ej can be written as where t wait 0 and t wait m are the waiting time of the task in Serv 0 and Serv m , respectively. The value of the waiting time will be discussed in the following subsection.

B. Markov Decision Approach
As each MEC server is modeled as a queuing system, the current serving state of a server may affect the time cost for accomplishing the following tasks. To choose the offloading target server efficiently, the offloading strategy taken by each task in time frame l depends on the characteristics of current vehicle network as well as the server states in frame l − 1. Thus, we can formulate (3) as a Markov decision process, and solve it in a Markov decision approach [24].
The state of the offloading system at time frame l is defined as S l = (s l 0 , s l 1 , ..., s l M ), where s l 0 is the total computation required by the tasks queuing in Serv 0 at frame l. Similarly, s l 1 , ..., s l M denotes the required computation of the tasks queuing in Serv 1 , Serv 2 , ..., Serv M at time frame l, respectively. The actions taken by the control center at frame l can be shown as i,e,m } are the sets of task offloading strategies with various transmission modes and offloading targets for the generated tasks at frame l, respectively.
To facilitate the analysis of the effects brought by the actions to the system states, we introduce variableĉ l m , m ∈ {0, 1, · · · , M }, which denotes the amount of computation taken by Serv m in time frame l. We defineĉ l m aŝ Then, the state transitions between time frame l and l + 1 This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication. can be written as When action a l is taken in state S l , the gained average utility in time frame l is where t total i,ej ,l is defined in (4) with t wait In order to maximize the utility of the offloading system, we need to obtain an optimal strategy π * , which consists of offloading actions for various tasks in different time frames. π * can now be expressed as where η is a discount factor that trades off the immediate utility and the later ones, 0 < η ≤ 1.

C. Deep Q-learning Based Offloading Scheme
To derive the optimal offloading strategy π * , we turn to reinforcement learning technology. Reinforcement learning is a main branch of machine learning, where agents take series of actions that maximize the discounted future reward with corresponding strategies in various states. Thus, a Markov decision process can be considered as a reinforcement learning problem. Under a given offloading strategy π, the gained average system utility from taking action a l in state S l can be expressed as a Q-function, which is shown as Then the optimal value of the Q-function will be where the maximum utility as well as the optimal offloading strategies can be derived by value and strategy iteration. Q-learning, which is a classical algorithm of reinforcement learning technologies, can be used in modifying the iterations.
In each iteration, the value of Q-function in the learning process is updated as where α is the learning rate. However, the states of the offloading system consist of the amount of required computation queuing in the MEC servers, whose value is continuous. It is hard to find the optimal solution through discretizing the state space. Thus, Qlearning approach cannot be directly implemented in solving our proposed Markov decision problem. To address this issue, we turn Q-function into a function approximator, which is a function form easy to be handled in optimal action acquisition process. Here we choose a multi-layered neural network as a nonlinear approximator that is able to capture complex interaction among various states and actions. Based on the Q-function estimation, we utilize deep Q-learning technology to obtain the optimal offloading strategies π * [25].
We refer to the proposed neural network based approximator as Q-network, where θ is the set of parameters of the network. With the help of Q-network, the Q-function in (9) can be estimated as Q(S l , a l ) ≈ Q (S l , a l ; θ) [26]. Q is trained to converge to real Q values over iterations. Based on Q , the optimal offloading strategies in each state is derived from the actions that lead to the maximum utility. The chosen action at frame l can now be written as a l * = arg max a l Q (S l , a l ; θ).
In the learning process, experience replay technique is utilized to improve the learning efficiency, where the learning experience at each time frame is stored in a replay memory [25]. The experience consists of observed state transitions as well as gained utilities led by actions. The experience gained at time frame l is expressed as (S l , a l , U l , S l+1 ). During Qlearning updates, a batch of stored experience drawn randomly from the replay memory is used as samples in training the parameters of Q-network. The goal of the training is to minimize the difference between Q(S l , a l ) and Q (S l , a l ; θ). We define a loss function to denote the difference as where θ l is the parameters of Q-network at time l. Q l tar is a learning target, which denotes the optimal value of the Qfunction in frame l and can be shown as We deploy a gradient descent approach to modify θ. The gradient derived through differentiating Loss(θ l ) is calculated as Then θ l is updated according to where is a scalar step size.
This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication. In order to avoid local optimum while balancing exploration and exploitation in the learning process, we adopt ε-greedy policy. Random selected actions are taken with probability ε to explore better offloading strategies, otherwise optimal actions are chosen in exploitation the replay memory with probability 1 − ε, where 0 < ε < 1. The proposed deep Q-learning based offloading scheme is shown in Algorithm 1.
Algorithm 1 Deep Q-learning based task offloading Initialization: Initialize Q-network with weights θ, action-value function Q, and experience replay buffer. 1: For a given steady vehicular traffic flow Do Select a random action a l with probability ε, otherwise choose action a l = arg max a Q(S l , a; θ);

5:
Execute action a l , derive the next state S l+1 and obtain offloading utility U l according to (6) and (7); 6: Store the experience (S l , a l , U l , S l+1 ) into the experience replay buffer; 7: Get a batch of samples from the replay memory, and calculate loss function Loss(θ l ) according to (12); 8: Calculate the gradient of Loss(θ l ) with respect to θ l according to (14); 9: Update θ l according to (15) Although the deep Q-learning based schemes provide an optimal way for offloading tasks with maximum utility, task file transmission failure may seriously undermine the offloading efficiency. Unlike V2B transmission mode, where vehicles communicate in orthogonal spectrum, the vehicle communication pairs in V2V mode may work with the same spectrum resources and cause serious interference between them. In addition, in V2V transmission, wireless interface of some chosen relay vehicles may be suddenly occupied by vehicular applications newly generated in these vehicles during the file delivery. Task transmission as well as offloading process may be endangered or even interrupted by the interference.
In this section, we focus on offloading reliability in a joint V2V and V2R mode. At first, we investigate the number of transmission hops in this joint mode offloading process. Recall that γ min is the minimum SINR at a receiver under the premise that the received data can be decoded, and the maximum distance of one hop vehicular communication without cochannel interference can be calculated as Theorem 1. Given traffic density of the vehicles on the road as ρ and the probability of a vehicle generating a task as P g , the mean value of the farthest communication distance between two vehicles can be approximated as d ∆d is distance reduction, and ∆d > ln σ/2(1 − P g )ρ, where σ is a coefficient and 0 < σ 1.

Proof. See Appendix A.
For a vehicle located in road segment e, when it chooses MEC Serv m as its offloading target, according to Theorem 1, the number of the transmission hops from the original vehicle to RSU m is Among the total number of transmission hops, H e,m − 1 hops are for V2V communication and the last one hop is for V2R communication.
Next, we analyze the interference brought by vehicles operating in the same frequency to the V2V and V2R communications. As stated above, each transmitting vehicle randomly chooses q v spectrum from Q v resources for file delivery. The total number of orthogonal spectral resource blocks for vehicular communication is n r = Q v /q v . We consider vehicle v sends computing task file to receiver r. Here r can be either a vehicle in V2V mode or an RSU in V2R mode. Let d v,r denote the distance between v and r, and d j,r be the distance between r and its jth nearest interference vehicle. The following Lemma presents the statistical distribution of interference distance d j,r . Lemma 1. For vehicular communication, the probability distribution of distance between receiving vehicle r and its jth nearest interference vehicle is shown as Proof. See Appendix B.
According to Theorem 1 and (2), the total time consumption of file transmission from vehicle v, which generates a type-i, to RSU m can be calculated as t total v,m,i = (He,m−1)fi In (19), interference distance d j,r is a random variable as shown in Lemma 1. Now we derive the probability distribution of t total v,m,i . To facilitate the analysis of the impact of co-channel vehicular communication on t total v,m,i , we only consider the two nearest interfering vehicles in (19). It is noteworthy that the analysis approach can be extended to a scenario with multiple interfering vehicles. Lemma 2. x and y are independent random variables with probability distribution f x (x) and f y (y), respectively. b 0 and c 0 are positive constants. Let z = b 0 (x −α + y −α ) + c 0 . The probability distribution of z can be presented as This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication. Proof. See Appendix C.
Let z = 2 j=1 P tx,v L −1 0 d j,r −α + P w . Through using Lemma 1 and 2, we obtain Lemma 3. Let z be a random variable with probability density of f z (z) and w = 1/log 2 (1 + a0 z ), where a 0 is a positive constant. The probability distribution of w is given as ) . According to Lemma 3, we get (24) Based on (23) and (24), we get the following theorem.
Theorem 2. The probability distribution of t total v,m,i can be expressed as Considering that a computing task needs to be accomplished under its delay constraint, we define a reliable link as its task file transmission is completed within a given time threshold. For type i task, the time threshold is given as t max i − t wait m − g i /W m . Thus, the reliability of transmission type i task is defined as follows Let ϑ be a reliability threshold. We aim to keep the task file transmission reliability no less than ϑ, i.e., Pr f,i ≥ ϑ. To enhance the reliability, we use redundant offloading schemes. For each computing task offloading through a joint V2V and V2R mode, the vehicle that generates the task sends some copies of the task file to the target MEC server in multiple separated transmission paths. The offloading is accomplished when any copy reaches the MEC server and is executed there. We consider a scenario where a vehicle located at road segment e offloads a type-i task to Serv m . Let N i,e,m denote the number of paths required for offloading the task to Serv m . From the perspective of a single task while ignoring the interference between multiple redundant transmissions, to assure reliability, N i,e,m should be no less than N low,i,e,m , which is given as However, in the offloading system, several tasks may be generated simultaneously. The average number of tasks generated on vehicles located in one road segment at the same time frame will bē According to our proposed deep Q-learning based task offloading scheme, the tasks of the same type and generated in the vehicles at the same road segment are offloaded in an identical approach. Thus, the average number of type-i tasks, which are generated at road segment e and offloaded to Serv m through a joint V2V and V2R mode, is given as If all theseN task tasks choose to offload their tasks with N i,e,m redundant transmission paths, serious interference may occur between the paths. Affected by this interference, task transmission reliability decreases while transmission delay increases. Consequently, offloading utility decreases. Thus, there exists a maximum number of redundant file transmission for each type of tasks in a given road segment. As it is hard to derive a closed-form expression of the relations between N i,e,m and t total v,m,i , we take a search approach to obtain the optimal number of redundant transmission path. To improve the searching efficiency while excluding invalid value, we present the upper and lower bounds of the redundant number. We first analyze the lower bound and give the following theorem. Proof. From (19), we can find that as N i,e,m increases, more interference may be brought to task file transmission and t total v,m,i becomes longer. Thus, the transmission reliability shown in (26) decreases. Let Pr f,i and N low,i,e,m denote the reliability and the minimum number of paths required for offloading transmission in the scenario with redundant transmission interference, respectively. Based on the above analysis, we have Pr f,i < Pr f,i . Since 0 < 1 − Pr f,i < 1, according to (27), we get N low,i,e,m < N low,i,e,m . Thus, N low,i,e,m can act as the lower band of the searching range.
Next, we investigate the upper bound of N i,e,m . In a given system state, to maximize the total offloading utility shown in (3), type-i tasks at road segment e should choose the offloading mode that brings them higher utility, where i ∈ [1, G] and e ∈ [1, E]. Motivated by this consideration, the This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication.
The final version of record is available at http://dx.doi.org/10.1109/JIOT.2019.2903191 Copyright (c) 2019 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing pubs-permissions@ieee.org. maximum redundant number of task transmission, N max,i,e,m can be obtained through comparing the gained utilities through various offloading modes. Considering that the computing capacity of Serv 0 at the BS is much powerful compared to the servers equipped at the RSUs, and that no interference between transmission vehicles exists in the cellular network, although the communication cost of V2B mode is higher than that of V2V mode, offloading tasks to Serv 0 in V2B mode can be taken as a backup approach. Thus, we focus on comparing the utilities gained from offloading type-i tasks to Serv 0 and Serv m . Considering that N i,e,m redundant task transmissions may be generated simultaneously, in this scenario, the density of interfering vehicles is given as P g ρ(1 − β i + N i,e,m β i ). According to Lemma 1, the probability distribution of the interference distance can be shown as Based on (30) and Theorem 2, we get the average time cost for transmitting type i task from a vehicle located in road segment e to RSU m, which can be presented as According to the utility definition in (3), we have the utility gained from offloading a type-i task to Serv m with redundant transmission as where total time cost t total i,r =t total v,m,i + t wait m + g i /W m . From (30) and (32), we can see that more redundant transmission paths lead to more serious interference and higher offloading cost. Since redundant task file transmission is taken under the condition that U r is higher than what is gained from offloading in V2B mode, the maximum number of redundant paths for type-i tasks generated at road segment e to Serv m will be Based on the lower and upper bounds in (27) and (33), respectively, the optimal redundant transmission number can be searched in a limited interval and be obtained efficiently. The reliable offloading scheme that prevents offloading failure through optimal redundant transmission is illustrated in Algorithm 2.

VI. NUMERICAL RESULTS
In this section, we evaluate the performance of the proposed task offloading schemes based on real traffic data, which consists of 1.4 billion GPS traces of more than 14000 taxis recorded for 21 days in a city of China [27]. In the recorded urban area, we select 3 roads, which are shown in Fig.  3. The corresponding geographical information is obtained Algorithm 2 Reliable offloading in vehicular edge computing networks Initialization: Offloading strategies obtained from Algorithm 1. 1: Sort road segments according to γ v,b from high to low; 2: For each road segment with γ v,b from high to low Do

3:
For each type of tasks generated in this road segment to be offloaded to Serv m , m = {1, 2, ..., M } Do 4: Compute type i task's transmission reliability Pr f,i according to (26); 5: if Pr f,i < ϑ then 6: Compute N min,i,e,m and N max,i,e,m according to (27) and (33), respectively; 7: Search N opt,i,e,m = arg max U r according to (32), where N min,i,e,m ≤ N opt,i,e,m ≤ N max,i,e,m ; 8: Compute reliability Pr f,i with redundant transmission; 9: if Pr f,i ≥ ϑ then 10: Offload type-i tasks generated on segment e to Serv m in N opt,i,e,m redundant paths; 11: else 12: Offload type-i tasks generated on segment e to Serv 0 through V2B mode; 13: Update the states of Serv 0 and Serv m ; 14: end if 15: end if 16: End For 17: End For from Google map. To increase the scale of the simulation, background traffic flow is added to the roads.
We consider a scenario that where are one BS and five RSUs on each selected road. We set computing capacity W 0 = 1000 units, and the capacities of the MEC servers equipped on the RSUs are randomly taken from [100, 200] units. The amount of allocated spectrum resources q c is 20MHz and q v is 10MHz [28]. L 0 and α are 47.86 dB and 2.75, respectively. The transmission power P tx,b and P tx,v are set as 33 dbm and 30 dbm, respectively [29]. Fig. 4 presents the impact of real traffic on the obtained average utilities of a task with different offloading schemes. It is clear that our proposed deep Q-learning scheme yields higher offloading utility compared to other schemes, especially in the non-rush period from 12:00 to 16:00. The reason is that our scheme jointly considers transmission efficiency as well as load states of the MEC servers. However, the offloading scheme that chooses target server according to the vehicles' best transmission path and the scheme that selects MEC server according to server state only take one factor into account. The ignored factor may seriously affect the offloading efficiency. In the game theoretic approach, the vehicles running on a road segment act as the players that compete for task offloading services to get higher utility. Since each vehicle independently determines its offloading strategy from the perspective of its own interests, and ignores the cooperation between multiple vehicles, the system performance is getting worse. In the greedy algorithm, each vehicle chooses its offloading strategy in a distributed manner. For each time frame, a vehicle determines its optimal strategy according to currently observed communication environment and the service states of MEC servers. From Fig. 4 we see that the greedy algorithm gets less average offloading utility compared to our proposed learning approach. Task execution on an MEC server takes a certain amount of time, especially when it serves a large number of tasks. Thus, offloading strategies adopted in previous time frames may affect system performance of subsequent frames. Although the greedy algorithm jointly optimizes file transmission path and MEC server selection in current frame, it ignores the follow-up effects. In contrast, our proposed learning scheme takes both of these effects into the design of offloading strategies, thus leading to better performance. Fig. 5 compares the performance of our proposed deep Qlearning offloading approach and greedy algorithm, which are implemented in a road with different numbers of BSs and various traffic densities. From the figure we can clearly see that our proposed deep Q-learning approach yields better utility compared to the greedy algorithm in both scenarios: a single BS and three BSs. The reason is that our approach takes into system performance relations between subsequent frames into the design of the offloading strategies. We further find that the difference between the utilities gained by the same approach in different scenarios with various numbers of BSs  becomes larger as traffic density grows. Since the MEC servers equipped on the BSs only take parts of vehicular computing tasks and the transmission cost of V2B mode is higher than that of V2V mode, a number of tasks will be preferentially offloaded to the MEC servers equipped on RSUs through V2V transmissions. However, as traffic density increases, V2V communication interference gets worse, and more vehicles choose V2B mode to offload their tasks. Fig. 6 shows the convergence of our proposed deep Qlearning based offloading scheme. From this figure, we see that the learning process takes about 8000 time frames to reach the optimal offloading strategies with different vehicle density ρ. It is noteworthy that in practice, the proposed learning-based offloading scheme is executed offline, which means the actual duration of execution has little to do with the performance of the task offloading application of our interest. For a given steady vehicular traffic flow and computing capacity of the MEC servers, we may obtain the optimal offloading strategies for vehicles with various states in advance. The strategy set is stored in the control center, and can be obtained and applied on vehicles directly.
In Fig. 7, we compare the impacts of the task generation probability P g on proportion of V2B mode offloading with This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication.  various traffic density. It can be seen that adopting our proposed offloading scheme, the proportion of task transmission in V2B mode becomes higher as P g increases. Offloading a large number of generated tasks through vehicular communication may bring serious interference and impair offloading efficiency. Consequently, more parts of tasks are offloaded to Serv 0 through V2B mode with the increase of P g . Fig. 7 illustrates that comparing to the performance of the case when P g = 0.1, the proportion with P g = 0.7 raises faster with the increase of traffic density when ρ is small, and changes slowly when ρ is large. This result indicates that offloading in V2B mode may alleviate the transmission burden on vehicular networks when the number of generated tasks is relatively small. However, due to the resource constraints of Serv 0 , this approach can not continuously improve offloading utility when the number of generated tasks is high. Fig. 8 shows the comparison of average reliability for different offloading schemes with various traffic density. Our proposed offloading scheme with adaptive redundant transmission path number brings the highest offloading reliability. Although compared to offloading scheme without redundant data transmission, applying offloading scheme with fixed number of redundant path yields higher reliability, its performance gets poor as ρ increases, especially when ρ is higher than 0.08. Due to the interference between vehicle communication pairs, in the scenario with high traffic density, too much redundant transmission may further aggravate the interference, and worsen offloading reliability. Fig. 9 illustrates average utilities of a task for different offloading schemes with various traffic density. As higher traffic density leads to worse vehicular communication performance, more redundant transmission and corresponding communication cost are required to ensure the offloading reliability. As a result, the utility of our proposed adaptive redundant scheme falls down with increase of ρ. However, due to the adaptive number of redundant path as well as the cooperation of multiple transmission modes, which help reduce the offloading cost, our scheme still gets higher utility. It is noteworthy that when ρ is above 0.08, the utility of the scheme with fixed number of redundant paths is lower than that of the offloading scheme without any redundant transmission. The reason is that in high traffic density scenario, fixed number offloading scheme results in significant interference in vehicular communication, which reduces data delivery rate while increasing offloading delay. Therefore, the fixed number scheme improves offloading reliability at the cost of great utility loss.

VII. CONCLUSION
In this paper, we have studied task offloading in a heterogeneous vehicular network with multiple MEC servers. To maximize task offloading utility, we first design an optimal offloading scheme with joint MEC server selection and transmission mode determination in a deep Q-learning approach. Then, we focus on reliable offloading in presence of task transmission failure, and propose an adaptive redundant offloading algorithm to ensure offloading reliability while improving system utility. The analytic results illustrate the gain from the proposed schemes for offloading vehicular tasks with optimal utility as well as respecting the constraints of reliability and latency. APPENDIX A PROOF OF THEOREM 1.
Consider an interval (−d max v,r , d max v,r ) centered at the origin. When a point process consisting of V points distributed in This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication.
The final version of record is available at http://dx.doi.org/10.1109/JIOT.2019.2903191 this interval, the Euclidean distance d E between the origin and the farthest point follows the following conditional probability distribution [30] f The traffic density of the vehicles running on the road is ρ and task generating probability is P g . As the vehicles generating tasks cannot receive data from other ones, the density of potential V2V relay vehicles is (1−P g )ρ. Then, the probability of υ vehicles located in road segment (−d max v,r , d max v,r ) is calculated as According to (34) and (35), we obtain the probability distribution of d E as (36) Let ∆d be distance reduction from d max v,r , and v f be the farthest vehicle that can be communicated with by the vehicle placed at the origin. The probability that vehicle v f is on road segment (d max v,r − ∆d, d max v,r ) can be given as Given coefficient σ, where 0 < σ 1, when ∆d > ln σ/2(1 − P g )ρ, vehicle v f is located in the road segment with a probability close to 1. Thus, the mean value of the distance between v f and the vehicle located at the origin is d max v,r − ∆d/2.

APPENDIX B PROOF OF LEMMA 1.
For vehicular communication from vehicle v to receiver r, as each transmitting vehicle randomly chooses a spectrum block from n r blocks, the average number of vehicles between r and its nearest interfering vehicle, which works in the same spectrum as r, is presented as h = (1 − 1 nr ) 1 nr + (1 − 1 nr ) 2 1 nr · 2 + ...
Considering that the location of vehicles on the road follows Poisson distribution with density ρ, the probability distribution of the distance between the nearest interfering vehicle and r is given as Let {d 1,r , d 2,r , ...} denote the distance between r and the interfering vehicles from near to far, respectively. Since the location of these vehicles is independent and identically distributed, and the distance between two adjacent vehicles follows exponential Integrating f k (k) and f s (s) into (41), we can derive (20). His research focuses on resource, spectrum, energy, routing and networking in Internet of Things, vehicular networks, broadband wireless access networks, smart grid, and the next generation mobile networks. He published over 150 research papers in recent years. He serves as an organizing committee chair and TPC member for many international conferences, as well as a reviewer for over 10 international research journals.