Enabling Real-Time AI Edge Video Analytics

This paper introduces a novel distributed AI model for managing in real-time, edge based intelligent analytics, such as the ones required for smart video surveillance. The novelty relies on distributing the applications in several decomposed functions which are linked together, creating virtual chain functions, where both computational and communication limitations are considered. Both theoretical analysis and simulation analysis in a real-case scenario have shown that the proposed model can enable real-time surveillance analytics on a low-cost edge network. Finally, a caching mechanism is proposed and evaluated, reducing further the operational costs of the edge network.


I. INTRODUCTION
Artificial Intelligence (AI), as expressed by the latest developments of Machine Learning (ML) and Deep Learning (DL), has produced numerous models which are mature enough to reach the market massively during the next few years. The main reasons for this, mainly involves the improvements on data-capturing devices, the re-engineering of several ML algorithms and the release of ML and DL toolkits, like PyTorch and TensorFlow [1]. Video analytics is an umbrella term for describing applications like object tracking, pedestrian detection, face recognition, behavioral analytics etc. The common business -technology model for deploying AI surveillance services nowadays is Cloud Computing [2], where the captured video streams are uploaded to a centralized data center. This imposes a round-trip time to the throughput of the service which, in many cases reduces significantly the Quality of Service (QoS). This leads the service providers to either reduce the processing frames per second (fps) or lower the resolution of the captured videos, nullifying the advances of new video sensors, like UHD and HDR sensors. Edge Computing has been proposed as a computing paradigm according to which the data are processed 'near' the generating data devices and comprises many low-capacity devices [3]. While this paradigm addresses the large round-trip times of Cloud Computing, the QoS is now limited due to the capacity of the edge devices. Not only academia, but lately industry has placed focus on Edge Computing, by providing software (e.g. Google Lite TensorFlow©) and hardware (e.g. NVIDIA Jetson AGX Xavier© and AWS DeepLens©) solutions are suitable for edge processing. This paper proposes a novel distributing framework which explores the Virtual Function Chaining (V F C) concept inspired from the Software-Defined Networks and enables the real-time inference for surveillance applications at the Edge. In this model, an AI smart video analytics service can be decomposed to a set of Virtual Functions (V F s), which can be deployed on different edge devices. Using these V F s, a V F C is created which process the streaming data in a distributed fashion. The contributions of this work mainly include: (i) A model for designing the optimum setup for a V F C, in terms of V F instances, V F placement on the edge devices. Each V F may appears in the V F C at several instances and interconnected V F Cs. (ii) A system architecture which seamlessly integrates V F Cs. The proposed architecture's services are mainly hosted on the Virtual Function Orchestrator (V F O). (iii) An optimization framework for effectively deploying AI video surveillance services on the Edge. (iv) A prototype of the described architecture, which is used to evaluate the described models and provide a proof of concept, in terms of effectiveness and feasibility, on enabling V F Cs as a model for real-time AI surveillance applications. (v) A caching mechanism, which demonstrates the scalability of the proposed model when multiple services are required. While the proposed model is inspired by the Service Function Chaining (SF C) concept, it alters and extends several of its features, in order to meet the requirements of video analytic services. First, it introduces a load-balancing mechanism connected with the desired QoS, which monitors the output of the service and rearranges the V F C automatically. Additionally, it extends the SF C model by allowing one V F instance to be part in several V F Cs (e.g. face detection, gender identification, etc.).
Several research studies have been focused on enabling edge computing to support demanding latency sensitive applications. Author in [4] has proposed a scheme for handling mass video data coming from city surveillance services on heterogeneous digital devices. Zhou et al. [5] have described a model for offloading cloud by utilizing an edge meshed network. Li et al. [6] have proposed a general virtualization architecture, based on VMs, mainly focusing on its networking aspects. Chen et al. [7] have described an architecture which explores fog computing as a processing infrastructure for supporting dynamic urban surveillance streams. Dautov et al. [8] have performed a comparison study among cloud, fog and edge computing for supporting intelligent surveillance applications. The authors in [9] provides a survey of the applications that can be supported from edge computing. Author in [10] provides a holistic vision about surveillance applications on edge / fog computing paradigms, where the basic concepts, challenges and opportunities are discussed. Finally, Chen et al. [11] have proposed a distributed deep learning model for video surveillance systems, while Puthal et al. [12] have presented a novel approach on load balancing of distributed edge servers. Our model has compared with the current state-of-the art literature and does not require any special virtualization (e.g. Virtual Machines) [6] or distribution [13] (e.g. Apache Spark©) middle-ware in order to perform the real-time calculation of AI analytics, offloading the edge devices from the substantial overhead these approaches require. Additionally, surveillance applications are decomposed in virtual functions that are deployed in nodes with the available resources. Such functions are scaled up based on demands. The rest of this paper is organized as follows. Section II describes the V F C model while Section III includes the system implementation details. In Section IV, the results from the experiments are discussed and the conclusions and future work are drawn in Section V.

II. MODEL FORMULATION
Aiming to enable edge as a real time inference mechanism for AI video analytics services, a generalization framework is proposed, according to which a video analytic service is decomposed to a set of V F s, creating a V F C. The proposed model aims to facilitate the efficient offloading of surveillance cloud services to a cooperative distributed edge environment. The basic principles of the proposed model ( Figure 1) are the following: (i) Each surveillance service is decomposed to set of processes. Each process implements certain tasks, like image enhancement, edge detection, etc. (ii) Each process instantiates a V F and is deployed as an edge node. Each V F comprises three parallel threads, the Input Queue, the Output Queue and

Entity Formulation Description
Surveillance service f ps is the processed frames / sec the service requires (QoS) {V F j } is the set of processes comprise the service cpu load is the required CPU instructions per frame oudata is amount of data virtual function produces after processing the input data Edge Node m is the CPU instructions / sec the device can execute c(l) is the cost function of the device, when performing l CPU instructions. Cost is a general term which includes battery life, maintenance cost, etc. r(l) is the required time to process l CPU instructions bw is the communication bandwidth among edge nodes a and b the Running agent. (iii) A surveillance service is realized by a V F C, similar to the service function chaining proposed by the IETF WG [14]. A V F C must include at least one instance of each V F . The main concept of the model proposed by [14] includes network services, like firewall and packet filtering. (iv)The V F O manages the V F s allocation to the physical devices and established the communication channels among them. Additionally, V F O monitors the performance criteria of the service (e.g. processed frames/sec, total cost, etc.) and performs actions in order to meet them. Each service is described by a V F C, and in general, a single V F can have multiple instances within the edge environment. When a user subscribes to a service (e.g. object detection, etc.), V F O instantiates the V F C by implementing the following tasks: (i) Calculates the required number of instances for each V F , in order to meet the service's QoS criteria, (ii) allocates the V F s to edge devices and (iii) establishes communication channels between the edge devices. A V F instance can be part of several service chains. Table I summarizes the formulations of the main entities of the proposed modeling framework.

A. Problems definition
V F O node needs to assign the V F C to the edge environment. In order to achieve this, the following general assignment constrained problem needs to be solved: Problem 1: Determine the number of V F instances and assign each instance to an edge device, such that the video analytics are generated while maintaining the required f ps, and the total network cost be minimum.
Problem 1 can be formulated as: , where n is the number of edge devices, m is the number the required time for device D i to execute V F j and process the data produced from a single frame and instances j is the number of the required instances for V F j . Finally, W i,i is the bandwidth of the communication link between nodes i and i , which host two adjacent V F s, j and j + 1. This formulation describes a model which aims to minimize the total cost of the service (eq. 1) while meeting the QoS constrains (eq. 5), with the minimum number of V F instances (eq. 2), requiring that each V F instance must by assigned to exactly one edge device (eq. 3) and each edge device can undertake no more than one V F instance (eq. 4). This is a NP-Hard problem [15], which requires a substantial computational time to be solved. In order to acquire a feasible solution within an acceptable timeframe, we decouple Problem 1 to two sub-problems: (A) V F instances sub-problem and (B) the assignment (placement) sub-problem. Sub-problem A aims to identify the minimum number of instances for each V F . As discussed in the previous section, one instance from each V F must be deployed on the V F C, in order to support the service. Let GV F = [V F 1 , V F 2 , ..., V F n ] be the set of the first instances of each V F . Each one of these V F s will be deployed to a different edge device. At this point of the assigning workflow, the allocation cost is not considered. Yet, we seek if there is a feasible solution of the placement, such that the QoS constrain is met. This results to the following relaxation.
Regarding t ij , it can be calculated using the r d () function of the edge device d. Thus t ij = r i (N j ), where N j is the CPU instructions required by V F j to complete its task. (Eq. 7) reflects the fact that each V F from the GV F set must be appointed only to one node and (eq. 8) that each edge node can not undertake more than one V F . Sub-problem A can be solved in a polynomial time by modeling it as a Min Cost Flow problem, which is a widely studied problem [16]. The utilized solver is a typical Hungarian algorithm. This process results to an allocation of the GV F with the minimum required time that the network can support. Let t * be the resulting time. If t * ≤ 1 f psserv , then the edge network can support the service without having to replicate a subset of the V F s. In this case, we can re-formulate the assignment problem as a constrained mixed integer problem, with the following formulation Sub-problem B: The objective function (eq. 9) of this formulation aims to minimize the total cost of the V F C deployment to edge network, while fulfilling the QoS constrains of the service (eq. 12) and assigning all V F s to a device (eq. 10) while limiting the number of deployed V F s to a device (eq. 11). This problem, can be solved by utilizing Constrained Programming [17] in a polynomial time. 1) Solving Problem 1: If t * > 1 f psserv , then the computational capacity of the edge devices is insufficient to support the service's QoS, if only one instance of each V F is deployed. In order to tackle this, we draw inspiration from the recently launched concept of Cloud-native functions [18], which handle dynamically the number of their instances aiming to handle the incoming requests. Thus, we propose a mechanism according to which a V F can be launched to multiple devices, and share the data coming from the previous V F of the V F C following a round-robin approach.
Thus, if we deploy a second instance of V F j , the required time for the function to process the data related to a single frame changes from r k (l) to r k (l) 2 + b, where b is the time overhead implied for handling the data separation on V F k−1 (previous V F in the chain) and data merging on V F k+1 (next V F in the chain), assuming that the two instances of the V F is deployed to identical nodes.
This case leads as to the formulation of a new sub-problem (sub-problem C). Its objective is to identify the minimum number of replicate instances for each V F that need to be deployed on the edge network, in order to meet the QoS constrains. Let − → S = [s 1 , s 2 , ..., s m ] represent the number of instances for each one of the V F s, with s i being an integer larger or equal to one (s i ≥ 1). Thus, we derive the following problem formulation: , where N j are the required instructions per frame required for executing V F j on device f , with f undertaking V F j+1 . (Eq. 13) drives our model to produce solutions with the minimum total new instances of the V F s, while (eq. 14) satisfies the QoS of the surveillance service. Unlike the problem formulated by eq. (9-12), this is a non-linear mixed integer problem, which required the utilization of the active set solver APOPT [19]. Let the result of this sub-problem be instances = [ins 1 , ins 2 , ..., ins m ]. Using instances, we can revisit Sub-problem B and solve the allocation problem as before. The discrepancy of the latest described sub-problem is that the utilized nodes f and f are unknown. This is rational, because we seek the number of the V F instances with regard to the computational time, which depends not only on the V F load but also on the node that will undertake the V F . We resolve this deviation by using Algorithm 1. This algorithm can provide two approaches: (a) worstCase scenario, where the edge device used to calculate (eq. 14) is the one with the lowest processing capacity and (b) bestCase scenario, where the highest processing capacity device is utilized. Algorithm 1 receives as input the processing f ps implied by the QoS and the allocation of the initial instances of the V F s. As depicted in Fig. 3 ('model' plots), both approaches converge to the desired processing fps. The reported results have been derived by a simulation framework that has been developed in order to evaluated the reported approach (github.com/blind-review-process). As far as the two functions (addReplicate() and removeReplicate()) used in Algorithm 1, they calculate for each V F the improvement (for the addReplicate()) or the regression (for the removeReplicate()) a new instance will have to the total cost. Let {v i } be these values. Then, we choose the V F which minimizes the difference |f ps current − v i |. In each iteration, Sub-problem B is solved. Aiming to evaluate the accuracy of the proposed algorithm for solving Problem 1,  (1); if (f ps real < f ps current − ) then The total average extra cost imposed by the proposed approach was ≈ 1.97%, while the required time for each algorithm to solve the problem was ≈ 5.61sec for the proposed approach and 7.89 × 10 3 sec for the greedy algorithm (Fig. 2). The solvers, which have been implemented using GEKKO suite [20], have been executed on a Intel i7 2.8GHz (8-core) on 8GB of RAM.

III. SYSTEM IMPLEMENTATION
Aiming to evaluate the model described in Section II, both a simulation and a prototype edge network has been used, where all the necessary functionalities have been developed and de-ployed, to support AI real-time video analytics of surveillance services. The simulation framework modeled each edge device as a Linux process. Linux commands cpulimit and ulimit were utilized to mock specific computational capacities for each 'device'. Each video analytic service has been modeled as a set of n V F s, where n is a random integer ∈ [3,5]. Finally, each V F could by either a light V F , a moderate V F or a heavy V F , with relative computational characteristics each. Each V F may by identical with another V F with a probability of 15%, enabling the caching mechanism described in the following sections. Two different setups have been implemented. Setup I deploys a much simpler distribution model (as described in Section IV), while Setup II utilizes the V F C model with the caching mechanism. The implemented edge network comprise 6 Raspberry PI 3 (model B+) devices, with a Quad Core 64bit CPU @ 1.2GHz and 1GB RAM and 2 Raspberry PI 4 devices with a Quad core Cortex-A72 (ARM v8) 64-bit CPU @ 1.5GHz with 4GB RAM, running Raspbian OS. The feed from the camera was mocked as video file from the VIRAT dataset [21]. Two video analytics services have been deployed on the edge network. Service A and Service B, requiring gender and age classification respectively (pre trained deep learning models, based on [22]). Both Service A and Service B decompose to 4 V F s.

IV. RESULTS
One can argue that the V F C approach adds a lot of complexity on the management of the edge network, while the same results could be reached if multiple agents have been deployed on the network and each agent would execute the full stack of the V F s. It is obvious that this approach would only require one device to distribute the frames among the agents (Setup I). Yet, this approach, despite its simplicity, share two main drawbacks. The total cost of the edge network is greater than the V F C approach. In case of multiple services, there is no space for sharing the data among the services. The next section reports an analysis on these aspects, based on experiments held on the edge network described in the previous section. Fig. 5 and Fig. 4  the V F C model enables the support of a specific number of services with fewer edge devices and with substantially lower total cost, as the service demand scales. Regarding the real edge network, in order to support Service A, V F O deployed 1 instance of V F 1, 2 of V F 2, 4 of V F 3 and 1 instance of V F 4. For Service B, the resulted instances were 1, 3, 3 and 1 for V F 1, V F 2, V F 5 and V F 6 respectively. The calculated V F Cs are in accordance with the V F characteristics, as the most demanding V F s (V F 3 and V F 5) participated in the chains with the largest number of instances.

A. Caching mechanism
Services A and B can share V F 1() and V F 2(). For this, we have developed a caching mechanism, according to which when a node executes a V F for data related with frame k, it stores the results in a stack for a specific timeframe. In case another video analytics service request from the same V F to process data related to an already processed frame, it retrieves the results and forwards them to the next V F of the V F C, without recalculations. A set of experiments has been conducted, trying to reveal the benefits of the V F C approach. The results have been reported on Fig. 6 were collected after a 10 minute run of the system, for each setup. Four different setups were tested (Table II). Setups were configured in order to support the QoS constrains of Services A (f ps = 1 12 ) and B (f ps = 1 10 ). Fig. 6 illustrates that the QoS has been reached for all setups, i.e. the edge network was able to support all setups. When comparing though the total cost between Setups I and II, we notice that the V F C model outperforms the typical 'all-in-one' setup by saving approximately 51.2% of the cost. Finally, the caching mechanism tested in Setup IV decreases by 47.3% the total cost, when compared with Setup III.

V. CONCLUSIONS AND FUTURE WORK
Edge computing is expected to be an important part of the AI industry during the next few years. Its advantages lie not only on the proximity with the processing data, but also on the data protection and safety issues, which are debatable on the Cloud computing paradigm. This paper proposes a novel concept for enabling real-time AI applications on an Edge network. Our proposal is based on the V F Cs which are used to distribute an AI application across the edge network on an scalable fashion. After providing a mathematical model for our system, we report the results of a real-case scenario, where the system has been implemented and tested in various setups. A caching mechanism is also described, which extents even further the capacity of our system. The experiments have provided evidence that such this approach can be used to undertake heavy-load AI applications and handle them realtime. Regarding the next step of our work, we plan to extent our model to have the capacity to handle node failures, by adding a migration mechanism to our architecture. Finally, a more extented comparison framework will be developed, aiming to compare the performance of the proposed model against general distribution frameworks like Apache Spark©.