Programmable Metasurface Based Multicast Systems: Design and Analysis

This paper considers a multi-antenna multicast system with programmable metasurface (PMS) based transmitter. Taking into account of the finite-resolution phase shifts of PMSs, a novel beam training approach is proposed, which achieves comparable performance as the exhaustive beam searching method but with much lower time overhead. Then, a closed-form expression for the achievable multicast rate is presented, which is valid for arbitrary system configurations. In addition, for certain asymptotic scenario, simple approximated expressions for the multicase rate are derived. Closed-form solutions are obtained for the optimal power allocation scheme, and it is shown that equal power allocation is optimal when the pilot power or the number of reflecting elements is sufficiently large. However, it is desirable to allocate more power to weaker users when there are a large number of RF chains. The analytical findings indicate that, with large pilot power, the multicast rate is determined by the weakest user. Also, increasing the number of radio frequency (RF) chains or reflecting elements can significantly improve the multicast rate, and as the phase shift number becomes larger, the multicast rate improves first and gradually converges to a limit. Moreover, increasing the number of users would significantly degrade the multicast rate, but this rate loss can be compensated by implementing a large number of reflecting elements.

and industry. In general, the applications of PMSs can be divided into two catagories. One typical application is to use the PMS as a passive relay to assist in the communication from the transmitter to the receiver [7][8][9]. Specifically, the PMS is deployed between the transmitter and receiver. Each PMS is connected with a controller which communicates with the transmitter via a separate wireless control link for coordination and exchanging channel state information (CSI) and smartly adjusts the phase shifts of reflecting elements. Such communication mode is especially useful when the direct link between the transmitter and receiver is blocked [10][11][12]. For example, assuming no line-of-sight communication is present, the work [10] investigated a PMS-aided multiple input single output (MISO) communication systems, showing that the use of PMS increases the system throughput by at least 40%, without requiring any additional energy consumption. Also, there are some works studying the utilization of PMSs in the presence of direct links [7,13]. However, using the PMS as a passive relay has two main disadvantages in practical systems.
• First, the PMS is far from the transmitter, making it difficult to obtain information (e.g.CSIs) from the transmitter, due to its passive architecture. To tackle this problem, a two-mode PMS model was proposed in [14,15], where the PMS is equipped with a controller that switches between receiving mode for CSIs and reflecting mode for data transmission. However, the realization of receiving mode requires the deployment of receive radio frequency (RF) chains, leading to more hardware cost. • Secondly, as pointed out in [16], instead of deploying the PMS between the transmitter and receiver, placing the PMS right at the transmitter or receiver will cause less power loss.
To overcome these drawbacks, another more practical application of the PMS is to use the PMS as a component of the transmitter. 1 Specifically, the PMS is deployed right at the transmitter, and each PMS cooperates with a RF chain. The signal transmitted from the RF chain is reflected by the PMS with little power loss, due to very short distance between the RF chains and the PMS. Moreover, the PMS controller is connected with the base station (BS), making it easier for the PMS to access the CSI information, thereby facilitating the joint design of phase shifts and digital beamformer. Furthermore, experimental results have demonstrated that the PMSbased transmitter is feasible [18,19]. For instance, a PMSbased transmitter presented in [18] has realized single carrier quadrature phase shift keying (QPSK) transmission over the air, achieving a data rate of 2.048 Mbps, which is comparable to that achieved by the conventional method but with much lower hardware complexity. Later on, the work [19] realized a PMS-based 8-phase shift-keying (8PSK) transmitter which can achieve a higher data rate of 6.144 Mbps over the air.
However, very few works have investigated the theoretical limits of communication systems with PMS-based transmitter [18,19]. Also, the existing experiments all focus on the scenario with only a single RF chain. Motivated by these observations, in this paper, we propose a PMS-based transmitter including multiple RF chains for multicast communication systems, taking into account of finite phase shifts, and present a detailed analysis on the achievable system performance. To the best of our knowledge, this is the first attempt to provide theoretical analysis for communication systems with PMSbased transmitter. The main contributions of this paper are summarized as follows: • A novel channel estimation scheme including phase shift beam training and equivalent channel estimation has been proposed. Simulation result shows that the proposed phase shift beam training algorithm achieves good performance but with much lower time overhead. • A closed-form expression is derived for the achievable rate of individual users, which enables efficient evaluation of the multicase rate, as well as reveals the impact of key system parameters on the user rate. • For some asymptotic scenarios, such as large pilot power, large number of RF chains, and large number of reflecting elements, closed-form solutions are derived for the optimal power control coefficients and the corresponding multicast rate. The remainder of the paper is organized as follows. In Section II, we introduce the PMS-based multicast system, while in Section III, we propose a channel estimation scheme including phase shift beam training and equivalent channel estimation. Then, the achievable rate is derived in Section IV, based on which we investigate the optimal power control coefficients and give a detailed analysis on the multicast rate in Section V. Numerical results and discussions are provided in Section VI, and finally Section VII concludes the paper.
Notation: Boldface lower case and upper case letters are used for column vectors and matrices, respectively. The superscripts () * , () T , () H , and () −1 stand for the conjugate, transpose, conjugate-transpose, and matrix inverse, respectively. Also, the Euclidean norm and absolute value are denoted by · and |·|, respectively. In addition, E {·} is the expectation operator, and tr (·) represents the trace. And, j of e jθ denotes the imaginary unit. Finally, z ∼ CN (0, δ 2 ) denotes a circularly symmetric complex Gaussian random variable (RV) z with zero mean and variance δ 2 , and z ∼ N (0, δ 2 ) denotes a real valued Gaussian RV.
II. SYSTEM MODEL We consider a single-cell multicast system as illustrated in Fig.1, where the BS equipped with a PMS-based transmitter communicates with a group of K single-antenna users.
The partially connected architecture is adopted, which is realized by aligning the beam of each directional horn antenna to the corresponding sub-PMS consisting of L = N NRF nonoverlapping reflecting elements, where N RF is the number of RF chains (antennas) and N is the total number of reflecting elements. 2 3 The i-th sub-PMS consists of L reflecting elements corresponding to the i-th RF chain. Each element of the i-th sub-PMS behaves like a keyhole. During the uplink transmission period, the reflecting element combines all the received signals and re-scatters the combined signal to the i-th RF chains, while during the downlink period, the reflecting element combines signal from the i-th RF chain and re-scatters the signal as if from a point source.
Since the PMS is close to the BS, the channel between them can be modeled by a line-of-sight (LOS) channel. Specifically, the channel from the i-th antenna (RF chain) to the i-th sub-PMS is given by g T B2P,i = α B2P a T i , where α B2P denotes the path loss coefficient given by G Ae , where G is the antenna gain, A e is the effective area of each reflecting element perpendicularly to the direction of propagation, and d B2P is the distance from the BS to the PMS. a T i is the array response vector of the i-th sub-PMS, whose elements have unit amplitude.
Let c = β[e jθ1 , ..., e jθn , ..., e jθN ] T denote the phase shift beam, where θ n ∈ [0, 2π) and γ ∈ [0, 1] are phase shift and amplitude coefficient, respectively. The amplitude coefficient is given by β = γα B2P with γ depicting the energy reflection efficiency of the PMS, while the impact of the array response vector a i is reflected in the phase shifts of c.
In practice, the reflecting elements are controlled by the digital to analog converters (DACs), hence have finite phase shifts due to limited DAC resolution. Without loss of generality, we use Q to denote the set of all possible values of θ n , which has a cardinality of M ph . Similarly, the set of all possible phase shift beams are denoted by C, which has a cardinality of M = M ph N . We assume block-fading channels, i.e., the channels remain the same during each coherence interval and vary independently between different coherence intervals. The entire communication process can be separated into two phases during each coherence interval, namely, channel estimation and multicasting transmission, which we elaborate in the ensuing sections. 2 Please note, the distance between the BS and the PMS is related to the carrier wavelength. In general, a smaller carrier wavelength implies a shorter distance. 3 It is worth noting that the proposed PMS transmitter architecture is different from the hybrid analog and digital beamforming transceiver structure. First, the methods to realize the partially connected architecture are different. In the proposed architecture, the partially connected architecture is realized by aligning the beam of a directional horn antenna to the corresponding sub-metasurface, while in the hybrid architecture, the partially connected architecture is realized by connecting each RF chain to a subarray via phase shifters. Secondly, in the proposed architecture, phase shifts are realized by the passive metasurface, while in the hybrid architecture, the adjustment of signal phases is realized by phase shifters which in general require complex circuits. Moreover, in a more general full-connected case, at each reflecting element, signals from different RF chains are first combined and then reflected with the same phase shift, while at each antenna of the full-connected hybrid architecture, the signals from different RF chains are first adjusted with different phase shifts by different phase shifters and then combined together.

III. CHANNEL ESTIMATION
The proposed channel estimation scheme consists of two steps. In the first step, beam training is performed to acquire the optimal phase shift beam. In the second step, the equivalent channels are estimated.

A. Beam Training
Since the cardinality of phase shift beam set increases exponentially with the number of reflecting elements, the complexity of conventional exhaustive beam searching approach quickly becomes prohibitive. Responding to this, we propose a novel beam training algorithm.
Specifically, during the beam training phase, all K users simultaneously transmit unmodulated frequency tones to the BS. For user k, the transmitted signal is denoted by x k = √ p k s, where p k is the power and s is the frequency tone of unit power.
For any k, we assume that E{ √ p k g k 2 } = ε r , with ε r being the average received power. Also, g k denotes the channel between the PMS and the k-th user and is defined as g k = √ α k h k , where α k models the large-scale fading, and h k models the small-scale fading with elements being independent and identically distributed (i.i.d) CN (0, 1) RVs. Furthermore, α k is assumed to be constant and known as a priori. After simplifying E{ √ p k g k 2 }, we have N α k p k = ε r .
The proposed beam training method works in a bisection manner, namely, at each stage, nearly half of the available beams will be eliminated. For instance, at the i-th stage, the BS chooses a pair of beams c (i,j) , j = 1, 2, which have the weakest correlation from the current beam set C i . As such, the received signal after combining can be written as where C (i,j) ∈ C NRF×N is a block diagonal matrix defined by with c T min r (i,j) 2 . It is intuitive that the optimal beam is more likely to have stronger correlation with c (i,j ⋆ ) . With this key observation, the number of training beams can be approximately halved by removing the beams which have weaker correlation with c (i,j ⋆ ) . Specifically, the beam c ∈ C i satisfying c H (i,j ⋆ ) c ≤ c H (i,j −⋆ ) c will be removed, and the remaining beams makes up a new beam set C i+1 . The process then continues until the cardinality of C i+1 becomes one. The pseudo-code of the proposed beam training method is summarized in Algorithm 1. 4 Remark 1. Since our proposed beam training method works in a bisection manner, a much lower complexity of O(log 2 (M N ph )) can be achieved, compared with the complexity of exhaustive beam searching O(M N ph ). Proposition 1. When both M ph and ε r are sufficiently large, the ideal phase shift beam obtained by Algorithm 1 can be approximated by where c l opt,n = β Proof. For notational convenience, we drop the subscript (i, j) in (1) and we have where (a) is according to N α k p k = ε r and (b) follows the fact that εr N is sufficiently large. Since the objective is to find the optimal phase shift beam c maximizing r , we have the following equivalent optimization problem Leveraging (2) and (4), we can express r 2 as where h k,n denotes the channel vector between the k-th user and the n-th sub-PMS. Based on the above equation, the optimization problem (5) can be rewritten as Algorithm 1 Beam training algorithm Initialize: stage number i = 0, the training beam set of the first stage C 1 = C.
Since the number of phase shifts, i.e., M ph , is sufficiently large, we relax the elements of c to be complex numbers with continuous phases and fixed amplitudes, and obtain the following optimization problem: Denote h l sum,n = K k=1 h l k,n . It is obvious that the phase of c l n should equal to that of h l sum,n , which completes the proof.

B. Equivalent Channel Estimation
Denoteh k Ch k ∈ C NRF×1 and define the equivalent channel between the BS and the k-th user asḡ k = √ α khk ∈ C NRF×1 . Note that C is the phase shift matrix corresponding to the optimal phase shift beam obtained in the beam training phase.
Then we estimate the equivalent channel through uplink training, where all K users simultaneously transmit orthogonal pilot sequences to the BS. Let τ c be the length of the coherence interval (in symbols), and τ p be the uplink training duration (in symbols) per coherence interval such that τ p < τ c . Denote the pilot sequence used by the k-th user, k = 1, 2, ..., K, by Then, the N RF × τ p received pilot matrix at the BS can be expressed as where ρ p is the normalized signal to noise ratio (SNR) of each pilot symbol, W p ∈ C N ×τp is the additive white Gaussian noise (AWGN) matrix, whose elements are i.i.d. CN (0, 1) RVs.
To estimateḡ k , we first multiply Y p by ϕ k , which gives where n p,k = W p ϕ k . The BS then adopts the minimum mean-square (MMSE) method to estimate the equivalent channel, as such, the equivalent channelḡ k can be decomposed as whereĝ k is the estimation ofḡ k , e k is the estimation error.
To obtain the distribution of the estimated equivalent channel, we first give an important proposition corresponding to the distribution of the equivalent channel.
Proposition 2. With finite number of phase shifts, the elements ofh k = Ch k can be modeled as i.i.d. random variables CN u, δ 2 with Proof. See Appendix A.
Remark 2. From Proposition 2, we can see that the deployment of the PMS can enhance the equivalent channel compared to the case without the PMS. Specifically, the strength of the channel without the PMS is α k , while the strength with the PMS is given by α k (u 2 + δ 2 ), indicating that an asymptotic gain in the order of O L 2 can be achieved. This is because the PMS not only achieves the phase shift beamforming gain of order L but also captures an inherent aperture gain of order L by collecting more signal power.
Based on Proposition 2 and the MMSE estimation property, e k andĝ k are complex Gaussian distributed, and they are independent of each other. Then, we have the following proposition: Proof. See Appendix B.

IV. ACHIEVABLE RATE ANALYSIS
During the multicasting phase, the BS utilizes the estimated equivalent CSI to precode the signals. To keep the processing simple, the BS adopts the transmit matched filter (MF) W = G * , then the received signal at all users is given by whereḠ = [g 1 , ...g k , ..., g K ], ρ is the total average transmit power (normalized by the noise power), .., s, ..., s] T is the the data symbol vector satisfying E{|s| 2 } = 1, and n ∼ CN (0, I K ) denotes the noise.
Noticing thatḠ =Ĝ + E, the above equation can be rewritten as Then, the received signal at the k-th user is given by where (a) follows the fact that E w i 2 = u 2 p,i + δ 2 p,i . Next, without loss of generality, let us focus on the achievable rate of the k-th user. We consider the realistic case where the k-th user does not have access to the instantaneous CSI of the effective channel gain. Instead, the detection of desired signal s is based on the statistical CSI. As such, we can rewrite y k as where Capitalizing on the results in [21], the achievable rate of the k-th user can be expressed as 5 with being the desired signal power and leakage power, respectively. Then, we have the following important result: Theorem 1. The achievable rate of the k-th user is given by (22) on the top of the next page.
Proof. Refer to Appendix C.
Theorem 1 presents a closed-form expression for the achievable rate which reveals the impact of key system parameters, such as the number of phase shifts, reflecting elements, RF 5 It is worth noting that this expression is derived under the assumption of the transmit MF and the realistic case where the users have no access to the instantaneous CSI of the effective channel gain. chains and users, as well as the impact of imperfect channel estimation on the achievable rate. For instance, R k is an increasing function with respect to N RF . Besides, it can be seen that the desired signal power decreases with the equivalent channel estimation error, indicating that we can improve the channel estimation accuracy, for example by increasing the pilot power.
After deriving the individual rate for any user k ∈ {1, 2, ..., K}, the multicast rate R can be obtained as V. POWER CONTROL To maximize the multicast rate, we formulate the following power control problem: In the general setting, the above optimization problem is a non-convex problem, hence is difficult to solve. Responding to this, we consider some asymptotic regime, where closed-form solutions can be derived.

A. Large pilot power
We first consider the scenario where the pilot power is sufficiently large, and we have the following important result: Theorem 2. As ρ p → ∞, the optimal power control coefficients are η k = 1 K , k = 1, ..., K, and the corresponding multicast rate is given by Mph π sin π Mph .
Proof. Refer to Appendix D.
Theorem 2 shows that, with large pilot power, the multicast rate is an increasing function with respect to L. This is because the equivalent channel can be enhanced by increasing the number of reflecting elements. Also, as the amplitude reflection coefficient increases, the achievable rate becomes larger, due to the fact that larger amplitude reflection coefficient implies less power loss when the transmit signal is reflected by the PMS. In addition, the multicast rate is a decreasing function with respect to K. This is reasonable because with fewer users, highly directional beams can be obtained. Furthermore, the multicast rate is constrained by the large-fading coefficient of the weakest user, but this negative effect of the weakest user can be compensated by increasing the number of reflecting elements or RF chains.

B. A large number of RF chains
Theorem 3. When L is fixed while N RF → ∞, the optimal power control coefficients are . And the corresponding multicast rate is given by Proof. Refer to Appendix E.
Theorem 3 implies that with a large number of RF chains, the effect of noise vanishes, and the multicast rate is determined by the channel conditions of all users. Moreover, the maximum signal to interference plus noise ratio (SINR) is proportional to N RF , indicating that increasing the number of RF chains can significantly improve the multicast rate. Besides, increasing the pilot power can improve the multicast rate, due to more accurate channel estimation.
Proposition 4. The power control coefficients η k is a decreasing function with respect to α k , indicating that more power should be allocated to users with poor channel conditions.
Proof. Utilizing the results given by Proposition 3, the optimal power control coefficient can be rewritten as η k =

C. A large number of reflecting elements
Theorem 4. When N RF is fixed while L → ∞, the optimal power control coefficients are and the multicast rate is given by Proof. Refer to Appendix F.
Theorem 4 shows that with a large L, the effect of noise as well as the equivalent channel estimation error vanishes. The reason is that a large number of reflecting elements can significantly enhance the equivalent channel. Also, as the number of reflecting elements increases, the amplitude reflection coefficient becomes irrelevant, indicating that increasing the number of reflecting elements can compensate for the power loss caused by PMS reflection. In addition, the SINR is proportional to N = N RF L, which implies that the multicast rate can be greatly improved by increasing the number of reflecting elements.
1) The impact of phase shift number: Proposition 5. With large L, the multicast rate is an increasing function with respect to the phase shift number M ph . Furthermore, when the phase shift number is sufficiently large, the multicast rate is given by Proof. Starting from R given in Theorem 4, we can see that Proposition 5 is rather intuitive since highly accurate beam can be obtained with high-resolution phase shifts. Moreover, as the phase shift number becomes sufficiently large, the multicast rate becomes independent of M ph and gradually converges to a limit, indicating that the gain of using highresolution phase shift diminishes gradually.
2) The impact of user number: Proposition 6. With a large number of reflecting elements, the multicast rate is a decreasing function with respect to the user number. Furthermore, with a large number of users, the multicast rate is given by Proof. Starting from Theorem 4, we can easily obtain the desired result.
From Proposition 6, it can be seen that with massive users, the SINR is inversely proportional to the user number, which implies that increasing the number of users can severely degrade the multicast rate. To compensate this rate loss, it is desired to employ a large number of reflecting elements.
3) Increasing N RF V.S. Increasing L : Although a higher rate can be achieved by increasing either N RF or L , it is better to increase the number of reflecting elements rather than the number of RF chains, because the power consumption and hardware cost of PMS are much lower than that of RF chains. In addition, with massive reflecting elements, the negative effects of noise, estimation error as well as the amplitude reflection coefficient can be effectively compensated, while with a large number of RF chains, only the effects of noise can be mitigated.

VI. NUMERICAL RESULTS
In this section, we provide numerical results to illustrate the performance of the PMS-based multicast system, as well as to verify the performance of the proposed channel estimation scheme. The considered system is assumed to operate at the frequency of f c = 4.25 GHz with the bandwidth of 180 kHz, 6 and the coherence time is 9 16πf 2 m with the maximum Doppler shift given by f m = 1 Hz. The noise spectral power density is −169 dBm/Hz. The channel from the transmitter to the user is modeled as Rayleigh fading. The large-scale fading coefficient is given by α = L −λ d , where λ = 3 is the path loss exponent, and L d is the transmission distance. The gain of each horn antenna is 20 dBi. The PMS deployed 1m away from the BS consists of N RF sub-metasurface, each of which consists of L reflecting elements with the size of 12 × 12 mm 2 . The impact of the PMS is reflected in the phase shift beam c. Unless specified, the optimal phase shift beam given in Algorithm 1 is adopted. In addition, we assume K users are uniformly distributed in a disk with the radius R = 200m. For each analytical result, 1000 random realizations of largescale fading profiles are generated. For numerical results, they are obtained by averaging over 1000 independent small-scale fading parameters for each realization of large-scale channels. Fig. 2 illustrates the performance of the proposed beam training scheme, where the normalized equivalent channel strength (NECS) (normalized by the ideal equivalent channel strength ) is defined as E{ CH 2 } E{ CoptH 2 } , with C opt being the ideal phase shift matrix given by Proposition 1. For comparison, the performance of the exhaustive scheme and the random selection scheme is also presented. As expected, the proposed beam training scheme significantly outperforms the random selection scheme over the entire SNR regime. Moreover, the performance of the proposed beam training scheme is close to that of the exhaustive scheme, regardless of the available number of phase shifts.   3 shows the multicast rate with different number of RF chains, where the analytical results are generated according to Theorem 1. As can be readily observed, the numerical results match exactly with the analytical results, thereby validating the correctness of the analytical expressions. Moreover, the 6 In practice, the metasurface can only handle a limited bandwidth, because the same phase shifts must be applied in the entire band. How to design the metasurface operating in a wider frequency band remains to be studied. multicast rate saturates in the high SNR regime due to imperfect channel estimation. In addition, we can see that the multicast rate improves as the number of RF chains increases. The reason is that a large number of RF chains leads to higher diversity gains.   4 presents the multicast rate with different numbers of reflecting elements (per sub-PMS) and reflection coefficients, where the "Approximate Results" curve is generated according to Theorem 2. As expected, the approximations well match the numerical results, especially with a larger L. Moreover, we can see that increasing L can significantly improve the multicast rate performance, because of the enhanced equivalent channel. Also, the multicast rate improves as the reflection coefficient β becomes larger due to a less power loss caused by PMS reflection.    5 shows the impact of the number of RF chains on the multicast rate with different pilot power, where the curve associated with "Approximate Results" is plotted according to Theorem 3. As the number of RF chains becomes larger, the gap between the "Approximate Results" curve and the "Numerical Results" curve becomes smaller, which verifies our analytical results in Theorem 3. Moreover, we can see that as the number of RF chains becomes larger, the multicast rate keeps increasing without a ceiling, indicating that a large number of RF chains would significantly improve the multicast rate. Also, the multicast rate increases with the pilot power, due to more accurate channel estimation.    6 illustrates the impact of the number of reflecting elements (per sub-PMS) on the multicast rate, where we generate the "Approximate Results" curve according to Theorem 4. As can be readily observed, the "Approximate Results" curve matches the "Numerical Results" curve well, thereby validating the correctness of Theorem 4. Moreover, we can see that as L becomes larger, the multicast rate keeps growing without a ceiling, which implies that increasing the number of reflecting elements (per sub-PMS) can always improve the multicast rate. Also, it can be observed that with the increase of phase shift number, the multicast rate becomes larger, due to more accurate beam training.   7 illustrates the impact of phase shift number on the multicast rate, where the "Limit" curve is plotted according to Proposition 5. As the phase shift number becomes larger, the multicast rate gradually approaches the limit given by Proposition 5, which verifies our analytical results. Moreover, a higher multicast rate limit can be achieved by increasing the number of reflecting elements. This is because with massive phase shifts, the multicast rate is mainly dominated by the number of reflecting elements. Besides, it can be seen that the multicast rate achieved by only few phase shifts is comparable to that with massive phase shifts. For instance, when L = 100, the multicast rate with 4 phase shifts is about 94% of that with 20 phase shifts.   8 depicts the impact of user number on the multicast rate, where the curve associated with "Approximate Results" is generated by Proposition 6. As can be readily observed, the approximation is very tight, thereby verifying our analytical expressions. Moreover, the multicast rate is a decreasing function with respect to the number of users, which indicates that increasing the number of users would always degrade the multicast rate. The reason is that a large number of users would lead to poorly directional beams. In addition, we can see that increasing the number of reflecting elements can compensate the rate loss caused by the increase of user number. For example, when the user number grows from 20 to 40, the muticast rate with L = 100 drops from 3 bits/s/Hz to 2 bits/s/Hz. However, by increasing L to 200, the muticast rate can remain unchanged at 3 bits/s/Hz. Fig. 9 compares the proposed PMS transmitter with a traditional multi-antenna transmitter. We can see that in the low-SNR regime, our proposed PMS transmitter is worse than the traditional multi-antenna transmitter due to the power loss caused by PMS reflection as well as signal propagation from the BS to the PMS. As the SNR increases, our proposed PMS transmitter becomes superior to the traditional multiantenna transmitter. Moreover, the rate gap becomes larger as the number of reflecting elements increases, due to both the increased beam gain and aperture gain of the PMS.

VII. CONCLUSION
This paper has investigated the performance of the PMSbased multicast system, taking into account of the limited  resolution of phase shifts. A novel beam training algorithm has been proposed, which achieves comparable performance as the exhaustive search scheme and has much lower time overhead. Then, an exact closed-form expression for the individual user rate has been derived. Moreover, several concise asymptotical approximations for the multicast rate are presented. The analytical findings suggests that deploying a large number of RF chains or reflecting elements can greatly improve the multicast rate. Besides, as the phase shift number increases, the multicast rate gradually saturates, and the multicast rate is a decreasing function with respect to the number of users. Furthermore, with a large number of RF chains, it is better to allocate more power to users with poor channel conditions. But with large pilot power or massive reflecting elements, equal power allocation is desirable.

APPENDIX A PROOF OF PROPOSITION 2
Without loss of generality, we focus on the n-th element ofh k :h k,n = c T n h k,n = L l=1 c l n h l k,n . Since we assume a large number of reflecting elements and limited number of RF chains, L = N NRF is large. According to the centrallimit theorem,h k,n approximately follows normal distribution CN Lu 0 , Lδ 2 0 , where u 0 and δ 2 0 are the mean and variance of c l n h l k,n , respectively. In the following, we try to derive u 0 and δ 2 0 . (1) Compute u 0 Denote the phase error resulted from the finite phase shift number M ph by ∆θ, and we have where (a) is obtained according to c l n = e j∆θ c l opt,n . We start with the computation of E e j∆θ : where (a) follows the fact that ∆θ ∈ (− π Mph , π Mph ) is uniformly distributed.
Then using the result given by Proposition 1, we can express E{c l opt,n h l k,n } as E{c l opt,n h l k,n } = βE{ h l * sum,n |h l sum,n | h l k,n }.
Recall that h l sum,n = K k=1 h l k,n , and the following equation E{c l opt,n h l k,n } = βE{|h l sum,n |}. By noticing that h l sum,n follows Rayleigh distribution and the variance of h l sum,n is K, we have .

Since
E{c l opt,n h l k1,n } = E{c l opt,n h l k2,n } holds for any (k 1 , k 2 ), we have Substituting (33) and (34) into (32), we obtain (2) Compute δ 2 0 Recall c l opt,n = β, and we have based on which, we obtain To this end, by noticing that u = Lu 0 and δ 2 = Lδ 2 0 , we complete our proof.

APPENDIX B PROOF OF PROPOSITION 3
According to the property of MMSE, we havê and 1 NRF×1 denotes an N RF × 1 vector whose elements are 1.
(42) APPENDIX C PROOF OF THEOREM 1 In the following, we will calculate A k and B k respectively.
where (a) follows the fact E e T k = 0.
We start with the calculation of the first term : Let us focus on the evaluation of E ĝ T kĝ * mĝ T v,k u * p,n H = N 2 RF u p,n u 3 p,k + N 2 RF u p,n u p,k δ 2 p,k + N RF u p,n u p,k δ 2 p,k . d) for m = n = k, we have Combining a) ,b) , c), d) and e) together, we obtain Then, we calculate B (2) k : Noticing that B k = ρ B (1) Recall that u 2 p,k = α k u 2 and δ 2 p,k + δ 2 e,k = α k δ 2 . The above equation can be rewritten as Combining 1) and 2), we obtain the desired result.

APPENDIX D PROOF OF THEOREM 2
As ρ p → ∞, we have u p,i = √ α i u and δ 2 p,i = α i δ 2 , based on which, the achievable rate of the k-th user can be expressed as greater than √ η k δ 2 , due to the fact that L and K are large in general. Thus, ignoring the term √ η k δ 2 in (59), we have the following approximation: Next, we consider the maximum power control problem, which can be formulated as max According to (60), we can observe based on which, the above optimization problem can be rewritten as where we define α min k=1,...,K α k .
Noticing that the objective function is an increasing function with respect to K i=1 √ η i , the optimization problem is equivalent Denote x i √ η i . Then we have a convex problem: x 2 i = 1, x i ≥ 0, i = 1, ..., K.
By applying KKT conditions, we can obtain the optimal power control coefficients η i = 1 K , i = 1, ..., K. To this end, substituting the optimal coefficients into the objective function of (63), we complete our proof.

APPENDIX E PROOF OF THEOREM 3
Starting from Theorem 1, we have Recall that u p,k = √ α k u, and the above equation can be written as With optimal power control coefficients, the following equation holds: SIN R 1 = ... = SIN R k = ... = SIN R K , which can be simplified as Then, substituting (72) into (67), we have Finally, substituting the results given by Proposition 3 into (73) yields the desired result.
Neglecting the small items that do not scale with L 2 , the above equation can be simplified as which is an increasing function with respect to K i=1 √ η i . Then, following the similar process in the proof of Theorem 2, we can obtain the optimal maximum power control coefficients η k = 1 K , k = 1, ..., K. To this end, substituting the optimal power control coefficients into (76) yields the desired result.