Optimal Investment in Research and Development Under Uncertainty

This paper explores the optimal expenditure rate that a firm should employ to develop a new technology and pursue the registration of the related patent. We consider an economic environment with industrial competition among firms operating in the same sector and in the presence of uncertainty in knowledge accumulation. We tackle a stochastic optimal control problem with random horizon and solve it theoretically by adopting a dynamic programming approach. An extensive numerical analysis suggests that the optimal expenditure rate is a decreasing function in time, and its sensitivity to uncertainty depends on the stage of the race. The odds for the firm to preempt the rivals nonlinearly depend on the degree of competition in the market.


Introduction
The decision to develop a new technology and-once it has been fully developed-the selection of the appropriate time for taking out a patent are very complex issues. On the one side, the sooner the firm takes the patent out, the sooner it earns exclusive rights in production and commercialization of the patented good. On the other side, patenting requires a very costly commitment of resources in the research and development (R&D) phase, and introducing a new product in the market also implies to face a highly uncertain demand for it. For these reasons, R&D investments must be thoroughly phased in by R&D intensive firms. In addition, R&D investment decision entails a dynamic decision-making process, and it represents a very relevant topic, given the huge amount of money at stake. Thus, the issue has attracted much attention in the literature. Related models can be grouped into two streams. The first is mainly concerned with the qualitative characterization of the time pattern of R&D expenditure rate over the completion time of the project. The decision-making process is analyzed in a context of a single firm without rivalry [1][2][3][4][5][6]. The second stream considers R&D activities in a competitive setting. Some studies deal with a static context [7,8], others with a multistage approach [9][10][11], and recently, taking advantage of optimization theory with a continuous framework [12][13][14][15][16][17].
In this paper, we take the stance of a R&D intensive firm, which has to decide its optimal investment policy with flexible termination time in an uncertain and competitive environment. Investing resources will produce a (possible, not sure) benefit in the future, against a sure commitment of resources now. This tension is at the basis of investment decisions. The investment policy can be considered as optimal whenever it maximizes the expected discounted net return from the project. In particular, we face the issue of the optimal selection of the expenditure rate to be employed by a firm to develop R&D policies as a stochastic optimal control problem. As in [18][19][20] and the ensuing literature, the state variable is modeled as a controlled diffusion process and it is supposed to be evaluated at any point in time during its conduct. The state variable, i.e., the project status, is measured in terms of the monetary value of knowledge accumulated by the firm's R&D program, in comparison with other competing products currently in the market. The presence of competition is formalized through the introduction of an exogenous random time representing the date in which a rival wins the race. In doing this, we are consistent with the common practice followed in the literature to model competition as a random negative sudden occurrence. 1 Net of the presence of this negative jump, we assume the state variable to follow a geometric Brownian motion dynamics and we largely motivate the plausibility of this assumption from an economic point of view. The control variable is the aforementioned expenditure rate. It is assumed to be a stochastic process, which describes the monetary outflow dedicated to R&D, hence feeding deterministically into the value of the state variable. More specifically, the drift term of the state variable accounts for diminishing returns due to increased R&D efforts. The horizon of the problem is random in that the natural conclusion of the R&D process can occur for three different reasons: (i) the firm abandons the project; (ii) the firm takes out the patent; and (iii) a rival preempts the firm. The first and the second case occur when the monetary value of knowledge accumulation reaches one of the two opportunely selected thresholds (i.e., at a prespecified exit time), while the third case occurs-as already said-at a given stopping time. Unlike many contributions, including those incorporating uncertain knowledge accumulation [13,22], in which optimal expenditure on R&D is either zero or at the maximum permitted rate, in this model, the optimal expenditure rate is assumed to vary over time with the level of accumulated knowledge. Hence, this model is more interesting and realistic, even if difficult to solve.
We adopt a strategy based on the derivation of a maximum principle, namely dynamic programming principle (DPP). Moving from DPP, we prove that the value function is a classical solution of a second-order ordinary differential equationthe so-called Hamilton Jacobi Bellman (HJB) equation-and then derive the optimal strategies in feedback form through a Verification Theorem. The solution procedure offers several mathematical difficulties. First, the introduction of a stochastic horizon in the problem invalidates a great part of dynamic programming theory as commonly known in the literature. To deal with this point, we rely on [23], which formalizes a DPP for a wide class of stochastic control problems with exit time. Second, the HJB equation is claimed in a formal sense, i.e., the value function is assumed to be twice differentiable. Unfortunately, the value function is not generally regular enough. Hence, a preliminary discussion on the existence and uniqueness of the solution of the HJB equation in a weak-namely viscosity-sense is needed. This step is in line with several works dealing with viscosity solutions of HJB equations [24,25]. Existence and uniqueness theorems for viscosity solutions of HJB equations in presence of stochastic horizon can be found in [26,27]. Third, the regularity of the value function is an aspect of central relevance. In order to prove the twice differentiability of the value function, we first prove that it belongs to a certain Sobolev space, and then we derive the thesis by opportunely applying an embedding theorem.
In general, stochastic optimal control problems are complex because of their dynamic and stochastic nature and for the presence of complex constraints. Nevertheless, in our specific context, explicit solutions are available. However, we validate the theoretical framework through an extensive numerical analysis to provide some economic insights. In particular, a numerical algorithm is provided to perform a sensitivity analysis of the optimal expenditure rate. This numerical scheme is based on the finite difference discretization of the HJB equation, coupled with a fixed point scheme to deal with its nonlinearity.
One of the main predictions of the model consists in a declining expenditure rate over time. This finding is consistent with [12,28], where the authors find that the equilibria investment rates should decrease in time. Differently, the result contradicts the one presented by Zuckerman and coauthors in [6,17], where in a set up close to ours, the authors find the expenditure rate to be monotonically increasing over time. As we will see below, the difference between the contributions of Zuckerman and coauthors and our paper is mainly due to a modeling assumption on the knowledge accumulation. The behavior of the expenditure rate is further characterized by the analysis of its sensitivity with respect to uncertainty in knowledge accumulation. In particular, the firm's reaction to changes in uncertainty is contingent upon the stage of the race. That is, an increase in uncertainty will engender a limited increase in the rate when the race is in an early stage. Conversely, the increase in the rate will be higher if the change in uncertainty occurs in a later stage. Yet, uncertainty in knowledge accumulation affects in a nonlinear fashion the odds for the firm to preempt the rivals. The higher the uncertainty, the higher the odds when competition is severe, conversely under mild competition. As a consequence of these results, it follows a prescription in terms of policy interventions. If authorities' intervention increases uncertainty about the returns to R&D in the near future, the resulting movement of the aggregate investment rate is unpredictable, depending on some instances such as the stage of the race, the degree of competitiveness in the market and their combinations.
The paper proceeds as follows. Section 2 sets out the economic issue we want to deal with and presents the model. Section 3 solves the control problem by employing a dynamic programming approach. Section 4 analyzes the sensitivity of the numerical solution with respect to some relevant parameters of the model. Finally, Sects. 5 and 6 discuss the results obtained and draw some concluding remarks.

The Model
We consider a patent protection decision model for a firm operating in a stochastic environment with competition. The decision is modeled as a solution of a stochastic optimal control problem. At this purpose, we firstly introduce a filtered probability where N is the collection of all the sets of measure zero under P, i.e., N := A ∈ F : P(A) = 0 . Since the Brownian motion is a continuous process, then {F t } t≥0 is right continuous. Hence, the usual conditions apply to the filtration.
Let X = {X (t)} t≥0 be a stochastic process representing the time-dependent monetary value of the technological knowledge accumulated by the firm's R&D program, in comparison with other competing products currently in the market. We will refer to X simply as technological knowledge, representing the state variable of the problem. Its evolution is assumed to be driven by a controlled stochastic differential equation with initial data as follows: where (i) β ∈]0, +∞[ is the volatility term of the process X . It may capture fluctuations in X due to layoffs and/or to the monetary component, such as changes in taxation, exchange rate depreciations, new criteria for the evaluation of knowledge accumulated, etc. (ii) C = {C(t)} t>0 represents the expenditure rate. It is modeled as a stochastic process with support [0, 1] such that C(t) is F t -progressively measurable, for each value of the technological knowledge, where 0 and K are the absorbing barriers of the dynamics. In particular, 0 is associated with the bad situation in which a similar technology is introduced and protected by a firm's rival, while K is the situation in which the firm takes out the patent first. Equation (1) constitutes one of the main building blocks of the model. For this reason, we further motivate the economic plausibility of our modeling assumption. In [20], it is pointed out that a diffusion process can be successfully employed to describe the monetary value of accumulated knowledge. Interestingly, this viewpoint also appears to be natural from more practical considerations as reported in [29] for the remarkable case of Sony. Furthermore, [30,31] show that firms not only learn, but also forget because of turnover and layoffs. It follows that, if there is organizational forgetting, a firm's stock of experience can decrease over time, and the ups and downs of the process in (1) can capture this realistic feature. Differently, assuming a one-sided non-decreasing process, as in [6,17], most of the features just mentioned could not be captured. By assuming α(C(t)) increasing and strictly concave, we are consistent with the empirical findings that document decreasing returns to scale in R&D [32]. In particular, it represents the deterministic component of knowledge accumulation, deriving from the expenditure in R&D. Put differently, when a firm invests a given amount of money, it cannot precisely know how much of the expenditure will turn into knowledge, but this piece of information can be known up to a certain level of (un)certainty. The deterministic drift plays this role in the sense that it represents the deterministic yield of the R&D investment, although at a decreasing rate, while β captures the remaining uncertainty in knowledge accumulation, i.e., in R&D returns.
Given this premise, we analyze the patent race played by the firm up to the registration of the patent. The timing of the end of the race is unknown and depends on the stochastic dynamics of X in (1). Therefore, we need to introduce the random times at which the process X reaches the absorbing barriers 0 or K . We denote by T the set of the stopping times in [0, +∞] as T : it follows that τ ∈ T . We also denote as σ ∈ T the random time at which a firm's rival beats the firm, i.e., it introduces and protects a similar (or the same) technology.
To try to win the race, the firm can act optimally by deciding its expenditure rate. That is, it can optimally set a stochastic expenditure at time t as a share of X (t) devoted to R&D. Hence, the admissible region, i.e., the functional space containing the admissible controls, can be defined as follows: The objective of our analysis is to maximize the firm's expected discounted net returns. At this purpose, it is necessary to distinguish some occurrences.
On the one side, if the value of technological knowledge reaches the absorbing barrier K before σ , the technology developed is protected and introduced in the market by the firm. In this case, P(τ < σ ) = 1 and a monetary return, M K > 0, accrues to the firm. This amount can be interpreted as the benefits from the innovation produced by the patent. On the other side, if the value of technological knowledge reaches the absorbing barrier 0, or equivalently if a rival firm introduces and protects the innovation first, we consistently assume that X (σ ) = 0 for P(τ = σ ) = 1. In this case, the monetary return to the firm is M 0 ≥ 0. Of course, M K > M 0 . The paradigmatic case of M 0 = 0 corresponds to the winner-takes-all hypothesis and will be discussed below.
To play the race, at each point in time t, the firm incurs the R&D expenses C(t)X (t), until the exit time σ ∧ τ . The optimal expenditure rate C * ∈ A can be found by maximizing the firm's expected discounted net value J , defined as follows: J : where Λ(0) = M 0 and Λ(K ) = M K , i.e., Λ maps the technological attainment at the end of the game into the monetary value of the prize. We also denote with E x the expected value conditioned on X (0) = x and e −δ is a continuous uniperiodal discount factor, with δ > 0. The value function V :

Dynamic Programming and Optimal Strategies
The control problem described in the previous section is solved by adopting a dynamic programming approach. To this aim, we refer to [23], in which a DPP for a rather wide class of optimal control problems with random horizon has been proved. Hence, here we state the DPP by adapting [23] to our case: From Theorem 3.1, the HJB equation can be directly derived. Such an equation is stated formally, in the sense that the value function V in (2) is a classical solution of the HJB equation only under the necessary regularity conditions.
with the relaxed boundary conditions for x ∈ {0, K }: The proof is rather standard and is omitted. Existence and uniqueness of the classical solution of the HJB equation (3) with boundary conditions (4)-(5) are needed in order to obtain the optimal strategies of the control problem. The following theorem guarantees such conditions. The proof is quite technical, and for this reason, it is confined to the electronic supplementary material.
Then (C * , X * (t)) is optimal at x if and only if u( See the electronic supplementary material for the proof. We notice that the Verification Theorem is grounded on the regularity of the value function, which is a classical solution of the HJB equation. This fact highlights the usefulness of Theorem 3.3. The next step consists in providing an explicit form of the optimal strategies and trajectories through the so-called closed loop equation: where I is the inverse function of α . Denote asX the solution of the closed loop equation dX (t) = α(C * (X (t)))X (t)dt + βX (t)dW (t),X (0) = x. Then, by settinḡ C(t) := C * (X (t)), we have J (x,C) = V (x) and the pair (C,X ) is optimal for the control problem.
The proof stems from the Verification Theorem, starting from the existence and uniqueness of the solution of the state Eq. (1). Theorem 3.5 explicitly determines the optimal strategies for our stochastic control problem.

Sensitivity Analysis
The experiments 2 are carried out considering α(c) = √ c. Similar results are obtained also considering other grow rate functions, e.g., α(c) = c 2/3 . Theoretical results allow us to derive some interesting static sensitivity analysis. Specifically, the expenditure rate C * (x) in Theorem 3.5 can be studied. 3 First of all, we assume K = 10 and M K = 10. Figures 1 and 2 show the optimal expenditure rate and the optimal expenditure as functions of the initial data x along with the value function, for different values of δ and β, considering α(c) = √ c. We notice that: (i) higher values of δ make the value function shift downward, while both the optimal expenditure curves shift upward (see Fig. 1); (ii) higher values of β engender a slight shift downward in the value function, and in the opposite direction for the expenditure curves (see Fig. 2, left panel), this effect is stronger for low values of δ (see Fig. 2, right panel); (iii) for given δ and β, both the value function and the optimal expenditure are increasing in the initial value of the technological knowledge, contrarily to the optimal expenditure rate, which slopes downward (see Figs. 1, 2). In Fig. 3, we analyze the behavior of our solution with respect to x for different values of M K , setting β = 0.4 and δ = 0.1. We notice that the higher the monetary return to the race, namely M K , the higher the value function and the expenditure curves, as expected.

A Dynamic Sensitivity Analysis
The time-varying nature of the optimal expenditure rate {C(t)} t≥0 of Theorem 3.5 is here considered. To this purpose, we perform a Monte Carlo simulation of the process (1). More precisely, we fix a starting value X 0 = 2 and simulate 1,000,000 possible scenarios for the process {X (t)} t≥0 . By means of the proposed numerical scheme, the optimal expenditure rate {C(t)} t≥0 is obtained numerically, asC(t) = C * (X (t)), for each simulated scenario {X (t)} t≥0 . Figure 4 shows the mean value of {C(t)} t≥0 over the scenarios and the optimal total expenditure. The upper part of Fig. 4 sets out that higher values of δ correspond to higher expenditure curves, that is, higher discount rates make the firm more eager to invest. As regard to the sensitivity with respect to β (see the lower panel of Fig. 4) in an early stage of the race, the total expenditure seems scantly sensitive to β, while as the race proceeds, it is higher for lower values of β, conversely for the rate. Notice that both sensitivities of the expenditure, the positive one with respect to the discount rate and the negative one with respect to uncertainty, are in line with the real options prediction, but our stylized framework enables us to add more information to those   Parameters as in Fig. 4. Extended results are reported in the electronic supplementary material results, as we can trace the optimal investment over time, rather than considering it as a lump sum cost. We will discuss this point in greater detail in Sect. 5.
In Table 1, we also report the odds, i.e., the chance that the firm takes out the patent before σ , and the expected time of success-when success occurs, namely when the firm takes out the patent-for different values of σ . The odds increase as δ increases, because the latter brings about an increase in the total optimal expenditure, whereas they decrease as β increases only for σ ≥ 5. Notice also that, for a combination of low uncertainty and high discount rate (β = 0.05, δ = 0.1), all the simulations obtain a success, i.e., X t = K , within 6 years. We will devote more attention also to this point in Sect. 5. Considering only the simulation for which the firm takes out the patent, it is worth noting that the expected time of success is generally decreasing in δ. The behavior of the expected time of success with respect to β is rather mixed. In particular, it increases with β when σ is large (σ ≥ 9), decreases when σ is small (σ ≤ 6), and has an inverted U-shape for intermediate values of σ (7 ≤ σ ≤ 8).
terms of sensitivities to the model parameters with particular emphasis on the effects of competition. As far as the expenditure rate is concerned, its characterizing features can be summarized as follows: (i) for given x, it is an increasing function of the discount rate (Fig. 1); (ii) it is a decreasing function of the initial values of the knowledge, x (Figs. 1, 2). This result is at odds with [6,17], in which the authors find an increasing function of the knowledge accumulated both in a non-competitive and in a competitive environment, respectively. This sharp contrast is due to the fact that the articles just cited consider {X (t)} t≥0 as a one-sided non-decreasing jump process.
Differently, in our context, being {X (t)} t≥0 supposed to evolve as a diffusion process and, furthermore, being a relative measure of knowledge accumulated, the path of {X (t)} t≥0 over time is not restricted to be increasing. In addition, the result is consistent with [12], but it is important to stress that [12] claims that the result can be obtained by abandoning the assumption of memoryless process for knowledge accumulation in favor of a long memory process. Here, we prove that the same result can be achieved by still adopting a memoryless process, such as in (1), coupled with decreasing returns to scale in R&D. Moreover, (iii) the sensitivity ofC(t) with respect to uncertainty in knowledge accumulation, formalized by β, changes over time, depending on the stage of the race: If an increase in uncertainty occurs, then the optimal expenditure rate will scantly (remarkably) increase in an early (late) stage of the process (Fig. 4). Finally, (iv) the optimal expenditure rate decreases over time, steeper in the initial part of the race and tends to flatten out subsequently (Fig. 4). Again, this result is at odds with that in [6,17], where the expenditure rate increases monotonically over time, and is consistent with [12] and [28], where it is stated that the equilibria investment rates should decrease in time.
From an economic standpoint, it is realistic to think of R&D as requiring a remarkable effort in terms of resources invested; however, after a certain period of time, the resources will reach a critical bulk, such that investment proceeds, but at a declining or constant rate. This phenomenon is referred to as the pure knowledge effect and is due to the fact that a firm's past R&D efforts contribute to increase the odds. This effect has important implications in terms of strategic interactions. In particular, a firm that is behind in the race has the possibility to catch-up. This pattern of strategic interactions is more consistent with action-reaction, rather than with increasing dominance, the latter being characterized by increasing investments as resources are accumulated. Indeed, empirical researches [33][34][35][36] lend more support to action-reaction than increasing dominance. In [37], the authors find a negative relationship between R&D intensity and firm's explorative activities, the latter being considered as a proxy of knowledge accumulation. Moreover, the presence of β in the dynamics {X (t)} t≥0 captures a nonnegligible aspects of uncertainty surrounding R&D investment decisions, as firms must consider rivalry dynamically and not only in terms of the termination of the race σ . Indeed, β plays this role, capturing the level of the dynamic competition and contributes to reveal the decreasing path of the expenditure rate over time. For this reason, our results seem to be more realistic than those of [6,17].
As regard to the simulated odds, their behavior is consistent with the optimal total expenditure path. For instance, an increase in the discount rate brings about the expenditure rate to increase, in turn, making the odds increase. A very interesting fact concerning the firm's behavior in a competitive environment arises from the role played by the uncertainty parameter β in the knowledge accumulation dynamics. We have noticed that, as β increases, the odds increase only when competition is more severe, i.e., for small sigma. This behavior can be explained in the light of the fact that when competition is severe, the chance for the firm to preempt the rivals is due mainly to the volatility of knowledge accumulation, i.e., in the presence of high riskiness as well as great opportunities. Put another way, the deterministic component of the knowledge evolution, α(C(t))dt, does not suffice to reach the level of knowledge enabling the firm to patent, but it needed a favorable random event, βdW (t), with β sufficiently large and positive. Hence, in this situation, the smaller the β, the less likely the event to occur. When competition is less severe, i.e., for larger values of σ , the effect of the deterministic component becomes more relevant, increasing the chances to take out a patent; hence, uncertainty is regarded as hindering this occurrence, and higher β decreases the odds.
At this point, it is straightforward that the firm's behavior is crucially affected by uncertainty and that its response depends on different instances, such as high/low competition, early/late stage of the race and their interactions. It follows that the effects of an increase in uncertainty about the future returns to R&D caused by authorities' intervention, such as uncertain and unclear taxation rules, are unpredictable both in terms of aggregate investment, depending on which stage the different competing firms are, and in terms of firms achievements, depending on the degree of competition in the market. This finding is consistent with the evidence arising from the European Framework Program (FP), which is a policy program aimed at overcoming a set of failures hindering the innovation process. This program is composed of a large number of instruments [38,Table1], and each instrument is designed in order to address a certain given type of failure. In [38], on the basis of an empirical study, the authors show that firms do not perceive the differences due to this variety of instruments. Thus, the final effect of the program is unpredictable because firms, from the policy maker's point of view, do not appropriately select the instruments, and the authors even claim that developing too complex instruments is counterproductive. Last, but not least, keeping instruments simple and stable over time should also reduce the costs of public administration.
As far as the value function is concerned, its behavior is not surprising, as it monotonically increases as the discount rate and the uncertainty parameter decrease, or the initial value and the final prize, M K , increase. The response of the value function to the possible perturbations can be viewed as a sort of validation of the results obtained for the expenditure rate.

Conclusions
This paper belongs to the stream of dynamic patent race models with risky choices in a continuous framework. The study focuses on the firms' optimal policy, which maximizes the expected discounted net return from the project. The evolution law of the state variable captures competition at any time. The patent race represents a very complex phenomenon entangled with multiple sources of uncertainty. In this respect, the model presented can be considered as capturing a realistic situation and moves a little, but significant step ahead in the comprehension of this aspect of the industrial economic field.