Intriguing yet simple skewness - kurtosis relation in economic and demographic data distributions; pointing to preferential attachment processes

In this paper, we propose that relations between high order moments of data distributions, for example between the skewness (S) and kurtosis (K), allow to point to theoretical models with understandable structural parameters. The illustrative data concerns two cases: (i) the distribution of income taxes and (ii) that of inhabitants, after aggregation over each city in each province of Italy in 2011. Moreover, from the rank-size relationship, for either S or K, in both cases, it is shown that one obtains the parameters of the underlying (hypothetical) modeling distribution: in the present cases, the 2-parameter Beta function, - itself related to the Yule-Simon distribution function, whence suggesting a growth model based on the preferential attachment process.


Introduction
Characteristics of distributions of variables is a never ending subject of investigations in many fields of economic research. There is much work for example on testing conditional convergence in variance and skewness for estimating (multivariate or not) normality [8,19,25,34,37,45], or on improving goodness of fit in regressions [17]. In fact, drawing inference on the parameters of (regression or agent based) models is a basic statistical problem which fortunately may provide interesting discoveries [33,48,49], not only on strict economic problems, but also in related sociological or demographic ones [13][14][15]18,29,55].
There has been already much interesting work on recurrence relations between high moments (μ i , i ≥ 3) of order statistics distributions [3,36]. Knowledge of such moments are of interest for drawing inference about the scaling parameters. Balakrishnan et al. [9] have reviewed many recurrence relations and identities for several continuous distributions. Pertinently, in view of the following, let us point to Thomas and Samuel [53] analysis of recurrence relations for the Beta distribution moments, a distribution of wide application, in its both continuous and discrete form [6,28,38] The skewness and kurtosis are officially the third and fourth moment, μ 3 and μ 4 , respectively, of a distribution, where μ i is usually centered on the mean; μ 2 is of course the variance, sometimes called σ 2 . Mathematical statistics textbooks and software packages usually calculate the Fisher-Pearson coefficient of skewness and the kurtosis respectively, which by extension become the (commonly accepted measure of) skewness and kurtosis of the distribution, respectively; we use such notations here below. It is known since Pearson [47] and Wilkins [56] that the kurtosis has a theoretical lower bound related to the skewness K ≥ aS 2 + b. Based on several types of experimental data, one has indeed observed that the quadratic relationship holds as an envelope of scattered data (see references below). In fact, p and q can be empirically fitted constants, which might have some interesting meaning, as pertinently shown by Guszejnov et al. [23]. It has recently been discussed by Cristelli et al. [16] that a more general form of the K−S relationship could be provided, i.e.
with ν = 4/3. However the findings (and interpretation) have been questioned: such an exponent might be due to the data too limited size [12]. Much of such experimental data, for which Equation (3) is obeyed, pertains to magnetohydrodynamics, meteorological and medical data. Moreover, most of the data where such relations are found pertain to time series.
We provide another set of cases in which one finds ν = 2, in an unusual set of data pertinent to a complex geo-sociological-economic realm. Equation (4), with ν 2, is found to hold for the wealth and population distribution of Italy cities, aggregated at the provincial level. We tie this specific finding to a statistical process inferring a Beta-distribution, or Polya urn dynamics, as its universal dynamics.
In Section 2, we very briefly recall previous experimental data analysis with similar findings, pertinent to the present report, but obviously in quite (in scientific terms) scattered fields of investigations, in order to stress the originality of the present ones.
In Section 3, we explain the system complexity which we investigated. Notice at once, that there is no time dependence, therefore the cloud of points, from which other works infer a parabola-type relation between S and K, is here strictly 'reduced' to a collapsing-like situation, rendering the parameter values being much more precise in view of describing the dynamical process, if we are allowed to discriminate between (so-called) hard and soft science. Such a section contains also the main methodological investigation methods, results and related comments. Section 4 is devoted to the description of a theoretical Polya urn model which is related to the developed complex system arguments. Specifically, a preferential attachment system is introduced whose stochastic law follows a Beta distribution, whence whose parameters could be also calibrated for defining skewness and kurtosis of a set of data.
In Section 5, we offer some conclusive remarks and provide also suggestions for future research directions.
Some tables containing descriptive statistics of the data and the disaggregation of them at a provincial level are reported in the appendix.

Literature review
As mentioned in Section 1, there are several reports pointing to the veracity of Equation (4), with ν = 2, in various research fields: the greatest occurrence is in turbulence, among the most recent see in magneto-hydrodynamics [10,23,32] and in atmospheric physics [52] . In this respect, refer also to Alberghi et al. [2], where the authors discuss the parameters conditions to be satisfied in the context of air vertical velocity in the atmospheric boundary layer for having a certain relation between skewness and kurtosis. In these cases, in order to have different points in the S−K plane, the authors usually repeat the experiment or evaluate the moments on different time windows of the same series. For further pertinent references, see, e.g. [44,50].
Related to the above-mentioned papers, there is a study in the context of geophysics of oil production time series forecasting [20].
Cristelli et al. [16] analysed the relation between skewness and kurtosis for earthquakes and daily price returns (on the S&P500) and identified two power-law regimes of non-Gaussianity, on the kurtosis versus skewness plots, but Celikoglu and Tirnakli [12] demonstrated that the proposed 'universal' relation between skewness and kurtosis, in fact is not universal and originates only due to the small number of data points in the data sets considered.
For completeness, on K−S relation consideration in the financial domain, let us mention related work on S 2 − K bounds for unimodal distributions by Klaassen et al. [31], and more recently by McDonald et al. [39] and Kerman and McDonald [30] with a discussion about modeling some 'popular income distributions' with exponential generalized beta functions.
Other cases where a simple K−S relation is found are in surface roughness analysis [26,51]. It is also worth mentioning the contributions in medical and biological fields; they pertain to fluctuation responses in the visual cortex [40][41][42] or ventricular fibrillation [21]; an apparently comprehensive review about 'biological and psychological aspects of the K-S can be found in [11].
One might put in parallel to the above a paper reporting a 4/3 relation between skewness and kurtosis of aesthetic score distributions in a photo aesthetics dataset, generated from an online voting survey [46].
Thus, it maybe observed that most of the data pertains to time series analysis, many to human reaction time, and a few to various (laboratory or not) produced crystalline samples. However, to the best of our knowledge, not many observations of a peculiar relationship between K and S seem to have been reported on socio-demography aspects.

Italy economic and demographic data
A statistical assessment of regional wealth inequalities over Italy (IT) has been previously provided based on aggregated tax income size data [7,[13][14][15]43].
Let it be known that IT is nowadays (since 2010) made up of 8092 cities distributed over 110 provinces. To provide some better understanding of the paper aims and results, the IT administrative structure can be briefly described as follows. Italy is clustered in 20 nonoverlapping regions, and each region contains one or more provinces, which in turn are composed by cities along with their territories (comuni). Thus, each city belongs to only one province, and each province is contained in only one region.
The gross domestic product (GDP) in Italy was worth about 2276 billion USD in 2011. The population of Italy fell 'slightly' below 60 millions, then.
The economic data were obtained from (and by) the Research Center of the Italian Ministry of Economics and Finance (MEF). The population data source is the Italian Institute of Statistics (ISTAT). In particular, data on the population are extracted from the elaborations of the 15th Italian Census, performed by ISTAT in 2011. We have disaggregated contributions at a municipal level for 2011, in order to obtain the aggregated tax income (ATI), ATI c and the number of inhabitants N inhab,c , for each city c.

K-S relation analysis
What we care about is the distributions of the skewness and of the kurtosis: these are the distributions of interest.
The interesting table is Table 1. It is seen that the corresponding (in some sense 'average' S and K are positive and not small. The most relevant point seems to be the existence of (2) negative K values: −1.30750 for BT (Barletta-Andria-Trani, Apulia region) in the ATI case and −0.9319 for BT and −0.7511 for RG (Ragusa, Sicily) for the N inhab . Table 1. Summary of (rounded) statistical characteristics for the distribution of S and K for ATI of cities in IT provinces and for the distribution of the number of inhabitants in IT cities in the various (N p = 110) provinces. Note: The last two lines will be useful in the assessment of the outliers, see below On the other extreme TO (Torino province in Piedmont region) has the largest K and S for both cases; for information TO contains the largest number of cities (315).
The obtained corner results for these specific provinces are in line with historical and empirical 'evidence'. In fact, Torino represents the core of the industrial production of Italy, being the headquarter of FIAT. Thus, the related province has an unequal distribution of richness and population, with an asymmetry of positive type. It is also expected that the tails of the distribution are heavy, being Torino (the city) one of the largest cities in Italyin terms of number of inhabitants and ATI -and since the same province contains some of the smallest cities of the country.
For what concerns BT and RG, they are two of the less populated and poor provinces in Italy, with a large part of small cities. Hence, a platycurtic distribution for the ATI and N inhab is what everyone with a fair level of knowledge of the Italian reality should expect.
The K-S relationship for the distribution of ATI of cities in the 110 provinces is shown in Figure 1.
The K-S relationship for the distribution of the number of inhabitants in cities in the 110 provinces is shown in Figure 2. The relation is pretty smooth in both cases, and recall those found in magnetohydrodynamics and other studies of time dependent systems; see pertinent references in Section 2.
Thereafter, in accord with previous literature, we try two fits: a polynomial of degree 2 but without linear term, i.e. Equation (3), or the pseudo-parabolic polynomial-like form Equation (4).
The parameters p and q are given in Table 2, together with the corresponding regression coefficient. To leave ν as a free fit parameter is seen not to be a drastic improvement. Thus, one can expect, since ν 2 that a simple interpretation or modelisation based on well established statistical distribution is in order; see Section 4.  It can be noticed that p 1, but q is negative, since there is a negative kurtosis for the distributions in a couple of provinces (see Table 1).

Rank-size analysis
The above findings remind us that there is a general relationship between skewness and kurtosis within Pearson's distribution system. Therefore, in order to pursue towards some understanding of this sort of K−S relation, we propose to develop a complementary analysis. Instead of considering the K and S values as belonging to some continuous distribution, we are using a method, the rank-size analysis method, which allows to study a distribution of 'quantities' when the orders of magnitudes can be rather different, and when the values have some imprecise error bar -as it always occurs in such economic and sociological surveys. In such a methodology, the (K and S here) values are supposed to belong to discrete distributions which are regularly sampled. Thus, we write the S and K data in an ascending size (regular) order, independently of each other, for the ATI and the number of inhabitants, respectively, i.e. giving the rank r = 1 to the lowest S and to the lowest K values, etc.
The most simple rank-size law is thought to be a power law of the rank r -leading to the Zipf plot, y ∼ r −α . It is often modified for including an upper tail cut off as through the Yule-Simon law, In order to take into account a possible change of curvature in the data, if some falling off seems to occur visually, at the highest ranks, Equation (5) can be then written [4][5][6] as where +1 is introduced in order to avoid a singular point in the fit at the highest rank r M = N, if ξ 0. This also emphasizes that an upper tail towards infinity is rather meaningless, since the upper rank r M is necessarily finite.
In view of taking into account a better fit at low and high rank, one can further generalize Equation (6) to a five parameter-free equation [6]: where the parameter tis reminiscent of Mandelbrot's generalization of Zipf 's law at low rank, while allows some flexibility at the highest rank -where usually the error bar on  Table 3.
the data can be rather influential in defining r. The shape of the curve in Equation (7) is sensitive to the variations of and [6].
Here, neglecting any low-rank free parameter ( ) of dubious origin, but still allowing for some flexibility on the upper rank value divergence, we approximate Equation (7) by One is allowed to imagine that a Generalized Discrete Beta function, like Equation (8), reminds the reader of the Pearson-Type I distribution, supported in the relevant rank interval [0, N]. In so doing, the corresponding best fits of the various S and K ranked data can be found for the distributions of city ATI for 2011 and of the number of inhabitants of the IT cities according to the 2011 census, distributed over the 110 provinces, respectively.  Table 3.  Figure 3 displays the rank-size relation for K and S for the distribution of ATI aggregated over cities in the IT 110 provinces in 2011, and the best fits by Equation (8). In the same spirit, Figure 4 shows the rank-size relation for K and S, with fits by Equation (8), for the distribution of the number of inhabitants aggregated over cities in the IT 110 provinces according to the 2011 census. The fit parameters are found in Table 3. The values (and error bars) on the ξ 4 and γ 4 , together with the value of the regression coefficient, R 2 , are quite convincing of the existence of an inflection point in the rank-size data. Nevertheless, observe the variety of values for ξ 4 and γ 4 .

Preferential attachment
Our argument for suggesting a model stems from the historical view that cities do not appear nor grow stochastically, Moreover, there is a postulate on demography that ghettos form according to peer status, in particular, due to the wealth of the population: rich and poor group themselves in clusters. In so doing, the number of inhabitants is somewhat related to the wealth. A similar type of process can be imagined, mutatis mutandis, for such different qualities: the so-called preferential attachment process . Such a process can be defined as a settlement procedure in urn theory, where additional balls are added and distributed continuously to the urns (cities, in this model) composing the system. The obtained model is the general Polya urn [35]. In our context, the rule of such an addition follows an increasing function of the number of balls already contained in the urns. The settlement formation obeys a Yule process, with a log-normal initial distribution of the population of the settlements.
In general, such a process contemplates also the creation of new urns. In such a general framework, this model is associated with the Yule-Simon distribution [54,55] whose density function f is where a is a positive integer, b > 0, B(a; b) is the Euler Beta function with (x) being the standard Gamma function [1,22]. Explicitly, denotes the Beta-function; a random variable X is Beta-distributed if its probability density function (pdf) obeys In practical words, newly created urn starts out with k 0 balls and further balls are added to urns at a rate proportional to the number k that they already have plus a constant a ≥ −k 0 .
With these definitions, the fraction P(k) of urns (areas) having k balls (cities) in the limit of long time is given by for k ≥ 0 (and zero otherwise). In such a limit, the preferential attachment process generates a long-tailed distribution following a hyperbolic (Pareto) distribution, i.e. a power law, in its tail.

K and S parametrization
The relevant Beta-function moments, i.e. K and S, are given by Johnson and Kotz [27, pp. 40-44], and recalled by Hanson [24], in terms of a and b parameters of the Beta function for the normalized variables: In order to develop the algebra, one also introduces a so-called help variable [24] Notice that if Equation (3) holds, then allowing a theoretical estimate at once, e.g. if p and q are as in Table 2, and a possible comparison to empirical results, shown on the last line in Table A2. In fact, ρ = a + b. Thus, one can obtain a and b from: which leads to 2 solutions for a: If the skewness is positive then the larger solution will be the value of b otherwise the larger solution will be the value of a [24]. The relevant values of the best-fitted Beta can be read from Table 3. Figures 5 and 6 illustrate the cumulative distribution functions of the calibrated Beta distribution for the normalized ATI and N inhab cases, respectively. We notice that the shapes of the distributions are similar, and suggest a high concentration around the small values of the Beta distribution.  Table 2.   Table 2.

Conclusion
This paper explores the relationship between skewness S and kurtosis K for series of demographical and economical data. The considered sample is taken from the Italian National Institute of Statistics (number of inhabitants) and the Ministry of Economics and Finance (income taxes) for the Italian cities, aggregated at a provincial level; the reference year is 2011. The existence of a quadratic and a power law relation between skewness and kurtosis is illustrated, with fits which are both visually quite appealing and markedly statistically sound. These findings support and add to the empirical literature on the connections between K and S; this is the first time that such quadratic and power laws rules are found for socio-economic surveys.
It should seem interesting to search for the general conditions leading to distributions with such apparently simple relations.
In our presently investigated 'socio-economic case', the empirical results have been supported by a theoretical argument based on the Polya urn. In so doing, one gets the datadriven calibration of the parameters of a Beta distribution, which adds further insights on the different nature of economic and demographic data.

Disclosure statement
No potential conflict of interest was reported by the authors.

ORCID
Marcel Ausloos http://orcid.org/0000-0001-9973-0019 Table A1 lists the ATI (in EUR) of each provinces in IT, given in alphabetical order of their legal acronym, the province population, and for general information, the number of cities in the relevant province, in 2011.

Appendix. Descriptive statistics
In Table A2, one displays a summary of the (rounded) statistical characteristics for the distribution of city ATI ATI c,p of each IT provinces (N p = 110) in 2011, for the distribution of the number of inhabitants in a given province, N inhab,c and for information the number of cities in the relevant province, N c,p .
In Table A2, we also give σ/μ, called the coefficient of variation (CV), allowing to have some confidence in a relatively peaked distribution, if CV is not too large. For completeness, we also provide the immediately deduce value of an indirect measure, 3(μ − m)/σ . Figures A1-A4 provide a view of the empirical distributions of the skewness and kurtosis for either the ATI or the number of inhabitants of the IT provinces. The shapes of the distributions for S and for K are quite similar. In both cases some outliers emerge, according to the definition of outliers as those values outside the interval (μ − 2σ , μ + 2σ ); see the values reported in Table 1 and just compare them with the histograms in Figures A1-A4.   Notice that both distributions of S and K seem to be more concentrated around their mean values for N inhab rather than for ATI, hence suggesting a more evident regularity in the asymmetry and in the peaks of the distribution across provinces in the former case. Moreover, Figures A1-A4 show also that skewness and kurtosis of our specific dataset, unexpectedly, seem to depart from a normal asymptotic distribution.