Religion-based Urbanization Process in Italy: Statistical Evidence from Demographic and Economic Data

This paper analyzes some economic and demographic features of Italians living in cities containing a Saint name in their appellation (hagiotoponyms). Demographic data come from the surveys done in the 15th (2011) Italian Census, while the economic wealth of such cities is explored through their recent [2007-2011] aggregated tax income (ATI). This cultural problem is treated from various points of view. First, the exact list of hagiotoponyms is obtained through linguistic and religiosity criteria. Next, it is examined how such cities are distributed in the Italian regions. Demographic and economic perspectives are also offered at the Saint level, i.e. calculating the cumulated values of the number of inhabitants and the ATI,"per Saint", as well as the corresponding relative values taking into account the Saint popularity. On one hand, frequency-size plots and cumulative distribution function plots, and on the other hand, scatter plots and rank-size plots between the various quantities are shown and discussed in order to find the importance of correlations between the variables. It is concluded that rank-rank correlations point to a strong Saint effect, which explains what actually Saint-based toponyms imply in terms of comparing economic and demographic data.


Introduction
The impact of religion on the demographic and economic evolution of the societies has been clearly stated in several studies. One should mention the (not recent but very interesting) survey on this topic of Iannaccone (1998), but also Iannaccone (1991), Ellison and Sherkat (1995), Lehrer (1995Lehrer ( , 1999, Waters et al. (1995) and, more recently, Zhang (2008), Yeatman and Trinitapoli (2008) and Connor (2011). The quoted papers deal with the social dimension of the religion for what concerns themes like marriage, education, population flows and socio-cultural economics. A field of research quite neglected by economists and socio-scientists is that related to the influence of religion on the urbanization process, along with an exploration of its historical grounds.
It is indeed widely accepted that many cities have developed on various seeds, e.g. water sources, river bridges, ores, ..., but many exist due to some "religious purpose", in a large sense (see Durkheim, 1968, Eliade, 1978, Dubuisson, 2003, Dennett, 2006, Stark and Bainbridge, 1979. The cult allowed splitting, thus necessarily juxtaposing, the human experience of reality into sacred and profane space and time. In catholic religion, much religious activity is performed through the cult of Saints, considered as intercessors with "God". Their bones and relics could even make "miracles". The cult of Saints emerged in the 3rd century and gained momentum from the 4th to the 6th century. It formed from Greek and Roman veneration of divinities, heroes, and rulers. It was established at "sanctified locations" which in turn could attract pilgrims, "priests", and merchants, and thus did grow in population size, and became cities. No need to say that it took many centuries before some strong organization around the cult of some Saint developed. It is well known that rival clergy up to bishops and their cliques have been fighting lengthy battles over who had the right to claim the Saint for their own community and even scorning "rival Saints". Whether there were economic conditions in play at those times is not a question raised here, -the answer seeming obvious. One of the causes of the protestant reformed is well known indeed. In Italy (IT, hereafter), known as a basically catholic country, -being closely related to the siege of the papacy, there has been for a while an important motivation about the cult of Saints, to the point that several cities bear the name of a Saint (Webb, 1996). This paper explores the main characteristics of the Italian society when isolating citizens living in the "cities with a Saint name", i.e. hagiotoponym cities (see Reading, 1996). Specifically, this religion-based relevant cultural aspect of the Italian urbanization process is described under a demographical and economical point of view. In particular, we aim at exploring what Saint-based toponyms imply in terms of the comparison between economic and demographic data. One may thus first wonder how many of these cities appeared and where they are located. Also, it seems to be important to explore how important are they with respect to the population size. These questions can be justified as being related to studies on social science and urban planning. Some motivation also arises from touristic, geographical, and cultural points of view. A third concern, but related to the previous one, is of linguistics origin, i.e. a search for the statistical distribution of names of Saints attached to city names. The reader is addressed to the Dictionary of Farmer and Hugh (1987). A related question concerns the rare Saints, e.g. those occurring only once, so called hapaxes; how many of them? The question "why are such Saints rare?" is outside the purpose of the present report. Nevertheless, how they geographically distributed is an original question. A fourth motivation of our investigation, but not the least, is based on economics concern. How does being a hagiotoponym city, behave with respect to others? Moreover, is a specific Saint "better", in an economic sense, than another? Therefore, the first question being tackled in this report is: what city and how many, in IT, have a "Saint" name in its appellation? The (not so trivial as it could be thought, at first) methodology is explained in Sect. 2. In particular, the data acquisition needs some very careful work as explained in Sect. 2.1.
Note that a distinction will be made between male and female Saints. In Sect. 2.2, we define the economic and demographic quantities which have retained our attention. Moreover, it seems of complementary interest to observe the occurrence of less frequent Saints, e.g. dislegomena Saints. Indeed, one might have asked, at the time of rising of such cities whether some novelty in names (or cult), some rarity or in a contrario some popularity of a Saint, would have been (and is still) beneficial within some "competition". In studies of cities, one often examines the rank-size or size-frequency relationships, in accord to Vining (1977) and . Their analysis will be the main content of the figures displayed here below in Appendix B and commented in Sect. 3. For what concerns the rank-size analysis, a brief comment is needed. Often, Zipf's law  is able to fit the relationship between the size and the rank of a given variable. Such a regularity has not a strong theoretical ground (see Krugman, 1995), and it is shown not to provide a robust analysis of the connections between demographic and economic data in our specific context. In this respect, it is important to note that Zipf's law fails in several cases (see e.g. Giesen and Sudekun, 2011; Soo, 2007;Cordoba, 2008;Garmestani et al., 2008;Vitanov and Ausloos, 2015). For this major reason, we have considered more general rank-size rules based on the Lavalette's law (Lavalette, 1966). In doing this, we have found robust best fits results, with high levels of R 2 and interesting interpretations. More specifically, in order to observe how many Saints (or cities) concern the above questions, plots for the rank-size, size-frequency and cumulative distribution function can be made and are presented in order to find an appropriate empirical law: see Sect. 3. The density of Saints, with respect to the whole country, and with respect to regions should give some idea on "catholicity" (or some cult) "touristic spreading", at the time of community creations. Thus, whether there is any geographical distribution, with special Saints in some area, is discussed as well; see Sect. 3.2. Next, since cities have often developed around churches or chapels, the question is raised whether there is any relation between "cities with Saint names" (hagiotoponyms) and present city population size. Moreover, are the most popular Saints associated with "large" cities? Is the cumulative distribution of interest? In so doing, one may next wonder about the wealth of such cities. How do they fare nowadays? This is examined through the (modern) aggregated income taxes (ATI) of such cities -which represents the aggregate contribution that the citizens of each city provide to the national Gross Domestic Product (GDP) -in Sect. 3.5. The final answer to these questions is found through rank-rank correlation studies, in Sect. 3.6. Section 3 contains also a comparison between the overall IT cities situation and the set of hagiotoponyms in the respect of demographic and economic variables. It seems that the hagiotoponym cities build a reality similar to the IT one, when the biggest cities are removed as outliers. We strongly emphasize the condition or constraint. Note that there is no discussion here below concerning parishes nor churches nor chapels nor folk life events implying Saints. This demands a very complex survey; thus is not studied here. In this respect, an interesting paper is Kim (2007) on pilgrimage and towns in medieval christianity. We recommend its reading both for the outlined ideas but also for the bibliography. Nevertheless, we stress that pilgrimage towns do not necessarily have a Saint for name. Cities we examine here below are in fact very rarely pilgrimage towns, though they have local Saint activities. Moreover, questions on why religious cities grow or fail, and why several Saints are more popular than others are left for further work; see Stark (1996)  This very interesting work differs from ours, surely about the respectively concerned lands, England vs. Italy, the timing, and scientific approaches, but is highly complementary. We emphasize that we add much to this sort of investigations by considering also economic questions.

Construction of the dataset
In IT, there are 8092 cities or, better, municipalities (in Italian: comuni ) shared among 20 regions nowadays. Thus, data on population size and on ATI have been collected at the municipality level, before selecting the cities of interest. Specifically, we want to identify the municipalities which have a toponym related to a Saint, so called hagiotoponyms. Sometimes, it is not so easy to understand who is the truly related Saint. For example, Giovanni could be the "apostle" or the "baptist". However, we consider that he is most likely Giovanni the apostle. Indeed, the baptist is now called "Giovanni Battista" of "Gianbattista". A check of the official list of Saints on Farmer (1987), indicates that Giovanni can be associated also to more recent people, but this is irrelevant to a city original appellation. The identification of the true Saint is sometimes not possible. Therefore, we decided neither to pursue nor explore such a task further. The Saint name in our list will point to a unique Saint, who is then considered to be the representative element of the related category of the Saints with the same name. It might be relevant to remove this constraint in more advanced religious studies. We also understand that a Saint having given his/her name to a city is not necessarily a bona fide catholic Saint, but only a "Saint by tradition" (see Farmer, 1987). We do not consider this ambiguity relevant to our consideration, -on the contrary.

Data acquisition
The investigated data is of three different natures: population data and economic data, and the city names.
• The source of the population data is the Italian Institute of Statistics (ISTAT). In particular, data on the population are extracted from the elaborations of the 15th Italian Census, performed by ISTAT in 2011.
The population data taken for the municipalities are: number of inhabitants, males and females, number of families, people living in a family, average number of the components of a family, people living as cohabitant and not as a family. Not all such statistical data can be examined here with respect to our concerns, but we recommend the data sources for subsequent work. • The data collection here above mentioned relies on the identification of Italian cities having a toponym recalling a Saint. Such an identification has been a tedious stage. It was implemented in several phases which seemed worthwhile of presentation for justifying the subsequent data. Firstly, a preliminary list of cities has been constructed through the application of four sorting procedures. Secondly, a refinement of the preliminary list has been applied, and some municipalities have been ejected on the basis of our own established criteria. After this second phase, the list of municipalities with "Saint" appellation has been revised one by one. We make more precise here below the applied procedure. The preliminary list moves from the premise that some specific toponym might come out from deformations of the original Saint name. Therefore, the string "san" -which, at a first look, seems to be rather informative, being actually contained also in the words "santo", "santa", "sant", -would in fact lead to the removal of a number of acceptable municipalities from the full list of cities (like Camposampiero, in Veneto, which derives from San Pietro). Thus, we provide a first sorting procedure by employing the string "sa" to the entire collection of 8092 elements. The resulting list contains 1321 municipalities. The second sorting procedure has been attained by employing the strings "San ", "Santo ", "Santa ", "Sant' " (note the blank space at the end of the strings) for the list of 1321 municipalities. This second sorting has produced 636 municipalities containing at least one of the strings and the remaining 685 ones without the strings. The third sorting procedure has been implemented on the group of 685 municipalities, in view of further facilitating the identification of the municipalities of interest. The string "san" (without space after the word) has been employed; whence it is found that 180 municipalities contain "san" and the other 505 without the string. The two groups of 180 and 505 have been manually checked line by line. All such cases have been carefully examined. In particular, as expected, there exists a number of municipalities whose toponym is a linguistic transformation of a Saint name. We call them strange cases. As an example, Samugheo, in Sardegna, derives from San Michele. In presence of a strange case, the assessment of the (possible) corresponding Saint name has been performed through the reading of historical information related to the single controversial municipalities. This information has been taken from the official website of the municipalities and/or from Wikipedia. When no information is available, the candidate strange case has been removed from the preliminary list of municipalities (like Santadi, in Sardegna). All accepted as a valid hagiotoponym but "strange cases" are listed in Table 1 (ESM).
Hence, in the groups with and without the string "san", a number of 28 and 23 municipalities, respectively, have been selected for being contained in the preliminary list. Some collateral effects came out from checking the municipalities after the third sorting procedure. First of all, Madonna del Sasso has been included in the preliminary list, Madonna being an Italian name for the "Virgin Mary", -a human Saint. We became also aware about the existence of some municipalities in Valle d'Aosta with French Saint names and in Trentino Alto Adige with German Saint names. Thereafter, we performed a fourth sorting procedure in the total list of 8092 municipalities, by employing the strings "donna", "dame", "Frau" to search for the equivalent of the Virgin Mary not only in Italian, but also in French and German, respectively. The fourth sorting procedure gave us the possibility to add to the preliminary final list 1 further municipality not already included, Rhemes-Notre-Dame. The preliminary list, so, collects 636+28+23+1=688 municipalities. Some exclusions have been then implemented (see ESM).
After the constitution of the final list, we have assigned the related Saint to each of the 637 municipalities. The sum of the frequencies of Saint appellations is, unexpectedly, 639. Indeed, there are two municipalities whose toponym contains a couple of Saints (Santi Cosma and Damiano in Lazio and San Marzano di San Giuseppe in Puglia). For these particular cases, the available municipality data, both in terms of population size and economic features, has been shared equally between the two Saints of the couple. The resulting distribution of the hagiotoponyms at a regional level can be found in Figure 1.

Data treatment
To exemplify the procedure, consider that we made two 2 Tables: one with the number of inhabitants and one with the < AT I > values, both in decreasing order, according to the Saint name, distinguishing between males and females. The top and bottom of each Table is shown in Tables 7, 8, 9 and 10 (ESM) in view of outlining the number ranges. Table 1 contains the statistical indicators computed for < AT I > and population data.
To highlight the role played by each Saint, population and economic raw data have been further treated in two ways.
• Data have been cumulated over the name of the Saints as follows: where x denotes the name of a Saint. Thus, F (x) is the set of municipalities which toponym derives from the Saint x, and represents the popularity of x. Specifically, the cardinality of F (x) -denoted as |F (x)|-is the f requency of x. AT I and P OP are the intuitive notations for the ATI values and for the number of inhabitants datum, respectively. In particular, AT I j and AT I(x) (P OP j and P OP (x)) represent the ATI (number of inhabitants) datum for the municipality j and the Saint x, respectively.
• Furthermore, all data have been cumulated and next averaged to take into account the name (x) popularity F (x) of the Saints:

Statistical analysis of the data
First of all, we provide a description of the dataset through the main statistical indicators. Then, some specific aspects of the considered dataset will be treated, on the basis of the methodological techniques described in the Appendix.

Main statistical indicators
The computation of the main statistical indicators leads to some interesting outcomes: • it appears that there are 91 cities with a f emale Saint name for 31 (female) different names, as reported in Tables 5 and 6 (ESM); the most popular is Santa Maria (including Marie) occurring 23 times, much preceding Sant'Agata (12 times). Our statistical analysis shows that the distribution is rather skewed (skewness ∼ 3.63) and the kurtosis ∼ 13.39; • from Tables 5 and 6 (ESM), it appears that there are 546 cities with one (or two) male Saint name(s). The most popular is San Pietro (43 times), but San Giovanni (36 times) is not far off. There are 175 different names.
(Recall restrictions, due to Cosmo and Damiani, and to Marzano and Giuseppe, as if there were 548 different cities). For these 548 hagiotoponyms, our statistical analysis shows that the distribution is rather skewed (skewness ∼ 4.57) and the kurtosis ∼ 23.93; • there are 206 different Saints (175 males + 31 females) within the above grouping rules for 639 hagitotoponyms. This popularity (F ) distribution is rather skewed (skewness ∼ 4.55) and the kurtosis ∼ 24.0.
The values of the statistical indicators provided in the above list illustrate an hagiotoponym-type of urbanization mostly related to a few very popular Saints (Pietro and Giovanni for males, Maria and Agata for females), but with a very large number of cities recalling unfrequent Saints. This is totally in line with the deep cult of very important and popular religious figures like the Virgin (Maria), the first Pope (Pietro), etc., along with widespread local traditions related to less famous Saints. Table 1 collects the main statistical indicators related to the raw ATI and number of inhabitants data. This Table can be compared with the main statistical indicators associated to ATI and number of inhabitants when the entire set of IT cities is considered ( Table 2). It is evident that the minimum values of ATI and number of inhabitants share the same magnitude order in the overall IT and hagiotoponym cases, while the maximum ones are remarkably different (the maximum of IT is much greater than that of hagiotoponyms for both the considered variables). However, IT cities are on average slightly richer and more populated than hagiotoponyms. Further information is bought by other statistical indicators: IT cities are noticeably more volatile than the set of hagiotoponyms (higher variance either for population and ATI), and exhibit also a greater level of kurtosis. All these facts point to hagiotoponyms which describe a hypothetical IT situation when the main outliers are removed, both for the number of inhabitants as well as for the ATI. Table 3 collects the main statistical indicators for the quantities AT I(x), P OP (x), AT I(x), and P OP (x) with respect to the independent variable x. By a methodological point of view, the closest integer has been taken for P OP (x). Moreover, at these levels, and thereafter, we have not distinguished between male and female names. The ratios between the number of cities having a hagiotoponym or the relative number of different Saints (in both case ∼ 0.17) or the proportion of female Saints in the overall counting (in both cases ∼ 0.14) are not small. Nevertheless, it is not expected that distinguishing genders would bring much to the discussion. Moreover, it should be obvious to the reader that to take into account the gender would lead to triple the number of curves, figures, columns in Tables, or Tables. However, we do not disregard the interest of such an investigation in the future, for a complementary paper.

Saints frequency regional disparities
It can be observed from Table 2 (ESM) that, on average, there are ∼ 8% hagiotoponyms in the Italian regions. However, Valle d'Aosta is an outlier in the hagiotoponym distribution, since about 22% of the municipalities contain a Saint name in that region. Recall that Valle d'Aosta is an autonomous-like region, with much historical connections with France. In fact, in France, more than 5000 municipalities, out of about 35000, i.e. about 15%, contain a Saint name.
In the more extreme Italian cases Lombardia and Trentino-Alto Adige are "very poor" in hagiotoponyms: ∼ 5%, though these are regions in the North of IT, like Valle d'Aosta. In contrast, Calabria, a southern region, has a much above average ∼ 14% hagiotoponym content. Note that the largest percentages of female hagiotoponyms occur in Umbria (0.6666), Sicilia (0.375) and Calabria (0.1864), as for the female/all ratio. In contrast, the largest percentages of male/all hagiotoponyms occur in Friuli-Venezia Giulia (0.9412) and in Lazio and Valle d'Aosta (0.9375). Basilicata has a noticeable feature: there is no female Saint name in any city. In fine, from the search of an empirical law point of view, the finite size effect of the data is remarkably emphasized when searching for the hagiotoponym frequency distribution; see insert of Fig. 2. Taking such a finite size into account, the rank-size relationship for the 20 regions can be well reproduced by a fit with a function as Eq. (1.6) in ESM. To our knowledge, the fundamental reason for the validity of such a function, in this type of considerations, is unknown, but rather ad hoc. Only mathematically plausible arguments (Naumis and Cocho, 2008) are known, but they seem hardly applicable in our case.

Saints frequency empirical distributions
It has been shown here above that there are 206 different Saint names. Thereafter, a rank-size (Zipf) plot on classical axes, Fig. 3, can be presented, i.e., the number of cities, independently carrying the name of a specific Saint, ranked in decreasing order of the Saint popularity. In so doing, we are only focussing on a linguistic-like approach, as a function of the rank, i.e. how many times the Saint occurs (its "size") in hagiotoponym cities. The display, for the various genders and the whole data set, indicates a smoothly decreasing data, as if a Zipf law exists. However, according to Fig. 4, displaying the same data as on Fig. 3, but on a log-log plot, the rank(r)-size(s) relationship for the number of times a city has the name of a Saint (i ∼ male (m), female (f) or all (a) cases) is obviously seen to be hardly represented by a mere power law. A more appropriate fit is through a Zipf-Mandelbrot law, Eq. (1.4) in ESM, with downward curving at low rank. The parameter values are given in the Figure. Nevertheless, note the slight king effect for Maria -i.e.: the distortion effect due to the outlier Maria, see Laherrère and Sornette (1998)-(ν ≤ 0). Note also that the power law decay at high rank is very similar for each gender, with an exponent (ζ) close to 1. Such fits are rather remarkable since the regression coefficient R 2 ≥ 0.99. A log-log display of the frequency-size relationship, Eq. (1.3) in ESM, i.e. the frequency of the size, for the Italian cities bearing a Saint name, is shown in

Population size considerations
The number of inhabitants in the 639 hagiotoponym cities is presented on a linear-linear plot in Fig. 7. Visually, this looks like displaying a smoothly, hyperbolic-like, decaying data, with an exponent close to 1. However, the R 2 value is pretty low, i.e. ∼ 0.54. Alas, as better seen in Fig. 8, there is no nice simple fit, by an empirical law with few free parameters. Indeed, Fig. 8 presents a log-log plot for the rank-number of inhabitants in the 639 Italian hagiotoponym cities. Visually, from the data scattering, it cannot be expected that a simple empirical law can be found: four simple laws are indicated, but do not lead to a convincingly interesting regression coefficient. The Zipf-Mandelbrot law has a R 2 ∼ 0.98, but is far from being visually appealing at high rank. It can be concluded, at this stage, that the sampling is far from a random one. Note that these simple fits for the "all Saints" case ( Fig. 7 and Fig. 8) are not nice enough to suggest a decomposition between males and females in further work. In view of the change in curvature of the data near the middle of the rank range, it is inappropriate to consider a fit by a power law or any other purely convex function. Instead, it seems that a fit by a 3-parameter function with inflection point, as that given in Eq. (1.7) in ESM, is more appealing. This is shown in Fig. 9, on a semi-log display of the number of inhabitants in the 639 Italian cities wearing a Saint name, -cities ranked in decreasing order of the number of inhabitants; a fit by such a function shows a convincing R 2 ∼ 0.992. Moreover, the fit for r ≤ 350 is quite visually appealing. These results can be compared with what the overall 8092 IT cities say. There is evident presence of outliers at a high rank, as the histogram in Fig.  10 noticeably puts in evidence. This fact is further confirmed by the fits for IT, which seem to be more of high quality when outliers are removed. In this respect, the comparison between the best 3-parameters Lavalette fit in the two cases of all 8092 cities and of removal of the 80 highest rank cities is pretty informative, being the latter more visually appealing than the former (see Figures  11 and 12). Also the case of low rank is quite interesting. Fig. 13 shows that the low rank IT cities are nicely fitted by a 3-parameters Lavalette curve, with R 2 ∼ 0.986. To conclude, by ranking cities with respect to the number of inhabitants, the sample of hagiotoponym cities behaves closely to the overall IT cities when outliers are removed.

Economic considerations
The < AT I > of all the 639 Italian cities containing a Saint name over the period 2007-2011 has been displayed on log-log axes in Fig. 14. It is seen on this figure that simple laws, as those tested, i.e. power, exponential, log, do not point to a plausibly simple empirical relationship. In fact, in view of the change in curvature of the data near the middle of the rank range, it is inappropriate to consider a fit by a power law or any other purely convex function. Instead, it seems that a fit by a 3-parameter function with inflection point, as that given in Eq. (1.7) (ESM), is more appealing. A semi-log display of the < AT I > the 639 Italian hagiotoponym cities, cities ranked in decreasing order of their ATI is shown in Fig. 15. A fit by such a 3-parameter free function shows a convincing fit for r ≤ 350, and an acceptable R 2 ∼ 0.989. The latter is smaller than in the case of the population size, in Fig.  9, but the exponents seem quite similar.
Also in this case, the comparison between the outcomes of the economic analysis of the hagiotoponyms and the one related to the IT cities might be of some usefulness. At this level, we refer the reader to Cerqueti and Ausloos (2015), where ranksize rules are applied at a national as well as at a regional level for cities in IT. The size is given by the ATI, and data are disaggregated on the basis of municipal unit. Cerqueti and Ausloos show that IT national economic data seems to be well described by a 3-parameters Lavalette curve, even if the distortion effect due to the presence of outliers is of high magnitude. This is the so-called king and viceroy effect, see Section 4.2 in the cited paper. For what concerns the validity of Zipf-Mandelbrot law, the hagiotoponym sample behaves according to the overall IT and to the majority of IT regions, in that it statistically fails in all the cases, being Lazio region a remarkable exception. Hence, we can reasonably state that the set of hagiotoponym cities proxies IT cities without outliers, when ATI is considered in the context of rank-size rules.

Study of the correlations
In view of the above findings, considering that similar empirical laws seem to hold for relations between the city population, Fig. 9, and the city wealth, Fig.  15, as a function of the rank in the relevant variable, it seems of interest to search whether such ranks are correlated. Such an answer is obtained from so called scatter plots (see Bradley, 2007), which allows to have some insight in the correspondence between data lists when the measures themselves are of less interest than their relative ordering importance.
First, a log-log display of the scatter plot for the < AT I > and the number of inhabitants, for the 639 Italian hagiotoponym cities, is presented in Fig. 16. The best power law fit indicates a loosely compact set of points along a quasi linear function. A correlation seems plausible, but there are a few outliers. Next, the corresponding scatter plot for the cumulated variables, i.e. AT I(x) and P OP (x), thus for the 206 Saints, can be observed in Fig. 17 on log-log axes. Again, a fine power law with an exponent close to 1 is found. Third, Fig. 18 is the corresponding log-log display of the scatter plot of the averaged over 5 years ATI and the number of inhabitants, in cities corresponding to the 206 Saints, but when reduced as a function of the frequency (popularity) of the Saint, i.e. AT I(x) and P OP (x). Again the exponent of such a power law fit is close to 1. The regression coefficient is the highest for the cumulated data, as should be expected. However, the R 2 is much lower (∼ 0.8) in the latter case, indicating much scattering, whence a rather strong deviation from a perfectly correlated popularity (frequency) effect. In order to quantify some correlation in two (necessarily equal size) sets, e.g., between population and ATI data, the Kendall's τ rank measure (Kendall, 1938) is usefully calculated. First of all, it is important to note that N = 206 implies p + q = 21115, when there is no overlap, being p and q the number of concordant and discordant pairs, respectively. However, in the present case, two hapax Saints (Bassano and Sosti) have the same cumulated number of inhabitants (2209), whence p + q= 21114. The relevant data, i.e. the Kendall τ , Eq. (1.8) in ESM, and correlation statistics of ranking order between (i) P OP (x) and AT I(x); (ii) P OP (x) and F ; (iii) AT I(x) and F ; (iv) P OP (x) and F ; (v) AT I(x) and F , for 206 Saints (x), with notations as in the text, are given in Table 5. For a visual inspection of the correlations among the variables belonging to this set of data, of some usefulness can be Figs. 20 and 21. The marked variations between the various τ coefficients allow interesting observations. First of all, one can compute the Kendall coefficient between the number of inhabitants and < AT I > for the 637 hagiotoponym cities, and obtain 0.81. Admitting that there might likely be some different wealth regime of the inhabitants in the various cities, it is bona fide expected that the total < AT I > of a city would be somewhat in direct (simple) relation with the number of inhabitants. In fact, there are variations in the < AT I > /N (ATI per inhabitants) values. For this, we found: mean ∼ 9813; median ∼ 9593; standard deviation ∼ 3893. However the relative pair ranking concordance ratio p/q is ∼ 9.7. It maybe concluded that there is a high concordance. It is here interesting to point out the analogies and disparities among hagiotoponyms and IT when dealing with the Kendall rank correlation, being the variables under scrutiny ATI and number of inhabitants. The Kendall τ for the case of IT cities is reported in Table 4. It is immediate to check that Kendall τ is substantially the same in the case of hagiotoponyms (τ ∼ 0.849) and IT (τ ∼ 0.850). From this, it can be concluded that there is a strong regularity in the correlations between the two types of city ranking in IT. Moreover, it is also important to stress that the discrepancies between IT and hagiotoponyms can be found in the number of inhabitants and in the ATI, and not in how the corresponding ranks are associated. This further confirms what already found for the relationship between IT and Saint cities.
The matter is quite different for the cumulated data rank correlations observed at the Saint level: τ 0.850 or τ 0.788, for which p/q ∼ 12.35 or p/q 8.44, either for the total cumulated data per Saint of for the reduced value taking into account the Saint popularity: columns (i) and (ii) in Table 5. This huge variation in p/q is, surprisingly, pointing to a redistribution of the ranks, following some sort of randomization. The effect is amplified when the correlations with the Saint popularity are examined: in this case, τ ∼ 0.510 for P OP (x), with the ratio p/q ∼ 3.08, and a similar τ ∼ 0.518 is found for the rank correlation between AT I(x) with respect to the frequency F of a Saint. In the latter case, the ratio p/q ∼ 3.15 (see columns (iii) and (iv) in Table 5). Therefore, it can be concluded that the cumulated population of cities for a given Saint is far from being positively correlated with the Saint popularity. When taking into account the Saint popularity and the cumulated data of cities into the Saint level, we have that the ranking correlations with respect to the Saint popularity leads to a very small τ ∼ 0.068 or τ ∼ 1.02. Note that the ratios p/q are ∼ 1.15 and ∼ 1.22; see columns (v) and (vi) in Table 5. From a purely statistical perspective, this result is very close to prove an independence of the rank sets. Finally, on one hand, this indicates how careful one should be in drawing conclusions from one statistical indicator only. On the other hand, it points to a "Saint" effect, either positive or negative, depending on the considered variable.

Conclusion
In summary, let us note that the objectives of the study were to describe the Italian society when referring to a key aspect of it: the cult of catholic Saints and its reflection on the toponyms of the Italian cities. With this aim, we have found that: (0) there exists an exhaustive list of cities in IT whose toponym is a derivation of a human Saint name. This requested some linguistic approach beside some religiosity filter. The resulting dataset can be used for subsequent studies dealing with related problems; (i) there exists a rather simple empirical law for the distribution of Saint Names as hagiotoponym of cities, in particular in IT, and who are the hapax Saints; (ii) there exists a rather simple empirical law about the distribution of population sizes for such cities; (iii) there exists a rather simple empirical law about the wealth distribution, for such cities, measured through their Aggregated Income Tax; (iv) there exists some correlation between such data. Specifically, there is a high concordance between the ranking of the average ATI and of the population for the hagiotoponyms.
(v) there are qualitatively reasonable causes which can be given for the findings.
The empirical laws are not trivial ones, thereby proposing further mathematical investigations on them. This remark holds for the city population and the ATI. Correlations exist between both variables, but with some loose ties because of the outliers. Our conclusion on city population vs. ATI correlations tends to indicate that these cities with hagiotoponyms are not drastically different from those in the rest of IT. In particular, it seems that hagiotoponym cities may represent a proxy of the overall IT cities when outliers are removed, both at an economic as well as a demographic level. Moreover, the rank-rank relationship between ATI and number of inhabitants -obtained by employing the Kendall τ -leads to a robust outcome of concordance. The results on Saint popularity, in fine, seem to indicate a lack of correlations between the latter and the two main variables which we have examined. On (v), further explanations are needed. Let us distinguish females and male names. The most important cult to have developed in christianity is that of the Virgin Mary, whence St. Maria is naturally an appealing Saint name for providing some cult in some city, and having given the name to several places. Yet, it seems interesting that she is less popular than Pietro and Giovanni, who both were the closest apostles to Jesus, according to christian tradition. Interestingly Martino comes third, before Giorgio. Martin is also very popular in France and other countries, where his name is usually more popular (Ausloos, unpublished) than Giovanni (Jean) and Pietro (Pierre). Giorgio's 4th place is interesting: his name occurs much in IT, but also all over Europe. The popularity of George is likely due to the religious role played by Saint George, who killed dragons which synthesized the devil and its hell. Agatha, as the second most popular female Saint is also interesting. She represents one of the most important martyr of the sicilian christianity: she is venerated at least as far back as the sixth century, "because" she had her breasts cut off, whence of interest for cults by women in order to produce gynecological miracles. Why men, usually leading the populations at those early Christian times, would give her name to a city is nevertheless an open question.           Table 5 (ESM). Observe the convex curvature for the male data cumulative distribution function, with a slight king effect for males, due to the large number (101) of hapax Saint males, i.e. for the "small size" region.  Visually, this plot looks like displaying a smoothly decaying data, but the regression coefficient R 2 is pretty low; alas, as "better" seen in Fig. 8, it is not possible to find a "nicely simple" fit. see text) Saints generating Italian hagiotoponyms, -cities ranked in decreasing order of the number of inhabitants; four simple law fits are shown with their corresponding regression coefficient. Visually, from the data scattering, it cannot be expected that a simple empirical law can be found. Indeed, four simple laws are indicated as not giving a nice regression coefficient. Note that these trial fits with simple empirical laws for the "all Saints" case, i.e. Fig. 7 and Fig. 8, are not nice enough to suggest a decomposition between males and females. It is also worth to note that from Fig. 8 it is confirmed that even if regularities of power-law type does not apply for an entire set of data, such regularity may exist for qualified subclusters. This is precisely the case of the hagiotoponym Italian cities, for which four "regimes" seem to emerge. One is for the 7 or 8 most frequent (in population number) names. The next regime contains about 30 names, and the next one about 50. The final regime contains about 100 names. The first and third regimes have approximately the same power law exponent.   outliers. It seems that the Lav3 function (green) fits the C data very well; in contrast the fit for the 8092 is not so good. Figure 12: Also in this case, K represents the 8092 cities, while C is the set of the upper 8092-80 cities, thus removing 80 outliers. As in Fig. 12, one barely sees that the Lav3 function (green) fits the C data very well; in contrast the fit for the 8092 is not satisfactory.  Since these trial fits with simple empirical laws in the "all Italian Saints" case, are not convincing enough to represent the data, they suggest to pursue further a data decomposition between males and females.       is called the rank-size scaling law and has been often applied to city sizes. The particular case α = 1 is thought to represent a desirable situation, in which forces of concentration balance those of decentralization. Such a case is called the ranksize rule. The interested reader is referred to Gabaix (1999), Gibrat (1957) Hence, the rank-size relationship has been frequently identified and sufficiently discussed to allow us to base much of the present investigation on such a simple law. This may be "simply" because the rank-size relationship can be applied to a wide range of specific situations (see Martìnez-Mekler et al., 2009) and Zipf's law obtained in different models: one example is tied to the maximization of the entropy concept in Chen (2012); another stems from the law of proportionate effect, so called Gibrat's law (Gibrat, 1957). Thus, let us express Zipf's law, in other words: the rank-frequency relationship, i.e. the relationship between the number f (frequency) of the occurrence of an "event" and its rank r (Hill, 1974), consists of an inverse power law:

Appendix
A display of the rank-size (or rank-frequency, two names for the same concept) relationship for cities bearing a Saint name ranked in decreasing order according to their "frequency", Eq. (1.2), is shown in Fig. 4. The best (least square) corresponding power law fits are indicated; the gender (f or m) is distinguished beside the overall (a) size (s) data. It can be observed that the power law fits look quite acceptable, being almost perfect in the "female case". Another law is attributed to Zipf : the size-frequency relationship, i.e.: the link between the frequency f and the size s of an "event", is also in this case an inverse power law: Of course, deviations from the simple law often occur, as illustrated through Fig.  7 and Fig. 14. In fact, there is no obligation for the size-frequency data to be enveloped by a purely convex or purely concave function (Egghe and Waltman, 2011). A so called "king effect" (Lahèrrere and Sornette, 1998) i.e. a sharp upturn at low r values often exists. A leveling off at low r, the queen effect can also occurs, as in Fig. 4 (see also Ausloos, 2013). In such a case, a Zipf-Mandelbrot-like (ZM), sometimes called Bradford-Zipf-Mandelbrot-like (BZM), law might be considered as more realistic (Fairthorne, 1969). It implies three parameters (J * , ν and ζ). Moreover, it can also be asked how many times one can find an "event" greater than some size s, i.e. within the size-frequency relationship. Pareto (1896) found out that the the cumulative distribution function of such events follows an inverse power of s, or in other words, where Y is the random variable of the size, P a probability measure and κ a scalar. Again, this is quite an approximation, as illustrated through Fig. 6. It is important to observe that the Yule-Simon distribution can be used to reproduce a Zipf law, but it introduces an exponential cutoff in the upper tail (Rose et al., 2002). The stretched exponentials (Lahèrrere and Sornette, 1998) and log-normal distributions (Montroll and Shlesinger, 1983) usually reproduce one of the tails but not the other. Usually, such deviations do not change in a dramatic way the correlation coefficient since the tails do not have a great impact upon this coefficient. Moreover, such distributions assume along (infinite) tail which is antagonistic to the concept of finite sizes, when there is a true maximum rank r M . When an inflection point occurs, the 2-parameter form, so called Lavalette function (Lavalette, 1966), of the rank-frequency (or rank-size) relationship reads: (1.6) its simple generalization into a 3-parameter free function ( (1.7) Note also that the role of r as independent variable in Zipf's law. Specifically, Eq. (1.1) is taken by the ratio r/(N − r + 1) between the descending and the ascending ranking numbers. The semi-logarithmic graph shows a reverse sigmoidal S-shape (or an inverse N-shape) which cannot be provided by Zipf's law. By the way, in a double-logarithmic diagram the downwards deviation from the Zipf's straight line at high rank is much emphasized. More complicated forms with many more parameters generalize the Lavalette form (see Ausloos, 2014a, 2014band Voloshynovska, 2011 . (1.9) similar to the classical one, when the distribution can be approximated by the normal distribution, with mean zero and variance, -in order to emphasize the coefficient τ significance. From a purely statistical perspective, under the null hypothesis of independence of the rank sets, such a sampling would have an expected value τ and Z = 0. A website (Wessa, 2012) allows τ immediate calculation.
frequently, Salvo) is the usual Italian appellation of Jesus, -who is not a Saint. Item 17. denotes a village which was named Sancti Boni or Castrum Bonum at the time of its foundation, to revere the holy goodness. However, it does not point to a specific Saint, and has been removed from the preliminary list. Nevertheless, we kept Michele (11 times) and Raffaele (once) as bona f ide Saints, although they are not humans, but archangels. However, they are so much anthropomorphic that they can be here assimilated to human Saints.

Linguistic transformations
Linguistic transformations have been treated case by case. As an example, San Lorenzello can be identified with San Lorenzo. Therefore, they belong to the same class (Lorenzo's one). In several "difficult" situations, the identification procedure has been analogous to that of the strange cases, i.e. the reading of the historical notes about the municipalities. The case of Monte San Pietrangeli, in Marche, is a nice example. We did not find any Saint named Pietrangeli. The historical notes provide a different name of such a municipality, still used by its inhabitants, which is Monsampietro. The reference Saint is then Pietro, the first Pope. Such linguistic transformation cases have been collected in Table 3 (Electronic supplementary material). As hinted here above, other complex cases are the municipalities with a hagiotoponym written in a foreign language. There are some (16) French names in Valle d'Aosta and some (9) German names in Trentino Alto Adige. For the German names, it is a case of application of the bilingualism of that Region, and there is a (legal) Italian counterpart (translation). The adopted criterion has been to translate, when possible, the French names in Italian, and take only the available translation in Italian of the German names. Results are shown in Table 4 (Electronic supplementary material). The number of different Saints, in the final list of municipalities, is thus 206, as reported in Table 5 and 6 (Electronic supplementary material), containing 31 females and 175 males 1 ; in which 15 and 101 are attributed to only 1 city (hapaxes); see  377  26  26  20  6  SICILIA  390  40  40  25  15  TOSCANA  287  20  20  17  3  TRENTINO ALTO ADIGE  333  16  16  14  2  UMBRIA  92  5  5  3  2  VALLE D'AOSTA  74  16  16  15  1  VENETO  581  54  54  45  9  Total  8092  637  639  548  91   Table 2: Summary Table with the hagiotoponym cities in IT. Data are disaggregated at the regional level, in the alphabetical order for regions. The discrepancy between the number of hagiotoponyms (637) and the Saints "frequency" (639) is due to the presence of cities with a name containing two Saints: Cosma and Damiano, Marzano and Giuseppe, see ( * ) in the