A network-based measure of the socio-economic roots of the migration flows

This paper provides a unified view for defining a measure of the reasons behind migration flows whose nature is of social and economic type. To this aim, worldwide migration flows are here presented in the context of complex network and a related socio-economic indicator is conceptualized. The ingredients of the indicator also include the economic strengths of the countries and how they behave in terms of community structure, where “community” has to be intended in the sense of how countries interact in terms of immigration and emigration. Empirical analyses on a wide set of real data validate the theoretical framework, hence giving a paramount quantitative view of the roots of the worldwide migration flows.


Introduction
The movement of individuals from the origin countries to new ones represents -and has represented in the past -one of the most pervasive demographic phenomenon at international level. A number of reasons can be found behind migrations. Most of them are of socio-economic nature. Often migrants leave their countries to escape from ethnic tensions and conflicts, civil wars or poverty and absence of a satisfactorily labour market. This paper aims at providing a new perspective for exploring the socio-economic reasons behind the migration flows. Specifically, we propose a novel socio-economic indicator with the purpose of defining a synthetic measure of the incoming and outgoing flows between countries in terms of: (i) the economic characteristics of the origin and destination countries; (ii) the interconnectedness of the countries in the overall context of migration flows; (iii) the entity of the migration flows from the origin country to the destination one.
The issues listed in points (i) − (iii) are reasonably assumed to broadly include the entire set of motivations for explaining migration flows. As a very relevant example, think at the motivations for migrating related to the social connections with people living in a given destination country. In this case, the roots of migration can be found in the economic strength of the hosting country -which indeed gave the possibility to migrants' connected people to take a position in the socio-economic context -and in how easily the final destination can be reached through other transit countries.
Several papers have dealt with the analysis of the roots of migrations. Migration is a global phenomenon, which is the effect of many different factors. For instance, sociological and economic motivations have driven (and they still drive) the migrations (see [2], [13], [15]). A crucial hypothesis is that the migration flow comes from poor to rich countries. This fact results from disequilibrium in the global distribution of income and wealth, encouraging the movement towards richer countries in the attempt to improve the conditions of life (see [4], [19]). Other theories about migrations are related to spatial distance between countries, as well as the mass of the migrants. A wellexplored literature focuses on the "gravity-model", in which the migration is directly related to the population of the origin and of the destination country, and inversely related to the distance between them ( [1], [3], [12]). For instance, in [18] the mass of migrant flows reduces with increasing the spatial distance, due to travel time, monetary costs but also cultural differences, facilitating in this way the back-and-forth migration.
An interesting perspective in the migration roots context is offered by [23]. Such a point of view is rather different from ours, since the authors deal with a concept of migration related to the changing of opinion, while we consider the socio-economic contextualization of the migrants. The physical nature of migration is well explained in [24], where the motion of substance (or of human) is modelled over the nodes of a network. The paper is of particular interest, because it assumes that the nodes can also exhibit an attracting behaviour. In this respect, our paper carries out arguments which are similar to those of [24], where countries are able to polarize migrants. Such a property is here captured by the proposed indicator.
Global migration flows between countries naturally can be well described in terms of complex networks (see [9], [22]). However, a network analysis of internal migration patterns has not been explored enough. A recent contribute to this aspect is given by [11], where the network structure of US migration on a yearly basis is analysed, identifying communities that arise, as well as their trend.
One of the most recent and prominent contributes is due to [25], where the network of global migration is considered and empirically explored over the period 1990-2013. This paper uses methods of temporal and cross-sectional exponential random graph for identifying the determinants of migrations and assessing the presence of noticeable clusters in migration flows.
We feel close to [25]. Indeed, the quoted paper advances a methodological proposal for simulta-neously combining different aspects of geographic, demographic, economic, religious, linguistic, and historical nature -all related to migration. Also in the present paper a unified framework is carried out and determinants of migration are evaluated jointly. However, to the best of our knowledge, this is the first paper proposing a synthesis of such elements through a novel indicator, which is then able to capture in a unified framework the universe of socio-economic motivations for explaining the immigration phenomenon.
Our approach is based on theory of complex networks, which seems to be particularly appropriate for our purposes. Countries are viewed as nodes of a weighted network, whose directed weighted arcs model both the presence and the entity of the migration flows between a (origin) country to another (destination) one. In so doing, we can imagine that any node has a in-weight and out-weight associated to an adjacent node, to capture both immigration and emigration phenomena.
The interconnected structure is represented by means of the weighted clustering coefficient of the nodes of the network, with a special reference to immigration for destination countries and emigration for origin ones. In particular, we here adopt the definitions proposed by [8] and [20].
Indeed, the interconnection plays a fundamental role in determining the migration patterns. A migration from a country to another could be the result not only of a direct movement but also the effect of the transition through a third country. In other words, the presence of weighted triangles in this network matters.
The economic strength of the countries are identified through their GDPs.
All the considered quantities have been opportunely normalized, in order to derive a relative indicator and be able to make comparisons. Moreover, we propose a global analysis of the immigration and emigration flows related to single countries and sets of countries by means of a local indicator related to the connections between couples of countries. In so doing, we are able to present and discuss a paramount view of reasoning behind the migration flows.
In dealing with the definition of a social indicator in a complex networks environment, we are in line with a wide strand of the literature. The mainstream literature in social network analysis focuses on local network measures, as vertex centralities, in order to assess the power or influence of a node/link in a whole system, as well as other global network parameters, as assortativity, average path length or average clustering coefficient. For instance, in [16] the author provides a composite index that measures the bundles of institutions channelling the positive effect on economic prosperity.
The rest of the paper is organized as follows. Section 2 describes the network of migration flows.
This Section also provides the basic notation used throughout the paper. Section 3 contains the construction of the socio-economic indicator, which is the core of the study. Section 4 is devoted to the empirical experiments and to the discussion of the obtained results. Last Section offers some conclusive remarks.

The network of migration flows
First, we briefly remind the terminology on networks that will be used in the paper. For a reference about graph and network theory, the reader can refer to [14], [21].
A graph G = (V, E) is a pair of sets V and E, where V is the set of N vertices (or nodes) and E is the set of M pairs (links) of vertices of V ; if (i, j) or (j, i) ∈ E, then vertices i and j are adjacent.
A link (i, i) ∈ E is said a loop. A weight w ij > 0 can be associated with each link (i, j) so that a weighted graph is obtained. The adjacency relationships between vertices of G are described by a non-negative, real N -square matrix A (the adjacency matrix). The degree of a node i, d i , is the number of links incident with it. In case of weighted graphs, the so-called strength of the node can be also computed. Formally, it is simply the sum of the weights over all links incident with it, A graph is directed if it is characterized by directed links (or arcs) between nodes. In this case (i, j) ∈ E does not imply (j, i) ∈ E. The in-degree d in i of a node i is the number of links pointing towards i, the out-degree d out i of i is accordingly defined as the number of edges originating from i.
The total degree of a node is the sum of in-degree and out-degree, d tot In-strength and out-strength are defined analogously as: The total strength is the sum of in-and out-strength.
We model the migrant's flow through a network, where nodes are countries and links identify the migrants moving from a country to another. Since a migration between two countries i and j can occur everywhere and in both directions, the network G is complete and all nodes are connected through bilateral directed links. Given the nature of the migration phenomenon, in the network loops are not present, therefore the total number of arcs is M = 2N (N − 1).
The analysis will be performed in one period, taking a "picture" of one reference year. For the analysis, we will consider the entire set of countries, according to the United Nations (UN) classification 1 , extended in order to take into account also of dependencies 2 , areas of Special Sovereignty 3 and state in free associations 4 . In particular, we use for our analysis 231 distinct countries.
Links are opportunely weighted by relative amounts of migration flows. Specifically, we denote by M IGR ij ∈ N the number of people migrating from the origin country i to a destination country Given i, j ∈ V , the weight of the link (i, j) is defined as: Note that, being the network complete, the absence of a migration flow from i to j is not associated to the missing link (i, j) but, rather, to an existing link with w ij = 0. This allows us to catch in our analysis also the effect of relations between countries where no migrants' flows are observed.
We also assign to each node an attribute in order to quantify a specific characteristic of the country. More precisely, for each i ∈ V , we denote by α i ∈ [0, 1] the relative weight of the node i in the overall system. In our context, α i represents the relative richness of country i, which is measured through the ratio between the GDP (Gross Domestic Product) of the country i and the total GDP of all the countries populating the network. Formally: where GDP i is the GDP of country i at the reference year. It's evident that k∈V α k = 1, then α i represents the proportion of richness of the country i over the totality of the countries.

Construction of the socio-economic indicator
In this Section we propose a new socio-economic indicator of the flow of migrants from a country i to a country j. This indicator aims at providing a measure of the economic and social reasons behind the migration flows.
Indeed, there is a wide evidence that migrants generally move from poor countries to rich ones, preferring locations highly connected with the others to feel free to move again in the future from the hosting country to a new one. A socio-economic indicator related to the migrant flows should reflect both aspects, modelling both the macroeconomic status of the considered countries and the neighbourhood structure associated to each country.
To this end, we include the GDP of the country/node i (i.e. the parameter α i ) as well as the local clustering coefficient of i in the definition. Furthermore, the definition of local clustering coefficient will take into account of both the bidirectional links and the weights. 4 Examples are self-governing state in free association, as Niue and Cook Island that are in free association with

New Zealand
Being the distribution of weights really skewed, because of significant differences in terms of migrant stock between countries, the well-known clustering coefficients proposed in [8] may be useless to depict heterogeneity of countries providing values close to zero. Furthermore, as shown in [20], the coefficient proposed in [8] may fail when weighted complete graphs are considered 5 . Hence, we follow the idea proposed in [20] to consider weights in a binary clustering and we also adapt the coefficient to our framework in order to catch the effects of links with w i,j = 0. In particular, as in [20], we modify the adjacency matrix fixing a threshold s ∈ [0, max i,j (w i,j )] and defining A s , whose elements are: Hence, A s is the adjacency matrix describing the existing links in the network having weight w ij at or above the threshold s.
The idea is to capture the mean cluster prevalence of the network looking at a zoom-out level where only the strongest edges (i.e. edges with a weight greater than a given threshold) are visible.
To this end, we compute the following local clustering coefficient for directed networks: where The coefficient in the previous formula has been proposed in [8] for binary and directed case, and it is conveniently adjusted to our framework. In particular, the denominator is properly arranged in order to also include in the analysis those links having weight equal to zero. In other words, in this way we take into accounts that migrants from an origin country can hypothetically move to every other destination country. Therefore, the total degree, that is considered at the denominator of the original formula in [8] is here replaced by the total degree of a complete graph (i.e. 2(N − 1)). As in [8], bilateral arcs (whose number is (2(N − 1)) in a complete graph) have to be removed by the formula, as they represent "false" triangles, being formed by i and by a pair of directed arcs pointing to the same node, e.g., Since for a country the migrants' flow is incoming as well as outcoming, we are able to take into account different patterns among neighbouring countries, reflected by specific clustering coefficients (C in i (A s ) and C out i (A s )) . Also in this particular case we refer to in-and out-clustering coefficients proposed in [8], adapting them to our specific context: .
In-clustering considers triangles such that there are two arcs incoming into i (j → i, k → i, j → k∨k → j) (i.e. "in" triangles)Hence, it could give an idea of the movement of incoming migrants, also passing through neighbouring countries. Analogously, out-clustering focuses on triangles such that there are two arcs coming out of i (i → j, i → k, j → k ∨ k → j) (i.e. "out" triangles), giving an idea of the emigration towards neighbouring countries.
The process is repeated varying the threshold s and computing the clustering coefficient C i for a node i as the average of In a similar way, we can also obtain C in i and C out i by averaging overall values of C in i (A s ) and C out i (A s ) respectively. Notice that both local in-and out-clustering coefficients are in the range [0, 1], for each i ∈ V .
As already mentioned above, such a quantity provides information on the sense of neighbourhood structure associated to a country i in terms of migration flows: a high value of C in i (or C out i ) is associated to a country i with a large quantity of incoming (or outgoing) flows. We study them in a separate way in order to test possible different patterns in terms of incoming and outgoing flows. Now, we provide the definition of the indicator at the level of a single directed migration flow between two countries, and then we define how to combine them at a country, macro area or global level.
We introduce the local economic indicator associated to the migration flows from country i to country j -and denote it as I ij -as follows: Definition (10) offers a measure of the economic and social ground of the migration flows from country i to country j. In particular, it takes into account the difference between the economic strengths -measured through the α's -of the involved countries, which is remarkably amplified if the considered country forms a strict community with the adjacent countries (high value of the clustering coefficient). In particular, the clustering coefficient of a country i acts in formula (10) as a uniperiodal rate of return for the economic parameter α i . In this respect, the term α i (1 + C out i ) (or α j (1 + C in j )) can be viewed as an indicator which provides a synthesis of the economic strength of i and of the level of the "out" ("in") community around it.
It is worth listing the properties of the quantity defined in (10).
• I ij ∈ [−2, 2], for each i, j ∈ V . This is a very important property, allowing to state a consistent comparison among migration flows in terms of the economic wealth of the involved countries.
). This means that the origin country can be briefly seen as richer and more (poorer and less) interconnected -in terms of migration flows -with the surrounding environment than the destination one.
The situation I ij > 0 can be characterized with more detailed remarks. In this case, one has Thus, the socio-economic indicator I ij is positive when the origin country i has economic strength or community level above some thresholds also depending on the parameters related to the destination country.
The same arguments can be opportunely reverted for explaining the situation of I ij < 0.
One can expect that the case I ij > 0 will be the exception rather than the rule. Empirical data will support this intuition (see the next Section).
• I ij = −2 is associated to the case in which α i = 0, α j = 1, C in j = 1 and w ij = 1. Thus, this extreme case captures the situation in which the total amount of migrants flows from the poorest country -even, with null GDP -to the richest one, and the richest country has the largest possible sense of community with its neighborhood(i.e., the maximal level of clustering coefficient). The community level of the origin country does not play any role in this specific situation.
• I ij = 2 is associated to the case in which α i = 1, α j = 0, C out i = 1 and w ij = 1. The total amount of migrants flows from the richest country to the poorest one -also in this case the poorest country is the one with null GDP and the richest country has the maximal clustering coefficient. Symmetrically with the previous case, the community level of the destination country does not matter.
• I ij = 0 when α i (1 + C i ) = α j (1 + C j ) and/or w ij = 0. This neutral case is then related to migration flow between countries sharing an identical quantitative synthesis of economic and community levels and/or absence of migrants from i to j. Thus, this situation of fairness might depend on migration flows between identical countries -to be intended under a joint economic and community perspective.
It is clear that there is no any country with null GDP . Thus, the bounds ±2 are extremal cases and cannot be achieved in practical analysis.
Moving from the indicators computed for each couple of countries, we can provide a measure of the migration network with the aim of analyzing the socio-economic reasons of such a phenomenon at either country, macro area or global level. We define the country ("in" and "out") socio-economic indicators of the migration flows as a weighted mean of the local socio-economic indicators, as follows: It is worth pointing out that I out i is lower than zero when emigrants of country i move on average to a destination country with a higher economic strength or community level. In a opposite way, I in i is lower than zero when immigrants of country i comes from origin countries with higher economic strength or community level.
The approach can be extended by aggregating values at different levels. For instance we can derive both indicators for a set K (of size k) of countries i ∈ K ⊂ V (for instance all countries of the same macro area or the same continent) It is noteworthy that the aggregate indicators can be derived as the weighted average of country indicators with weights equal to in (out) strength (i.e. s in i and s out i ). For K = V , we have global ("in" and "out") socio-economic indicators of the migration flows as a

Empirical experiments
In order to test the performance of the indicator proposed in the previous Section, we consider data based on international migrant stock in 2015. In particular, the Population Division of the Department of Economic and Social Affairs of the United Nations Secretariat provides the international community with timely and accessible population data and analysis of population trends and development outcomes for all countries and areas of the world. As regard to the data we deal with, the dataset presents the estimates of total migrant stock at mid-year 2015 by origin and destination.
The estimates are available for all countries and areas of the world, based on official statistics on the foreign-born or the foreign population (mainly based on population censuses but also on population registers and nationally representative surveys). Specific adjustments are also made by the Department in order to exclude both foreign-born persons naturalized in their country of residence and children born to international migrants that live in countries where citizenship is granted upon birth (i.e a birthright citizenship based on jus soli). Finally, also the refugees' statistics reported by international agencies have been used by the Department in order to catch the number of person recognized as refugees or find themselves in refugee-like situations (see [6] for details). We have at disposal data of 231 countries also including dependencies and areas with special Sovereignty. To give a brief idea of the heterogeneity between different countries, we display in As described in Section 2, a network is used to describe migrants' flow where nodes are countries (i.e. 231 nodes) and weighted directed links, computed as in (3), consider migration between origin and destination countries. The network is plotted in Figure 2.
We can compute some classical network indicators. Values of degree and node strength provide meaningful insights. Ranking of top ten countries according to these indicators are reported in Table   1. As regard to destination countries, we observe the case of Chile, characterized by a low number of immigrants (in-strength is approximately 0.27% at the 67th position) but also by a significant heterogeneity of people that comes from 209 different origin countries 8 . Furthermore, of the twenty countries with the largest in-degree, fifteen are in Europe. This result is also an effect of million of migrants and refugees that crossed into Europe over the last years.
According to "out" indicators, we observe that India has the highest out-strength but also a high out-degree, meaning that Indian people are widely distributed in different countries of world.
A different situation is instead observed for Mexico that, although a high diaspora (i.e. a high out-strength), has the 98% of its emigrants settled in USA. 6 We did not use the classical grouping based on quartiles in order to emphasize differences on the right tail of the distribution. 7 According to the U.S Census Bureau, an increase of 12.6 million has been observed since the turn of the century.
Immigrants now comprise 13.5% of the U.S. population, roughly one out of eight residents, the highest share in 106 years. 8 Immigration has indeed contributed to the demographics and the history of this South American nation. This    Table 1: Top 10 countries ranked in terms of in/out-degree and in/out-strength.
In order to catch the neighbourhood structure of each destination and origin country in terms of migrant, we compute directed clustering coefficients as described in Section 2. Through the clustering coefficient, we want to catch both the number of relations between different countries and the number of migrants involved.
It is noticeable that main patterns are caught by the coefficient but also some specific behaviours (see Figure 3). For instance, we observe on average a very low in-clustering for the African continent and part of the Western Asia. Migrants in these countries comes in general from countries that are not related each other leading to a low coefficient. An exception is represented by South Africa, that is characterized by a high clustering because of different kind of migrants. In particular over the last years, neighbouring people from different nationalities that have fled the political and economic instability at home (Nigerians, Zimbabweans, Congolese and Somalis for instance), go in South Africa for the hope of employment and political protection (see [10] for an analysis of intra-continental migration within Africa).
As regard to the "out"values, it is noteworthy that Mexico does not belong to the very high category, despite a huge number of people living abroad. As said before, this is a consequence of almost unidirectional diaspora towards USA. Furthermore we observe that North America, Europe, The indicator proposed in previous Section has been computed at country level in order to emphasize differences between countries in terms of immigrants and emigrants (see Figure 4). As regard to the origin country, we observe a prevalence of negative values. It means that on average origin countries has economic strength and community structure lower than the destination countries.
In particular, the movement of migrants is often towards countries with a greater GDP. Exception are represented by countries with a very high level of GDP and clustering. United States, China, Russian Federation are the only three countries with an average index significantly greater than zero.
Other countries with a high GDP level (as Japan and UK) have a weighted average (see formula (12)) lower than zero because of the significant number of migrants towards countries with a higher  and East Europe is here caught.
It is also very singular the case of Nigeria. On one hand, as Africa's most populous country, Nigeria, with an estimated population of 150 million and over 250 ethnic groups, deals with a range of migration issues, from massive internal and regional migration to brain drain and a large, welleducated diaspora in the West (mainly towards the United States and the United Kingdom) that it sees as key to future development (indeed I out i =-6.3%). On the other hand, country is characterized by an immigration from other African countries (as Benin, Chad, Liberia, Mali) with a lower level of GDP and community (indeed I in i =0.6%). In particular, this is also a consequence of the development of ECOWAS 10 treaty, between countries of Western Africa in order to strengthen regional economic 10 The Economic Community of West African States (ECOWAS) Treaty is a multilateral agreement signed by the member states that made up the Economic Community of West African States. Benin, Burkina Faso, Cape Verde, Cote d'Ivoire, Gambia, Guinea, Guinea Bissau, Liberia, Mali, Mauritania, Niger, Nigeria, Senegal, Sierra Leone and integration through progressively free movement of goods, capital, and people and to consolidate states' efforts to maintain peace, stability, and security (see [7] (11) and (12)). In top figure, a negative index (green colour) means that people of that country migrated on average to countries with a greater economic strength and community level. In general, results confirm that migrants move toward destination countries with a higher community and GDP level.
As regard to the destination countries (bottom figure), we observe a negative index when, on average, immigrants come from richer or more interconnected countries.
Indicators have been also computed by grouping countries according to a macro area classification. We followed the composition of geographical regions used by the Statistics Division of United Nations in its publications and databases. Each country or area is shown in one region only. These geographic regions are based on continental regions, further subdivided into sub-regions and inter-Togo have accordingly agreed to a Revised Treaty of 24th July, 1993 mediary regions drawn as to obtain greater homogeneity in terms of population size and accuracy of demographic statistics. Results are displayed in Table 2.
It is noteworthy that emigrants move on average towards countries and regions with a higher GDP or community structure. Exception are represented by Northern America and Eastern Asia.
However, Eastern Asia is characterized by a significant heterogeneity with a positive index for China and very negative values for Republic of Korea and Hong Kong. As already stressed before, at the bottom we have Caribbean and Central America with a significant emigration towards other countries of the North and South American continents.
Looking instead at values of I in K , Northern America and all European sub-regions show positive indices. The effect of recent migration flows to EU ( [17]) explain these values. Also Eastern Europe seems more affected than in the past by immigration. How many people working in this sector are suggesting for some time now: Europe's restrictive policies and tightened borders do not deter overall migration but instead merely divert flows towards different routes. UN (14) and (13) respectively). Indicators are ranked in decreasing order.

Conclusions
Immigration is now a prominent feature in the economic, social, and political landscape of many countries. The reception and assimilation of immigrants is indeed a significant economic and social phenomenon in many previous emigration countries. This paper provides a unified view of the analysis of migration whose nature is of social and economic type. To this aim, worldwide migration is here presented as a network and a related socio-economic indicator is proposed. Both the macroeconomic status and the community structure of origin and destination countries are taken into account.
Indeed, the interconnection plays a fundamental role in determining the migration patterns. The interconnectedness of the network is here represented by means of the weighted clustering coefficient, referring to both immigration for destination countries and emigration for origin ones.
Empirical analysis confirms that migrants move, on average, toward destination countries with a higher community and GDP level. Furthermore, the indicator proves effective also in capturing patterns of migration flows showing at different "observation scales" peculiarity of countries or macro-areas of the world. Hence, we are able to catch distinctive behaviours related to specific contexts.