7.1 Population forecasting model
7.1.1 Purpose
The mathematical model of demographic indicators of SED (hereinafter – the Model) is designed to calculate the forecast of demographic values (population, birth rate, mortality, migration) of the socio-economic development of settlements (cities, regions, countries).
The set of demographic indicators is part of the concept and methodology for assessing the impact of investment projects on socio-economic development. The green frame in the figure highlights the place of calculated indicators in the general structure of SED indicators.
Figure. System of indicators used in assessing the impact of investment projects on socio-economic development
Demographic SED indicators are basic indicators characterizing the state and development of the city, they are initial for calculating social (provision with socio-economic objects, real disposable income of the population) and economic development (consumption volumes, payroll funds, and local taxes) indicators. The logical links of demographic indicators included in the Model with other basic SED indicators are presented in the figure.
Figure. Relationships of the Model’s demographic indicators with other basic SED indicators
The Model allows, based on an incomplete series of retrospective values, to form both long-term (decades) and medium-term (years) forecast values, as well as to restore missing ones and correct “anomalous” values in the retrospective period.
Main tasks of the demographic model: 1. Determination of stable trend coefficients describing changes in indicators over time. 2. Restoration of retrospective values of indicators using interpolation to improve the quality of municipal statistics. 3. Calculation of the baseline (inertial) forecast of demographic indicators for the calculation period. 4. Calculation of scenario-based demographic indicators, taking into account possible “man-made” events (investment project, expansion of territories, catastrophic phenomena) occurring at one time (“jump”) changing or not changing the trend at the choice of the Model user.
7.1.2 Main parameters, indicators, and notation
The Model allows for forecast calculations for the following demographic indicators:
| № | Demographic Indicator Name | Unit |
|---|---|---|
| 1 | Population | people |
| 2 | Number of births | people |
| 3 | Birth rate | ppm |
| 4 | Number of deaths | people |
| 5 | Mortality rate | ppm |
| 6 | Number of arrivals | people |
| 7 | Arrival rate | ppm |
| 8 | Number of departures | people |
| 9 | Departure rate | ppm |
| 10 | Natural increase (decrease) | people |
| 11 | Natural increase rate | ppm |
| 12 | Migration increase (decrease) | people |
| 13 | Migration increase rate | ppm |
The following notation for parameters and auxiliary indicators is used in the Model:
| № | Symbol | Name | Unit |
|---|---|---|---|
| 1 | \(t\) | Time (calendar date), year (month, day, hour) | year |
| 2 | \(N\) | Population size | people |
Actual values according to the Federal State Statistics Service (FSSS)
| № | Symbol | Name | Unit |
|---|---|---|---|
| 3 | \(N_{[20ХХ]}\) | Actual population as of January 1 of 20ХХ | people |
| 4 | \(B_{[20ХХ]}\) | Number of births in 20ХХ | people/year |
| 5 | \(D_{[20ХХ]}\) | Number of deaths in 20ХХ | people/year |
| 6 | \(E_{[20ХХ]} = B_{[20ХХ]} – D_{[20ХХ]}\) | Natural increase (decrease) in 20ХХ | people/year |
| 7 | \(I_{[20ХХ]}\) | Number of arrivals for permanent residence in 20ХХ | people/year |
| 8 | \(O_{[20ХХ]}\) | Number of departures for permanent residence in 20ХХ | people/year |
| 9 | \(M_{[20ХХ]} = I_{[20ХХ]} – O_{[20ХХ]}\) | Migration increase (decrease) in 20ХХ | people/year |
| 10 | \(∆N_{[20ХХ]} = E_{[20ХХ]} + M_{[20ХХ]}\) | Population increase (decrease) in 20ХХ | people/year |
| 11 | \(K_{Р[20ХХ]} = 2000·B_{[20ХХ]} / (N_{[20ХХ]} + N_{[20ХХ + 1]})\) | Birth rate, equal to the number of births in 20ХХ per thousand population in 20ХХ | ppm/year |
| 12 | \(K_{С[20ХХ]} = 2000·D_{[20ХХ]} / (N_{[20ХХ]} + N_{[20ХХ + 1]})\) | Mortality rate, equal to the number of deaths in 20ХХ per thousand population in 20ХХ | ppm/year |
| 13 | \(K_{Е[20ХХ]} = K_{Р[20ХХ]} – K_{С[20ХХ]}\) | Natural increase (decrease) rate | ppm/year |
| 14 | \(K_{П[20ХХ]} = 2000·I_{[20ХХ]} / (N_{[20ХХ]} + N_{[20ХХ + 1]})\) | Migration arrival rate, equal to the number of arrivals in 20ХХ per thousand population in 20ХХ | ppm/year |
| 15 | \(K_{У[20ХХ]} = 2000·O_{[20ХХ]} / (N_{[20ХХ]} + N_{[20ХХ + 1]})\) | Migration departure rate, equal to the number of departures in 20ХХ per thousand population in 20ХХ | ppm/year |
| 16 | \(K_{М[20ХХ]} = K_{П[20ХХ]} – K_{У[20ХХ]}\) | Migration increase (decrease) rate | ppm/year |
| 17 | \(K_{[20ХХ]} = K_{Е[20ХХ]} + K_{М[20ХХ]}\) | Population increase (decrease) rate | ppm/year |
Calculated values
| № | Symbol | Name | Unit |
|---|---|---|---|
| 18 | \(N(t)\) | Calculated population at time t | people |
| 19 | \(N`(t)=dN/dt\) | Rate of change in population | people/year |
| 20 | \(\nu_{B}(t)\) | Calculated birth rate coefficient at time t | 1/year |
| 21 | \(\nu_{D}(t)\) | Calculated mortality rate coefficient at time t | 1/year |
| 22 | \(B_{(20XX)}\) | Calculated number of births in 20ХХ | people/year |
| 23 | \(D_{(20XX)}\) | Calculated number of deaths in 20ХХ | people/year |
| 24 | \(E_{(20XX)} = B_{(20XX)} - D_{(20XX)}\) | Calculated natural increase (decrease) in 20ХХ | people/year |
| 25 | \(w_{I}(t)\) | Migration arrival rate at time t | people/year |
| 26 | \(w_{O}(t)\) | Migration departure rate at time t | people/year |
| 27 | \(I_{(20XX)}\) | Calculated number of arrivals in 20ХХ | people/year |
| 28 | \(O_{(20XX)}\) | Calculated number of departures in 20ХХ | people/year |
| 29 | \(M_{(20XX)} = I_{(20XX)} - O_{(20XX)}\) | Calculated migration increase (decrease) in 20ХХ | people/year |
| 30 | \(∆N_{(20XX)} = E_{(20XX)} + M_{(20XX)}\) | Calculated population increase (decrease) in 20ХХ | people/year |
Approximation errors
| № | Symbol | Name | Unit |
|---|---|---|---|
| 31 | \(\varepsilon_{N_{[20XX]}} = N_{(20XX)} - N_{[20XX]}\) | Absolute error in population as of January 1 of 20ХХ | people |
| 32 | \(\varepsilon_{B_{[20XX]}} = B_{(20XX)} - B_{[20XX]}\) | Absolute error in number of births in 20ХХ | people |
| 33 | \(\varepsilon_{D_{[20XX]}} = D_{(20XX)} - D_{[20XX]}\) | Absolute error in number of deaths in 20ХХ | people |
| 34 | \(\varepsilon_{E_{[20XX]}} = E_{(20XX)} - E_{[20XX]}\) | Absolute error in natural increase (decrease) in 20ХХ | people |
| 35 | \(\varepsilon_{I_{[20XX]}} = I_{(20XX)} - I_{[20XX]}\) | Absolute error in number of arrivals in 20ХХ | people |
| 36 | \(\varepsilon_{O_{[20XX]}} = O_{(20XX)} - O_{[20XX]}\) | Absolute error in number of departures in 20ХХ | people |
| 37 | \(\varepsilon_{M_{[20XX]}} = M_{(20XX)} - M_{[20XX]}\) | Absolute error in migration increase (decrease) in 20ХХ | people |
| 38 | \(\varepsilon_{∆N_{[20XX]}} = ∆N_{(20XX)} - ∆N_{[20XX]}\) | Absolute error in population increase (decrease) in 20ХХ | people |
Statistical estimates
| № | Symbol | Name | Unit |
|---|---|---|---|
| 39 | \(P\) | Confidence probability (taken as 90% in calculations) | % |
| 40 | \(z_{[P]}\) | Quantile corresponding to probability P, depending on the Student’s distribution | number |
| 41 | \(err(t)\) | Relative error | % |
Calculation results and forecasting
| № | Symbol | Name | Unit |
|---|---|---|---|
| 42 | \(N(t),\nu_{B}(t),\nu_{D}(t),B(t),D(t),…\) | Calculated dependencies of population (birth rate, mortality rate, number of births, deaths, etc.) on time | number |
| 43 | \(H(t)=N(t)e^{(z_{[P]}err(t))}\) | Upper bounds of the forecast with confidence probability P | number |
| 44 | \(L(t)=N(t)e^{(-z_{[P]}err(t))}\) | Lower bounds of the forecast with confidence probability P | number |
7.1.3 Used formulas and assumptions
In the most general form, the relation for population change over time \[\begin{align*} N`(t)=dN/dt, \end{align*}\] can be written as: \[\begin{align*} N`(t)=N(t)[\nu_{B}(N(t),t,S(N(t),t),Ext(t),…)-\nu_{D}(N(t),t,…)]+\\ +[w_{I}(N(t),t,S(N(t),t),Ext(t),…)-w_{O}(N(t),t,…)], (1) \end{align*}\] where \(N(t)\) – population size depending on time; birth rate \(\nu_{B}(…)\) and mortality \(\nu_{D}(…)\) coefficients, as well as migration arrival \(w_{I}(…)\) and departure \(w_{O}(…)\) rates depend on population \(N\), time \(t\), population “structure” \(S\) (age-sex, educational, national, etc.), and external conditions \(Ext\) (natural-climatic, inflation rate, dollar rate, oil price, employment, housing, epidemics, disasters).
Two levels of detailing are considered in the proposed model. Level I – for modeling and analyzing SED as a whole. Population size is calculated with subsequent division into three age groups: - younger than working age, - working age, - older than working age.
Equation (1) is reduced to (2): \[\begin{align*} N`(t)=N(t)[\nu_{B}(t)-\nu_{D}(t)]+[w_{I}(t)-w_{O}(t)], (2) \end{align*}\] where functions \(\nu_{B}(t)\), \(\nu_{D}(t)\), \(w_{I}(t)\), and \(w_{O}(t)\) include dependencies on “structure”, population size \(N\), time \(t\), and external conditions without detailing.
The main modeling problem is that the functions \(\nu_{B}(t)\), \(\nu_{D}(t)\), \(w_{I}(t)\), and \(w_{O}(t)\) are unknown, meaning the problem is “inverse”, which increases the solution complexity.
Level II – designed for detailed modeling and analysis of demographic system development. For Level II, equations (3) are built separately for each i-th age group with the addition of the “aging” process – transition from one age group to the next: \[\begin{align*} N`_{i}(t)=N_{i}(t)[\nu_{B_{i}}(t)-\nu_{D_{i}}(t)]+[w_{I_{i}}(t)-w_{O_{i}}(t)]+[u_{I_{i}}(t,S)-u_{O_{i}}(t,S)]. (3) \end{align*}\] where \(u_{I_{(i+1)}}(t,S)=u_{O_{i}}(t,S)\) are entrance/exit functions for the i-th age group. These functions are known and determined by the age-sex structure \(S\) and time \(t\).
7.1.4 Model Input Data
The input data for the Level I model are retrospective time series from FSSS (where available):
\(N_{[2000]}, N_{[2001]}, …, N_{[2023]}\) - Actual population as of January 1, people \(B_{[2000]}, B_{[2001]}, …, B_{[2022]}\) - Number of births in 20ХХ, people/year \(D_{[2000]}, D_{[2001]}, …, D_{[2022]}\) - Number of deaths in 20ХХ, people/year \(I_{[2000]}, I_{[2001]}, …, I_{[2022]}\) - Number of arrivals in 20ХХ, people/year \(O_{[2000]}, O_{[2001]}, …, O_{[2022]}\) - Number of departures in 20ХХ, people/year
7.1.5 Description of total calculation indicators
Total calculation indicators include parameters for each demographic indicator:
Year – the year characterizing the period; Period, start – the start date of the period; Period, end – the end date of the period; Fact – actual value from city statistics; Plan/Project – value of increase (decrease) caused by managed impact; Jump – binary parameter (TRUE if “Plan/Project” is entered); Calculation, average – the most expected forecast value; Calculation, upper bound – upper bound of the 90% corridor; Calculation, lower bound – lower bound of the 90% corridor.
7.1.6 Detailed description of the calculation algorithm
Step 1. Selection of functions \(\nu_{B}(t)\), \(\nu_{D}(t)\), \(w_{I}(t)\), and \(w_{O}(t)\). Four-parameter sigmoidal curves are used: \[\begin{align*} & \nu_{B}(t)=\frac{a_{B}+b_{B}(t⁄d_{B})^{c_{B}}}{1+(t⁄d_{B})^{c_{B}}}, & (4B) \\ & \nu_{D}(t)=\frac{a_{D}+b_{D}(t⁄d_{D})^{c_{D}}}{1+(t⁄d_{D})^{c_{D}}}, & (4D) \\ & w_{I}(t)=\frac{a_{I}+b_{I}(t⁄d_{I})^{c_{I}}}{1+(t⁄d_{I})^{c_{I}}}, & (4I) \\ & w_{O}(t)=\frac{a_{O}+b_{O}(t⁄d_{O})^{c_{O}}}{1+(t⁄d_{O})^{c_{O}}} & (4O) \end{align*}\] where a, b, c, d are parameters; a – value at t = 0 (January 1, 1999); b – limit value at t = \(\infty\); d – inflection date; c – determines the slope at the inflection point.
Step 2. Determination of initial parameter values using the least squares method.
Step 3.1. Solution of differential equation (2) for the retrospective period using the implicit Euler method. Equation (2) is replaced by: \[\begin{align*} &N(t_{j}+δt)(1-\frac{\nu_{B}(t_{j}+δt)δt}{2}+\frac{\nu_{D}(t_{j}+δt)δt}{2})=N(t_{j})(1+\frac{\nu_{B}(t_{j})δt}{2}-\frac{\nu_{D}(t_{j})δt}{2})+\\ &+δt(\frac{w_{I}(t_{j})+w_{I}(t_{j}+δt)}{2}-\frac{w_{O}(t_{j})+w_{O}(t_{j}+δt)}{2}). (6) \end{align*}\]
Step 3.2. Calculated values of other indicators (births, deaths, arrivals, departures) are determined.
Step 4. Error assessment. Sum of squared errors \(∑\varepsilon^{2}\) is calculated.
Step 5. Refinement of parameters a, b, c, d using the Newton method.
Step 6. Iteration of steps 3-5 until minimum error is reached.
Steps 7.1 and 7.2 Solution of equation (2) for the forecast period.
Step 8. Construction of upper and lower bounds: \[\begin{align*} &H(t)=N(t)e^{(z_{[P]}err(t))}, & (8) \\ &L(t)=N(t)e^{(-z_{[P]}err(t))}. & (9) \end{align*}\]
7.1.7 Scope of permissible application of the mathematical model
The model is suitable for population forecasting during natural and gradual development. It cannot predict sudden changes (wars, disasters) or administrative changes (territory expansion).
7.1.8 Accuracy assessment of mathematical models
Retrospective data shows high accuracy (up to 0.18% for population).
Population size, people
| City | Forecast for 2020* | Fact | Deviation |
|---|---|---|---|
| Voronezh | 1,057,863 | 1,058,261 | -0.04% |
| Krasnodar | 939,856 | 932,629 | 0.77% |
| Ufa | 1,128,730 | 1,128,787 | -0.01% |
| Arkhangelsk | 346,861 | 346,979 | -0.03% |
| Yuzhno-Sakhalinsk | 202,908 | 200,636 | 1.13% |
* forecast based on 2000–2019 data.