April 6, 2021 in Origin-Destination Pairs
Worldwide Air Passenger Flows Estimation
SHARE: Share on PRINT ARTICLE:

In 2019, almost 4.4 billion air passengers were carried worldwide with a steady growth since 2002 and only a recent stall due to the COVID-19 crisis [1]. This number illustrates the importance of travel not only for the global economy providing thousands of jobs or developing infrastructure of countries, but also as a key enabler for cultural exchange between international citizens.
Knowing the global number of air passengers is an easy data point, but it is much more complex to get more actionable insights on this data, such as acquiring estimations about the origin and destinations of these millions of trips happening every day at city-pair level. This information is vital for planning purposes, not only for airlines but for all the actors who depend on this mode of transportation and air traffic flow.
Even though air traffic today has considerably decreased because of the COVID-19 crisis, these planning tool estimations remain highly relevant. It is very important for airlines to have access to relevant data to adapt their schedules to new demand. For example, some countries may be experiencing an increasing demand for domestic destinations while international demand is globally low, so the relevant question of airlines and other actors depending on travel include which destinations to serve and what is the demand at origin-destination city level?
Providing Data and Relevant Insights
To help answer those questions, travel technology company Amadeus provides data and relevant insights as part of its data and intelligence products [2]. In this context, Amadeus has worked on the problem of global traffic estimation for years and already offers products and services that provide this information (e.g., the application Amadeus Traffic Analytics).
The company helps different travel providers connect to the travel ecosystem by building critical solutions that help airlines and airports, hotels and railways, search engines, travel agencies, tour operators and other travel players, run their operations and improve the travel experience, billions of times a year, all over the world. Amadeus has a global presence by operating in 190+ markets, processing 600+ million travel bookings, and providing IT solutions that impacted 1.9+ billion passengers in 2019 [3].
In this new work, Amadeus joined forces with industrial practitioners and academics to research possible improvements of current models used for traffic analytics. Two improvement dimensions have been considered: 1) accuracy of the estimation and 2) the robustness of the estimations. In particular, the team focused the research on estimating the worldwide number of airline passengers by origin-destination (O-D) city pairs on a monthly basis. The number of passengers per flight could be estimated by statistical methods, but this approach does not allow directly deducing the number of travelers at O-D level because passengers flying in the same airplane may not share their initial origin or final destinations.
Estimating the number of passengers per O-D pair involves:
- Analyzing, over the time, the evolution of the demand for each O-D pair. An origin-destination route in growth may prompt an airline to open new services (i.e., flights) to serve at least one segment, and, conversely, close routes with low demand.
- Estimating traveler flows entering or leaving a given city, constituting a significant economic indicator. Again, it is possible to know the number of passengers arriving at a particular airport but not their origin, which is economically important.
- Anticipating the spread of infectious diseases such as Ebola and Zika (COVID-19 was not known at the time of this project). This information may be of interest to institutions like UNESCO
or WHO for planning and simulation purposes.
In addition to obtaining these socioeconomic indicators, a rapid resolution of the problem is a prerequisite for more advanced applications, including optimization of network airlines. The network optimization consists of evaluating the effect of the addition or removal of one or more flights on the profitability of different markets. A market is precisely defined by the number of passengers on an O-D route, while profitability is directly related to the aircraft load factor and passenger willingness to pay.
Deducting an O-D matrix from partial data on segments does not constitute a new issue by itself. The problem has been studied by many operations researchers, including Bierlaire [4, 5], who’s work gives a clear survey of the various existing approaches. However, it is a challenging task in the field of aviation. The size of the problem in the airline industry is absolutely huge and vastly superior to the inherent problems in other areas, making the resolution by traditional methods unattractive. Additionally, obtaining data on a global scale, covering all airlines and airports – whose number exceeds 3,000, constituting more than 10 million potential O-D pairs – is unavailable for most companies, except services providers who work with all airlines.
Solving the Problem
The problem involves obtaining data for the flow of passengers leaving each airport and arriving at each airport, an estimated number of passengers on each flight, lower bounds (limit below which the flight is canceled) and upper bounds (capacity of the plane) on the number of passengers that can be transported on the flight, the possible itineraries for each O-D pair and the probability of using them (again estimated by statistical methods), and the number of passengers for each O-D pair. The problem has been modeled as a mathematical programming problem for which different objective functions, corresponding to standard norms, have been considered (L1, L2 norms). The formulations with the L1 norm correspond to a linear programming model, and the L2 norm induces a convex quadratic program (constrained least square problem). Each formulation has been solved using an optimization software (CPLEX) on randomly generated instances, and on a real-life instance. The formulation with the L2 norm has also been solved with a Lagrangian relaxation approach.
In the case of randomly generated instances, for which the O-D matrix is known, statistical outputs measuring the deviation to the real O-D matrix, or to the estimated number of passengers in each leg, has been drawn. The ability to compute a matrix close to the real O-D matrix is not the single model requirement. A model estimating O-D matrices should also be “robust” on slight perturbations on the data. Since all data used by the models are statistically estimated, their values do not correspond exactly to what happens in practice. For example, the possible itineraries for each O-D pair and the probability of using one of them are statistical data that may deviate from the true values. A “good” model is thus not supposed to be dependent on these deviations. Finding two completely different O-D matrices, one corresponding to a certain probability of 0.1 of using an itinerary and the other to 0.15 for the same itinerary, is not a desirable effect.
The numerical results show that for randomly generated instances, in terms of O-D matrix qualities and processing times, the L2 norm formulation behaves surprisingly better than the linear one, but neither are robust relative to data perturbation. A heuristic method (called balanced algorithm) using the formulation L2 has been developed to ensure robustness. This method has been used to compute an O-D flow matrix from a real-life instance dealing with 234 airports, 5,315 legs (direct flights) and 200,410 itineraries. Partial information on O-D flows considered can be seen in Figure 1.

O-D flow estimation is a general problem in transportation science [4]. Our heuristic has been designed to generate some O-D solutions less sensitive to perturbations on itinerary probabilities or leg volume estimation. Each of these parameters may be found on other transport networks, by which our method can be used similarly, at least, in the road traffic context.
References
- https://data.worldbank.org/indicator/IS.AIR.PSGR
- https://amadeus.com/en/portfolio/airlines/traffic-analytics
- https://amadeus.com/en/about
- Michel Bierlaire and Ph. L. Toint. 1995, "MEUSE: An origin-destination matrix estimator that exploits structure," Transportation Research Part B: Methodological, Vol. 29, No. 1, pp. 47-60.
- Michel Bierlaire, 1996, "Mathematical models for transportation demand analysis, " Ph.D. thesis, Facultés Universitairs Notre-Dame de la paix de Namur.
Serigne Gueye is an associate professor (HDR) at Avignon University, Laboratoire d'Informatique d’Avignon (LIA) in France. Rodrigo Acuna-Agost is head of AI Research at Amadeus. Ezequiel Geremia is a senior data scientist at Median Technologies. Philippe Michelon is a professor at Avignon University, Laboratoire de Mathématiques Avignon (LMA) in France. Thiago Gouveia is a professor at Instituto Federal de Educaçao Ciência e Tecnologia da Paraiba in Brazil.