Computing the driver acceptance probability or … the DRAC of Beat

Published in

Beat Engineering Blog

6 min readFeb 16, 2022

Beat connects millions of passengers with nearby available drivers in real time. Matching is a core part of what makes the Beat app work. In this blog we describe how we use data and Machine Learning powered features to increase our drivers’ happiness.

Introduction

When you’re on the move, every moment matters. At Beat we use technology to match passengers and drivers in the real world efficiently and reliably. Matching a driver to a passenger’s ride request is a core task for any ride-hailing application. It relates directly with many business metrics that indicate the quality of the provided service such as passenger pickup time and driver/passenger cancellation rates. At Beat we put extra emphasis on this task, as we have a whole domain — named after the task, i.e. the Matching domain- that works specifically on deploying the best solutions to help both passengers and drivers, reducing wait times and maximizing earnings.

Matching based on pickup time

The existing solution consists of three major steps, namely the Retrieval step, the Filtering step and the Ranking step. Figure 1 provides a visual representation of these steps.

The Retrieval Step consists of identifying and retrieving all drivers that are positioned within a meaningful area around the passenger location. In other words, we retrieve all drivers that are spatially close to the passenger. This allows for the optimization of our systems without sacrificing service quality.

Next comes the Filtering Step. It consists of the application of a set of business rules that eliminate a number of drivers from further consideration. Examples of such rules are checking whether the driver is available or occupied in another ride, whether the service required by the passenger is provided by the driver or whether a fraudulent activity is likely to be performed between the driver and the passenger.

The previous steps generate a pool of drivers which are all eligible for taking the ride. The final step, namely the Ranking Step, is responsible for determining the most suitable driver in this pool to be assigned to the ride request. We use the pickup time, -the time required for the driver to reach the passenger-, as the ranking criterion of the Ranking Step. This implies that the driver that will get faster to the passenger ranks first in our ranking scheme. Using pickup time as a ranking criterion is intuitive from a business point of view as it guarantees low pickup times for passengers and high driver utilization.

Figure 1. Retrieval, filtering and ranking phase for solving the assignment task.

The Retrieval Step is necessary only for making the task easier computing-wise. In addition, the Filtering Step provides a set of business rules that results in strict inclusion or exclusion of drivers from the pool of eligible drivers for the request. The Ranking Step is based on a pickup ETA as its single criterion. Although this makes sense, one can argue that there are more factors that come into play when considering the best match between a driver and a passenger. For example, ride requests that involve routes that the driver drives frequently might make the request more appealing to him. The ride’s fare or the surge of the pick up point might be an additional factor in the driver’s decision.

Matching based on the DRAC Score

Matching domain’s ML team develops a Machine Learning model that takes into account various aspects that contribute to keeping the driver happy and engaged and outputs the probability associated with each outcome. Examples of aspects that were taken into consideration are the following:

the ride’s spatial characteristics, such as driver’s position, driver’s preference for particular routes, increased demand for rides on origin or destination
the ride’s temporal characteristics, such as time of day and day of week
the status of the driver, the driver’s requests fulfilment, inactivity time and driver’s stamina
other factors such as weather conditions for the particular ride or the driver

Based on the data above, a ML model that encodes driver’s preferences is trained. The model produces a score that quantifies how probable it is for the driver to accept the ride. We name this model DRAC, a concatenation of the terms driver and acceptance.

The DRAC model is trained on a schedule on Kubeflow, a popular ML platform based on Kubernetes. Figure 2 depicts the pipeline deployed. We retrieve data from Hive and we transform them into model features using PySpark. The feature set is the input of the model training component. After the model training is done the model’s performance characteristics are checked. If all checks pass, we store the trained model to S3. The Kubeflow process also creates a pull request to the Github repository that hosts the model that includes the serialized model and all required resources for the model to be tested again. Finally we notify all stakeholders regarding the model training through Slack messages.

Figure 2. DRAC’s Kubeflow training pipeline.

AB testing

Offline performance evaluation is a strong indication of the model’s quality. In the case of DRAC offline performance metrics demonstrated a very performant model. However, it is only after we integrate DRAC to Beat’s online matching flow that the model can demonstrate its contribution to Beat’s business KPIs. We have deployed several AB tests — using Beat’s internal AB testing platform — that evaluate the DRAC solution against the current version of the Matching algorithm that uses only the passenger pickup time in its ranking phase.

After measuring several internal business KPIs we observe that the rate of driver’s accepting ride requests has increased by 13.3% on average across the different markets. In addition, we need on average 15% less time to find a match for a ride request. Beat also benefits from an increased number of completed rides. However, there is also a negative side effect, as we observe that passengers tend to cancel their requests more frequently.

Conclusion

Matching drivers to ride requests is a critical function of every ride-hailing app. Over time, we’ve made our matching technology more aware of different factors to create a seamless pickup experience for both passengers and drivers. Beat’s Matching domain develops a Machine Learning approach in order to take into account a large number of factors when searching for the most suitable driver for a ride. AB testing the DRAC solution showcases significant improvements in almost all business KPIs examined.

DRAC however, provides the driver’s perspective with regard to matching. Applying the model may result in passengers denying the increased pick up time that the driver selected using DRAC provides. It is obvious that including the passenger’s perspective is an essential component of the matching scheme. Our future efforts will be centered not only around improving the DRAC model but also around a component that will express the passenger perspective on Matching.

If you found this article interesting and you are looking for your new challenge, check out our open roles.

About the authors

The Machine Learning team within Beat’s Matching domain is a cross craft team that includes Machine Learning Engineers and Data Engineers that operates across the domain and is responsible for providing Data Science and Machine Learning powered features. We work on the complete Data Science project life cycle from formalizing product requirements, to collecting and analyzing relevant data, to training ML models and putting them in production. We have already produced solutions on tasks such as efficiently matching drivers with passengers and estimating trip’s duration.

We are Akis Kontonasios, Vassilis Stathopoulos and Odysseas Bournas.