Imagine you want to implement demand-based pricing for a train service. Your goal is to encourage riders to use the train during non-peak times. So, you want a system that dynamically adjusts the pricing in real time to make it financially attractive for the riders. As such, you want a system that convinces the riders to consider riding when the trains are less crowded.

Such a system usually has different pricing strategies and tries to optimize two things. It tries to balance the ridership throughout the day, and it tries to increase the total revenue from the ridership. Now, you might ask, “Why not just use traditional mathematical rule-based optimization?” It turns out that these rule-based systems aren’t practical anymore because of the sheer complexity of these scenarios. So, the goal of the learning system is to reach a state of (1) spread-out ridership and (2) revenue that at least covers the costs.

Now, imagine that the system has already been trained. How do you check that this system optimizes these two factors based on some given input? For example, imagine that you would provide a test data set where a bunch of people take a train from some location to some other location at the same price and at the peak time (e.g., 8am). But that’s not what the railway agency wants.

The system is now asked to suggest different travel times and different prices to the riders in such a way that the ridership spreads out and that the total revenue at least covers the costs for the railway agency. This implies that the system will return a ridership and price distribution. For example, imagine that the system (based on the provided input) suggests that 60% of the people should take the train at the peak time (e.g., 8am) for the highest possible price, 25% should take the train at a non-peak time (e.g., 7am) at a lower price (e.g., at a 10% discount), and 15% should take the train at a non-peak time (e.g., 9am) at an even lower price (e.g., at a 15% discount). These are fake numbers; don’t confuse them with reality. Nevertheless, the question is how to check whether that’s the optimal solution. Well, you can’t. You simply can’t check whether that’s the best solution that can be achieved. You can only check whether the system performed the desired optimization based on your expectations.

Based on the way you’ve trained the system, you derive expectations about how the system should react on new input data. For example, you could define a range of the average ridership spread and a range for the average revenue you expect for new input data. Then you could compare these expectations to the result of the system. You don’t check for binary results (e.g., cats in images) anymore. It’s more like measuring results in statistical terms. You measure the statistical significance of the results to determine that the system doesn’t randomly distribute the ridership and adjust the prices. The system must do this according to the learned patterns. Although that’s fuzzy, it’s the only statement you can make at the end of your testing day.