In contrast to rule-based systems, learning systems have a very ambitious goal. The vision of AI research, which turns out to be more a hope than a concrete vision, is to implement general AI through the learning capability of these systems. Hence, the hope is that a learning system is in principle unlimited in its ability to simulate intelligence. It’s said to have adaptive intelligence. The ability to learn causes adaptive intelligence, and adaptive intelligence means that existing knowledge can be changed or discarded, and new knowledge can be acquired. Hence, these systems build the rules on the fly. That is what makes learning systems so different from rule-based testing. A neural network is an instance of a learning system.

**Bottom Line.** Rule-based systems rely on explicitly stated and static models of a domain. Learning systems create their own models.

This sounds like learning systems do some black magic. They don’t. Let’s have a look at the following analogy by Kasper Fredenslund to demystify learning systems by figuring out what we actually mean by learning.

The difference between rule-based systems and learning systems just boils down to who (e.g., computer system, human being) does the learning. For example, imagine some human real estate agents forming a group. The name of this group is a rule-based system.

This group then tries to predict house prices based on some given information about the houses (e.g., location, build year, size). Furthermore, imagine that the agents already know the prices of some example houses upfront based on the location, build year, and size of each house. This means that the agents have already been trained. The agents then try to predict the price of new houses based on the given information. To do that, the agents would most probably start building a model; e.g., imagine they developed a model in the form of a simple linear equation:

*price=location⋅w*_{1}+build year⋅w_{2}+size⋅w_{3}

We can easily imagine that the agents would then use a combination of their intuition and their experience (knowledge base) to come up with the values for the weights to approximate the price of a house they have never seen before. By doing this for more and more houses, the agents would get more and more data, and so the agents would then (most probably) start tweaking the model itself (e.g., the form of the equation) or the parameters (weights) of the model to minimize the difference between the predicted price and the actual price. So, the key to learning is feedback. It is nearly impossible to learn anything without it (Steve Levitt). That’s (in rough terms) how rule-based systems work.

In contrast, a learning system says, “Screw the agents; we don’t need human beings!” A learning system (in a nutshell) just automates the process of tweaking the weights to minimize the difference between the actual price and the predicted price, just like the human agents would do. As such, the difference between the actual price and the predicted price is what is known as the utility function (cost function) of learning systems.

The goal of a learning system is to minimize that function, and the system does this by tweaking the weights in such a way that the function is minimized. Hence, learning just means finding the right weights to minimize the utility function. In learning systems (e.g., deep neural networks), the process of optimizing (minimizing, maximizing) the utility function is called backpropagation, and this is achieved through traditional optimization techniques.

These optimization techniques (e.g., gradient descent, stochastic gradient descent) are indeed rule-based techniques because these techniques just compute the gradient that is needed to adjust the weights and biases in a neural network to optimize its utility function. How all this is done in practice varies greatly (e.g., supervised and unsupervised learning), but this example isn’t too far from reality.

Bottom Line. Optimization is at the heart of learning systems. Learning systems heavily rely on rule-based techniques (e.g., mathematical optimization).