If we want to create a recommendation system for a service or platform we can approach the goal in many ways. Looking at the current trends, in particular the fashion for deep learning, we may be impressed that the use of only very complex algorithms will result in success. However, it is not so. We can apply recommender systems in scenarios where many users interact with many items and the system will help to suggest items that have been hosted by users that are similar.
The challenge with more complex approaches is that they can be sometimes difficult to tune and interpret. In other words, they can be very powerful but require a lot of knowledge to implement properly. Associations analysis from another side is relatively light on the math concepts and easy to explain to laypeople. In addition, it is an unsupervised learning tool that looks for hidden patterns so there is a limited need for data prep and feature engineering. It is a good start for certain cases of data exploration and can facilitate more insightful approaches to data.
A useful technique that we will show here as the first one is called association analysis, which attempts to find common patterns of items in large data sets. The important assumption is that we built an unsupervised rule based on GPI with 10 dig level and Drug Name level. We didn’t want to separate recommendations based on dose; for example, GPI for Lipitor Oral Tablet 10 MG.
The main idea in apriori rules is that we are looking for association rules between drugs, based on the basket (set of products that people buy). It is necessary to remember we are looking for users that buy more than one item. In the case of prescription users (SingleCare clients in this case), the number of drugs that they bought is not so big as in everyday situations in the supermarket. Therefore, we first chose only the customers that bought more than one drug. For simplicity, we created an artificial basket based on one year of user history. Based on the experiment and the results, we are choosing the customers who buy more than 4 drugs per year. Keep in mind that after preprocessing, we are creating rules from 3 important components: member_id, drug_name, count_of_drug (for each drug & client separately).
Important terminology:
Support is the relative frequency that the rules show up. In many instances, you may want to look for high support in order to make sure it is a useful relationship. However, there may be instances where low support is useful if you are trying to find “hidden” relationships, one of the measures of interest. These relationships describe the usefulness and certainty of the rules. 5% Support means 5% of transactions in the database follow the rule.
Confidence is a measure of the reliability of the rule. Confidence of 0.5 in the above example would mean that in 50% of the cases where amoxicillin and vitamin C were purchased, the purchase also included ibuprofen. For product recommendation, a 50% confidence may be perfectly acceptable but in a medical situation, this level may not be high enough. Confidence: A confidence of 60% means that 60% of the customers who purchased amoxicillin and vitamin C also bought ibuprofen.
Lift is the ratio of the observed support to that expected if the two rules were independent. The basic rule of thumb is that a lift value close to 1 means the rules were completely independent. Lift values > 1 are generally more “interesting” and could be indicative of a useful rule pattern.
The most interesting part of the algorithm is that rules have the direction where one drug is antecedents and the other is consequents. A patient who buys a cancer drug (antecedents) also buys a painkiller (consequents) indicates a strong rule. Inversely, it doesn’t work: not everyone suffering from pain has cancer. We are able to filter this kind of recommendation automatically by apriori algorithm parameters. Now based on the strongest rules (filtered by lift metrics), we are able to achieve some interesting results:
- more than 60% of the rules are related to popular drugs (first 50 drugs for all drugs available).
- more than 50% of the drugs that are recommended are cheaper for clients.
- over 40% of strong apriori rules recommended by apriori increase profitability for a company.