Hokkaido University Research Profiles

Japanese
Information and Communication

Recommendation Techniques Using the Bandit Method

Online learning technology that maximizes cumulative gain while acquiring knowledge

We are researching a recommendation method that maximizes the user's cumulative satisfaction, not only by recommending items that the user may prefer (use of knowledge), but also items that may provide more information about the user's preferences (acquisition of knowledge) in a balanced manner.

Content of research

In today's internet society, recommendation technology, if it works well, can benefit both the provider and the receiver of the service. A recommendation service is not a one-time event, but an iterative process with feedback each time, and the feedback only concerns the items that are recommended. Therefore, to increase the accuracy of subsequent recommendations, it is not only important to recommend items that the user is likely to like based on the feedback history (knowledge utilization), but also items from which the user is likely to acquire more information (knowledge acquisition). The Bandit method attempts to maximize user satisfaction by balancing the use and acquisition of knowledge. We are developing a recommendation system using this method.

  • Results of simulation experiments using artificial data.
    Setting to send recommendation mails by selecting 50 pairs per round from 1000 x1000 (user, item) pairs
    The proposed method (UCBVB, UCBPMF) has a higher cumulative average evaluation value after 100 rounds.

Potential for social implementation

  • ・Recommendations
  • ・Sending direct mail
  • ・Ad delivery

Appealing points to industry and local governments

It is a method for maximizing the cumulative value of evaluation of choices in a problem of the (bandit) setting, whereby feedback can only be obtained for the choices made after repeated selections from a set. It can be used for more than just recommendations.

2022/5/27Released