Bandits and Online Learning

How can agents learn efficiently while making decisions online?

This theme studies algorithms that learn from feedback obtained through online decision making. I focus on achieving stable learning under partial feedback and in environments that change over time.

Key questions
#

How can agents learn efficiently from partial feedback?
How can stable learning be achieved in environments that change over time?

Related Publications

AAMAS 2026 (Extended abstract)

Time-Varyingness in Auction Breaks Revenue Equivalence

Yuma Fujimoto, Kaito Ariu, Kenshi Abe

Theme:Bandits & Online Learning

arXiv

WSDM 2025 (Industry day talks)

Efficient Creative Selection in Online Advertising using Top-Two Thompson Sampling

Daiki Katsuragawa, Yusuke Kaneko, Kaito Ariu, Kenshi Abe

Theme:Bandits & Online Learning

Paper

SIGIR 2023 (Short Paper)

Exploration of Unranked Items in Safe Online Learning to Re-Rank

Hiroaki Shiino, Kaito Ariu, Kenshi Abe, Togashi Riku

Theme:Bandits & Online Learning

arXiv

ICML 2022

Thresholded LASSO Bandit

Kaito Ariu, Kenshi Abe, Alexandre Proutière

Theme:Bandits & Online Learning

arXiv

A Practical Guide of Off-Policy Evaluation for Bandit Problems

Masahiro Kato, Kenshi Abe, Kaito Ariu, Shota Yasui

Theme:Bandits & Online Learning

arXiv

AAAI 2020 Workshop on Reinforcement Learning in Games

Online Learning for Bidding Agent in First Price Auction

Gota Morishita, Kenshi Abe, Kazuhisa Ogawa, Yusuke Kaneko

Theme:Bandits & Online Learning

Paper

↑

Key questions#

Related Publications

Time-Varyingness in Auction Breaks Revenue Equivalence

Efficient Creative Selection in Online Advertising using Top-Two Thompson Sampling

Exploration of Unranked Items in Safe Online Learning to Re-Rank

Thresholded LASSO Bandit

A Practical Guide of Off-Policy Evaluation for Bandit Problems

Online Learning for Bidding Agent in First Price Auction

Key questions
#