<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>阿部 拳之</title><link>https://bakanaouji.github.io/ja/</link><description>Recent content on 阿部 拳之</description><generator>Hugo -- gohugo.io</generator><language>ja</language><copyright>© 2026 阿部 拳之</copyright><lastBuildDate>Tue, 07 Jul 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://bakanaouji.github.io/ja/index.xml" rel="self" type="application/rss+xml"/><item><title>Asymmetric Perturbation in Solving Bilinear Saddle-Point Optimization</title><link>https://bakanaouji.github.io/ja/publications/asymmetric-perturbation-bilinear-icml-2026/</link><pubDate>Tue, 07 Jul 2026 00:00:00 +0000</pubDate><guid>https://bakanaouji.github.io/ja/publications/asymmetric-perturbation-bilinear-icml-2026/</guid><description/></item><item><title>Time-Varyingness in Auction Breaks Revenue Equivalence</title><link>https://bakanaouji.github.io/ja/publications/time-varyingness-auction-aamas-2026/</link><pubDate>Wed, 27 May 2026 00:00:00 +0000</pubDate><guid>https://bakanaouji.github.io/ja/publications/time-varyingness-auction-aamas-2026/</guid><description/></item><item><title>Policy Testing in Markov Decision Processes</title><link>https://bakanaouji.github.io/ja/publications/policy-testing-mdp-aistats-2026/</link><pubDate>Sat, 02 May 2026 00:00:00 +0000</pubDate><guid>https://bakanaouji.github.io/ja/publications/policy-testing-mdp-aistats-2026/</guid><description/></item><item><title>Policy Testing in Markov Decision Processes</title><link>https://bakanaouji.github.io/ja/publications/policy-testing-mdp-neurips-2025/</link><pubDate>Sat, 06 Dec 2025 00:00:00 +0000</pubDate><guid>https://bakanaouji.github.io/ja/publications/policy-testing-mdp-neurips-2025/</guid><description/></item><item><title>Last Iterate Convergence in Monotone Mean Field Games</title><link>https://bakanaouji.github.io/ja/publications/monotone-mean-field-games-neurips-2025/</link><pubDate>Wed, 03 Dec 2025 00:00:00 +0000</pubDate><guid>https://bakanaouji.github.io/ja/publications/monotone-mean-field-games-neurips-2025/</guid><description/></item><item><title>Learning from Delayed Feedback in Games via Extra Prediction</title><link>https://bakanaouji.github.io/ja/publications/learning-from-delayed-feedback-neurips-2025/</link><pubDate>Wed, 03 Dec 2025 00:00:00 +0000</pubDate><guid>https://bakanaouji.github.io/ja/publications/learning-from-delayed-feedback-neurips-2025/</guid><description/></item><item><title>Asymmetric Perturbation in Solving Bilinear Saddle-Point Optimization</title><link>https://bakanaouji.github.io/ja/publications/asymmetric-perturbation-bilinear-ibis-2025/</link><pubDate>Wed, 12 Nov 2025 00:00:00 +0000</pubDate><guid>https://bakanaouji.github.io/ja/publications/asymmetric-perturbation-bilinear-ibis-2025/</guid><description/></item><item><title>Policy Testing in Markov Decision Processes</title><link>https://bakanaouji.github.io/ja/publications/policy-testing-mdp-ibis-2025/</link><pubDate>Wed, 12 Nov 2025 00:00:00 +0000</pubDate><guid>https://bakanaouji.github.io/ja/publications/policy-testing-mdp-ibis-2025/</guid><description/></item><item><title>Unified Convergence Guarantees for Learning with General Payoff Perturbations in Extensive-Form Games</title><link>https://bakanaouji.github.io/ja/publications/unified-convergence-guarantees-efg-ibis-2025/</link><pubDate>Wed, 12 Nov 2025 00:00:00 +0000</pubDate><guid>https://bakanaouji.github.io/ja/publications/unified-convergence-guarantees-efg-ibis-2025/</guid><description/></item><item><title>ゲームにおける時間遅れフィードバックからの学習</title><link>https://bakanaouji.github.io/ja/publications/learning-from-delayed-feedback-ibis-2025/</link><pubDate>Wed, 12 Nov 2025 00:00:00 +0000</pubDate><guid>https://bakanaouji.github.io/ja/publications/learning-from-delayed-feedback-ibis-2025/</guid><description/></item><item><title>オークション環境の時間変動による収入同値の破れ</title><link>https://bakanaouji.github.io/ja/publications/time-varyingness-auction-ibis-2025/</link><pubDate>Wed, 12 Nov 2025 00:00:00 +0000</pubDate><guid>https://bakanaouji.github.io/ja/publications/time-varyingness-auction-ibis-2025/</guid><description/></item><item><title>共通トレンドを考慮した加法報酬モデルに基づく非定常バンディットアルゴリズム</title><link>https://bakanaouji.github.io/ja/publications/additive-reward-non-stationary-bandit-ibis-2025/</link><pubDate>Wed, 12 Nov 2025 00:00:00 +0000</pubDate><guid>https://bakanaouji.github.io/ja/publications/additive-reward-non-stationary-bandit-ibis-2025/</guid><description/></item><item><title>不完全情報展開型ゲームの求解における利得摂動に関する研究</title><link>https://bakanaouji.github.io/ja/publications/perturbation-under-sampling-efg-fit-2025/</link><pubDate>Wed, 03 Sep 2025 00:00:00 +0000</pubDate><guid>https://bakanaouji.github.io/ja/publications/perturbation-under-sampling-efg-fit-2025/</guid><description/></item><item><title>Return-Aligned Decision Transformer</title><link>https://bakanaouji.github.io/ja/publications/return-aligned-decision-transformer-tmlr-2025/</link><pubDate>Sun, 08 Jun 2025 00:00:00 +0000</pubDate><guid>https://bakanaouji.github.io/ja/publications/return-aligned-decision-transformer-tmlr-2025/</guid><description/></item><item><title>周期的なゼロ和ゲームにおけるマルチエージェント学習</title><link>https://bakanaouji.github.io/ja/publications/synchronization-periodic-zero-sum-jsai-2025/</link><pubDate>Tue, 27 May 2025 00:00:00 +0000</pubDate><guid>https://bakanaouji.github.io/ja/publications/synchronization-periodic-zero-sum-jsai-2025/</guid><description/></item><item><title>日本語大規模言語モデルの自己学習によるアライメントの実験評価</title><link>https://bakanaouji.github.io/ja/publications/alignment-evaluation-llm-jsai-2025/</link><pubDate>Tue, 27 May 2025 00:00:00 +0000</pubDate><guid>https://bakanaouji.github.io/ja/publications/alignment-evaluation-llm-jsai-2025/</guid><description/></item><item><title>不完全情報展開型ゲームの求解における利得摂動に関する研究</title><link>https://bakanaouji.github.io/ja/publications/perturbation-under-sampling-efg-jsai-2025/</link><pubDate>Tue, 27 May 2025 00:00:00 +0000</pubDate><guid>https://bakanaouji.github.io/ja/publications/perturbation-under-sampling-efg-jsai-2025/</guid><description/></item><item><title>Global Behavior of Learning Dynamics in Zero-Sum Games with Memory Asymmetry</title><link>https://bakanaouji.github.io/ja/publications/global-behavior-zero-sum-games-aamas-2025/</link><pubDate>Wed, 21 May 2025 00:00:00 +0000</pubDate><guid>https://bakanaouji.github.io/ja/publications/global-behavior-zero-sum-games-aamas-2025/</guid><description/></item><item><title>Nash Equilibrium and Learning Dynamics in Three-Player Matching m-Action Games</title><link>https://bakanaouji.github.io/ja/publications/three-player-matching-games-aamas-2025/</link><pubDate>Wed, 21 May 2025 00:00:00 +0000</pubDate><guid>https://bakanaouji.github.io/ja/publications/three-player-matching-games-aamas-2025/</guid><description/></item><item><title>Regularized Best-of-N Sampling with Minimum Bayes Risk Objective for Language Model Alignment</title><link>https://bakanaouji.github.io/ja/publications/regularized-best-of-n-naacl-2025/</link><pubDate>Wed, 30 Apr 2025 00:00:00 +0000</pubDate><guid>https://bakanaouji.github.io/ja/publications/regularized-best-of-n-naacl-2025/</guid><description/></item><item><title>Boosting Perturbed Gradient Ascent for Last-Iterate Convergence in Games</title><link>https://bakanaouji.github.io/ja/publications/boosting-perturbed-gradient-ascent-iclr-2025/</link><pubDate>Thu, 24 Apr 2025 00:00:00 +0000</pubDate><guid>https://bakanaouji.github.io/ja/publications/boosting-perturbed-gradient-ascent-iclr-2025/</guid><description/></item><item><title>不完全情報展開型ゲームの求解における利得摂動に関する研究</title><link>https://bakanaouji.github.io/ja/publications/perturbation-under-sampling-efg-ipsj-2025/</link><pubDate>Thu, 13 Mar 2025 00:00:00 +0000</pubDate><guid>https://bakanaouji.github.io/ja/publications/perturbation-under-sampling-efg-ipsj-2025/</guid><description/></item><item><title>Efficient Creative Selection in Online Advertising using Top-Two Thompson Sampling</title><link>https://bakanaouji.github.io/ja/publications/creative-selection-online-advertising-wsdm-2025/</link><pubDate>Tue, 11 Mar 2025 00:00:00 +0000</pubDate><guid>https://bakanaouji.github.io/ja/publications/creative-selection-online-advertising-wsdm-2025/</guid><description/></item><item><title>大規模言語モデルのためのアライメントデータ合成手法の実験的評価</title><link>https://bakanaouji.github.io/ja/publications/alignment-evaluation-llm-nlp-2025/</link><pubDate>Mon, 10 Mar 2025 00:00:00 +0000</pubDate><guid>https://bakanaouji.github.io/ja/publications/alignment-evaluation-llm-nlp-2025/</guid><description/></item><item><title>Approximate State Abstraction for Markov Games</title><link>https://bakanaouji.github.io/ja/publications/state-abstraction-markov-games-aaai-2025/</link><pubDate>Thu, 27 Feb 2025 00:00:00 +0000</pubDate><guid>https://bakanaouji.github.io/ja/publications/state-abstraction-markov-games-aaai-2025/</guid><description/></item><item><title>Synchronization behind Learning in Periodic Zero-Sum Games Triggers Divergence from Nash equilibrium</title><link>https://bakanaouji.github.io/ja/publications/synchronization-periodic-zero-sum-aaai-2025/</link><pubDate>Thu, 27 Feb 2025 00:00:00 +0000</pubDate><guid>https://bakanaouji.github.io/ja/publications/synchronization-periodic-zero-sum-aaai-2025/</guid><description/></item><item><title>Evaluation of Best-of-N Sampling Strategies for Language Model Alignment</title><link>https://bakanaouji.github.io/ja/publications/evaluation-best-of-n-tmlr-2025/</link><pubDate>Sat, 15 Feb 2025 00:00:00 +0000</pubDate><guid>https://bakanaouji.github.io/ja/publications/evaluation-best-of-n-tmlr-2025/</guid><description/></item><item><title>The Power of Perturbation under Sampling in Solving Extensive-Form Games</title><link>https://bakanaouji.github.io/ja/publications/perturbation-under-sampling-efg-arxiv-2025/</link><pubDate>Tue, 28 Jan 2025 00:00:00 +0000</pubDate><guid>https://bakanaouji.github.io/ja/publications/perturbation-under-sampling-efg-arxiv-2025/</guid><description/></item><item><title>Filtered Direct Preference Optimization</title><link>https://bakanaouji.github.io/ja/publications/filtered-dpo-emnlp-2024/</link><pubDate>Tue, 12 Nov 2024 00:00:00 +0000</pubDate><guid>https://bakanaouji.github.io/ja/publications/filtered-dpo-emnlp-2024/</guid><description/></item><item><title>（不完全情報）展開型ゲームにおける零分散の利得摂動手法</title><link>https://bakanaouji.github.io/ja/publications/perturbation-under-sampling-efg-ibis-2024/</link><pubDate>Mon, 04 Nov 2024 00:00:00 +0000</pubDate><guid>https://bakanaouji.github.io/ja/publications/perturbation-under-sampling-efg-ibis-2024/</guid><description/></item><item><title>Evaluation of Best-of-N Sampling Strategies for Language Model Alignment</title><link>https://bakanaouji.github.io/ja/publications/evaluation-best-of-n-ibis-2024/</link><pubDate>Mon, 04 Nov 2024 00:00:00 +0000</pubDate><guid>https://bakanaouji.github.io/ja/publications/evaluation-best-of-n-ibis-2024/</guid><description/></item><item><title>Filtered Direct Preference Optimization: 選好データセットの質に基づくフィルタリング手法の提案</title><link>https://bakanaouji.github.io/ja/publications/filtered-dpo-ibis-2024/</link><pubDate>Mon, 04 Nov 2024 00:00:00 +0000</pubDate><guid>https://bakanaouji.github.io/ja/publications/filtered-dpo-ibis-2024/</guid><description/></item><item><title>Last Iterate Convergence in Monotone Mean Field Games</title><link>https://bakanaouji.github.io/ja/publications/monotone-mean-field-games-ibis-2024/</link><pubDate>Mon, 04 Nov 2024 00:00:00 +0000</pubDate><guid>https://bakanaouji.github.io/ja/publications/monotone-mean-field-games-ibis-2024/</guid><description/></item><item><title>Synchronization behind Learning in Periodic Zero-Sum Games Triggers Divergence from Nash equilibrium</title><link>https://bakanaouji.github.io/ja/publications/synchronization-periodic-zero-sum-ibis-2024/</link><pubDate>Mon, 04 Nov 2024 00:00:00 +0000</pubDate><guid>https://bakanaouji.github.io/ja/publications/synchronization-periodic-zero-sum-ibis-2024/</guid><description/></item><item><title>ベイズリスク選好最適化：報酬モデル不要のオンライン選好最適化手法</title><link>https://bakanaouji.github.io/ja/publications/bayes-risk-preference-optimization-ibis-2024/</link><pubDate>Mon, 04 Nov 2024 00:00:00 +0000</pubDate><guid>https://bakanaouji.github.io/ja/publications/bayes-risk-preference-optimization-ibis-2024/</guid><description/></item><item><title>マルコフ決定過程における良方策検定手法の提案</title><link>https://bakanaouji.github.io/ja/publications/policy-testing-mdp-ibis-2024/</link><pubDate>Mon, 04 Nov 2024 00:00:00 +0000</pubDate><guid>https://bakanaouji.github.io/ja/publications/policy-testing-mdp-ibis-2024/</guid><description/></item><item><title>二人零和マルコフゲームにおける状態抽象化に関する研究</title><link>https://bakanaouji.github.io/ja/publications/state-abstraction-markov-games-fit-2024/</link><pubDate>Wed, 04 Sep 2024 00:00:00 +0000</pubDate><guid>https://bakanaouji.github.io/ja/publications/state-abstraction-markov-games-fit-2024/</guid><description/></item><item><title>Policy Gradient Algorithms with Monte-Carlo Tree Search for Non-Markov Decision Processes</title><link>https://bakanaouji.github.io/ja/publications/policy-gradient-mcts-rlc-2024/</link><pubDate>Sat, 10 Aug 2024 00:00:00 +0000</pubDate><guid>https://bakanaouji.github.io/ja/publications/policy-gradient-mcts-rlc-2024/</guid><description/></item><item><title>Filtered Direct Preference Optimization</title><link>https://bakanaouji.github.io/ja/publications/filtered-dpo-icml-2024/</link><pubDate>Fri, 26 Jul 2024 00:00:00 +0000</pubDate><guid>https://bakanaouji.github.io/ja/publications/filtered-dpo-icml-2024/</guid><description/></item><item><title>Regularized Best-of-N Sampling to Mitigate Reward Hacking for Language Model Alignment</title><link>https://bakanaouji.github.io/ja/publications/regularized-best-of-n-icml-2024/</link><pubDate>Fri, 26 Jul 2024 00:00:00 +0000</pubDate><guid>https://bakanaouji.github.io/ja/publications/regularized-best-of-n-icml-2024/</guid><description/></item><item><title>Adaptively Perturbed Mirror Descent for Learning in Games</title><link>https://bakanaouji.github.io/ja/publications/adaptively-perturbed-mirror-descent-icml-2024/</link><pubDate>Tue, 23 Jul 2024 00:00:00 +0000</pubDate><guid>https://bakanaouji.github.io/ja/publications/adaptively-perturbed-mirror-descent-icml-2024/</guid><description/></item><item><title>Model-Based Minimum Bayes Risk Decoding</title><link>https://bakanaouji.github.io/ja/publications/model-based-mbr-icml-2024/</link><pubDate>Tue, 23 Jul 2024 00:00:00 +0000</pubDate><guid>https://bakanaouji.github.io/ja/publications/model-based-mbr-icml-2024/</guid><description/></item><item><title>RLHFにおける分布シフトの評価</title><link>https://bakanaouji.github.io/ja/publications/distribution-shift-evaluation-rlhf-jsai-2024/</link><pubDate>Tue, 28 May 2024 00:00:00 +0000</pubDate><guid>https://bakanaouji.github.io/ja/publications/distribution-shift-evaluation-rlhf-jsai-2024/</guid><description/></item><item><title>二人零和ゲームにおける突然変異駆動型正則化先導者追従法の終極反復収束</title><link>https://bakanaouji.github.io/ja/publications/mutation-driven-ftrl-ipsj-j-2024/</link><pubDate>Wed, 15 May 2024 00:00:00 +0000</pubDate><guid>https://bakanaouji.github.io/ja/publications/mutation-driven-ftrl-ipsj-j-2024/</guid><description/></item><item><title>Scalable and Provably Fair Exposure Control for Large-Scale Recommender Systems</title><link>https://bakanaouji.github.io/ja/publications/scalable-fair-exposure-control-www-2024/</link><pubDate>Tue, 14 May 2024 00:00:00 +0000</pubDate><guid>https://bakanaouji.github.io/ja/publications/scalable-fair-exposure-control-www-2024/</guid><description/></item><item><title>Learning Fair Division from Bandit Feedback</title><link>https://bakanaouji.github.io/ja/publications/learning-fair-division-aistats-2024/</link><pubDate>Thu, 02 May 2024 00:00:00 +0000</pubDate><guid>https://bakanaouji.github.io/ja/publications/learning-fair-division-aistats-2024/</guid><description/></item><item><title>研修医配属における地域間格差を調整する制約のモンテカルロ木探索</title><link>https://bakanaouji.github.io/ja/publications/medical-residency-match-ipsj-2024/</link><pubDate>Fri, 15 Mar 2024 00:00:00 +0000</pubDate><guid>https://bakanaouji.github.io/ja/publications/medical-residency-match-ipsj-2024/</guid><description/></item><item><title>二人零和マルコフゲームにおける状態抽象化法に関する研究</title><link>https://bakanaouji.github.io/ja/publications/state-abstraction-markov-games-ipsj-2024/</link><pubDate>Fri, 15 Mar 2024 00:00:00 +0000</pubDate><guid>https://bakanaouji.github.io/ja/publications/state-abstraction-markov-games-ipsj-2024/</guid><description/></item><item><title>Memory Asymmetry Creates Heteroclinic Orbits to Nash Equilibrium in Learning in Zero-Sum Games</title><link>https://bakanaouji.github.io/ja/publications/memory-asymmetry-heteroclinic-orbits-aaai-2024/</link><pubDate>Thu, 22 Feb 2024 00:00:00 +0000</pubDate><guid>https://bakanaouji.github.io/ja/publications/memory-asymmetry-heteroclinic-orbits-aaai-2024/</guid><description/></item><item><title>A Slingshot Approach to Learning in Monotone Games</title><link>https://bakanaouji.github.io/ja/publications/adaptively-perturbed-mirror-descent-ibis-2023/</link><pubDate>Sun, 29 Oct 2023 00:00:00 +0000</pubDate><guid>https://bakanaouji.github.io/ja/publications/adaptively-perturbed-mirror-descent-ibis-2023/</guid><description/></item><item><title>Learning in Multi-Memory Games Triggers Complex Dynamics Diverging from Nash Equilibrium</title><link>https://bakanaouji.github.io/ja/publications/multi-memory-games-ibis-2023/</link><pubDate>Sun, 29 Oct 2023 00:00:00 +0000</pubDate><guid>https://bakanaouji.github.io/ja/publications/multi-memory-games-ibis-2023/</guid><description/></item><item><title>Zero-Variance Perturbation Utility for Extensive-Form Games</title><link>https://bakanaouji.github.io/ja/publications/perturbation-under-sampling-efg-ibis-2023/</link><pubDate>Sun, 29 Oct 2023 00:00:00 +0000</pubDate><guid>https://bakanaouji.github.io/ja/publications/perturbation-under-sampling-efg-ibis-2023/</guid><description/></item><item><title>オンライン環境において公平な資源配分を実現するアルゴリズムに関する研究</title><link>https://bakanaouji.github.io/ja/publications/learning-fair-division-ibis-2023/</link><pubDate>Sun, 29 Oct 2023 00:00:00 +0000</pubDate><guid>https://bakanaouji.github.io/ja/publications/learning-fair-division-ibis-2023/</guid><description/></item><item><title>オンライン環境において公平な資源配分を実現するアルゴリズムに関する研究</title><link>https://bakanaouji.github.io/ja/publications/learning-fair-division-fit-2023/</link><pubDate>Wed, 06 Sep 2023 00:00:00 +0000</pubDate><guid>https://bakanaouji.github.io/ja/publications/learning-fair-division-fit-2023/</guid><description/></item><item><title>研修医配属における地域間格差を調整するための制約のモンテカルロ木探索</title><link>https://bakanaouji.github.io/ja/publications/medical-residency-match-fit-2023/</link><pubDate>Wed, 06 Sep 2023 00:00:00 +0000</pubDate><guid>https://bakanaouji.github.io/ja/publications/medical-residency-match-fit-2023/</guid><description/></item><item><title>Learning in Multi-Memory Games Triggers Complex Dynamics Diverging from Nash Equilibrium</title><link>https://bakanaouji.github.io/ja/publications/multi-memory-games-ijcai-2023/</link><pubDate>Tue, 22 Aug 2023 00:00:00 +0000</pubDate><guid>https://bakanaouji.github.io/ja/publications/multi-memory-games-ijcai-2023/</guid><description/></item><item><title>Exploration of Unranked Items in Safe Online Learning to Re-Rank</title><link>https://bakanaouji.github.io/ja/publications/safe-online-learning-to-rerank-sigir-2023/</link><pubDate>Mon, 24 Jul 2023 00:00:00 +0000</pubDate><guid>https://bakanaouji.github.io/ja/publications/safe-online-learning-to-rerank-sigir-2023/</guid><description/></item><item><title>Why Guided Dialog Policy Learning performs well? Understanding the role of adversarial learning and its alternative</title><link>https://bakanaouji.github.io/ja/publications/guided-dialog-adversarial-arxiv-2023/</link><pubDate>Thu, 13 Jul 2023 00:00:00 +0000</pubDate><guid>https://bakanaouji.github.io/ja/publications/guided-dialog-adversarial-arxiv-2023/</guid><description/></item><item><title>オンライン環境において公平な資源配分を実現するアルゴリズムに関する研究</title><link>https://bakanaouji.github.io/ja/publications/learning-fair-division-jsai-2023/</link><pubDate>Tue, 06 Jun 2023 00:00:00 +0000</pubDate><guid>https://bakanaouji.github.io/ja/publications/learning-fair-division-jsai-2023/</guid><description/></item><item><title>二人零和展開型ゲームにおける突然変異付き乗算型重み更新に関する研究</title><link>https://bakanaouji.github.io/ja/publications/mutation-mwu-efg-jsai-2023/</link><pubDate>Tue, 06 Jun 2023 00:00:00 +0000</pubDate><guid>https://bakanaouji.github.io/ja/publications/mutation-mwu-efg-jsai-2023/</guid><description/></item><item><title>Last-Iterate Convergence with Full and Noisy Feedback in Two-Player Zero-Sum Games</title><link>https://bakanaouji.github.io/ja/publications/last-iterate-full-noisy-feedback-aistats-2023/</link><pubDate>Tue, 25 Apr 2023 00:00:00 +0000</pubDate><guid>https://bakanaouji.github.io/ja/publications/last-iterate-full-noisy-feedback-aistats-2023/</guid><description/></item><item><title>タスク指向対話システムの方策学習への Decision Transformerの適用</title><link>https://bakanaouji.github.io/ja/publications/decision-transformer-dialogue-nlp-2023/</link><pubDate>Mon, 13 Mar 2023 00:00:00 +0000</pubDate><guid>https://bakanaouji.github.io/ja/publications/decision-transformer-dialogue-nlp-2023/</guid><description/></item><item><title>タスク指向対話における強化学習を用いた対話方策学習への敵対的学習の役割の解明</title><link>https://bakanaouji.github.io/ja/publications/guided-dialog-adversarial-nlp-2023/</link><pubDate>Mon, 13 Mar 2023 00:00:00 +0000</pubDate><guid>https://bakanaouji.github.io/ja/publications/guided-dialog-adversarial-nlp-2023/</guid><description/></item><item><title>オンライン環境において公平な資源配分を実現するアルゴリズムに関する研究</title><link>https://bakanaouji.github.io/ja/publications/learning-fair-division-ipsj-2023/</link><pubDate>Thu, 02 Mar 2023 00:00:00 +0000</pubDate><guid>https://bakanaouji.github.io/ja/publications/learning-fair-division-ipsj-2023/</guid><description/></item><item><title>研修医配属における地域間格差を調整するための制約のモンテカルロ木探索</title><link>https://bakanaouji.github.io/ja/publications/medical-residency-match-ipsj-2023/</link><pubDate>Thu, 02 Mar 2023 00:00:00 +0000</pubDate><guid>https://bakanaouji.github.io/ja/publications/medical-residency-match-ipsj-2023/</guid><description/></item><item><title>Last-Iterate Convergence with Full- and Noisy-Information Feedback in Two-Player Zero-Sum Games</title><link>https://bakanaouji.github.io/ja/publications/last-iterate-full-noisy-feedback-ibis-2022/</link><pubDate>Sun, 20 Nov 2022 00:00:00 +0000</pubDate><guid>https://bakanaouji.github.io/ja/publications/last-iterate-full-noisy-feedback-ibis-2022/</guid><description/></item><item><title>Thresholded Lasso Bandit</title><link>https://bakanaouji.github.io/ja/publications/thresholded-lasso-bandit-ibis-2022/</link><pubDate>Sun, 20 Nov 2022 00:00:00 +0000</pubDate><guid>https://bakanaouji.github.io/ja/publications/thresholded-lasso-bandit-ibis-2022/</guid><description/></item><item><title>ビームサーチ推論のための強化学習</title><link>https://bakanaouji.github.io/ja/publications/policy-gradient-mcts-ibis-2022/</link><pubDate>Sun, 20 Nov 2022 00:00:00 +0000</pubDate><guid>https://bakanaouji.github.io/ja/publications/policy-gradient-mcts-ibis-2022/</guid><description/></item><item><title>公平性を考慮した大規模推薦システム</title><link>https://bakanaouji.github.io/ja/publications/scalable-fair-exposure-control-ibis-2022/</link><pubDate>Sun, 20 Nov 2022 00:00:00 +0000</pubDate><guid>https://bakanaouji.github.io/ja/publications/scalable-fair-exposure-control-ibis-2022/</guid><description/></item><item><title>Fair Matrix Factorisation for Large-Scale Recommender Systems</title><link>https://bakanaouji.github.io/ja/publications/scalable-fair-exposure-control-recsys-2022/</link><pubDate>Fri, 23 Sep 2022 00:00:00 +0000</pubDate><guid>https://bakanaouji.github.io/ja/publications/scalable-fair-exposure-control-recsys-2022/</guid><description/></item><item><title>二人零和ゲームにおける突然変異駆動型Follow-The-Regularized-Leaderの終極反復収束</title><link>https://bakanaouji.github.io/ja/publications/mutation-driven-ftrl-fit-2022/</link><pubDate>Tue, 13 Sep 2022 00:00:00 +0000</pubDate><guid>https://bakanaouji.github.io/ja/publications/mutation-driven-ftrl-fit-2022/</guid><description/></item><item><title>Mutation-Driven Follow the Regularized Leader for Last-Iterate Convergence in Zero-Sum Games</title><link>https://bakanaouji.github.io/ja/publications/mutation-driven-ftrl-uai-2022/</link><pubDate>Tue, 02 Aug 2022 00:00:00 +0000</pubDate><guid>https://bakanaouji.github.io/ja/publications/mutation-driven-ftrl-uai-2022/</guid><description/></item><item><title>Anytime Capacity Expansion in Medical Residency Match by Monte Carlo Tree Search</title><link>https://bakanaouji.github.io/ja/publications/medical-residency-match-ijcai-2022/</link><pubDate>Tue, 26 Jul 2022 00:00:00 +0000</pubDate><guid>https://bakanaouji.github.io/ja/publications/medical-residency-match-ijcai-2022/</guid><description/></item><item><title>Thresholded LASSO Bandit</title><link>https://bakanaouji.github.io/ja/publications/thresholded-lasso-bandit-icml-2022/</link><pubDate>Tue, 19 Jul 2022 00:00:00 +0000</pubDate><guid>https://bakanaouji.github.io/ja/publications/thresholded-lasso-bandit-icml-2022/</guid><description/></item><item><title>二人零和ゲームにおける突然変異付きレプリケータダイナミクスを用いた学習アルゴリズムに関する研究</title><link>https://bakanaouji.github.io/ja/publications/mutation-driven-ftrl-jsai-2022/</link><pubDate>Tue, 14 Jun 2022 00:00:00 +0000</pubDate><guid>https://bakanaouji.github.io/ja/publications/mutation-driven-ftrl-jsai-2022/</guid><description/></item><item><title>クールノー競争におけるマルチエージェント強化学習に関する研究</title><link>https://bakanaouji.github.io/ja/publications/multi-agent-rl-cournot-competition-ipsj-2022/</link><pubDate>Thu, 03 Mar 2022 00:00:00 +0000</pubDate><guid>https://bakanaouji.github.io/ja/publications/multi-agent-rl-cournot-competition-ipsj-2022/</guid><description/></item><item><title>二人零和ゲームにおける突然変異付きレプリケータダイナミクスを用いた学習アルゴリズムに関する研究</title><link>https://bakanaouji.github.io/ja/publications/mutation-driven-ftrl-ipsj-2022/</link><pubDate>Thu, 03 Mar 2022 00:00:00 +0000</pubDate><guid>https://bakanaouji.github.io/ja/publications/mutation-driven-ftrl-ipsj-2022/</guid><description/></item><item><title>Computing Strategies of American Football via Counterfactual Regret Minimization</title><link>https://bakanaouji.github.io/ja/publications/american-football-cfr-aaai-2022/</link><pubDate>Mon, 28 Feb 2022 00:00:00 +0000</pubDate><guid>https://bakanaouji.github.io/ja/publications/american-football-cfr-aaai-2022/</guid><description/></item><item><title>Direct Expected Quadratic Utility Maximization for Mean-Variance Controlled Reinforcement Learning</title><link>https://bakanaouji.github.io/ja/publications/direct-quadratic-utility-maximization-neurips-2021/</link><pubDate>Mon, 13 Dec 2021 00:00:00 +0000</pubDate><guid>https://bakanaouji.github.io/ja/publications/direct-quadratic-utility-maximization-neurips-2021/</guid><description/></item><item><title>見間違えのある繰り返しゲームのためのActor-Critic型強化学習</title><link>https://bakanaouji.github.io/ja/publications/misperception-repeated-games-ibis-2021/</link><pubDate>Wed, 10 Nov 2021 00:00:00 +0000</pubDate><guid>https://bakanaouji.github.io/ja/publications/misperception-repeated-games-ibis-2021/</guid><description/></item><item><title>見間違えのある繰り返しゲームのためのActor-Critic型強化学習</title><link>https://bakanaouji.github.io/ja/publications/misperception-repeated-games-orsj-2021/</link><pubDate>Thu, 16 Sep 2021 00:00:00 +0000</pubDate><guid>https://bakanaouji.github.io/ja/publications/misperception-repeated-games-orsj-2021/</guid><description/></item><item><title>見間違えのある繰り返し囚人のジレンマにおける方策勾配法に関する研究</title><link>https://bakanaouji.github.io/ja/publications/misperception-repeated-games-fit-2021/</link><pubDate>Wed, 25 Aug 2021 00:00:00 +0000</pubDate><guid>https://bakanaouji.github.io/ja/publications/misperception-repeated-games-fit-2021/</guid><description/></item><item><title>反実仮想後悔最小化によるアメリカンフットボールにおけるオフェンス戦略の均衡推定</title><link>https://bakanaouji.github.io/ja/publications/american-football-cfr-fit-2021/</link><pubDate>Wed, 25 Aug 2021 00:00:00 +0000</pubDate><guid>https://bakanaouji.github.io/ja/publications/american-football-cfr-fit-2021/</guid><description/></item><item><title>Off-Policy Exploitability-Evaluation in Two-Player Zero-Sum Markov Games</title><link>https://bakanaouji.github.io/ja/publications/off-policy-exploitability-evaluation-aamas-2021/</link><pubDate>Wed, 05 May 2021 00:00:00 +0000</pubDate><guid>https://bakanaouji.github.io/ja/publications/off-policy-exploitability-evaluation-aamas-2021/</guid><description/></item><item><title>二人零和マルコフゲームにおけるオフ方策評価のためのQ学習</title><link>https://bakanaouji.github.io/ja/publications/off-policy-q-learning-markov-games-gpw-2020/</link><pubDate>Sat, 14 Nov 2020 00:00:00 +0000</pubDate><guid>https://bakanaouji.github.io/ja/publications/off-policy-q-learning-markov-games-gpw-2020/</guid><description/></item><item><title>A Practical Guide of Off-Policy Evaluation for Bandit Problems</title><link>https://bakanaouji.github.io/ja/publications/off-policy-evaluation-bandits-guide-arxiv-2020/</link><pubDate>Fri, 23 Oct 2020 00:00:00 +0000</pubDate><guid>https://bakanaouji.github.io/ja/publications/off-policy-evaluation-bandits-guide-arxiv-2020/</guid><description/></item><item><title>Online Learning for Bidding Agent in First Price Auction</title><link>https://bakanaouji.github.io/ja/publications/bidding-agent-first-price-auction-aaai-2020/</link><pubDate>Sat, 08 Feb 2020 00:00:00 +0000</pubDate><guid>https://bakanaouji.github.io/ja/publications/bidding-agent-first-price-auction-aaai-2020/</guid><description/></item><item><title>花札におけるナッシュ均衡戦略の計算</title><link>https://bakanaouji.github.io/ja/publications/nash-equilibrium-strategy-hanafuda-ibis-2019/</link><pubDate>Wed, 20 Nov 2019 00:00:00 +0000</pubDate><guid>https://bakanaouji.github.io/ja/publications/nash-equilibrium-strategy-hanafuda-ibis-2019/</guid><description/></item><item><title>A Simple Heuristic for Bayesian Optimization with A Low Budget</title><link>https://bakanaouji.github.io/ja/publications/bayesian-optimization-low-budget-arxiv-2019/</link><pubDate>Mon, 18 Nov 2019 00:00:00 +0000</pubDate><guid>https://bakanaouji.github.io/ja/publications/bayesian-optimization-low-budget-arxiv-2019/</guid><description/></item><item><title>Black-box最適化に対するBudgetを考慮した探索空間の初期化</title><link>https://bakanaouji.github.io/ja/publications/bayesian-optimization-low-budget-jsai-2019/</link><pubDate>Tue, 04 Jun 2019 00:00:00 +0000</pubDate><guid>https://bakanaouji.github.io/ja/publications/bayesian-optimization-low-budget-jsai-2019/</guid><description/></item><item><title>非定常多腕バンディットアルゴリズムを用いたハイパーパラメータ最適化フレームワークの提案</title><link>https://bakanaouji.github.io/ja/publications/non-stationary-bandit-hpo-ibis-2018/</link><pubDate>Sun, 04 Nov 2018 00:00:00 +0000</pubDate><guid>https://bakanaouji.github.io/ja/publications/non-stationary-bandit-hpo-ibis-2018/</guid><description/></item><item><title>活用と探索の釣り合いを考慮した事例ベース政策最適化</title><link>https://bakanaouji.github.io/ja/publications/exemplar-policy-optimization-ee-jpnsec-2017/</link><pubDate>Mon, 13 Mar 2017 00:00:00 +0000</pubDate><guid>https://bakanaouji.github.io/ja/publications/exemplar-policy-optimization-ee-jpnsec-2017/</guid><description/></item><item><title>多峰性景観下での自然進化戦略による事例ベース政策最適化</title><link>https://bakanaouji.github.io/ja/publications/exemplar-policy-optimization-multimodal-ssi-2016/</link><pubDate>Tue, 06 Dec 2016 00:00:00 +0000</pubDate><guid>https://bakanaouji.github.io/ja/publications/exemplar-policy-optimization-multimodal-ssi-2016/</guid><description/></item><item><title>自然進化戦略を用いた事例ベース政策最適化</title><link>https://bakanaouji.github.io/ja/publications/exemplar-policy-optimization-nes-sice-se-2016/</link><pubDate>Mon, 07 Mar 2016 00:00:00 +0000</pubDate><guid>https://bakanaouji.github.io/ja/publications/exemplar-policy-optimization-nes-sice-se-2016/</guid><description/></item><item><title>Bandits and Online Learning</title><link>https://bakanaouji.github.io/ja/research/bandits-online-learning/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://bakanaouji.github.io/ja/research/bandits-online-learning/</guid><description>オンライン環境で意思決定をしながら効率的に学習するには？</description></item><item><title>Fairness in Recommender Systems and Allocation</title><link>https://bakanaouji.github.io/ja/research/fairness-recsys-allocation/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://bakanaouji.github.io/ja/research/fairness-recsys-allocation/</guid><description>限られた資源や機会を公平に配分するには？</description></item><item><title>Language Model Alignment and Preference Optimization</title><link>https://bakanaouji.github.io/ja/research/language-model-alignment/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://bakanaouji.github.io/ja/research/language-model-alignment/</guid><description>言語モデルの出力を人間の選好にどう整合させるか？</description></item><item><title>Learning Dynamics and Equilibrium Computation in Games</title><link>https://bakanaouji.github.io/ja/research/learning-dynamics-equilibrium-games/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://bakanaouji.github.io/ja/research/learning-dynamics-equilibrium-games/</guid><description>ナッシュ均衡へ高速に収束する学習アルゴリズムとは？</description></item><item><title>Reinforcement Learning and Sequential Decision Making</title><link>https://bakanaouji.github.io/ja/research/reinforcement-learning-sequential-decision/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://bakanaouji.github.io/ja/research/reinforcement-learning-sequential-decision/</guid><description>逐次的な意思決定において、方策をどう改善・評価するか？</description></item></channel></rss>