Rewards-in-Context: Multi-objective Alignment of Foundation Models with
Dynamic Preference Adjustment
Rewards-in-Context: Multi-objective Alignment of Foundation Models with
Dynamic Preference Adjustment
We consider the problem of multi-objective alignment of foundation models with human preferences, which is a critical step towards helpful and harmless AI systems. However, it is generally costly and unstable to fine-tune large foundation models using reinforcement learning (RL), and the multi-dimensionality, heterogeneity, and conflicting nature of human preferences …