Skip to content
But is it really in Rome? Limitations of the ROME model editing technique
An incomplete list of projects I'd like to work on in 2023
(Linkpost) Results for a survey of tool use and workflows in alignment research
How learning efficiently applies to alignment research
Differential Training Process: Delaying capabilities until inner aligned