But is it really in Rome? Limitations of the ROME model editing technique

2 min read

An incomplete list of projects I'd like to work on in 2023

1 min read

(Linkpost) Results for a survey of tool use and workflows in alignment research

1 min read

How learning efficiently applies to alignment research

2 min read

Differential Training Process: Delaying capabilities until inner aligned

3 min read