Skip to content
Difficulties in Building an AI Safety Startup
Gaining clarity on Automated Alignment Research
Better model diffing is needed
Automating AI Safety: What we can do today
AI Alignment Project Ideas
How much I'm paying for AI productivity software (and the future of AI use)
The importance of Entropy
Accelerating AI Alignment Research (Talk)
Using data attribution for AI alignment
Quantum Computing, Photonics, and Energy Bottlenecks for AGI
AI Insights #1: How Misalignment Could Lead to Takeover & Necessary Safety Properties
My current research and request for collaborators
But is it really in Rome? Limitations of the ROME model editing technique
An incomplete list of projects I'd like to work on in 2023
(Linkpost) Results for a survey of tool use and workflows in alignment research
How learning efficiently applies to alignment research
Differential Training Process: Delaying capabilities until inner aligned
Near-Term AI capabilities probably bring low-hanging fruits for global poverty/health
Is the "Valley of Confused Abstractions" real?
Foresight for AGI Safety Strategy
Notes on Cicero
Detail about factual knowledge in Transformers
Current Thoughts on my Learning System
What does "Effective" in EA mean to you?
Helping organizations survive disasters (and potentially avoid them altogether)
I'll be in Berkeley for SERI MATS for the next 2 months
AI Alignment YouTube Playlists
A descriptive, not prescriptive, overview of current AI Alignment Research
A survey of tool use and workflows in alignment research
Interesting Applications of GPT-3: Elicit