AI Alignment Project Ideas (Oct 2, 2024)
I quickly wrote up some rough project ideas for ARENA and LASR participants, so I figured I'd share
How much I'm paying for AI productivity software (and the future of AI use)
This post is broken down into two parts:
1. Which AI productivity tools am I currently using?
2. Why does
The importance of Entropy
Imagine you're building a sandcastle on the beach. As you carefully shape the sand, you're creating
Accelerating AI Alignment Research (Talk)
I gave a keynote talk on how we should be thinking about accelerating AI alignment (safety) research. This is a
Using data attribution for AI alignment
This is a post on a recent paper I thought was cool. I give some follow-up project ideas after.
In-Run
Quantum Computing, Photonics, and Energy Bottlenecks for AGI
💡Note: I wrote this post in less than a day and didn't want to spend more time on
AI Insights #1: How Misalignment Could Lead to Takeover & Necessary Safety Properties
AI safety insights number 1: risks of misaligned AI takeover, key properties of AGI safety plans, and dangers of autonomous AI agents maximizing rewards in unintended ways as models advance.
My current research and request for collaborators
I wrote this as a bio for EAG Bay Area 2024. I'm sharing this here because it gives
But is it really in Rome? Limitations of the ROME model editing technique
I just published a new post on LessWrong. It's about the causal tracing and model editing paper (ROME)
An incomplete list of projects I'd like to work on in 2023
Wrote up a short (incomplete) bullet-point list of the projects I'd like to work on in 2023. Here&