Current Projects

These are projects I'm actively working on.

Language models as tools for accelerating alignment (practical work)

The goal for this project is to start by using language models as tools to accelerate the work of other alignment researchers and then progressively try to use them to automate alignment research.

We started by creating a dataset of alignment-relevant texts. The dataset might end up being used for the following things:

  • Fine-tuning language models. This includes using models like GPT-NeoX or the newer OPT models and fine-tuning them on the entire dataset, then fine-tuning on a more specific task. Or, it could mean choosing specific tasks and fine-tuning GPT-3 models.
  • Semantic Search + Fine-tuned GPT models to quickly extract relevant information.
  • AI-Assisted Writing. Helping alignment researchers organize their thoughts faster. Write blog posts faster. Get criticism from GPT-Eliezer and GPT-Christiano.

Research into Language Models as Tools

As part of this research direction, I'm researching (as part of MATS) how to make language models better at tasks such as distillation in the context of AI Alignment.