Logo for Anthropic
Research Engineer, Prompting
Anthropic
Posted over a month ago
Description

About the role:

Anthropic’s AI technology is amongst the most capable and safe in the world. However, large language models are a new type of intelligence, and the art of instructing and evaluating them in a way that delivers the best results is still in its infancy — it’s a hybrid between research, engineering, and behavioral science. We’re bringing rigor to this new discipline by applying a range of techniques to systematically discover and document prompting best practices, using our models to improve training and evaluation, developing prompt self-improvement techniques to automatically optimize the model’s performance on any given task, , and finding ways of making it easy for our customers to do the same.

Given that this is a nascent field, we ask that you share with us a specific prompting, model evaluation, synthetic data generation, model finetuning, or application built on LLMs that you're proud of in your application! Ideally this project should show off a complex and clever prompting architecture, or a systematic evaluation of an LLM's behavior in response to different prompts, or an example of using LLMs for a relevant ML task such as careful dataset curation and processing. There is no preferred task; we just want to see how you create and experiment with prompts. You can also include a short description of the process you used or any roadblocks you hit and how to deal with them, but this is not a requirement.

Responsibilities:

  • Develop automated prompting techniques for our models (eg extensions to the Metaprompter)
  • Finetune new capabilities into Claude that maximize Claude’s performance or ease of use given particular prompting innovations
  • Lead automated evaluation of Claude models and prompts across the training and product lifecycle
  • Help create and optimize data mixes for model training
  • Develop and systematically test new, creative, and original prompting strategies for a wide range of research tasks relevant to our fine-tuning and end product efforts.
  • Help to create and maintain the infrastructure required for efficient prompt iteration and testing.
  • Develop future Anthropic products built on top of Claude.
  • Stay up-to-date with the latest research in prompting and model orchestration, and share knowledge with the team.

You may be a good fit if you:

  • Have significant ML research or software engineering experience
  • Have at least a high level familiarity with the architecture and operation of large language models.
  • Have extensive prior experience exploring and testing language model behavior.
  • Have spent time prompting and/or building products with language models
  • Have good communication skills and an interest in working with other researchers on difficult prompting tasks.
  • Have a passion for making powerful technology safe and societally beneficial.
  • Stay up-to-date and informed by taking an active interest in emerging research and industry trends.
  • Enjoy pair programming (we love to pair!)
Strong candidates may also have:
  • Advanced degree in computer science, mathematics, statistics, physics, or a related technical field, or an advanced degree in a relevant non-technical field alongside evidence of programming experience.
  • Experience with large-scale model training and evaluation.
  • Language modeling with transformers
  • Reinforcement learning
  • Large-scale ETL

Representative projects:

  • Building the prompting and model orchestration for a production application backed by a language model
  • Finetuning Claude to maximize its performance when a particular prompting technique is used.
  • Building and testing an automatic prompt optimizer or automatic LLM-driven evaluation system for judging a prompt’s performance on a task.
  • Implementing a novel retrieval, tool use, sub-agent, or memory architecture for language models.
  • Building a scaled model evaluation framework driven by model-based evaluation techniques.

Deadline to apply: None. Applications will be reviewed on a rolling basis. 

More Similar Roles...

    Want more remote roles like this one sent to you?