Description

About the role:

Anthropic’s AI technology is amongst the most capable and safe in the world. However, large language models are a new type of intelligence, and the art of instructing and evaluating them in a way that delivers the best results is still in its infancy — it’s a hybrid between research, engineering, and behavioral science. We’re bringing rigor to this new discipline by applying a range of techniques to systematically discover and document prompting best practices, using our models to improve training and evaluation, developing prompt self-improvement techniques to automatically optimize the model’s performance on any given task, , and finding ways of making it easy for our customers to do the same.

Given that this is a nascent field, we ask that you share with us a specific prompting, model evaluation, synthetic data generation, model finetuning, or application built on LLMs that you're proud of in your application! Ideally this project should show off a complex and clever prompting architecture, or a systematic evaluation of an LLM's behavior in response to different prompts, or an example of using LLMs for a relevant ML task such as careful dataset curation and processing. There is no preferred task; we just want to see how you create and experiment with prompts. You can also include a short description of the process you used or any roadblocks you hit and how to deal with them, but this is not a requirement.

Responsibilities:

Develop automated prompting techniques for our models (eg extensions to the Metaprompter)
Finetune new capabilities into Claude that maximize Claude’s performance or ease of use given particular prompting innovations
Lead automated evaluation of Claude models and prompts across the training and product lifecycle
Help create and optimize data mixes for model training
Develop and systematically test new, creative, and original prompting strategies for a wide range of research tasks relevant to our fine-tuning and end product efforts.
Help to create and maintain the infrastructure required for efficient prompt iteration and testing.
Develop future Anthropic products built on top of Claude.
Stay up-to-date with the latest research in prompting and model orchestration, and share knowledge with the team.

You may be a good fit if you:

Have significant ML research or software engineering experience
Have at least a high level familiarity with the architecture and operation of large language models.
Have extensive prior experience exploring and testing language model behavior.
Have spent time prompting and/or building products with language models
Have good communication skills and an interest in working with other researchers on difficult prompting tasks.
Have a passion for making powerful technology safe and societally beneficial.
Stay up-to-date and informed by taking an active interest in emerging research and industry trends.
Enjoy pair programming (we love to pair!)

Strong candidates may also have:

Advanced degree in computer science, mathematics, statistics, physics, or a related technical field, or an advanced degree in a relevant non-technical field alongside evidence of programming experience.
Experience with large-scale model training and evaluation.
Language modeling with transformers
Reinforcement learning
Large-scale ETL

Representative projects:

Building the prompting and model orchestration for a production application backed by a language model
Finetuning Claude to maximize its performance when a particular prompting technique is used.
Building and testing an automatic prompt optimizer or automatic LLM-driven evaluation system for judging a prompt’s performance on a task.
Implementing a novel retrieval, tool use, sub-agent, or memory architecture for language models.
Building a scaled model evaluation framework driven by model-based evaluation techniques.

Deadline to apply: None. Applications will be reviewed on a rolling basis.

Remote Scouter

About the role:

Responsibilities:

You may be a good fit if you:

Representative projects:

More Similar Roles...

Want more remote roles like this one sent to you?