Join the Teams Building What’s Next

Our portfolio companies are hiring. Explore your opportunities to join visionary teams across a range of sectors and growth stages
companies
Jobs

AI Safety Red Teaming Lead

ActiveFence

ActiveFence

Software Engineering, Data Science
United Kingdom
Posted on Dec 8, 2025

AI Safety Red Teaming Lead

  • Intelligence
  • UK
  • Management
  • Full-time

Description

ActiveFence is seeking an experienced and detail-oriented Red Teaming Team Lead to oversee complex research and delivery efforts focused on identifying and mitigating risks in Generative AI systems. In this role, you will lead a multidisciplinary team conducting adversarial testing, risk evaluations, and data-driven analyses that strengthen AI model safety and integrity.

You will be responsible for ensuring high-quality project delivery, from methodology design and execution to client communication and final approval of deliverables. This position combines hands-on red teaming expertise with operational leadership, strategic thinking, and client-facing collaboration.

Key Responsibilities

Operational and Quality Leadership

  • Oversee the production of datasets, reports, and analyses related to AI safety and red teaming activities.
  • Review and approve deliverables to ensure they meet quality, methodological, and ethical standards.
  • Deliver final outputs to clients following approval and provide actionable insights that address key risks and vulnerabilities.
  • Offer ongoing structured feedback on the quality of deliverables and the efficiency of team workflows, driving continuous improvement.

Methodology and Research Development

  • Design and refine red teaming methodologies for new Responsible AI projects.
  • Guide the development of adversarial testing strategies that target potential weaknesses in models across text, image, and multimodal systems.
  • Support research initiatives aimed at identifying and mitigating emerging risks in Generative AI applications.

Client Engagement and Collaboration

  • Attend client meetings to address broader methodological or operational questions.
  • Represent the red teaming function in cross-departmental collaboration with other ActiveFence teams.

Requirements

Must Have

  • Proven background in red teaming, AI safety research, or Responsible AI operations.
  • Demonstrated experience managing complex projects or teams in a technical or analytical environment.
  • Strong understanding of adversarial testing methods and model evaluation.
  • Excellent communication skills in English, both written and verbal.
  • Exceptional organizational ability and attention to detail, with experience balancing multiple priorities.
  • Confidence in client-facing environments, including presenting deliverables and addressing high-level questions.

Nice to Have

  • Advanced academic or research background in AI, computational social science, or information integrity.
  • Experience authoring or co-authoring publications, white papers, or reports in the fields of AI Safety, Responsible AI, or AI Ethics.
  • Engagement in professional or academic communities related to Responsible AI, trust and safety, or machine learning security.
  • Participation in industry or academic conferences.
  • Familiarity with developing or reviewing evaluation frameworks, benchmarking tools, or adversarial datasets for model safety testing.
  • Proven ability to mentor researchers and foster professional development within technical teams.
  • A proactive, research-driven mindset and a passion for ensuring safe, transparent, and ethical AI deployment.

About ActiveFence

ActiveFence is the leading provider of security and safety solutions for online experiences, safeguarding more than 3 billion users, top foundation models, and the world’s largest enterprises and tech platforms every day.

As a trusted ally to major technology firms and Fortune 500 brands that build user-generated and GenAI products, ActiveFence empowers security, AI, and policy teams with low-latency Real-Time Guardrails and a continuous Red Teaming program that pressure-tests systems with adversarial prompts and emerging threat techniques. Powered by deep threat intelligence, unmatched harmful-content detection, and coverage of 117+ languages, ActiveFence enables organizations to deliver engaging and trustworthy experiences at global scale while operating safely and responsibly across all threat landscapes.