Senior Site Reliability Engineer

David AI

David AI

Software Engineering

San Francisco, CA, USA

USD 160k-220k / year + Equity

Posted on May 30, 2026

Location

San Francisco

Employment Type

Full time

Location Type

On-site

Department

Engineering

Compensation

  • $160K – $220K • Offers Equity

About David AI

David AI is the first audio data research company. We bring an R&D approach to data–developing datasets with the same rigor AI labs bring to models. Our mission is to bring AI into the real world, and we believe audio is the gateway. Speech is versatile, accessible, and human—it fits naturally into everyday life. As audio AI advances and new use cases emerge, high-quality training data is the bottleneck. This is where David AI comes in.

David AI was founded in 2024 by a team of former Scale AI engineers and operators. In less than a year, we’ve brought on most FAANG companies and AI labs as customers. We recently raised a $50M Series B from Meritech, NVIDIA, Jack Altman (Alt Capital), Amplify Partners, First Round Capital and other Tier 1 investors.

Our team is sharp, humble, ambitious, and tight-knit. We’re looking for the best research, engineering, product, and operations minds to join us on our mission to push the frontier of audio AI.

About our Engineering team

At David AI, our engineers build the pipelines, platforms, and models that transform raw audio into high-signal data for leading AI labs and enterprises. We're a tight-knit team of product engineers, infrastructure specialists, and machine learning experts focused on building the world’s first audio data research company.

We move fast, own our work end-to-end, and ship to production daily. Our team designs real-time pipelines handling terabytes of speech data and deploys cutting-edge generative audio models.

About this role

As a Senior Site Reliability Engineer at David AI, you will shape and build the foundation for reliability, observability, and scalability across David AI's infrastructure. Working closely with our engineering and product teams, you’ll help ensure our systems are resilient, efficient, and designed to scale as the company grows.

In this role, you will

  • Own David AI’s observability stack, including monitoring, alerting, logging, and tracing, to provide engineers with clear visibility into system health, reliability, and performance.

  • Partner closely with product and platform engineering teams to design systems that are scalable, resilient, and reliable from day one, not as an afterthought.

  • Design and implement secure, scalable cloud infrastructure across AWS using Terraform and modern DevOps tooling to support rapid product and research iteration.

  • Lead improvements across deployment pipelines, CI/CD systems, and incident response processes to reduce downtime, improve operational efficiency, and strengthen engineering velocity.

  • Define and evolve the foundation of SRE practices at David AI, influencing reliability culture, tooling standards, operational excellence, and best practices across the engineering organization.

Your background looks like

  • 5+ years of experience in Site Reliability, Infrastructure, or Platform Engineering supporting large-scale SaaS or cloud systems.

  • Hands-on experience applying Security best practices in production systems and cloud infrastructure.

  • Strong experience building and running reliable, highly available, and scalable systems.

  • Hands-on experience with AWS, Terraform, containers (like Kubernetes), and cloud networking basics.

  • Experience implementing and maintaining observability tooling across monitoring, logging, alerting, and tracing (e.g., Prometheus, Grafana, Datadog, or similar).

  • Comfortable working in fast-paced teams and collaborating closely with product, ML, and engineering teams.

  • Bachelor’s degree in Computer Science or related field, or equivalent practical experience.

Bonus points if you have

  • Past experience in an early-stage startup environment, especially defining SRE culture and tooling from scratch.

  • Familiarity with incident management automation or self-healing infrastructure patterns.

Some technologies we work with

Next.js, TypeScript, TailwindCSS, Node.js, tRPC, PostgreSQL, AWS, Temporal, WebRTC, FFmpeg.

Benefits

  • Unlimited PTO.

  • Top-notch health, dental, and vision coverage with 100% coverage for most plans.

  • FSA & HSA access.

  • 401k access.

  • Meals 2x daily through DoorDash + snacks and beverages available at the office.

  • Unlimited company-sponsored Barry’s classes.

Compensation Range: $160K - $220K