## The Google Approach to Service Management Software engineers perform sysadmin functions What is SRE? It’s what happens when you ask a software engineer to design an operations team. Composition: - 50-60% Google software engineers - 40-50% candidates close to Google software engineer qualifications with strong sysadmin skills (i.e. UNIX system internals and networking) Want a team of people that: 1. Get bored of doing things by hand 2. Can write software to automate things by hand Focus on constant engineering: - without it, operations load increases, more staff needed to keep up - ops group will grow linearly with service size, then with traffic Google places 50% cap on ops work for all SREs (tickets, manual tasks, etc.) to ensure time is spent writing software keeping the service stable and operable Cap is an upper bound, ops work should become self-sustainable and SREs should almost entirely work on development tasks Google has found SRE approach works well for running large-scale systems. Goal is to make Google’s systems run themselves, so teams invite: - rapid innovation - large acceptance of change Number of SREs scales sublinearly with system size Easy transfers between dev and ops teams due to shared skillsets Difficulties: limited talent pool, SRE and development candidates overlap [[DevOps or SRE?]]