Incident Management (SRE class implements DevOps)

Incident Management (SRE class implements DevOps)

HomeGoogle Cloud TechIncident Management (SRE class implements DevOps)
Incident Management (SRE class implements DevOps)
ChannelPublish DateThumbnail & View CountDownload Video
Channel AvatarPublish Date not found Thumbnail
0 Views
In the previous video, Liz and Seth discussed how to make systems observable and how observability helps us diagnose failing systems. However, they didn't cover what to do when an incident is beyond the capabilities of a single person. In this video, you'll learn about the most important part of the incident management process – people.

In the stressful moments of a system failure, it's important to define clear, precise roles for everyone involved in an incident. Too few people can quickly become overwhelmed, but too many people can result in duplicate work (i.e. too many hands on the keyboard). Learn how SREs effectively manage incidents with clearly defined roles and responsibilities such as operations manager, planning manager, communications manager, logistics manager, and more. Seth and Liz also discuss techniques for managing lengthy and exponentially complex incidents.

Contact Liz and Seth:
https://twitter.com/lizthegrey
https://twitter.com/sethvargo

You can watch more episodes of the playlist here → http://bit.ly/2PPL6f0

Subscribe to the Google Cloud Platform channel for more cloud content → http://bit.ly/GCloudPlatform

Please take the opportunity to connect with your friends and family and share this video with them if you find it useful.