Document Type
|
:
|
BL
|
Record Number
|
:
|
851436
|
Title & Author
|
:
|
The site reliability workbook : : practical ways to implement SRE /\ edited by Betsy Beyer [and 4 others].
|
Publication Statement
|
:
|
Sebastopol, CA :: O'Reilly Media :: O'Reilly Media,, 2018.
|
Page. NO
|
:
|
1 online resource
|
ISBN
|
:
|
1492029459
|
|
:
|
: 1492029475
|
|
:
|
: 9781492029458
|
|
:
|
: 9781492029472
|
|
:
|
1492029505
|
|
:
|
9781492029502
|
Bibliographies/Indexes
|
:
|
Includes bibliographical references and index.
|
Contents
|
:
|
How SRE relates to DevOps -- Foundations. Implementing SLOs -- SLO engineering case studies -- Alerting on SLOs -- Eliminating toil -- Simplicity -- Practices. On-call -- Incident response -- Postmortem culture: learning from failure -- Managing load -- Introducing non-abstract large system design -- Data processing pipelines -- Configuration design and best practices -- Configuration specifics -- Canarying releases -- Processes. Identifying and recovering from overload -- SRE engagement model -- SRE: reaching beyond your walls -- SRE team lifecycles -- Organizational change management in SRE -- A. Example SLO document -- B. Example error budget policy -- C. Results of postmortem analysis.
|
|
:
|
Intro; Copyright; Table of Contents; Foreword I; Foreword II; Preface; Conventions Used in This Book; Using Code Examples; O'Reilly Safari; How to Contact Us; Acknowledgments; Chapter 1. How SRE Relates to DevOps; Background on DevOps; No More Silos; Accidents Are Normal; Change Should Be Gradual; Tooling and Culture Are Interrelated; Measurement Is Crucial; Background on SRE; Operations Is a Software Problem; Manage by Service Level Objectives (SLOs); Work to Minimize Toil; Automate This Year's Job Away; Move Fast by Reducing the Cost of Failure; Share Ownership with Developers
|
|
:
|
Evernote's SLO StoryWhy Did Evernote Adopt the SRE Model?; Introduction of SLOs: A Journey in Progress; Breaking Down the SLO Wall Between Customer and Cloud Provider; Current State; The Home Depot's SLO Story; The SLO Culture Project; Our First Set of SLOs; Evangelizing SLOs; Automating VALET Data Collection; The Proliferation of SLOs; Applying VALET to Batch Applications; Using VALET in Testing; Future Aspirations; Summary; Conclusion; Chapter 4. Monitoring; Desirable Features of a Monitoring Strategy; Speed; Calculations; Interfaces; Alerts; Sources of Monitoring Data; Examples
|
|
:
|
Managing Your Monitoring SystemTreat Your Configuration as Code; Encourage Consistency; Prefer Loose Coupling; Metrics with Purpose; Intended Changes; Dependencies; Saturation; Status of Served Traffic; Implementing Purposeful Metrics; Testing Alerting Logic; Conclusion; Chapter 5. Alerting on SLOs; Alerting Considerations; Ways to Alert on Significant Events; 1: Target Error Rate ≥ SLO Threshold; 2: Increased Alert Window; 3: Incrementing Alert Duration; 4: Alert on Burn Rate; 5: Multiple Burn Rate Alerts; 6: Multiwindow, Multi-Burn-Rate Alerts; Low-Traffic Services and Error Budget Alerting
|
|
:
|
Moving from SLI Specification to SLI ImplementationMeasuring the SLIs; Using the SLIs to Calculate Starter SLOs; Choosing an Appropriate Time Window; Getting Stakeholder Agreement; Establishing an Error Budget Policy; Documenting the SLO and Error Budget Policy; Dashboards and Reports; Continuous Improvement of SLO Targets; Improving the Quality of Your SLO; Decision Making Using SLOs and Error Budgets; Advanced Topics; Modeling User Journeys; Grading Interaction Importance; Modeling Dependencies; Experimenting with Relaxing Your SLOs; Conclusion; Chapter 3. SLO Engineering Case Studies
|
|
:
|
Use the Same Tooling, Regardless of Function or Job TitleCompare and Contrast; Organizational Context and Fostering Successful Adoption; Narrow, Rigid Incentives Narrow Your Success; It's Better to Fix It Yourself; Don't Blame Someone Else; Consider Reliability Work as a Specialized Role; When Can Substitute for Whether; Strive for Parity of Esteem: Career and Financial; Conclusion; Part I. Foundations; Chapter 2. Implementing SLOs; Why SREs Need SLOs; Getting Started; Reliability Targets and Error Budgets; What to Measure: Using SLIs; A Worked Example
|
Abstract
|
:
|
An expansion on the understanding of Google SRE, providing 'worked examples' for each essential facet of this area of IT prepared in co-operation with Google cloud customers based on their experiences. Instructs on methodology for running services at scale and starting SRE in greenfield or brownfield fashion.
|
Subject
|
:
|
Computer engineering.
|
Subject
|
:
|
Information technology-- Management.
|
Subject
|
:
|
Reliability (Engineering)
|
Subject
|
:
|
BUSINESS ECONOMICS-- Industrial Management.
|
Subject
|
:
|
BUSINESS ECONOMICS-- Management Science.
|
Subject
|
:
|
BUSINESS ECONOMICS-- Management.
|
Subject
|
:
|
BUSINESS ECONOMICS-- Organizational Behavior.
|
Subject
|
:
|
Computer engineering.
|
Subject
|
:
|
Information technology-- Management.
|
Subject
|
:
|
Reliability (Engineering)
|
Dewey Classification
|
:
|
658.4038
|
LC Classification
|
:
|
T58.64
|
Added Entry
|
:
|
Beyer, Betsy
|
|
:
|
Kawahara, Kent.
|
|
:
|
Murphy, Niall Richard.
|
|
:
|
Rensin, David K.
|
|
:
|
Thorne, Stephen.
|