Windows Updates Master Runbook v2 (Consolidated)

Document Control

Source package: screenshots + workbook (CriticalAppsSQCluster.xlsx / CriticalAppsSQLCluster tracker context)
Revision note: 07/02/2026 reviewed version to include GIS new servers/procedures and apply changes after decommissioned infrastructure.
Scope: Windows Updates lifecycle for SQL/SCOM/AlwaysOn/Failover clusters, including batch operations and SCCM Software Center procedures.

1) General Process Information

Objective

Standardize Windows update execution across standalone servers, failover clusters, and SQL Always On environments with controlled failover, restart, and validation.

Core Principles

Track progress in the approved tracker (e.g., CriticalAppsSOCluster.xlsx).
Start with systems where synchronization delay is longest (SCOM/SQL primary paths).
Validate before and after each role move/failover/restart.
Execute only in approved change windows.

High-Level Flow

Check update status and readiness.
Move servers to the correct batch (if needed).
Apply updates (interactive or scheduled).
Perform failover/failback procedures (where applicable).
Validate synchronization, services, and evidence.

2) Check Update Status

Script Method

On designated admin host (example in source: dfs-asufs-51), use:

C:\temp\CheckUpdateStatus1.ps1

Steps

Open PowerShell as Administrator.
Run script against current batch list.
If checking another list, edit server list variable (line 2 in source note).
Review results for:
- pending updates
- pending reboot
- install state
Save outputs (updates.csv + evidence screenshots).

Notes

Run before and after change.
Treat pending reboot state as blocker unless explicitly planned.

3) Calendar / Change Window

Before Change

Confirm approved window and stakeholder communications.
Confirm business/service owner approval.
Confirm dependency windows (jobs, backups, integrations).

During Change

Timestamp each major action.
Validate after each failover/move.
Escalate quickly if sync exceeds expected threshold.

After Change

Record completion, validation evidence, and deviations.

4) Move Batches

Purpose

When servers are in wrong Software Center batch or missing updates.

Procedure

Send request to operations team (example: SGITTHybridCloudOperations).
Include: server list, current batch, target batch, reason, evidence.
Copy affected support owners.
Link request/ticket ID in change tracker.

Batch Mapping Example (Reference)

Batch 5A – Secondary nodes: C75403
Batch 5A – Primary nodes: C75404
Batch 5B – Secondary nodes: C75406
Batch 5B – Primary nodes: C75407

Use this pattern to separate primary/secondary update waves.

5) Microsoft System Center (Software Center)

Option 1 — Interactive Install

Launch Software Center.
In Available Software, select only entries with Type = Update.
Click Install Selected.
Monitor status: Waiting to install → Installing → Finished/Requires restart.
If prompted, click Restart (or reboot per policy).
After reboot, verify host/services health.

Option 2 — Scheduled Install

Select all required updates (Type = Update).
Click Schedule Selected.
In scheduling prompt, set:
- Install outside business hours
- Restart automatically after installation if needed
Click Change software installation settings.
In Options → Work Information:
- define business hours correctly,
- set valid workdays,
- confirm non-working install window.
Click Apply, then OK.
Confirm updates show as Scheduled to install after ....
Validate completion after schedule window.

Notes

Scheduled mode preferred for production.
Wrong business-hours config can cause off-window installs.

6) SCOM Servers During Change

Why first

Old primary DB can take long to sync after failover (up to ~2 hours).

Steps

Verify updates installed correctly.
Verify Always On synchronization healthy.
Perform failover.
When new primary synchronized, restart old primary (no need to wait full old-primary DB recovery first).
Continue remaining servers; periodically check old primary sync progress.
Perform failback when all DBs synchronized (if needed).
Validate final primary sync.

Notes from source

Operation Manager SQL: failover to counterpart in same site.
Datawarehouse SQL: only one node per site, failover between corresponding nodes.
Cosmos config note:
- INST_01: DFS-COSSQL-51A, DFS-COSSQL-51C
- INST_02: DFS-COSSQL-51B, DFS-COSSQL-51G

7) Always On (Example: VAVM1327 / VAVM3773)

These servers may require manual process (not fully covered by update job in global domain).

Procedure

Open Failover Cluster Manager.
- Check roles, topology, shared disk expectations.
Open SSMS.
- Ensure no critical jobs running.
- Confirm AO-related jobs behavior.
In Always On High Availability:
- Availability Groups → Dashboard
- verify replicas green
- run failover wizard to synchronous counterpart
Refresh dashboard and verify synchronization.
If dashboard shows reverting/transient state:
- do not force restart immediately
- validate role state and primary ownership first
Restart server only after role/sync validation.

Failback to Original Primary

Log onto original primary.
Check pending updates (allow settle period).
In SSMS/AG, connect to current primary and verify no jobs running.
Run failback wizard to original primary.
Validate synchronization and database online status.

8) Failover Clusters (Shared Disks)

Example nodes in source

DFS-GALASQL-03A
DFS-CPCFSQL-51A

Procedure

Confirm updates installed on all relevant nodes (including pending restart state if expected).
In Failover Cluster Manager, identify active node.
Verify disk mount ownership on active node.
In SSMS, confirm jobs not running and no immediate conflict schedules.
Move SQL roles to target node:
- Roles → Move → Select Node
Verify role transition and disk remount on target node.
Confirm SQL service and DB availability.
Restart as required from Software Center and re-validate.

9) Post-Change Validation Checklist

[ ] Updates installed successfully
[ ] Pending reboot cleared
[ ] Key services healthy
[ ] SQL jobs normal
[ ] AO/cluster synchronization green
[ ] Databases online and available
[ ] Evidence attached in ticket/tracker

10) Troubleshooting Notes

Long sync after first failover can be normal on high-transaction DBs.
If role status is inconsistent, verify current primary and jobs before restart.
If updates missing, validate batch assignment + policy refresh.
Re-run update status script after each significant step.

11) Suggested AI Wiki Structure

operations/windows-updates/windows-updates-master-runbook-v2.md (this file)
operations/windows-updates/check-update-status.md
operations/windows-updates/move-batches.md
operations/windows-updates/software-center-scheduled-install.md
operations/windows-updates/alwayson-failover-quick-checklist.md

12) Scheduling Split + CM Preparation (Friday/Saturday)

Note

Batches are split into:
- Friday (inside business hours)
- Saturday (outside business hours)

Steps to follow

Create a CM for servers highlighted for Thursday/Friday before automatic restart.
Create a second CM for Saturday for the remaining servers.
GIS and OCHA application servers require a special graceful shutdown procedure before restart.

Referenced procedure docs

GIS ArcGIS VMs_V5
OCHA OneGMS v2

13) Automatic Launch via SQL Jobs on ARS

CM timing note

Create corresponding CM on Monday of the same week in which Windows Updates will be applied, to avoid tickets remaining open too long.

Execution model

Windows Updates installation is triggered automatically through SQL Jobs on ARS.

Job hosts by domain

DPKO domain host: DFS-ARSPROLIST
- ASU_DBA - Install Windows Updates Batch 5A
- ASU_DBA - Install Windows Updates Batch 5B
GLOBAL domain host: ARS-M-LIS-001.global.un.org
- ASU_DBA - Install Windows Updates Batch 5A GLOBAL
- ASU_DBA - Install Windows Updates Batch 5B GLOBAL

File location (both servers)

E:\Projects\_wu

Referenced files

Install_Windows_Updates_Batch_5A.ps1
Install_Windows_Updates_Batch_5B.ps1
updates5A.txt
updates5B.txt
Windows Update PowerShell script.txt