Windows Updates Master Runbook v2 (Consolidated)
Document Control
- Source package: screenshots + workbook (
CriticalAppsSQCluster.xlsx/ CriticalAppsSQLCluster tracker context) - Revision note:
07/02/2026 reviewed version to include GIS new servers/procedures and apply changes after decommissioned infrastructure. - Scope: Windows Updates lifecycle for SQL/SCOM/AlwaysOn/Failover clusters, including batch operations and SCCM Software Center procedures.
1) General Process Information
Objective
Standardize Windows update execution across standalone servers, failover clusters, and SQL Always On environments with controlled failover, restart, and validation.
Core Principles
- Track progress in the approved tracker (e.g.,
CriticalAppsSOCluster.xlsx). - Start with systems where synchronization delay is longest (SCOM/SQL primary paths).
- Validate before and after each role move/failover/restart.
- Execute only in approved change windows.
High-Level Flow
- Check update status and readiness.
- Move servers to the correct batch (if needed).
- Apply updates (interactive or scheduled).
- Perform failover/failback procedures (where applicable).
- Validate synchronization, services, and evidence.
2) Check Update Status
Script Method
On designated admin host (example in source: dfs-asufs-51), use:
C:\temp\CheckUpdateStatus1.ps1
Steps
- Open PowerShell as Administrator.
- Run script against current batch list.
- If checking another list, edit server list variable (line 2 in source note).
- Review results for:
- pending updates
- pending reboot
- install state
- Save outputs (
updates.csv+ evidence screenshots).
Notes
- Run before and after change.
- Treat pending reboot state as blocker unless explicitly planned.
3) Calendar / Change Window
Before Change
- Confirm approved window and stakeholder communications.
- Confirm business/service owner approval.
- Confirm dependency windows (jobs, backups, integrations).
During Change
- Timestamp each major action.
- Validate after each failover/move.
- Escalate quickly if sync exceeds expected threshold.
After Change
- Record completion, validation evidence, and deviations.
4) Move Batches
Purpose
When servers are in wrong Software Center batch or missing updates.
Procedure
- Send request to operations team (example:
SGITTHybridCloudOperations). - Include: server list, current batch, target batch, reason, evidence.
- Copy affected support owners.
- Link request/ticket ID in change tracker.
Batch Mapping Example (Reference)
- Batch 5A – Secondary nodes:
C75403 - Batch 5A – Primary nodes:
C75404 - Batch 5B – Secondary nodes:
C75406 - Batch 5B – Primary nodes:
C75407
Use this pattern to separate primary/secondary update waves.
5) Microsoft System Center (Software Center)
Option 1 — Interactive Install
- Launch Software Center.
- In Available Software, select only entries with Type = Update.
- Click Install Selected.
- Monitor status:
Waiting to install→Installing→Finished/Requires restart. - If prompted, click Restart (or reboot per policy).
- After reboot, verify host/services health.
Option 2 — Scheduled Install
- Select all required updates (Type = Update).
- Click Schedule Selected.
- In scheduling prompt, set:
- Install outside business hours
- Restart automatically after installation if needed
- Click Change software installation settings.
- In Options → Work Information:
- define business hours correctly,
- set valid workdays,
- confirm non-working install window.
- Click Apply, then OK.
- Confirm updates show as
Scheduled to install after .... - Validate completion after schedule window.
Notes
- Scheduled mode preferred for production.
- Wrong business-hours config can cause off-window installs.
6) SCOM Servers During Change
Why first
Old primary DB can take long to sync after failover (up to ~2 hours).
Steps
- Verify updates installed correctly.
- Verify Always On synchronization healthy.
- Perform failover.
- When new primary synchronized, restart old primary (no need to wait full old-primary DB recovery first).
- Continue remaining servers; periodically check old primary sync progress.
- Perform failback when all DBs synchronized (if needed).
- Validate final primary sync.
Notes from source
- Operation Manager SQL: failover to counterpart in same site.
- Datawarehouse SQL: only one node per site, failover between corresponding nodes.
- Cosmos config note:
INST_01: DFS-COSSQL-51A, DFS-COSSQL-51CINST_02: DFS-COSSQL-51B, DFS-COSSQL-51G
7) Always On (Example: VAVM1327 / VAVM3773)
These servers may require manual process (not fully covered by update job in global domain).
Procedure
- Open Failover Cluster Manager.
- Check roles, topology, shared disk expectations.
- Open SSMS.
- Ensure no critical jobs running.
- Confirm AO-related jobs behavior.
- In Always On High Availability:
- Availability Groups → Dashboard
- verify replicas green
- run failover wizard to synchronous counterpart
- Refresh dashboard and verify synchronization.
- If dashboard shows reverting/transient state:
- do not force restart immediately
- validate role state and primary ownership first
- Restart server only after role/sync validation.
Failback to Original Primary
- Log onto original primary.
- Check pending updates (allow settle period).
- In SSMS/AG, connect to current primary and verify no jobs running.
- Run failback wizard to original primary.
- Validate synchronization and database online status.
8) Failover Clusters (Shared Disks)
Example nodes in source
DFS-GALASQL-03ADFS-CPCFSQL-51A
Procedure
- Confirm updates installed on all relevant nodes (including pending restart state if expected).
- In Failover Cluster Manager, identify active node.
- Verify disk mount ownership on active node.
- In SSMS, confirm jobs not running and no immediate conflict schedules.
- Move SQL roles to target node:
- Roles → Move → Select Node
- Verify role transition and disk remount on target node.
- Confirm SQL service and DB availability.
- Restart as required from Software Center and re-validate.
9) Post-Change Validation Checklist
- [ ] Updates installed successfully
- [ ] Pending reboot cleared
- [ ] Key services healthy
- [ ] SQL jobs normal
- [ ] AO/cluster synchronization green
- [ ] Databases online and available
- [ ] Evidence attached in ticket/tracker
10) Troubleshooting Notes
- Long sync after first failover can be normal on high-transaction DBs.
- If role status is inconsistent, verify current primary and jobs before restart.
- If updates missing, validate batch assignment + policy refresh.
- Re-run update status script after each significant step.
11) Suggested AI Wiki Structure
operations/windows-updates/windows-updates-master-runbook-v2.md(this file)operations/windows-updates/check-update-status.mdoperations/windows-updates/move-batches.mdoperations/windows-updates/software-center-scheduled-install.mdoperations/windows-updates/alwayson-failover-quick-checklist.md
12) Scheduling Split + CM Preparation (Friday/Saturday)
Note
- Batches are split into:
- Friday (inside business hours)
- Saturday (outside business hours)
Steps to follow
- Create a CM for servers highlighted for Thursday/Friday before automatic restart.
- Create a second CM for Saturday for the remaining servers.
- GIS and OCHA application servers require a special graceful shutdown procedure before restart.
Referenced procedure docs
GIS ArcGIS VMs_V5OCHA OneGMS v2
13) Automatic Launch via SQL Jobs on ARS
CM timing note
- Create corresponding CM on Monday of the same week in which Windows Updates will be applied, to avoid tickets remaining open too long.
Execution model
- Windows Updates installation is triggered automatically through SQL Jobs on ARS.
Job hosts by domain
-
DPKO domain host:
DFS-ARSPROLISTASU_DBA - Install Windows Updates Batch 5AASU_DBA - Install Windows Updates Batch 5B
-
GLOBAL domain host:
ARS-M-LIS-001.global.un.orgASU_DBA - Install Windows Updates Batch 5A GLOBALASU_DBA - Install Windows Updates Batch 5B GLOBAL
File location (both servers)
E:\Projects\_wu
Referenced files
Install_Windows_Updates_Batch_5A.ps1Install_Windows_Updates_Batch_5B.ps1updates5A.txtupdates5B.txtWindows Update PowerShell script.txt