Edit page

Windows Updates Master Runbook v2 (Consolidated)

Title

Slug

Status

Context

Categories

Topics

Markdown

# Windows Updates Master Runbook v2 (Consolidated)

## Document Control
- **Source package:** screenshots + workbook (`CriticalAppsSQCluster.xlsx` / CriticalAppsSQLCluster tracker context)
- **Revision note:** `07/02/2026 reviewed version to include GIS new servers/procedures and apply changes after decommissioned infrastructure.`
- **Scope:** Windows Updates lifecycle for SQL/SCOM/AlwaysOn/Failover clusters, including batch operations and SCCM Software Center procedures.

---

## 1) General Process Information

### Objective
Standardize Windows update execution across standalone servers, failover clusters, and SQL Always On environments with controlled failover, restart, and validation.

### Core Principles
1. Track progress in the approved tracker (e.g., `CriticalAppsSOCluster.xlsx`).
2. Start with systems where synchronization delay is longest (SCOM/SQL primary paths).
3. Validate before and after each role move/failover/restart.
4. Execute only in approved change windows.

### High-Level Flow
1. Check update status and readiness.
2. Move servers to the correct batch (if needed).
3. Apply updates (interactive or scheduled).
4. Perform failover/failback procedures (where applicable).
5. Validate synchronization, services, and evidence.

---

## 2) Check Update Status

### Script Method
On designated admin host (example in source: `dfs-asufs-51`), use:

```powershell
C:\temp\CheckUpdateStatus1.ps1
```

### Steps
1. Open PowerShell as Administrator.
2. Run script against current batch list.
3. If checking another list, edit server list variable (line 2 in source note).
4. Review results for:
   - pending updates
   - pending reboot
   - install state
5. Save outputs (`updates.csv` + evidence screenshots).

### Notes
- Run **before** and **after** change.
- Treat pending reboot state as blocker unless explicitly planned.

---

## 3) Calendar / Change Window

### Before Change
- Confirm approved window and stakeholder communications.
- Confirm business/service owner approval.
- Confirm dependency windows (jobs, backups, integrations).

### During Change
- Timestamp each major action.
- Validate after each failover/move.
- Escalate quickly if sync exceeds expected threshold.

### After Change
- Record completion, validation evidence, and deviations.

---

## 4) Move Batches

### Purpose
When servers are in wrong Software Center batch or missing updates.

### Procedure
1. Send request to operations team (example: `SGITTHybridCloudOperations`).
2. Include: server list, current batch, target batch, reason, evidence.
3. Copy affected support owners.
4. Link request/ticket ID in change tracker.

### Batch Mapping Example (Reference)
- **Batch 5A – Secondary nodes:** `C75403`
- **Batch 5A – Primary nodes:** `C75404`
- **Batch 5B – Secondary nodes:** `C75406`
- **Batch 5B – Primary nodes:** `C75407`

> Use this pattern to separate primary/secondary update waves.

---

## 5) Microsoft System Center (Software Center)

## Option 1 — Interactive Install
1. Launch Software Center.
2. In **Available Software**, select only entries with **Type = Update**.
3. Click **Install Selected**.
4. Monitor status: `Waiting to install` → `Installing` → `Finished/Requires restart`.
5. If prompted, click **Restart** (or reboot per policy).
6. After reboot, verify host/services health.

## Option 2 — Scheduled Install
1. Select all required updates (**Type = Update**).
2. Click **Schedule Selected**.
3. In scheduling prompt, set:
   - **Install outside business hours**
   - **Restart automatically after installation if needed**
4. Click **Change software installation settings**.
5. In **Options → Work Information**:
   - define business hours correctly,
   - set valid workdays,
   - confirm non-working install window.
6. Click **Apply**, then **OK**.
7. Confirm updates show as `Scheduled to install after ...`.
8. Validate completion after schedule window.

### Notes
- Scheduled mode preferred for production.
- Wrong business-hours config can cause off-window installs.

---

## 6) SCOM Servers During Change

### Why first
Old primary DB can take long to sync after failover (up to ~2 hours).

### Steps
1. Verify updates installed correctly.
2. Verify Always On synchronization healthy.
3. Perform failover.
4. When new primary synchronized, restart old primary (no need to wait full old-primary DB recovery first).
5. Continue remaining servers; periodically check old primary sync progress.
6. Perform failback when all DBs synchronized (if needed).
7. Validate final primary sync.

### Notes from source
- Operation Manager SQL: failover to counterpart in same site.
- Datawarehouse SQL: only one node per site, failover between corresponding nodes.
- Cosmos config note:
  - `INST_01: DFS-COSSQL-51A, DFS-COSSQL-51C`
  - `INST_02: DFS-COSSQL-51B, DFS-COSSQL-51G`

---

## 7) Always On (Example: VAVM1327 / VAVM3773)

These servers may require manual process (not fully covered by update job in global domain).

### Procedure
1. Open **Failover Cluster Manager**.
   - Check roles, topology, shared disk expectations.
2. Open **SSMS**.
   - Ensure no critical jobs running.
   - Confirm AO-related jobs behavior.
3. In **Always On High Availability**:
   - Availability Groups → Dashboard
   - verify replicas green
   - run failover wizard to synchronous counterpart
4. Refresh dashboard and verify synchronization.
5. If dashboard shows reverting/transient state:
   - do not force restart immediately
   - validate role state and primary ownership first
6. Restart server only after role/sync validation.

### Failback to Original Primary
1. Log onto original primary.
2. Check pending updates (allow settle period).
3. In SSMS/AG, connect to current primary and verify no jobs running.
4. Run failback wizard to original primary.
5. Validate synchronization and database online status.

---

## 8) Failover Clusters (Shared Disks)

### Example nodes in source
- `DFS-GALASQL-03A`
- `DFS-CPCFSQL-51A`

### Procedure
1. Confirm updates installed on all relevant nodes (including pending restart state if expected).
2. In Failover Cluster Manager, identify active node.
3. Verify disk mount ownership on active node.
4. In SSMS, confirm jobs not running and no immediate conflict schedules.
5. Move SQL roles to target node:
   - Roles → Move → Select Node
6. Verify role transition and disk remount on target node.
7. Confirm SQL service and DB availability.
8. Restart as required from Software Center and re-validate.

---

## 9) Post-Change Validation Checklist

- [ ] Updates installed successfully
- [ ] Pending reboot cleared
- [ ] Key services healthy
- [ ] SQL jobs normal
- [ ] AO/cluster synchronization green
- [ ] Databases online and available
- [ ] Evidence attached in ticket/tracker

---

## 10) Troubleshooting Notes

- Long sync after first failover can be normal on high-transaction DBs.
- If role status is inconsistent, verify current primary and jobs before restart.
- If updates missing, validate batch assignment + policy refresh.
- Re-run update status script after each significant step.

---

## 11) Suggested AI Wiki Structure

- `operations/windows-updates/windows-updates-master-runbook-v2.md` (this file)
- `operations/windows-updates/check-update-status.md`
- `operations/windows-updates/move-batches.md`
- `operations/windows-updates/software-center-scheduled-install.md`
- `operations/windows-updates/alwayson-failover-quick-checklist.md`

---

## 12) Scheduling Split + CM Preparation (Friday/Saturday)

### Note
- Batches are split into:
  - **Friday** (inside business hours)
  - **Saturday** (outside business hours)

### Steps to follow
1. Create a **CM** for servers highlighted for Thursday/Friday before automatic restart.
2. Create a second **CM** for Saturday for the remaining servers.
3. **GIS and OCHA application servers require a special graceful shutdown procedure** before restart.

### Referenced procedure docs
- `GIS ArcGIS VMs_V5`
- `OCHA OneGMS v2`

---

## 13) Automatic Launch via SQL Jobs on ARS

### CM timing note
- Create corresponding **CM on Monday** of the same week in which Windows Updates will be applied, to avoid tickets remaining open too long.

### Execution model
- Windows Updates installation is triggered automatically through SQL Jobs on ARS.

### Job hosts by domain
- **DPKO domain host:** `DFS-ARSPROLIST`
  - `ASU_DBA - Install Windows Updates Batch 5A`
  - `ASU_DBA - Install Windows Updates Batch 5B`

- **GLOBAL domain host:** `ARS-M-LIS-001.global.un.org`
  - `ASU_DBA - Install Windows Updates Batch 5A GLOBAL`
  - `ASU_DBA - Install Windows Updates Batch 5B GLOBAL`

### File location (both servers)
- `E:\Projects\_wu`

### Referenced files
- `Install_Windows_Updates_Batch_5A.ps1`
- `Install_Windows_Updates_Batch_5B.ps1`
- `updates5A.txt`
- `updates5B.txt`
- `Windows Update PowerShell script.txt`

Cancel