Chapter 1: System Components
1.1 System Architecture
The system architecture is organized into two primary deployment boundaries: the Data Center / Core, which hosts authoritative identity and policy services, and the Sites / Edge, which contains enforcement points closest to end users and devices. Both boundaries are connected via redundant WAN links (MPLS/SD-WAN), and each site is designed with local survivability capabilities to maintain critical access even during WAN interruptions.
The authoritative identity plane (AD/LDAP, IdP, PKI) resides in the core, ensuring a single source of truth for all identity decisions. Policy engines (RADIUS, TACACS+, NAC) operate in an active-active or N+1 cluster configuration. Enforcement points at the edge consume policy decisions via standardized protocols and apply them at the port, SSID, or application level.
Module Relationships & Flows
| Module | Role | Key Interactions | Redundancy Model |
|---|---|---|---|
| AD/LDAP Directory | Authoritative identity, groups, device objects | Provides identity attributes to RADIUS, NAC, PAM; receives HR feed | Multi-master replication; ≥2 DCs at different sites |
| IdP (SAML/OIDC) | User authentication, MFA, conditional access tokens | Issues tokens to ZTNA/Apps; consumes AD/LDAP; receives MDM/EDR posture | Active-active or cloud HA; failover within seconds |
| PKI/CA | Certificate issuance and revocation for EAP-TLS | Issues device certs via SCEP/EST; OCSP/CRL queried by RADIUS | Offline root + online issuing CA; OCSP cluster |
| RADIUS | 802.1X network access authentication and authorization | Queries AD/LDAP; validates PKI; returns VLAN/ACL/SGT to switch/AP | N+1 cluster behind load balancer or DNS SRV |
| TACACS+ | Device admin login + command authorization + accounting | Queries AD; logs commands to SIEM; enforces role-based command sets | Primary/secondary pair; fallback to local (restricted) |
| NAC | Profiling, posture assessment, quarantine, guest portal | Integrates with switches/APs via RADIUS/SNMP/API; queries MDM/EDR | HA policy nodes; posture data replicated |
| ZTNA/VPN | Remote access enforcement with per-app policy | Integrates with IdP for MFA/CA; queries NAC/EDR for posture | Global load balancer; multi-region gateways |
| PAM | Privileged session brokering, vault, recording | Brokers SSH/RDP to network devices; integrates with TACACS+; logs to SIEM | Clustered vaults; session recording replicated |
| SIEM | Log correlation, alerts, retention, dashboards | Ingests from all modules via TLS syslog/CEF; NTP-synchronized | Distributed collectors; hot/warm/cold storage tiers |
| NTP | Time synchronization for all nodes | Provides stratum time to all infrastructure components | Internal stratum hierarchy; ≥2 stratum-1 sources |
Core / Optional / Support Distinction
| Category | Components | Rationale |
|---|---|---|
| Core | AD/IdP, PKI, RADIUS/TACACS+, SIEM+NTP, Switch/AP enforcement | Required for basic identity-based access control and audit |
| Optional | Advanced posture checks (EDR/MDM), SGT/TrustSec, SOAR playbooks | Enhances security posture; adds complexity and licensing cost |
| Support | CMDB, ITSM ticketing, vulnerability scanners for posture signals | Improves operational governance and change control |
1.2 Components and Functions
Each component in the identity authentication stack has a well-defined primary responsibility, set of inputs and outputs, engineering metrics, and associated mismatch risks. The component grid below provides a structured reference for procurement, capacity planning, and acceptance testing. Understanding these interdependencies is critical for avoiding cascading failures during deployment or operational changes.
| Component | Primary Responsibility | Inputs | Outputs | Key Metrics | Mismatch Risks |
|---|---|---|---|---|---|
| AD/LDAP Directory | Authoritative identities, groups, device objects | HR feed, admin changes | LDAP/Kerberos, group attributes | Replication latency <15 min; DC uptime ≥99.9% | Stale groups cause over-privilege; replication delay breaks login |
| IdP (SAML/OIDC) | User authentication, MFA, conditional access | User credentials, device signals | Tokens/assertions | Auth latency p95 <1.5s; MFA success rate | Token misconfig causes app lockout |
| PKI/CA | Cert issuance/revocation for EAP-TLS | CSR, device identity | Cert, CRL/OCSP | OCSP p95 <300ms; revocation propagation <1h | Weak templates enable spoofing; revocation delay |
| RADIUS | 802.1X authN, dynamic authZ, accounting | EAP requests, identity attrs | Accept/Reject + VLAN/ACL/SGT | Auth p95 <800ms; success rate ≥99% | Too-strict policies cause mass outages |
| TACACS+ | Admin authN/Z + command control | Admin login, role | Permit/deny commands + logs | Command log completeness 100% | Shared accounts break accountability |
| NAC | Profiling, posture, quarantine, guest | DHCP/SNMP/EDR/MDM | Role/VLAN/ACL changes, CoA | Quarantine time <60s | False profiling disrupts IoT operations |
| Switch/AP/WLC | Enforcement at access edge | RADIUS decisions | Port state, VLAN/ACL/SGT | 802.1X stability; failover time | Firmware limitations break EAP-TLS |
| ZTNA/VPN | Remote access enforcement | IdP tokens, device posture | Tunnel/app access | Connection p95 <3s; step-up accuracy | Split tunnel errors cause data leakage |
| PAM | Privileged session brokering | Approvals, vault creds | Session recording, audited access | Recording coverage 100%; checkout TTL | Bypass paths allow unrecorded admin |
| SIEM | Correlation, alerts, retention | Logs from all modules | Alerts, dashboards, evidence | Ingest EPS headroom 30%; retention 180–365d | Time drift ruins correlation |
| NTP | Time sync for audit integrity | Stratum sources | Time offset control | Offset <100ms inside domain | Unsynced clocks invalidate audit |
| Firewall | Network segmentation and policy enforcement | Traffic flows, ACL rules | Permit/deny, logging | Throughput; rule hit rate; latency | Overly permissive rules negate segmentation |
1.3 Working Principles
Startup / Initialization
Before any authentication can occur, the system must establish its trust anchors and verify temporal consistency. This initialization sequence ensures that all components are operating from a known-good state before accepting live traffic.
- Establish trust anchors: CA root distribution to endpoints/switches; IdP signing keys published; RADIUS shared secrets configured and rotated.
- Verify time: NTP sync for all nodes; alert if offset exceeds threshold (100 ms internal, 1 s external). No authentication should proceed if time is unsynced on policy nodes.
- Load policies: RADIUS/TACACS+ policy sets, NAC profiles, conditional access rules, PAM roles — all version-tagged and staged before enforcement.
Normal Operation
During steady-state operation, the system processes three primary authentication flows, each with distinct protocols and decision paths. All flows emit structured audit events to SIEM with correlation IDs.
- Wired/Wi-Fi: Endpoint presents cert → RADIUS validates chain + revocation → maps identity to role → returns VLAN/ACL/SGT → switch/AP enforces.
- Admin access: User authenticates to PAM (MFA) → session to device uses TACACS+ for command-level authorization → accounting logs to SIEM.
- Remote access: IdP authenticates + conditional access evaluates → ZTNA grants app access with least privilege; abnormal risk triggers step-up or block.
Exception & Recovery Chains
| Exception Trigger | System Behavior | Recovery Action | SIEM Alert |
|---|---|---|---|
| Expired device certificate | Endpoint placed in remediation VLAN; NAC triggers certificate renewal workflow; user notified | MDM/auto-enrollment issues new cert; endpoint re-authenticates | Alert if threshold of failures exceeded |
| Compromised admin account attempts forbidden command | TACACS+ denies command; PAM session flagged; SOAR opens incident | Password reset + token revoke; session terminated; incident documented | Critical alert — privilege abuse |
| Time drift on a switch causes invalid accounting timestamps | NTP monitoring alert; device moved to maintenance window; logs tagged "time_untrusted" | NTP fix applied; post-fix validation includes log continuity test | Warning alert — audit integrity risk |
| RADIUS cluster node failure | Client failover to secondary node; site survivability policy activates if WAN also fails | Node restored or replaced; health check confirms recovery | Major alert — AAA availability |
| OCSP responder unavailable | Critical zones fail-closed; standard zones allow only remediation VLAN | OCSP responder restored; cached responses expire; normal auth resumes | Major alert — PKI trust chain |