Chapter 2: Design Methods

Engineering principles, failure analysis, decision logic, and key design dimensions

2.1 Design Principles and Basis

Sound identity authentication architecture rests on a set of executable engineering principles — not abstract guidelines, but actionable rules that can be verified during design review, commissioning, and audit. Each principle below is paired with a technical basis and a verification method, ensuring that the design remains traceable from concept to acceptance test.

#	Principle	Technical Basis	Verification Method
1	Single authoritative identity — choose one primary source per identity type (human/device/service)	Audit integrity; prevents split-brain identity decisions	Identity source inventory; no duplicate provisioning paths
2	Certificate-based auth for devices (EAP-TLS) — eliminate shared secrets	Cryptographic binding; phishing-resistant; non-repudiation	EAP-TLS coverage report; no PEAP-MSCHAPv2 in production zones
3	Least privilege by default — deny-all baseline, explicit allow by role	Breach containment; limits lateral movement	RBAC test cases; no "catch-all" permit rules
4	Separation of duties (SoD) — split identity admin, policy admin, security audit roles	Fraud prevention; compliance requirement	Role matrix review; no single account with all privileges
5	Policy-as-code mindset — version-control AAA/NAC policies; changes require peer review	Change safety; rollback capability; drift detection	Git history for policy files; change approval records
6	Fail-safe behavior defined per zone — fail-closed for critical; limited fail-open with compensating controls	Availability vs. security balance; zone-based risk tolerance	Game-day simulation; verify fallback VLAN ACL restrictions
7	Immutable and time-synced audit — NTP everywhere; near-real-time log forwarding; tamper-evident storage	Forensics integrity; compliance evidence	NTP drift dashboard; SIEM ingest gap monitoring
8	Segment by identity, not IP alone — dynamic VLAN/ACL/SGT and micro-segmentation	Zero trust principle; IP-based segmentation is insufficient	Role-to-VLAN mapping tests; SGT policy verification
9	Lifecycle automation — joiner/mover/leaver triggers; certificate auto-enroll; service account rotation	Operational reliability; eliminates manual error	Provisioning SLA reports; cert renewal lead-time monitoring
10	Compatibility with legacy — managed exceptions (MAB, PSK, MAC allowlist) with tighter ACLs	Gradual migration; avoids big-bang disruption	Exception register with expiry dates; monthly review
11	Observability first — every decision emits structured logs; dashboards exist before mass rollout	Troubleshootability; baseline before enforcement	SIEM dashboard review; correlation ID present in all log types
12	Redundancy and capacity headroom — AAA/IdP/SIEM sized for peak auth storms	Resilience; Monday-morning and power-recovery scenarios	Stress test at 3× normal load; failover drill results

2.2 Failure Causes → Recommendations

Understanding failure patterns is as important as designing for success. The following table documents eight major failure mechanism groups observed in enterprise identity authentication deployments, along with their consequences, avoidance strategies, and verification methods. Each entry represents a real-world failure mode that has caused production outages or security incidents.

Failure Mechanism	What Happens	Avoidance / Recommendation	Verification Method
Weak identity mapping	Users get wrong VLAN/ACL; over-privilege or under-privilege	Normalize attributes; enforce naming conventions; use role registry	Sampling + RBAC tests; role/VLAN correlation report
RADIUS overload	Auth timeouts; mass disconnect during peak (Monday morning, power recovery)	HA cluster + load balancer + caching; size for peak × storm factor	Stress test at peak auth/s; verify N+1 headroom
PKI misconfigured templates	Cert spoofing or unusable certs; EAP-TLS failures	EKU constraints; key size policy (RSA 2048+ or ECDSA P-256); auto-enroll controls	Template review checklist; cert chain validation test
Poor fallback design	Outage escalates; critical staff locked out	Define zone-based fallback VLAN with minimal ACL; pre-approve break-glass list	Game-day simulation; fallback VLAN ACL verification
Guest portal bypass	Untrusted devices access internal resources	Strict guest VLAN isolation + DNS filtering + account expiry	Penetration test + NAC isolation reports
Shared admin accounts	No accountability; audit trail broken	Unique admin IDs + PAM + MFA; forbid shared credentials; alert on shared use	Device config audit; TACACS accounting review
Missing time sync	Logs unusable in incident; correlation impossible	Enforce NTP on all nodes; monitor drift; block devices with excessive drift	SIEM drift dashboard; NTP offset alerts
Mis-scoped conditional access	Legitimate users blocked; helpdesk flood	Pilot rings; exceptions with approval; telemetry-based tuning before enforcement	CA policy monitoring; false-positive rate tracking
Inconsistent log formats	Correlation impossible; forensics fail	Adopt CEF/JSON schema; normalize fields; parser regression tests	SIEM parser tests; field presence validation

2.3 Core Design / Selection Logic

The design selection process follows a structured decision tree that classifies access type, evaluates device management status and certificate readiness, and maps to the appropriate authentication and authorization method. This ensures that every access scenario has a deterministic policy outcome, with no ambiguous or undefined states.

Figure 2.1: Core Design Decision Tree — Access type classification leading to EAP-TLS, ZTNA+MFA, or PAM+TACACS+ outcomes

Decision Steps

Classify identities: human / device / service / privileged. Each type has different trust requirements and lifecycle management needs.
Define trust: certificate issuance model, revocation model (CRL/OCSP), factor requirements (single/MFA/phishing-resistant).
Choose enforcement points: switch/AP (802.1X), VPN/ZTNA gateway, PAM jump host, firewall — each with appropriate protocol support.
Select policy language: RBAC baseline + ABAC for conditions (device posture, location, risk score, time-of-day).
Define exception handling: legacy endpoints, IoT devices, break-glass admins — each with documented compensating controls and expiry.
Define observability and acceptance: success rates, latency targets, coverage percentages, audit completeness criteria.

2.4 Key Design Dimensions

Every design decision involves trade-offs across multiple dimensions. The following table provides a structured framework for evaluating those trade-offs, ensuring that performance, reliability, maintainability, compatibility, cost, and compliance requirements are all addressed in the design documentation.

Dimension	Key Considerations	Engineering Targets	Trade-off Notes
Performance / UX	Auth latency, roaming stability, captive portal usability	RADIUS p95 <800ms; roam interruption <150ms; portal onboard <2min	Tighter security (more checks) increases latency
Stability / Reliability	HA design, failover time, policy rollback capability	AAA availability ≥99.9%; failover <30s; rollback <5min	Active-active adds complexity and cost
Maintainability	Certificate renewal automation, template versioning, runbooks	Cert renewal lead time 14–30 days; zero manual cert operations at scale	Automation requires PKI maturity and MDM integration
Compatibility / Extensibility	Vendor support for CoA, SGT, APIs; future integration hooks	Standard protocols only (EAP-TLS, RADIUS, TACACS+, SAML/OIDC)	Vendor-proprietary features risk lock-in
LCC / TCO	Licensing + ops time + incident cost reduction	3-year net cost after incident reduction; FTE ops effort per 1000 users	EAP-TLS reduces long-term incident costs; NAC posture adds licensing overhead
Energy / Environmental	Appliance power consumption, PoE budgeting for edge sensors/APs	PoE budget with 15% safety margin; closet thermal within vendor limits	High-density AP deployments require careful PoE planning
Compliance	Audit retention, SoD, encryption standards, privacy minimization	Log retention 180–365 days; TLS for all log transport; SoD enforced in roles	Longer retention increases storage cost; privacy rules may limit log fields