Chapter 2: Design Methods
2.1 Design Principles and Basis
Sound identity authentication architecture rests on a set of executable engineering principles — not abstract guidelines, but actionable rules that can be verified during design review, commissioning, and audit. Each principle below is paired with a technical basis and a verification method, ensuring that the design remains traceable from concept to acceptance test.
| # | Principle | Technical Basis | Verification Method |
|---|---|---|---|
| 1 | Single authoritative identity — choose one primary source per identity type (human/device/service) | Audit integrity; prevents split-brain identity decisions | Identity source inventory; no duplicate provisioning paths |
| 2 | Certificate-based auth for devices (EAP-TLS) — eliminate shared secrets | Cryptographic binding; phishing-resistant; non-repudiation | EAP-TLS coverage report; no PEAP-MSCHAPv2 in production zones |
| 3 | Least privilege by default — deny-all baseline, explicit allow by role | Breach containment; limits lateral movement | RBAC test cases; no "catch-all" permit rules |
| 4 | Separation of duties (SoD) — split identity admin, policy admin, security audit roles | Fraud prevention; compliance requirement | Role matrix review; no single account with all privileges |
| 5 | Policy-as-code mindset — version-control AAA/NAC policies; changes require peer review | Change safety; rollback capability; drift detection | Git history for policy files; change approval records |
| 6 | Fail-safe behavior defined per zone — fail-closed for critical; limited fail-open with compensating controls | Availability vs. security balance; zone-based risk tolerance | Game-day simulation; verify fallback VLAN ACL restrictions |
| 7 | Immutable and time-synced audit — NTP everywhere; near-real-time log forwarding; tamper-evident storage | Forensics integrity; compliance evidence | NTP drift dashboard; SIEM ingest gap monitoring |
| 8 | Segment by identity, not IP alone — dynamic VLAN/ACL/SGT and micro-segmentation | Zero trust principle; IP-based segmentation is insufficient | Role-to-VLAN mapping tests; SGT policy verification |
| 9 | Lifecycle automation — joiner/mover/leaver triggers; certificate auto-enroll; service account rotation | Operational reliability; eliminates manual error | Provisioning SLA reports; cert renewal lead-time monitoring |
| 10 | Compatibility with legacy — managed exceptions (MAB, PSK, MAC allowlist) with tighter ACLs | Gradual migration; avoids big-bang disruption | Exception register with expiry dates; monthly review |
| 11 | Observability first — every decision emits structured logs; dashboards exist before mass rollout | Troubleshootability; baseline before enforcement | SIEM dashboard review; correlation ID present in all log types |
| 12 | Redundancy and capacity headroom — AAA/IdP/SIEM sized for peak auth storms | Resilience; Monday-morning and power-recovery scenarios | Stress test at 3× normal load; failover drill results |
2.2 Failure Causes → Recommendations
Understanding failure patterns is as important as designing for success. The following table documents eight major failure mechanism groups observed in enterprise identity authentication deployments, along with their consequences, avoidance strategies, and verification methods. Each entry represents a real-world failure mode that has caused production outages or security incidents.
| Failure Mechanism | What Happens | Avoidance / Recommendation | Verification Method |
|---|---|---|---|
| Weak identity mapping | Users get wrong VLAN/ACL; over-privilege or under-privilege | Normalize attributes; enforce naming conventions; use role registry | Sampling + RBAC tests; role/VLAN correlation report |
| RADIUS overload | Auth timeouts; mass disconnect during peak (Monday morning, power recovery) | HA cluster + load balancer + caching; size for peak × storm factor | Stress test at peak auth/s; verify N+1 headroom |
| PKI misconfigured templates | Cert spoofing or unusable certs; EAP-TLS failures | EKU constraints; key size policy (RSA 2048+ or ECDSA P-256); auto-enroll controls | Template review checklist; cert chain validation test |
| Poor fallback design | Outage escalates; critical staff locked out | Define zone-based fallback VLAN with minimal ACL; pre-approve break-glass list | Game-day simulation; fallback VLAN ACL verification |
| Guest portal bypass | Untrusted devices access internal resources | Strict guest VLAN isolation + DNS filtering + account expiry | Penetration test + NAC isolation reports |
| Shared admin accounts | No accountability; audit trail broken | Unique admin IDs + PAM + MFA; forbid shared credentials; alert on shared use | Device config audit; TACACS accounting review |
| Missing time sync | Logs unusable in incident; correlation impossible | Enforce NTP on all nodes; monitor drift; block devices with excessive drift | SIEM drift dashboard; NTP offset alerts |
| Mis-scoped conditional access | Legitimate users blocked; helpdesk flood | Pilot rings; exceptions with approval; telemetry-based tuning before enforcement | CA policy monitoring; false-positive rate tracking |
| Inconsistent log formats | Correlation impossible; forensics fail | Adopt CEF/JSON schema; normalize fields; parser regression tests | SIEM parser tests; field presence validation |
2.3 Core Design / Selection Logic
The design selection process follows a structured decision tree that classifies access type, evaluates device management status and certificate readiness, and maps to the appropriate authentication and authorization method. This ensures that every access scenario has a deterministic policy outcome, with no ambiguous or undefined states.
Decision Steps
- Classify identities: human / device / service / privileged. Each type has different trust requirements and lifecycle management needs.
- Define trust: certificate issuance model, revocation model (CRL/OCSP), factor requirements (single/MFA/phishing-resistant).
- Choose enforcement points: switch/AP (802.1X), VPN/ZTNA gateway, PAM jump host, firewall — each with appropriate protocol support.
- Select policy language: RBAC baseline + ABAC for conditions (device posture, location, risk score, time-of-day).
- Define exception handling: legacy endpoints, IoT devices, break-glass admins — each with documented compensating controls and expiry.
- Define observability and acceptance: success rates, latency targets, coverage percentages, audit completeness criteria.
2.4 Key Design Dimensions
Every design decision involves trade-offs across multiple dimensions. The following table provides a structured framework for evaluating those trade-offs, ensuring that performance, reliability, maintainability, compatibility, cost, and compliance requirements are all addressed in the design documentation.
| Dimension | Key Considerations | Engineering Targets | Trade-off Notes |
|---|---|---|---|
| Performance / UX | Auth latency, roaming stability, captive portal usability | RADIUS p95 <800ms; roam interruption <150ms; portal onboard <2min | Tighter security (more checks) increases latency |
| Stability / Reliability | HA design, failover time, policy rollback capability | AAA availability ≥99.9%; failover <30s; rollback <5min | Active-active adds complexity and cost |
| Maintainability | Certificate renewal automation, template versioning, runbooks | Cert renewal lead time 14–30 days; zero manual cert operations at scale | Automation requires PKI maturity and MDM integration |
| Compatibility / Extensibility | Vendor support for CoA, SGT, APIs; future integration hooks | Standard protocols only (EAP-TLS, RADIUS, TACACS+, SAML/OIDC) | Vendor-proprietary features risk lock-in |
| LCC / TCO | Licensing + ops time + incident cost reduction | 3-year net cost after incident reduction; FTE ops effort per 1000 users | EAP-TLS reduces long-term incident costs; NAC posture adds licensing overhead |
| Energy / Environmental | Appliance power consumption, PoE budgeting for edge sensors/APs | PoE budget with 15% safety margin; closet thermal within vendor limits | High-density AP deployments require careful PoE planning |
| Compliance | Audit retention, SoD, encryption standards, privacy minimization | Log retention 180–365 days; TLS for all log transport; SoD enforced in roles | Longer retention increases storage cost; privacy rules may limit log fields |