In today's digital landscape, where data is the lifeblood of businesses, data security solutions must be more than just robust – they need to be resilient. Fortanix Data Security Manager (DSM) SaaS, a global service offering crucial security services like cryptographic key lifecycle management and data encryption, understands this imperative.
But what happens when the stakes are high, and downtime is simply not an option? . This blog post delves into the architectural design, operational procedures, and development lifecycle that underpin the impressive resilience of Fortanix DSM SaaS.
Setting the Bar High: Resiliency Objectives
Fortanix DSM SaaS is frequently deployed in mission-critical applications, demanding unwavering reliability. To meet these demands, Fortanix has established clear resiliency objectives:
- High Availability (HA): Aiming for continuous availability, exceeding the 99.95% SLA uptime.
- Disaster Recovery (DR): Striving for zero Recovery Time Objective (RTO) and Recovery Point Objective (RPO).
- Capacity: Maintaining predictable performance under peak loads and DDoS attacks, with easy scalability.
- Latency: Ensuring API request latency remains unaffected by load or congestion.
- Fairness: Guaranteeing equitable service for all tenants, preventing "noisy neighbor" issues.
The Foundation: DSM SaaS Architecture
The resilience of DSM SaaS is built upon a robust architecture, deployed across six independent global regions. Each region comprises a cluster of DSM nodes, powered by FIPS 140-2 Level 3 validated Fortanix FX2200 HSM appliances, spread across three distinct data centers. This regional isolation ensures that failures in one region don't impact others.
Key architectural elements include:
- Network Topology: Utilizing Anycast routing and redundant network providers for reliable connectivity and DDoS protection.
- Compute Cluster: An active-active cluster design, enabling every node to respond to requests and ensuring seamless load balancing.
- Data and Storage Layer: Employing a distributed Cassandra database with strong consistency, ensuring data integrity and availability.
This architecture enables:
- HA: Redundancy at every level, ensuring uninterrupted service even during site failures.
- Capacity: Scalability and DDoS protection for consistent performance.
- Latency: Optimized routing and load balancing for minimal delays.
- Fairness: Quality of Service (QoS) mechanisms to prevent resource hogging.
Building Quality In: Software Development Lifecycle (SDLC)
Fortanix prioritizes quality through a rigorous SDLC, featuring:
- Extensive Quality Assurance (QA): Thorough testing, including unit, integration, performance, and regression testing.
- Staging Environment: A scaled-down replica for testing new releases before production deployment.
- Staged Software Updates: Gradual rollouts across regions and nodes, minimizing disruption.
This SDLC ensures high availability and minimizes the risk of disruptive software updates.
Keeping the Lights On: SaaS Operations
A dedicated 24/7 SaaS Operations team ensures continuous monitoring, rapid response to issues, and proactive capacity planning. Key operational practices include:
- Monitoring and Alerting: Real-time monitoring of all systems and immediate alerts for any anomalies.
- Disaster Drills: Regular simulations to prepare for potential disasters.
- Capacity Planning: Proactive scaling to meet future demand.
- Communication and Notification: Transparent communication with customers through status updates and notifications.
- Change Procedure: Rigorous change management processes.
These operational practices contribute to high availability, disaster recovery preparedness, and optimal performance.
Looking Ahead: Future Enhancements and Catastrophic Scenarios
Fortanix is committed to continuous improvement, with plans to introduce Virtual Private Cloud (VPC) support for enhanced network resiliency.
While DSM SaaS is designed for exceptional resilience, Fortanix acknowledges the possibility of extremely rare catastrophic scenarios, such as simultaneous failures of all sites in a region. However, robust backup and recovery procedures are in place to mitigate these risks.
Conclusion: Security and Reliability, Hand in Hand
Fortanix DSM SaaS exemplifies the commitment to building not just a secure platform, but a resilient one. Through a combination of robust architecture, rigorous development practices, and dedicated operations, Fortanix ensures that customers can rely on DSM SaaS for their critical security needs. In a world where data security is non-negotiable, Fortanix DSM SaaS stands as a testament to the power of resilience.
To learn more about the resiliency of Fortanix DSM, download this white paper.