Skip to main content

High Availability / Disaster Recovery (HA/DR)

Multi-Region Network Diagram (Detailed View)

Kindly review the architecture diagrams created for your institution

Mechanisms for HA & Multi-Cloud

The IgniteConnex Cloud Environment is designed using Azure Application Service Plans, Traffic Manager, and Application Gateways with WAF2 components. This setup ensures the highest level of redundancy and fault tolerance in the market. Web traffic is balanced through an Active/Passive configuration of the Traffic Manager, distributing the load across available regions. Each region comprises multiple data centers with automatic scalability, load balancing, and failover configured within distributed nodes. Regions remain synchronized to maintain the latest information. This approach also forms a strong foundation for any "Multi-Cloud" initiatives an organization may have, effectively spreading risk across multiple cloud providers. With this architecture, we can automate various high availability concerns.

High Availability Concern Areas:

  • Redundancy
  • Load balancing
  • Failover mechanism
  • Disaster recovery plan
  • Monitoring and alerting
  • Geographic distribution
  • Scalability
  • Regular maintenance and updates
  • Highly available database
  • Cloud-based services
  • Reducing single points of failure

Disaster Recovery: Measuring Recovery Time & Recovery Point Objectives (RTO/RPO)

Critical DR metrics that guide our decisioning around Disaster Recovery and how we handle business-impacting downtime events, include "Recovery Time Objective and Recovery Point Objective." Note that being an integration platform as a service, we do our best to be as "hands off" with your data as possible. Ultimately, we establish the plumbing that lets your information flow between systems, when there's a leak or a burst pipe, we know about it and respond. However, events that occur on the point solutions we integrate with, are out of our control.

  • Recovery Time Objective (RTO): RTO represents the maximum allowable downtime for a system or service after a disaster occurs. It is the target time within which you aim to restore the service to normal operation. Example of how we calculate RTO considering various factors:
    • Example 1: E-commerce Website
      • Let's say you operate an e-commerce website. If your average hourly revenue is $10,000, and your RTO is set at 4 hours, then you cannot afford more than 4 hours of downtime during a disaster. Any downtime beyond this period could result in significant revenue loss and customer dissatisfaction.
    • Example 2: Customer Support System
      • For a customer support system, you may calculate the RTO based on how long your team can tolerate not having access to the system. If your customer support team relies heavily on the system to respond to customer inquiries, an RTO of 1 hour may be essential to avoid disruptions in service.
  • Recovery Point Objective (RPO): RPO defines the maximum acceptable amount of data loss measured in time. It represents the point in time to which you can recover data after a disaster occurs. Calculating RPO involves considering factors such as data sensitivity and how frequently data changes.
    • Example 1: Financial Data Management
      • In a financial data management system, the RPO might be set to 1 hour. This means that in the event of a disaster, you should be able to recover data up to the most recent state within the last hour. If data is lost beyond this point, it could lead to financial discrepancies and compliance issues.
    • Example 2: Content Management System
      • For a content management system, where frequent updates occur, an RPO of 15 minutes might be appropriate. This ensures that data losses are minimal, and users can access the latest content even after recovery.

Calculating RTO and RPO requires a careful balance between cost, technology, and business needs. Striking the right balance ensures that your disaster recovery plan is aligned with the criticality of the services you provide and the expectations of your customers. Once determined, RTO and RPO guide the design and implementation of the disaster recovery strategy, including backup frequency, replication intervals, and failover mechanisms to meet the desired objectives.

RTO/RPO for Managed Integrations (Premium Support Only)

Our RTO/RPO map directly to our Support Levels. See the reference below from the "Production Support Levels" section for more information. In addition, please view our "https://trust.igniteconnex.com" website and download our various reports to learn more about how we handle HA/DR and incidents as a platform.

Severity LevelIncident DefinitionResponse Time (Acknowledgement)
CriticalThe service is completely down or inaccessible, resulting in a major business impact.Response in accordance with Support Level. RTO is defined on a per-project basis considering the monetary impact of the service to the business while in a downtime state due to a disaster event. For RPO, since IgniteConnex is a middleware solution, it does not store data longer than 24 hours.