In this module, we will learn:
- Introduction of Whats New with VMware Cloud Director Availability 4.0 highlighting key features and Business value.
- Introduction of What's New with VMware Cloud Director Availability 3.5
Note - This module is purely an overview of the solution and updates made in recent releases. The hands-on configuration portion of the lab starts in Module 2.
Special thanks go to Daniel Paluszek and his series of blog posts on VMware Cloud Director Availability 4.0 . This lab is based on his blog posts:
Review Core Capabilities of VMware Cloud Director Availability
Lets briefly look at core capabilities of VMware Cloud Director Availability
- Intuitive, Disaster Recovery as a service protection and wizard driven workflows to protect virtual machines (VM) or vApps, Replication and Recovery of VMs and vApps between VMware Cloud Director sites or on-premise to VMware Cloud Director and visa versa.
- Single on-premise appliance installation for ease of deployment and simplicity for customers replicating to provider clouds. Supports a migration path and DR functionality from vSphere 6.5+ U3
- The capability of each deployment to serve as both source and recovery instances (sites). There are no dedicated source and destination sites with symmetrical replication flow that can be started and managed from either the source or the recovery site means the UI can be accessed from anywhere with correct context.
- Maximum replicated VMs and retained replications (stored instances) as well as minimum 5min RPO Policy controls for providers to apply to one to-many VDC or replications via SLA policies. This helps to control storage costs and provide tiered services to customers.
- Secure tunneling through TCP proxy between sites with builtin encryption and optional compression availability
- Multi-tenant support native within the VMware Cloud Director hierarchy and now in-context DRaaS providing customers a very simple view and action capability directly in VMware Cloud Director
Overview of VMware Cloud Director Availability 4.0
Formerly known as vCloud Availability (vCAv), VMware Cloud Director Availability 4.0 (VCDA) provides a simple, cost-effective and secured onboarding, migration and disaster recovery as a service solution to or between multi-tenant VMware clouds.
Part of the VMware Cloud Provider Platform, VMware Cloud Director Availability has been designed from the ground-up to dramatically simplify cloud onboarding, enable cost-effective availability and recovery, and better secure operations to cloud providers and their end customers.
From a functionality perspective, VCDA is the convergence of multiple products and covers the following use cases:
- Disaster Recovery With ongoing risk of human error, natural disasters, and cyber-attacks, disaster recovery is an even more critical part of a cloud providers portfolio than ever. Intuitive, Disaster Recovery as a service protection and wizard driven workflows to protect virtual machines (VM) or vApps, Replication and Recovery of VMs and vApps between VMware Cloud Director sites or on-premise to VMware Cloud Director and visa versa.
- Cloud to Cloud Migrations Cloud providers must help customers migrate from clouds where their workloads are already running.
- Cloud Onboarding With so many workloads moving off-premises, cloud providers need to help customers onboard efficiently.
What’s New in VMware Cloud Director Availability 4.0?
VMware Cloud Director Availability 4.0 enhancements and new capabilities focus on three main themes: Service Operation, Service Consumption, and Platform Integration. We will drill into these themes as we review what is new with 4.0 below.
For our cloud providers that are already running vCAv, an in-place upgrade from vCloud Availability 3.0.5 and 3.5.1 is possible. Cloud Director doesn't bring any significant architectural changes: the real surprise concerns the new features and how they can maximize your capability and value as a cloud provider.
Service Operation – Resource UI Visualization
With VCDA 4.0, providers and tenants can comprehensively monitor resource utilization. The provider can see an organization or provider VDC (pVDC) view what would be required to ensure proper failover for any protected workloads. From a tenant perspective, they can see what resources are required for each respective orgVDC along with specific utilization on a per-VM perspective:
Service Operation – Resource Manageability
First, storage consumption reporting is made available in the user interface as well as via API. While tenants and providers have full visibility of the disk space used by each virtual machine replication, providers can also aggregate storage reporting from an organizations perspective.
Just like storage utilization, VCDA 4.0 provides visibility to network utilization (via the UI and API) on a per-protected workload. Besides network metering, improvements to network throttling have been made in this release:bandwidth throttling was previously available for system throttling, but with 4.0, we have introduced the ability to throttle on a per-policy basis (at the organization level). This change allows granular control on what the provider can control on a per-tenant perspective. Bandwidth throttling allows you to enforce:
- limits for replication data traffic from the on-premises to cloud sites, which is appropriate for expensive bandwidth areas.
- a global limit for the total incoming replication traffic from all cloud sites.
Public API enhancements
While the UI has greatly improved to visualize this data, a provider requires automation to retrieve these statistics and export them for metering and monitoring requirements.
VMware Cloud Director Availability now provides public APIs that help to onboard tenants, see the storage usage, the network settings, and the compute resource usage of the replications on a per-org/tenant basis.
It allows a provider to gain visibility into the entire platform and organizations: this is a great addition for cloud providers to be able to further automate delivery of the service to customers and configuration of the service, speeding up the time to consumption, whilst minimizing configuration errors and providing consistency.
The API is in the form of swagger definition (as all API are for Cloud Director Availability), making it very friendly for cloud providers to use their coding preferences.
One can view the entire API structure on code.vmware.com
Service Operation –Logging
With VMware Cloud Director Availability 4.0, we have the ability to export out VCDA logs to a syslog collector (such as vRealize Log Insight). However, we also show tasks and events within the native VCD interface. This is a great addition, especially when we pair with the new UI functionality.
From the Provider UI screen, the administrator can establish event notifications along with the syslog configuration.
Service Consumption -SLA Profiles
VMware Cloud Director Availability 4.0 introduces SLA Profiles, making protection configuration intuitive for customers. Previously, many decisions needed to be made by tenants, involving implied knowledge on how disaster recovery is working. Now, customers can select an SLA Profile that the cloud provider has created for them, which can be as simple as Gold, Silver and Bronze nomenclature essentially hiding complex wizard-driven choices from the customers.
For more advanced customers, these profiles can be overridden and modified at the point of usage, but this is optional for each customer.
In short, the premise of a SLA Profile is to provide a grouping of target settings when protecting a workload. This minimizes the amount of time spent configuring for protection. Think of this as a logical grouping construct.
The end value for a tenant is this is a single selection to protect a workload. One does not need to think about what settings to configure. Just select the SLA profile, and done.
Service Consumption -SLA Profiles Properties
A provider can either leverage the default SLA profiles (Gold, Silver, and Bronze) available upon VMware Cloud Director Availability 4.0 deployment/upgrade, or create additional ones and assign them as required to one or multiple organizations. The following properties define an SLA profile:
- Target Recovery Point Objective (RPO)
- Retention policy for point in time instances
- Timeslot to delay the initial synchronization
Service Consumption -SLA Profiles and Policy
Policies didn't disappear with this release; they are still mandatory for any organization using VMware Cloud Director Availability. To further explain SLA profiles in relations to policies:
- Think of the policy as the parent state of any protection and are the authoritative state for organizations.
- Then, the SLA profile is the child object that must respect the exposed policy to the organization.
- Both operate independently; however, they can work in unison to accomplish the stated protection goal.
SLA profiles are a key construct from Cloud Director Availability: this new capability is designed to provide better control to the provider, to configure replications faster avoiding complex decisions, and to establish service profiles based on application or organization requirements. As a tenant, protecting a workload has never been so easy!
- Select the workload(s) to protect.
- Select the destination organization Virtual Data Center (oVDC) and the Storage Policy.
- Select the SLA profile from a drop-down list.
- Confirm and done!
Service Consumption -SLA Profiles and Policy explained with example
This example showcases how a Silver SLA Profile can be paired with a Silver Policy for an organization
- SLA profiles are optional. One can continue with Policies, but we highly recommend these to simplify operations.
- SLA profiles only pertain to protections. They are not utilized for migrations at this time.
- A provider can revise a SLA profile if there is not any active protections consuming this profile. Please be aware of this consideration.
Service Consumption -Stored Instances
This is a great new feature. With VMware Cloud Director Availability 4.0, one has the ability to preserve a point in time instance of a protected workload. This would supersede the stated policy, but is still controlled by the provider. This is done within the Policy framework as the selection of the maximum number of stored instances per replication.
- Upon selection of the Store button, user will receive a confirmation to confirm this selection to make the specific instance Permanent.
This can be removed or deleted at any time. Do note, storage of any instances impacts any associated disk utilization over a period of time.
Service Consumption -Live Disk Resizing
In the source site with vCenter Server 7.0 and ESXi 7.0, when resizing the disk of the source virtual machine, VMware Cloud Director Availability now automatically resizes the protected virtual machine disk in the destination site, running any supported vSphere version. The existing replication instances are retained. Failover and Test Failover are supported from any instance, before and after the disk resizing. Until the replication instances before the resizing expire, the replica files continue to take the additional storage space. To save the storage space, you can remove the old replication instances or change the retention policy.
Platform Integration - User Experience
In terms of platform integration, there has been a lot of work to make Cloud Director Availability simpler and easier to consume. This includes improvements such as:
- Icons now represent the main operations.
- Improved workflows and reduced steps in wizards.
- Each workload now has a sub-context menu that shows the details, instances, tasks, traffic, and disk utilization
- Multi-selection is allowed and provides context menus based on the actual selection
- In-context protection and status from VMware Cloud Director UI Easier to ascertain protection status and run protective actions right from the vAPP and/or VM in VMware Cloud Director interface, rather than the VMware Cloud Director Availability plugin. Protected vApps and virtual machines now have a protection indicator badge in the interface of VMware Cloud Director. The protection badge is visible to providers in VMware Cloud Director with VMware Cloud Director Availability 4.0 in the source and in the destination cloud site.
Support and Interoperability with NSX-T
Interoperability with NSX-T: Support for NSX-T is key for cloud providers who are evolving their VMware Cloud Director platform in line with VMware's networking strategy. In on-premises sites and in cloud sites, you can now replicate workloads that use NSX-T networks. You can also use environments with mixed NSX-V and NSX-T networks, and you can replicate workloads from NSX-V to NSX-T, and from NSX-T to NSX-V. VMware Cloud Director Availability does not apply or translate the networking features that VMware Cloud Director and NSX-T do not support. VMware Cloud Director Availability replicates the source site NSX-V to NSX-T NAT-routed vApp networks and bridged vApp networks, and does not replicate the DHCP service.
Note: please refer to the VMware Product Interoperability Matrices (https://www.vmware.com/resources/compatibility/sim/interop_matrix.php) to confirm which versions of VCDA, VCD, and NSX-T are supported together.
Usage Meter Integration
VMware vCloud Usage Meter integration: There is an indication in the management interface that vCloud Usage Meter meters the Cloud Service instance. Any protected workload during the calendar month will count towards the Monthly Usage Units. It will be depicted in the Monthly Usage Report. When vCloud Usage Meter has not requested metering information for more than three days, you now see a warning message in the management interface. VMware Cloud Director Availability has supported automatic metering from Usage Meter since 3.6.1. Hot Patch 3.
In addition, another great capability added to VMware Cloud Director Availability 4.0 is -
Multi-NIC traffic conditioning enables configuring multiple nics leading to advanced network setups with higher security requirements and optimization of the replication data traffic. Providers can configure on-premise tunnel environments to send replication traffic from on-premise to cloud via nic1 (connected to public network) and go out from a tunnel to a cloud replicator via nic2 (connected to private network). Replicator traffic can be similarly conditioned to tunnel to replicator over nic1 (internal provider network) and replicator to esxi over nic2 (high-throughput replication data network).
VMware Cloud Director Availability 4.0 Configuration Limits
For the tested uptime, concurrency, and scale limits of VMware Cloud Director Availability 4.0 -
- Tenants with active protection paired to a cloud site - 500
- Active protections across tenants to a cloud site - 15000
- Protected virtual machine size - 15 TB The maximum protected VM size depends on the size of a single data store available in the cloud.
- Cloud Replicator Appliance instances per cloud site - 30
- Active protections per Cloud Replicator Appliance instance - 2000
- Concurrent Protect & Test Failover & Failover & Reverse protect operations from on-premises to a cloud site - 200
- Concurrent Protect & Test Failover & Failover operations from a cloud site to a cloud site - 200
- Concurrent Reverse protect operations from a cloud site to a cloud site - 200
Note: please refer to https://configmax.vmware.com for the latest configuration limits.
What's New in VMware vCloud Availability 3.5
Lets review new technical additions that are now part of the VMware vCloud Availability 3.5. VMware vCloud Availability 3.5, or vCAv, is a comprehensive Disaster Recovery as a Service and Migration platform. Here are the few features that were added in VMware vCloud Availability 3.5
- Bandwidth UI Visibility
- Automatic vCAv Metering by Usage Meter -We already reviewed this enhancement early in the module.
- Regional Data Center Support and Flexible Network Connectivity
- Enhanced Grouping and Protection Workflows
- Operational Enhancements
What's New in VMware vCloud Availability 3.5 - Bandwidth UI Visibility
- Within the Provider User Interface (UI), we can see traffic utilization on a per organization basis, but also on a per-VM/vApp basis.
- Moreover, these traffic statistics can be exported out to a tab-delimited file (TSV) for ease of consumption.
- Within the on-premises vSphere plugin, we can see similar data available on a per-org or workload basis.
What's New in VMware vCloud Availability 3.5 - Regional Data Center Support and Flexible Network Connectivity
Regional Data Center Support
One of the main requests was the ability to optimally route vCAv traffic. This is extremely important for multiple regional data centers inside of a single VCD instance. Lets start off with a diagram that lays out this scenario.
Previously, we could only deploy a single vCAv instance per VCD platform. In the diagram below, we have multiple Texas-based data centers where workloads and organizations reside.
If we had an on-premises customer that wanted to protect workloads to Austin, this would have to traverse to El Paso first (since the vCAv stack resides there), then route to Austin. Not an efficient use of resources. Moreover, this did not work when crossing multiple SSO domains.
What's New in VMware vCloud Availability 3.5 - Regional Data Center Support and Flexible Network Connectivity (contd..)
WITH VCAV 3.5
With 3.5, the logic and foundation has been changed to allow for multiple vCAv deployments per VCD instance. Moreover, we can optimally route traffic directly to Austin once control traffic traverses the VCD instance. Therefore, we now have a design like in the image.
Moreover, this allows for support multiple SSO domains.
What's New in VMware vCloud Availability 3.5 - Flexible Network Connectivity
Lets review this new enhancement for flexible northbound connections for Cloud Providers. It is very evident that Cloud Provider want to control replication (protection) traffic and consume across multiple northbound links Wide Area Networks (WAN), Virtual Private Network (VPN), Direct Connection, and so forth.
With vCAv 3.5, the platform introduces flexibility to allow for multiple endpoint inbound connections. Lets review the image.
From the image, we can see we have our traditional WAN connectivity that terminates to the Tunnel via a DNAT rule for inbound connections. Nothing different there.
However, we now can to route other types of northbound traffic. In the example above, we have a dedicated VPN connection. The traffic routes to the vCAv Tunnel which then terminates on the Cloud Replication Manager (CRM) instance.
Previously, this was not allowed. All traffic had to route through the external endpoint URL, which typically sat on the WAN.
The result is further flexibility for Cloud Providers that allows for optimal routing of vCAv traffic.
What's New in VMware vCloud Availability 3.5 - Enhanced Grouping and Protection Workflows
For on-premises to cloud replications, you can create a collection of virtual machines in a single container, managed and replicated as a single unit. You can specify the virtual machines boot order, boot delays, and protect or migrate them as a single vApp replication in the destination cloud site. During the protection workflow, one can select Advanced Settings and select specific VMDKs they want to protect or exclude
In the destination cloud site, the grouped multiple virtual machines are represented as a vApp replication. In this vApp, the virtual machines relations are:
- The boot order works from top to bottom.
- By default, there is no set boot delay. The start wait is measured as the time that passed after the boot of the previous virtual machine.
You can perform replication operations on the vApp or on a single virtual machine from the vApp.
Failing over one of the virtual machines from a vApp replication, in the destination site results in two vApp replications with the same name. One replication contains the failed over virtual machine and the other replication contains the remaining virtual machines that are not failed over.
What's New in VMware vCloud Availability 3.5 - vApp Network Properties
For on-premises to cloud, or cloud to cloud replications, you can set the target network settings of a vApp or virtual machine. vCloud Availability applies these network settings in the target cloud site, after a migration, failover, or a test failover.
- For the cloud to cloud replications, vCloud Availability replicates all the types of source vApp networks in the target cloud site: Isolated, bridged (direct) and fenced (NAT-routed) networks.vCloud Availability replicates the source networks settings like: IP pools, NAT routes, firewall rules, and DNS settings, in the target site.
- For the on-premises to cloud replications, vCloud Availability creates a new bridged vApp network in the target cloud site and you can configure the vApp network settings.
What's New in VMware vCloud Availability 3.5 - DATASTORE EVACUATION
We can now migrate protected instances to another storage policy (datastore) for operational maintenance activities. This allows our tenants to continue to meet their specified recovery point objectives (RPOs) while not disrupting any existing protected workload.
For a provider to execute a datastore evacuation, select Datastores on the left panel. From there, its easy as hitting the Execute button and selecting your destination Storage Policy.
VWhat's New in VMware vCloud Availability 3.5 - REPLICATOR MAINTENANCE MODE
vCAv can put a replicator in maintenance mode during any operational activities. Again, this minimizes any potential tenant/organization disruption. The Cloud Provider executes on these operations within the Replication Manager UI from the CRM instance.
- To execute on a replicator maintenance mode operation, select the specified replicator and click Enter Maintenance Mode from the UI.
- Upon completion, we can see my replicator in maintenance mode while all traffic is now residing on my remaining replicator instance for this specific site.
Once your maintenance operation has been completed, we can then exit Maintenance Mode.
VWhat's New in VMware vCloud Availability 3.5 - REBALANCING OF REPLICATIONS
To distribute the incoming replications evenly over all vCloud Availability Replication instances in the site, you can rebalance the replications.
vCloud Availability assigns all new replications to the vCloud Availability Replicator with the fewest number of replications in the site. After adding an extra vCloud Availability Replicator instance, vCloud Availability assigns all new replications to the new vCloud Availability Replicator instance. Replications that existed before adding the new vCloud Availability Replicator instance remain assigned to the previous vCloud Availability Replicator instances. The result is an unequal balance of the number of replications pervCloud Availability Replicator instance. You can see how many replications are assigned to each vCloud Availability Replicator instance and rebalance the replications. This operation migrates the replications from vCloud Availability Replicator instances with more replications tovCloud Availability Replicatori nstances with fewer replications.
This brings us to the end of reviewing new enhancements in VMware Cloud Director Availability 4.0 and 3.5.
Congratulations on completing Module 1!!