During provisioning of StorSimple Cloud Appliance (SCA) the user gets generic error “current operation failed due to an internal service error”
Common causes of failure and Troubleshooting Steps
1 Insufficient number of available hosted services or virtual CPU cores within the subscription.
Depending on the thresholds applied to a subscription type, a certain number of total hosted service and number of cores are allowed. If the hosted service or core count limit exceeds the thresholds on the subscription, then SCA provisioning can fail with the above error. It is recommended that the user retry SCA provisioning after increasing the thresholds or by deleting unused services or virtual machines to free up compute cores. The procedure to request increase the limits is explained here.
2. Changes to Subscription or Account settings.
During SCA provisioning, any change to the Subscription or Storage Account settings can cause SCA provisioning failure. A retry should succeed If there was a recent change in the Azure Subscription type or with Storage Account settings. If SCA provisioning fails after a retry, then the issue is expected more with network connectivity component or VNET configuration used for this machine.
3. Failure to obtain and set a static IP address.
IP address generation and configuring it as a Static IP are required steps for SCA provisioning. If the generation of an IP from the selected vNet fails, then a message asking the customer to choose a different vNet will appear. However, if the IP is available but unable to configure the IP as static, an "internal service error" is reported. The operational logs reflect this case by posting a message “IP address is invalid” after a “creating a VM message". This is caused by a race condition and a retry is expected to succeed in provisioning a SCA.
4. Failure due to Azure IaaS availablity and networking issues.
The SCA provisioning depends on availability of IaaS service during VM instantiation. Sometimes, IaaS layer runs into host clustering issues at the infrastructure layer and StorSimple Service times out after waiting for 30 mins. The provisioning job would be at 30% with a failure message "Creation of virtual device failed due to an internal service error". A retry will succeed assuming the underlying Azure Data Center issues are fixed. If the job progress to 60% before failure message "Preparation of virtual device failed due to an internal service error", one of the VM initialization steps failed (creation of 4TB disk pool/journal/data or meta data, users or randomize password failed). If the error message is "or "Virtual device provisioning timed out waiting for registration", the device registration call was not received by the watchdog service (a service that polls management service, data service etc.).
There are mainly two reasons for failure during the VM preparation stage,
i) The device registration key expired. In this case, the user is advised to regenerate the keys and retry SCA provisioning if this is the case.
ii) DNS configuration on vNet blocking internet connectivity. This is common in cases where the customer has express route configured on the vnet associated with the SCA. The below steps help with validating the connectivity issues,
1. Create a Windows server 2012 VM using same configuration (same storage account, vnet and subnet)
2. Remote login to the VM using creds given during VM creation
3. Open cmd inside VM
4. Run “nslookup windows.net”
5. If nslookup fails, then internet connectivity failures is preventing the SCA registration to StorSimple service. The customer vnets in azure have no route direct out to the internet via express route network without attaching to a proxy server. Advice the customer to work with their ISP to add SCA outbound traffic to access the firewall and allow the traffic to the internet. With direct internet connectivity allowed on the Azure vnets via express route, the SCA provisioning should work.
6. if nslookup passes, then it unlikely a network connectivity issue with the vnet, the support team is advised to get the vhd for further investigation.