Unsolved
4 Posts
0
1173
December 22nd, 2022 13:00
VxRail Ansible Module, Cluster Expansion questions
Thanks for any help in advance.
I am trying to utilize the Ansible Module to do a cluster expansion of a node recently removed and RASRed and running into some trouble. Essentially, I am adding the node via the example specified in the readme for the module, and then like in the sample looping through looking for the job. It appears to fail at this step. However, when running the add again it states that a "There is another expansion request in progress". So, one theory is that it did some work but is sitting there waiting for some other input. With that said, I never saw a task or event run in VCenter for the ID we are using with Ansible to automation this. But as I dig deeper, I also noticed that there is a discovered node of the same name (I did not add it to the cluster as I wanted to test this out with the VxRail modules). So, I am wondering if I missed a step. Additionally, I have a question on the cluster expansion itself. since none of its variable show the cluster name it should join in VSphere, how does it get that? Is that beyond the scope of the Module, and if so, what are the steps that I should take to add a recently RASRed node into a VxRail environment and into the appropriate DataCenter and Cluster in VSphere?
I mostly ask this as I believe the discovery looks the same as before we started so I am not sure what the module did. But also, with it saying it is still adding the node but not seeing any activity in Vcenter I am wondering what exactly it did or is doing.
BTW it has been almost 24 hours later, and the node is still not there so it is not a timing thing.
Shortened version of the questions:
1. How to I see what the module is doing behind the scenes (What log, etc... do I look at)?
2. Since the Cluster expansion module does not contain VMware Datacenter or Cluster info, how does it determine where it goes?
3. Since it is saying it "There is another expansion request in progress." how do I determine what it is doing (practically the same as question 1).
The VxRail Modules installed are 1.4.0 and the VXM version is 7.0.401
Thanks again for any help
Craig__
1 Message
0
January 4th, 2023 05:00
Hi LokiX,
Thanks for reaching out with your questions.
Cluster Expansion Module will firstly validate the added node compatibility and if validation passes, then will do the cluster expansion.
Details on the execution of module dellemc_vxrail_cluster_expansion.py can be checked in the logs /tmp/vxrail_ansible_cluster_expansion.log. The location of each modules log file is detailed in our documentation.
The Cluster Expansion Module calls the VxRail public api /v1/cluster/expansion to add a node, which doesn’t contain VMware Datacenter or Cluster info parameter. This API requires the user to specify the vxmip, which is done in the playbook that calls the public api. The node will be added to the cluster managed by the specified VxM.
After executing the cluster expansion playbook called addnode.yml, you can navigate to the logs at /tmp/vxrail_ansible_cluster_expansion.log to review details on cluster expansion job.
Ansible has a strange unchangeable feature where ongoing async monitoring functions print [FAILED – RETRYING] for each iteration that something is monitored. For example, waiting for the node-add to complete will likely show a lot of this in the log:
FAILED – RETRYING: Check if cluster expansion job is completed...The node addition is still ongoing (99 Retrys Left).
FAILED – RETRYING: Check if cluster expansion job is completed...The node addition is still ongoing (98 Retrys Left).
FAILED – RETRYING: Check if cluster expansion job is completed...The node addition is still ongoing (97 Retrys Left).
FAILED – RETRYING: Check if cluster expansion job is completed...The node addition is still ongoing (96 Retrys Left).
…
Even though it appears to say FAIL, it was only the success condition of the loop failing, and the operation is simply still ongoing.
For expansions of nodes on VxRails, you have to run a cancel job on a failed expansion before attempting to start over. We have a module for this: dellemc_vxrail_cluster_expansion_cancel but it is part of the 1.5.0 release scheduled for this month.
I hope this helps. If you have any further questions please let me know.
Regards,
Craig.
LokiX
4 Posts
0
January 17th, 2023 12:00
Thanks again Craig__,
Our hope is that we have it figured out. But we are still getting that "cluster expansion is already in progress" error. We are on 1.4.0 which of course does not have the expansion cancel module. Also, our API is at 7.0.401, not 7.0.410. Any idea if 1.5.0 will work with 7.0.401? If not is there a way to clear the previously running expansion outside of the module until we can upgrade the API and module?
LokiX
4 Posts
0
January 18th, 2023 14:00
By calling the API we were able to actually cancel the expansion directly and try again. Interestingly, now it appears to run through an expansion validation, and then in the loop where it is checking the expansion 10-15 minutes later it comes back with the error of:
"response_error": [
"There are not enough disks to form a vSAN disk group on host XZXXZ89"
]
We actually have enough disks, and this same task has worked multiple times in the past in the GUI. Any known issues with this not working and saying there are not enough VSAN disks, when there actually are?
Thanks in advance.
LokiX
4 Posts
0
January 20th, 2023 07:00
Can confirm that this process does not have the same issue when manually adding a node in the GUI as the VSAN disks needed are there. Trying to figure out why the Module is behaving differently. Any advice would be greatly appreciated.