Ansible: Eliminating Compatibility Gaps

A few weeks ago, I posted a piece on a similar compatibility issue, where the issue was functional impairment. Now, I have to solve the backward compatibility problem when the module exists in both environments but is missing some operators in earlier releases.

The story started when the json_query module was moved from the Ansible Core to community collections, and a few of the most critical roles stopped working altogether. Let's take a look at the domain control role that performs steps as follows:

  • Identifies which WebLogic Machine this instance belongs to and selects NodeManager details from the inventory.
  • Runs only WebLogic domain components that are registered on this machine.

For example, the inventory variable looks similar to the data structure below:

domains:
   sample_domain:
      machines:
         machine1: 
            host:  "wlshost1.domain.com"
            port:  5050
            tls:   yes
         machine2:
            host:  "wlshost2.domain.com"
            port:  5050
            tls:   yes
            
      servers:
      

Sample Data Object

To find the machine record that matches the current instance with core filters only:

- name: Identify the Local Node Manager 
  set_fact:
    machine: "{{ domains[dname].machines|dict2items|
             selectattr('value.host','contains',inventory_hostname)|
             first }}"
  vars:
    dname: "sample_domain"

Filter dictionary by the current inventory hostname.

The Chain of filters in the task set_fact is:

  • Select the 'machines' list from the domain 'sample_domain'
  • Convert the dictionary into a list of items:
    [{ "key": "machine1","value": { ... },{ ... }]
  • Filter items by the attribute "value.host" if it contains a hostname matching the current instance name - inventory_hostname.
  • Return the first entry only - turn the list into an object.

That would work for Ansible 2.9 and Ansible 2.10+, but here's the deal: some of the domains use the hostname "localhost" for a machine host. The reason is preventing access to the NodeManager from the network. For such cases, the role fails because the attribute value and the current hostname do not match.

The new task uses the Jinja2 operator 'in' to test the attribute value against the list. Here is the version that works on Ansible 2.11+ without a fault:

- name: Identify the Local Node Manager 
  set_fact:
    machine: "{{ domains[dname].machines|dict2items|
             selectattr('value.host','in',['localhost',
             inventory_hostname)|
             first }}"
  vars:
    dname: "sample_domain"

Multi-value Test Operator

In this version, selectattr tests the attribute value against multiple values and picks the host and port that match the current node. The value 'localhost' is a wildcard since it matches any host. Unfortunately, my old controllers failed to run this task because the 'in' operator is not supported by them.

The final and universal solution is to run a regex test against the attribute value. This operator has been available since the dawn of time. So the final equation is

- name: Identify the Local Node Manager 
  set_fact:
    machine: "{{ domains[dname].machines|dict2items|
             selectattr('value.host','search','localhost$|'+
             inventory_hostname+'$')|
             first }}"
  vars:
    dname: "sample_domain"

Test the Attribute Value Against Complex Regexp

The search operator tests if the attribute value ends with 'localhost' or (the pipe sign) if it ends with the current hostname. Now, old and new Ansible engines successfully run the same code.

Today takeaways:

  • Upgrade your infrastructure to keep up with all the new features and capabilities, but be mindful of the outdated systems where new features and capabilities invariably backfire on you.
  • In the feature-rich system, you can always find a middle-ground solution that works on all systems, old or new.
  • Yoc could spend six years developing code for Ansible and still learn new tricks almost every day.