Setting up server to server trust relationships in Ansible

At my new(ish) job, I’ve taken on a dev ops role in addition to development. We’ve been working with the excellent configuration management tool Ansible to configure and maintain our servers.

One of the specific tasks I was working on was to setup backup and replication from our primary database server to a remote recovery server. This required setting up known_hosts and authorized keys on each server so they could talk to each other over ssh without a password in either direction.

Alot of the other documentation and tutorials on ansible didn’t really have a great way to do this. The most common approach I saw was people putting the private keys for servers into ansible-vault, which required pre-generating a key for each server locally, or using a shared key for all servers. I figured there had to be away to generate the private keys on the servers themselves, and then copy the public keys between them.

I did manage to create a solution I’m pretty happy with, though it required thinking about ansible a little differently than alot of the tutorials and documentation encourages. For the most part, ansible best practices seem to encourage to think about your configuration goals in terms of roles. Playbook are almost always shown as simple composition of roles, and rarely have a significant number of tasks over their own. This is fine when you’re thinking about configuration tasks that impact a single server in isolation, but doesn’t work so well when you’re working with tasks that have to touch multiple servers, like copying public keys between two boxes.

Fortunately, there are a number of other ways to organize ansibles work besides the typical example of 1 file-1 play-multiple roles. For starters, you can actually have multiple plays in a single yaml file. When you run ansible-playbook against a file like that, it will run each play successively. That allowed me to do something like this.

  • Play 1: Gather information about all the servers
  • Play 2: Generate ssh keys on the first group of servers and then fetch the public key for each
  • Play 3: Generate ssh keys on the second group of servers and then fetch the public key for each
  • Play 4: Add the fetched public keys to the first group of servers
  • Play 5: Add the fetched public keys to the second group of servers

This has a couple of key advantages over the other methods I’ve seen.

1. Each server get’s a unique private key, for better control of your infrastructure. Single servers can be removed from the trust relationship without affecting any others.

2. Each unique private key never leaves the server it was generated on. There’s no risk of leaking a key in source control, or storing on an insecure location.

We’ll take a look at each of those steps in isolation and then look at the playbook as a whole

Play 1

Play 1 is a simple dummy play that ensures that we gather information about all the servers involved first. That way we have everything we need for known_hosts entries. It targets an inventory named db, which should have every database involved in the trust relationships you are trying to setup. The only task it has will never run.

- hosts: db
  name: gather facts about all dbs
  tasks:
    - fail: msg=""
      when: false

Play 2

Play 2 is where I create the keys for all our replica database servers. We create the keys simply through a standard user task. The magic though, is that we subsequently use the fetch module to download a copy of the public key and store it in a temporary directory. This will be important later.

The second important piece is that we add the primary as a known_host. There are two steps to this. First we use ssh-keygen to check if the primary is already a known host. If it’s not then we use ssh-keyscan to read the key and add it to the known_hosts file. In my case, we’re dealing with a single server representing the primary, so I can store that server in the variable postgresql_primary. However, if you’re working with multiple machines with dynamic ip addresses, you can also use a loop with the built in groups dictionary to loop through a group with all the servers that need to be added as a known host.

- name: create keys for replicas
  hosts: replicas
  user: "{{ config_user }}"
  sudo: yes
  sudo_user: postgres

  tasks:
    - name: Generating postgres user and ssh key
      user: name=postgres group=postgres generate_ssh_key=yes
      sudo: yes
      sudo_user: root

    - name: Downloading pub key
      fetch: src=~/.ssh/id_rsa.pub dest=/tmp/pub-keys/replicas/{{ansible_hostname}}/id_rsa.tmp flat=yes
      changed_when: False

    - name: check if primary is already a known host
      shell: ssh-keygen -H -F {{ postgresql_primary }}
      register: know_host
      ignore_errors: true
      changed_when: False

    - name: make sure the ssh folder exists
      file: name=~/.ssh state=directory

    - name: make sure known_hosts exists
      file: name=~/.ssh/known_hosts state=touch

    - name: Make primary a known host
      shell: ssh-keyscan -H {{ postgresql_primary }} >>; ~/.ssh/known_hosts
      when: know_host|failed

Play 3

Play 3 is almost identical, but now we’re creating keys for our primary database server. The process is the same though, create the keys. fetch them to a temporary folder, and the other servers as known hosts.

- name: create keys for primary
  hosts: primary-db
  user: "{{config_user}}"
  sudo: yes
  sudo_user: postgres

  tasks:
  - name: Generating postgres user and ssh key
    user: name=postgres group=postgres generate_ssh_key=yes
    sudo: yes
    sudo_user: root

  - name: Downloading pub key
    fetch: src=~/.ssh/id_rsa.pub dest=/tmp/pub-keys/primary/{{ansible_hostname}}/id_rsa.tmp flat=yes
    changed_when: False

  - name: check if backup is already a known host
    shell: ssh-keygen -H -F {{ postgresql_backup }}
    register: know_host
    ignore_errors: true
    changed_when: False

  - name: make sure the ssh folder exists
    file: name=~/.ssh state=directory

  - name: make sure known_hosts exists
    file: name=~/.ssh/known_hosts state=touch

  - name: Make backup a known host
    shell: ssh-keyscan -H {{ postgresql_backup }} >> ~/.ssh/known_hosts
    when: know_host|failed

Play 4

Play 4 is where we start to use those saved public keys. On the primary database we make sure our trusted user exists and then we loop through the local temporary directory and add all those keys as authorized authorized keys for that user. Then we delete the local copies of the public keys.

- name: add keys to primary
  hosts: primary-db
  user: "{{config_user}}"
  sudo: yes
  sudo_user: root

  tasks:
    - name: Create postgres user
      user: name=postgres group=postgres

    - name: add keys
      authorized_key: user=postgres key="{{ lookup('file', 'item') }}"
      with_fileglob:
       - /tmp/pub-keys/replicas/*/id_rsa.tmp

    - name: Deleting public key files
      local_action: file path=/tmp/pub-keys/replicas state=absent
      changed_when: False
      sudo: no

Play 5

Play 5 is just the reverse of play 4. It copies the other set of keys from your local folders, up to the server as authorized keys

- name: add keys to backup
  hosts: backup-dbs
  user: "{{config_user}}"
  sudo: yes
  sudo_user: root

  tasks:
    - name: ensure replication user
      user: name=replication group=replication

    - name: add keys
      authorized_key: user=replication key="{{ lookup('file', 'item') }}"
      with_fileglob:
        - /tmp/pub-keys/primary/*/id_rsa.tmp

    - name: Deleting public key files
      local_action: file path=/tmp/pub-keys/primary state=absent
      changed_when: False
      sudo: no

Wrapping up

Here’s the full playbook, written out

- hosts: db
  name: gather facts about all dbs
  tasks:
    - fail: msg=""
      when: false

- name: create keys for replicas
  hosts: replicas
  user: "{{ config_user }}"
  sudo: yes
  sudo_user: postgres

  tasks:
    - name: Generating postgres user and ssh key
      user: name=postgres group=postgres generate_ssh_key=yes
      sudo: yes
      sudo_user: root

    - name: Downloading pub key
      fetch: src=~/.ssh/id_rsa.pub dest=/tmp/pub-keys/replicas/{{ansible_hostname}}/id_rsa.tmp flat=yes
      changed_when: False

    - name: check if primary is already a known host
      shell: ssh-keygen -H -F {{ postgresql_primary }}
      register: know_host
      ignore_errors: true
      changed_when: False

    - name: make sure the ssh folder exists
      file: name=~/.ssh state=directory

    - name: make sure known_hosts exists
      file: name=~/.ssh/known_hosts state=touch

    - name: Make primary a known host
      shell: ssh-keyscan -H {{ postgresql_primary }} >>; ~/.ssh/known_hosts
      when: know_host|failed

- name: create keys for primary
  hosts: primary-db
  user: "{{config_user}}"
  sudo: yes
  sudo_user: postgres

  tasks:
  - name: Generating postgres user and ssh key
    user: name=postgres group=postgres generate_ssh_key=yes
    sudo: yes
    sudo_user: root

  - name: Downloading pub key
    fetch: src=~/.ssh/id_rsa.pub dest=/tmp/pub-keys/primary/{{ansible_hostname}}/id_rsa.tmp flat=yes
    changed_when: False

  - name: check if backup is already a known host
    shell: ssh-keygen -H -F {{ postgresql_backup }}
    register: know_host
    ignore_errors: true
    changed_when: False

  - name: make sure the ssh folder exists
    file: name=~/.ssh state=directory

  - name: make sure known_hosts exists
    file: name=~/.ssh/known_hosts state=touch

  - name: Make backup a known host
    shell: ssh-keyscan -H {{ postgresql_backup }} >> ~/.ssh/known_hosts
    when: know_host|failed

- name: add keys to primary
  hosts: primary-db
  user: "{{config_user}}"
  sudo: yes
  sudo_user: root

  tasks:
    - name: Create postgres user
      user: name=postgres group=postgres

    - name: add keys
      authorized_key: user=postgres key="{{ lookup('file', 'item') }}"
      with_fileglob:
       - /tmp/pub-keys/replicas/*/id_rsa.tmp

    - name: Deleting public key files
      local_action: file path=/tmp/pub-keys/replicas state=absent
      changed_when: False
      sudo: no

- name: add keys to backup
  hosts: backup-dbs
  user: "{{config_user}}"
  sudo: yes
  sudo_user: root

  tasks:
    - name: ensure replication user
      user: name=replication group=replication

    - name: add keys
      authorized_key: user=replication key="{{ lookup('file', 'item') }}"
      with_fileglob:
        - /tmp/pub-keys/primary/*/id_rsa.tmp

    - name: Deleting public key files
      local_action: file path=/tmp/pub-keys/primary state=absent
      changed_when: False
      sudo: no

Save this in a yml file and you can run it with ansible-playbook. You can also include it in other playbooks where you need to ensure both groups of servers can ssh between each other before your configuration really begins.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s