Ansible collection to deploy the components of TDP
hadoop: deploys the Hadoop TDP Release (HDFS + YARN + MapReduce)hbase: deploys the HBase TDP Release (HBase Master + HBase RegionServer), Phoenix and Phoenix Query Serverhive: deploys the Hive TDP Release (Hiveserver2 + Tez)knox: deploys the Knox TDP Release (Knox Gateway)ranger: deploys the Ranger TDP Release (Ranger Admin + Ranger plugins)spark: deploys the Spark TDP Release (Spark Client + Spark History Server)zookeeper: deploys the Apache ZooKeeper Release
The best to get started with TDP and the Ansible roles is to go through the Getting Started repository.
Ansible 2.9 does not handle installing a collection from a Git repository with ansible-galaxy. Instead, clone the repository in the correct folder.
For example, set the property collections_paths in your ansible.cfg:
[defaults]
collections_paths=collectionsThen create the folders structures and clone:
mkdir -p collections/ansible_collections/tosit
git clone https://github.com/TOSIT-FR/ansible-tdp-roles collections/ansible_collections/tosit/tdp
The project structure should look like this:
.
├── ansible.cfg
├── collections
│ └── ansible_collections
│ └── tosit
│ └── tdp
│ ├── galaxy.yml
│ ├── README.md
│ └── roles
│ ├── hadoop
│ ├── hive
│ ├── ranger
│ ├── spark
│ ├── ...
│ └── zookeeper
├── roles
├── test.yml
Note that the first role folder is not the roles from this collection, but any other roles the project has. The collections folder has been set in ansible.cfg.
The collection is compatible with Mitogen 0.2.
In order to activate Mitogen, follow the Mitogen installation guide.
Note: We use custom plugins which are incompatible with Mitogen. For this reason, we added strategy: linear in some of our playbooks (e.g.: hbase_hdfs_init.yml) to avoid any issues with Mitogen configured Ansible environments.
Using ansible-galaxy: TBD
hdfs_filemodule: file and directory handling in HDFS
Example usage:
- name: Add directory for spark logs
delegate_to: "{{ groups['hdfs_nn'][0] }}"
tosit.tdp.hdfs_file:
hdfs_conf: "{{ hadoop_conf_dir }}"
path: "{{ item.path }}"
state: "{{ item.state | default(omit) }}"
owner: "{{ item.owner | default(omit) }}"
group: "{{ item.group | default(omit) }}"
mode: "{{ item.mode | default(omit) }}"
become: yes
become_user: "{{ hdfs_user }}"
loop:
- path: /spark3-logs
state: directory
owner: "{{ spark_user }}"
group: "{{ hadoop_group }}"
mode: '777'access_fqdnfilter plugin: returnsaccess_fqdn, oraccess_sn+domain, orinventory_hostname+domain(checking if variables exist for the host in this order)
Example usage:
- debug:
msg: "{{ groups[hdfs_nn][0] | access_fqdn(hostvars) }}"
- debug:
msg: "{{ groups['hdfs_jn'] | map('access_fqdn', hostvars) | list }}"The best way to use the roles from the collection is to call the related file from the playbooks directory inside another playbook.
Examples:
- name: Deploy ZooKeeper
ansible.builtin.import_playbook: ansible_roles/collections/ansible_collections/tosit/tdp/playbooks/zookeeper.yml
- name: Deploy Hadoop
ansible.builtin.import_playbook: ansible_roles/collections/ansible_collections/tosit/tdp/playbooks/hadoop.yml
- name: Deploy Hive
ansible.builtin.import_playbook: ansible_roles/collections/ansible_collections/tosit/tdp/playbooks/hive.yml
- Python >= 3.6 with virtual env package (i.e.
python3-venv)
Please follow the guidelines at contributing and respect the code of conduct.