Secure Configuration Management Overview
Automation is a beautiful thing. It can reduce the overall workload for performing a task by allowing us to work smarter, not harder. When it comes to security, sometimes automation is required as a way to provide a high level of assurance that controls are being implemented or that they’re being monitored properly. Secure configuration baselines, such as the Red Hat STIG, are a prime example of this. Secure configuration baselines are comprised of dozens of specific configuration settings aimed at improving security. Doing this manually can be incredibly time-consuming, and it would be prone to error. That’s why organizations look for ways to work smarter, not harder in this arena. Where Windows hosts are concerned, Group Policy Objects can be used to push out standardized settings to like hosts, but tools in the *nix world that might achieve the same thing tend to be distribution focused. So, running RHEL and Ubuntu Server means having 2 different tools for the same purpose. Many organizations have already automated the process of maintaining secure configuration baselines, but even those organizations are facing challenges in managing this key process. For the last two years we’ve been working in an AWS-based environment that began using Ansible for build and deploy tasks, and seeing the power it provided, we’ve adopted it for this important security function. The rest of this article will cover some of the ways that Ansible can help you overcome the typical challenges associated with implementing and managing secure configuration baselines. We'll start by diving into some of the approaches for tackling the problem, which include:
- Manually applying the settings after install;
- Managing a library of machine image “gold disks;” or
- Using a tool to push the settings some time after the server is online and able to be managed.
After that, we cover how automated STIG compliance with Ansible is different.
Manual Implementation
At the most elementary level, these settings can be applied on a build-by-build basis in a manual fashion. These organizations are typically staffed by at least one sysadmin who is both talented and eschews tedious, manual tasks, and so there is some probability that the organization has a loose collection of shell scripts and ad hoc processes that get them into some degree of compliance with a secure configuration baseline.
Challenges
It sounds horrible and it is, but I have seen organizations manually walk through STIG-ing servers before. When going to do an audit of these organizations checking the servers for compliance with a baseline was equally horrid as you literally pulled out a copy of the baseline and told the sysadmin “now please show me /etc/fstab ...” or you walked them through running a shell script which would collect information and prompt with questions as necessary. This situation is as bad as you think it is or possibly worse. You can guarantee that there are servers missing settings either because they were never applied or because someone made a change to get things running. No one can be too sure what the level of compliance within the organization is because measuring that is usually not possible either.We won’t bother diving deeper into this scenario, but it should be assumed by everyone involved, that the servers in this environment are not hardened consistently and in full compliance with secure baselines.
Gold Disks
For many organizations, managing the implementation of secure configuration baselines means maintaining a "Gold Disk." Part of the usual IT processes at this organization include creating base images for the OSes that they use, which have secure configuration baselines applied to them, and making sure that these images are updated every 6 months or so. This can certainly expedite the deployment process in an environment where VMs are being deployed, or where imaging tools like Ghost are being used to apply images to bare-metal systems.
Challenges
However, there are a few problems. First, this requires that a library of disk images be kept. Storage is not as expensive as it once was, but keeping a library of sizable disk images is still not the most efficient use of space, nor is it highly portable in deployment scenarios involving disconnected networks or other unique circumstances. Second, the moment the image is created it begins to drift out of date with patches. In many organizations these images are updated only once every 6 months meaning that on average, systems are deployed with a few months of missing critical security updates. Also, one key issue is that these images are only good for building systems from scratch. They serve no purpose for systems already in production. This can be an issue in 2 different cases:
- When there is a periodic update to the configuration baseline, and the existing systems should have the deltas between version 1.0 and version 2.0 of the baseline configuration applied.
- When changes have been made to a host so that it is out of compliance and the deltas between the running config and the baseline need to be “trued up.”
The Gold Disk approach does not help here. In many cases, I believe these organizations simply end up running an environment with some systems built on v1 and others on v2, and in the cases of deviations, they try to remediate them manually or they use another type of tool (as we’ll see next).
Management Tool
Other organizations have implemented some sort of management tool to implement baselines. This might be something like BigFix or TripWire. These tools can be a great improvement over manual implementation, obviously, and they may also be an improvement over using the Gold Disk approach, although many organizations will use both in conjunction. They'll deploy from the Gold Disk images, and then use the tool to track compliance and enforce settings. This has the obvious advantage of addressing the possibility of "drift" in settings from the baseline. These tools typically can be used to simply measure compliance, or alternatively to enforce the settings. This would allow the organization to keep the systems in line with approved baseline settings on a continually basis.
Challenges
There are still a few issues here. First, it does not address the fact that the system will initially come online with several weeks or months of security updates missing. In addition, while these tools can provide the capability to enforce settings, organizations often times do not enable this capability. The reason for this is that in many cases, the organization views this as a potentially destructive operation. If settings have been manually changed from the approved baseline settings, it is possible that the team of sysadmins has made the change in order to get something working. So what then?Well, in most cases your security team will see the deviation on the next review of the results when looking at the tool, and they'll want to know why. They'll probably talk to the sysadmins who will hopefully provide an answer, and the setting can be flagged as an approved deviation. If the setting was changed arbitrarily, or maybe as an attempted fix, which ended up not being needed, then usually the setting will be changed back manually. In general, any operations team's prime directive is to keep things running, and therefore, there is a fear of leaving it to a tool to make automatic changes on the network.
Ansible
Some organizations, like the one we are supporting are using Ansible to implement secure configuration baselines. This allows the security team to manage the library of baselines in the same way that the operations team is managing the deployment process. Also, Ansible can provision an instance using the latest and greatest installation material, and the deployment playbook can include a task to apply whatever updates (if any) are available. In general, the result is that by the time the system is online and reachable via SSH, it is already running a mostly up-to-date OS, and as soon as it becomes accessible, the tasks to apply only recent updates and the secure configuration baseline can be applied.
The key point of this type of deployment is that there is a different approach used when there is a convergence of the security and operations teams around a single tool instead of each using their own tools. It may seem minor, but when the security team's "tool" for applying a baseline is the same as the tool the operations team is using to deploy the system and the task of applying the baseline occurs inline with setting up Apache, copying code on the server and so on, then the baseline is not an external entity that may steamroll a server configuration and destroy things. In fact, it can be thought of as an integral part of the as-built system.
The reason for this is that security has moved away from using a single-purpose tool like those mentioned above which are focused on assessing and potentially enforcing compliance. They have moved instead towards becoming an integral part of the deployment process. When the Ansible playbook for deploying a system or an update to a system has been completed, it can be run through build testing to ensure that the system works as intended, and that means functionally as well as in terms of security. Once tests pass and the system is put into operation, then any deviations from the security baseline would be treated similar to a deviation from the approved code baseline would- a potential risk to keeping the system running properly. In this environment, re-applying the baseline using Ansible will tend to be viewed as more constructive than destructive.
So, once you have a role defined which will apply the baseline, and you integrate that role (among others) into the playbook for deploying your systems, then hopefully you're keeping those playbooks under version control. Since the role for applying the baseline is a part of that, then the typical response to a deviation from "our approved codebase" is generally going to be different from a deviation from "the approved secure configuration baseline." Security automation in this manner is a goal, not simply because we're lazy, but because it's capable of greatly improving the security posture of systems.
Our goal has been to achieve this integration with the deployment process on the project we're working on, and through our partnership with Ansible developing secure configuration baseline roles we're trying to allow others to achieve this as well.