Container-Optimized OS: A Pragmatic Approach to Running Containers on Google Cloud
As cloud computing continues to evolve, so does the need for efficient, secure, and scalable ways to run containerized applications. In this essay, we’ll delve into what makes COS stand out, its security advantages—particularly root filesystem immutability, seamless updates, and how it can be smartly utilized with regional managed instances for scaling. We’ll also explore the role of startup scripts and other practical considerations for engineering teams.
A Brief Overview of Container-Optimized OS
Container-Optimized OS is a lightweight, secure operating system image designed by Google specifically for running containers on GCP. Based on the open-source Chromium OS project, COS is tailored to offer a minimal footprint, reducing potential attack surfaces and simplifying maintenance. It comes pre-installed with essential tools like Docker and containerd
, enabling teams to deploy containers out of the box without additional setup.
One of the key benefits of COS is its tight integration with GCP services. It’s the default node OS for Kubernetes Engine (GKE) and is optimized for Google’s infrastructure, providing automatic updates and security patches directly from Google. For teams already invested in GCP, COS offers an easy path to deploying and managing containerized workloads efficiently.
Security Review: RootFS Immutability and Easy Updates
Security is often a paramount concern when running applications in the cloud. COS addresses this head-on with several built-in features, notably root filesystem immutability and automatic updates.
Root Filesystem Immutability
The root filesystem in COS is mounted as read-only and is immutable. This design choice significantly enhances security by preventing unauthorized or accidental modifications to the core operating system files. The kernel computes checksums of the root filesystem at build time and verifies them on each boot, ensuring the integrity of the system. This approach minimizes the risk of persistent attacks, as any changes to the filesystem do not persist across reboots.
Here, you could leverage
Shield VMs
for additional security. Read more about the importance of that policy here
Furthermore, COS employs a stateless configuration for directories like /etc/
, which are writable but do not retain changes after a reboot. This means that every time a COS instance restarts, it starts from a clean state, reducing the chances of configuration drift and ensuring consistency across instances.
If you wish to decrease the overall deployment complexity, anything that you deem should be persisted should go to a bucket, which can be mounted to
/mnt/disks/
.
Automatic and Seamless Updates
COS is configured to automatically download weekly updates in the background. These updates include security patches and performance improvements, which are applied upon reboot. The automatic update mechanism is designed to be non-intrusive, allowing workloads to continue running without interruption until a reboot is scheduled.
For organizations, this means less overhead in managing updates and patches. Since the updates are provided and maintained by Google, teams can rely on timely patches for vulnerabilities without manual intervention. This is particularly advantageous in large-scale deployments where manually updating each instance would be impractical.
This can be configured with a metadata tag
cos-update-strategy=update_enabled|update_disabled
to control the update behavior.
Scalable use case with Regional Managed Instances
While Kubernetes is often the go-to solution for scaling containerized applications, there are scenarios where a simpler setup might suffice. COS can be effectively used with regional managed instance groups (MIGs) to handle scaling without the complexity of Kubernetes.
Scaling with Managed Instance Groups
Managed Instance Groups allow you to deploy a group of identical instances that you can control as a single entity. By using COS as the base image for instances in a MIG, you can leverage its fast boot times and security features while managing scaling policies at the instance group level.
For example, if you have a stateless application packaged in a container that doesn’t require the orchestration features of Kubernetes, deploying it on a COS-based MIG can simplify your architecture. You can set up autoscaling policies based on CPU usage, HTTP load balancing, or custom metrics, allowing your application to scale out and in based on demand.
Regional Distribution for High Availability
By deploying your MIG across multiple zones within a region, you enhance the availability of your application. COS’s quick startup times mean that new instances can be brought online rapidly in response to scaling events or in the case of zone failures.
This approach provides a balance between simplicity and scalability. You get the benefits of automated scaling and high availability without the overhead of managing a full Kubernetes cluster.
Example of a Regional MIG
|
|
Startup Scripts Explained
Startup scripts are a powerful feature in GCP that allow you to run commands when an instance boots. With COS’s immutable filesystem and stateless design, startup scripts become essential for configuring instances at runtime.
Using Startup Scripts with COS
Since you cannot install software packages directly onto the COS instance due to the lack of a package manager and the immutable root filesystem, startup scripts are used to set up the necessary environment for your containers.
For instance, you might use a startup script to:
- Pull the latest version of your container image from a registry.
- Configure environment variables or secrets needed by your application.
- Set up system configurations that are required at runtime.
Example of a Startup Script
Here’s a simplified example of a startup script that pulls a container image from Artifact Registry and runs it:
|
|
This script configures Docker to authenticate with Artifact Registry and then runs your container image. By including this script in the instance metadata, you ensure that every instance in your MIG starts with the correct configuration.
Persistent Configuration with cloud-init
COS supports cloud-init
, allowing you to define your startup scripts in a cloud-config format. This is particularly useful for more complex configurations or when you need to write files, define systemd
services, or perform other initialization tasks.
Better way to define a Startup Script
|
|
Two important things to note here:
- The script fetches the
ENVIRONMENT
variable from the instance metadata, which can be set when creating the instance. - Execution of the
cloud-sdk
container will, by default, inherit the instance’s service account, allowing it to access Secrets Manager.
Additional Considerations
Beyond the core features, there are several other aspects of COS that are worth understanding.
Monitoring and Logging with Node Problem Detector
COS includes the Node Problem Detector (NPD) agent, which monitors the system’s health and reports metrics to Cloud Monitoring. NPD can help you detect issues like disk pressure, memory leaks, or kernel problems. While NPD doesn’t monitor individual containers, it provides valuable insights into the underlying VM’s health, which can be critical for diagnosing issues in production environments.
Securing Containers with AppArmor
Security profiles are essential for enforcing the least privilege and preventing containers from performing unauthorized actions. COS supports AppArmor
, a Linux kernel security module that restricts the capabilities of processes. You can apply default Docker AppArmor
profiles or define custom profiles to tailor the security settings for your containers.
For example, you might create a custom AppArmor
profile that prevents a container from accessing raw network sockets, enhancing security for sensitive applications.
Immutable Infrastructure and Deployment Strategies
COS’s design aligns well with immutable infrastructure principles. Since the root filesystem is read-only and instances start fresh on each boot, you can be confident that the environment is consistent across deployments. This reduces the “it works on my machine” problem and simplifies troubleshooting.
For deployment strategies, this means you can adopt patterns like blue-green deployments or rolling updates with greater confidence. By updating your container image and redeploying instances, you ensure that all instances are running the same code in the same environment.
Things do go south on occasion, and at such a moment there is a GCP toolbox to help you debug. But my personal experience was that it’s still not production grade nor really helpful in most cases.
When to Choose Container-Optimized OS
While COS offers many benefits, it’s important to assess whether it’s the right choice for your specific needs.
Ideal Use Cases
- K8S: You would like to avoid introduction of Kubernetes, but still want to run containers in a scalable and secure manner.
- Containerized Applications: If your workloads are already containerized and you don’t require additional software installations on the host OS.
- Security-Conscious Deployments: Environments where security is a top priority, and the immutable filesystem and automatic updates are beneficial.
- Simplified Management: Teams that prefer a managed OS experience with minimal maintenance overhead.
Limitations to Consider
- No Package Manager: You cannot install additional software directly on the OS, which may be a limitation if your containers depend on host-level software.
- Limited Customization: The locked-down nature of COS means less flexibility in modifying the OS environment.
- Not Suitable for Non-Containerized Workloads: If your applications are not containerized, COS is not the appropriate choice.
Example of a COS Instance
This one can easily be turned into a Terraform module.
|
|
⚠️ Do not forget to update
access_config {}
|
|
And if you structure it in that manner, then it will be a breeze to change startup scripts, or metadata depending on the instance:
|
|
Conclusion
Container-Optimized OS presents a robust, secure, and efficient platform for running containerized applications on Google Cloud. Its design principles of immutability, minimalism, and tight integration with GCP services make it a strong candidate for teams looking to simplify their infrastructure and focus on delivering value through their applications.
As with any technology choice, it’s crucial to evaluate how COS aligns with your application’s requirements and your team’s expertise. For many, it provides a pragmatic solution that balances security, performance, and simplicity—key considerations in today’s fast-paced engineering landscape (rarely do I meet a team that’s not on some Rapid Development cycle).