Devops

Terraform Data Sources in Detail: Practical Patterns and Gotchas

April 7, 2026
Published
#AWS#Cloud#DevOps#Infrastructure as Code#Terraform

Most developers pick up Terraform by creating resources. You define an aws_instance, apply it, and move on. But things get interesting when your infrastructure isn’t fully owned by your Terraform code.

That’s where Terraform data sources come in. They let you query existing infrastructure instead of creating it. Think of them as read-only views into your cloud environment.

First, a quick mental model

If a resource says “create this”, a data source says “find me this”.

Here’s a simple example fetching an existing AWS VPC:

JSON
1data "aws_vpc" "main" {
2  filter {
3    name   = "tag:Name"
4    values = ["prod-vpc"]
5  }
6}

No infrastructure is created here. Terraform simply looks up a VPC matching the filter and exposes its attributes.

Why data sources matter in real projects

In isolation, Terraform is clean. In reality, teams share infrastructure, migrate systems, and integrate across environments.

Common scenarios where data sources shine:

  • Referencing existing VPCs, subnets, or security groups
  • Pulling AMI IDs dynamically
  • Reading outputs from another Terraform state
  • Integrating with manually created resources

Without data sources, you’d either hardcode values or duplicate infrastructure—both bad options.

Example: Dynamic AMI lookup

Hardcoding AMI IDs is a classic mistake. They change frequently, and your deployment becomes brittle.

Instead:

TEXT
1data "aws_ami" "latest_amazon_linux" {
2  most_recent = true
3
4  owners = ["amazon"]
5
6  filter {
7    name   = "name"
8    values = ["amzn2-ami-hvm-*-x86_64-gp2"]
9  }
10}

Now use it in a resource:

TEXT
1resource "aws_instance" "app" {
2  ami           = data.aws_ami.latest_amazon_linux.id
3  instance_type = "t3.micro"
4}

This keeps your infrastructure automatically up to date without manual intervention.

Data sources vs resources (where people get confused)

A common mistake developers make is assuming data sources “import” infrastructure into Terraform management. They don’t.

Let’s be clear:

  • Resource: Terraform manages lifecycle (create, update, destroy)
  • Data source: Terraform only reads information

If you delete a resource block, Terraform destroys it. If you delete a data source block, nothing happens to the actual infrastructure.

Chaining data sources with resources

Things get powerful when you combine both.

Example: find a subnet and launch an instance inside it:

JSON
1data "aws_subnet" "selected" {
2  filter {
3    name   = "tag:Name"
4    values = ["public-subnet-1"]
5  }
6}
7
8resource "aws_instance" "web" {
9  ami           = data.aws_ami.latest_amazon_linux.id
10  instance_type = "t3.micro"
11  subnet_id     = data.aws_subnet.selected.id
12}

This pattern is extremely common in production systems.

Working with remote state

Here’s where things get interesting.

You can use data sources to read outputs from another Terraform project:

TEXT
1data "terraform_remote_state" "network" {
2  backend = "s3"
3
4  config = {
5    bucket = "my-terraform-state"
6    key    = "network/terraform.tfstate"
7    region = "us-east-1"
8  }
9}

Then reference outputs:

TEXT
1subnet_id = data.terraform_remote_state.network.outputs.public_subnet_id

This enables modular infrastructure where different teams manage separate stacks.

Filtering strategies that actually work

Filters are where most data source bugs come from.

A few practical tips:

  • Prefer tag-based filters over names
  • Ensure filters return exactly one result (or explicitly handle multiples)
  • Use most_recent = true carefully—it can introduce drift

If your filter matches multiple resources unexpectedly, Terraform will fail with a vague error.

Performance considerations

Every data source call is an API request. In large configurations, this adds up.

Watch out for:

  • Repeated identical data sources (cache manually using locals)
  • Over-fetching data you don’t use
  • Complex filters hitting slow APIs

A small optimization:

TEXT
1locals {
2  vpc_id = data.aws_vpc.main.id
3}

Use local values instead of repeatedly referencing the data source.

Gotchas you’ll probably hit

1. Data source not found

If Terraform can’t find a match, it fails the plan. There’s no “optional” by default.

2. Timing issues

Data sources run during planning. If you’re trying to read something created in the same apply, it may not exist yet.

Solution: reference the resource directly instead of using a data source.

3. Implicit dependencies

Terraform usually infers dependencies, but sometimes you need to be explicit:

TEXT
1data "aws_subnet" "example" {
2  depends_on = [aws_vpc.main]
3
4  filter {
5    name   = "vpc-id"
6    values = [aws_vpc.main.id]
7  }
8}

When NOT to use data sources

Data sources are powerful, but not always the right tool.

Avoid them when:

  • You fully control the infrastructure → use resources instead
  • You need lifecycle management
  • You’re introducing unnecessary external dependencies

Overusing data sources can make your configuration harder to reason about.

A simple rule of thumb

If Terraform should own it, use a resource. If Terraform should only read it, use a data source.

That distinction keeps your infrastructure predictable and maintainable.

Wrapping it up

Terraform data sources are the glue between managed and existing infrastructure. They help you avoid hardcoding, integrate across systems, and build flexible configurations.

Used well, they make your setup dynamic and reusable. Used poorly, they introduce hidden dependencies and fragile plans.

The difference usually comes down to one thing: being intentional about what Terraform owns—and what it simply observes.

Comments

Leave a comment on this article with your name, email, and message.

Loading comments...

Similar Articles

More posts from the same category you may want to read next.

Share: