Terraform Data Sources Guide with Real Examples

Most developers pick up Terraform by creating resources. You define an aws_instance, apply it, and move on. But things get interesting when your infrastructure isn’t fully owned by your Terraform code.

That’s where Terraform data sources come in. They let you query existing infrastructure instead of creating it. Think of them as read-only views into your cloud environment.

First, a quick mental model

If a resource says “create this”, a data source says “find me this”.

Here’s a simple example fetching an existing AWS VPC:

JSON
data "aws_vpc" "main" {
  filter {
    name   = "tag:Name"
    values = ["prod-vpc"]
  }
}

No infrastructure is created here. Terraform simply looks up a VPC matching the filter and exposes its attributes.

Why data sources matter in real projects

In isolation, Terraform is clean. In reality, teams share infrastructure, migrate systems, and integrate across environments.

Common scenarios where data sources shine:

Referencing existing VPCs, subnets, or security groups
Pulling AMI IDs dynamically
Reading outputs from another Terraform state
Integrating with manually created resources

Without data sources, you’d either hardcode values or duplicate infrastructure—both bad options.

Example: Dynamic AMI lookup

Hardcoding AMI IDs is a classic mistake. They change frequently, and your deployment becomes brittle.

Instead:

TEXT

1data "aws_ami" "latest_amazon_linux" {
2  most_recent = true
3
4  owners = ["amazon"]
5
6  filter {
7    name   = "name"
8    values = ["amzn2-ami-hvm-*-x86_64-gp2"]
9  }
10}

Now use it in a resource:

TEXT

1resource "aws_instance" "app" {
2  ami           = data.aws_ami.latest_amazon_linux.id
3  instance_type = "t3.micro"
4}

This keeps your infrastructure automatically up to date without manual intervention.

Data sources vs resources (where people get confused)

A common mistake developers make is assuming data sources “import” infrastructure into Terraform management. They don’t.

Let’s be clear:

Resource: Terraform manages lifecycle (create, update, destroy)
Data source: Terraform only reads information

If you delete a resource block, Terraform destroys it. If you delete a data source block, nothing happens to the actual infrastructure.

Chaining data sources with resources

Things get powerful when you combine both.

Example: find a subnet and launch an instance inside it:

JSON
data "aws_subnet" "selected" {
  filter {
    name   = "tag:Name"
    values = ["public-subnet-1"]
  }
}

resource "aws_instance" "web" {
  ami           = data.aws_ami.latest_amazon_linux.id
  instance_type = "t3.micro"
  subnet_id     = data.aws_subnet.selected.id
}

This pattern is extremely common in production systems.

Working with remote state

Here’s where things get interesting.

You can use data sources to read outputs from another Terraform project:

TEXT

1data "terraform_remote_state" "network" {
2  backend = "s3"
3
4  config = {
5    bucket = "my-terraform-state"
6    key    = "network/terraform.tfstate"
7    region = "us-east-1"
8  }
9}

Then reference outputs:

TEXT

1subnet_id = data.terraform_remote_state.network.outputs.public_subnet_id

This enables modular infrastructure where different teams manage separate stacks.

Filtering strategies that actually work

Filters are where most data source bugs come from.

A few practical tips:

Prefer tag-based filters over names
Ensure filters return exactly one result (or explicitly handle multiples)
Use most_recent = true carefully—it can introduce drift

If your filter matches multiple resources unexpectedly, Terraform will fail with a vague error.

Performance considerations

Every data source call is an API request. In large configurations, this adds up.

Watch out for:

Repeated identical data sources (cache manually using locals)
Over-fetching data you don’t use
Complex filters hitting slow APIs

A small optimization:

TEXT

1locals {
2  vpc_id = data.aws_vpc.main.id
3}

Use local values instead of repeatedly referencing the data source.

Gotchas you’ll probably hit

1. Data source not found

If Terraform can’t find a match, it fails the plan. There’s no “optional” by default.

2. Timing issues

Data sources run during planning. If you’re trying to read something created in the same apply, it may not exist yet.

Solution: reference the resource directly instead of using a data source.

3. Implicit dependencies

Terraform usually infers dependencies, but sometimes you need to be explicit:

TEXT

1data "aws_subnet" "example" {
2  depends_on = [aws_vpc.main]
3
4  filter {
5    name   = "vpc-id"
6    values = [aws_vpc.main.id]
7  }
8}

When NOT to use data sources

Data sources are powerful, but not always the right tool.

Avoid them when:

You fully control the infrastructure → use resources instead
You need lifecycle management
You’re introducing unnecessary external dependencies

Overusing data sources can make your configuration harder to reason about.

A simple rule of thumb

If Terraform should own it, use a resource. If Terraform should only read it, use a data source.

That distinction keeps your infrastructure predictable and maintainable.

Wrapping it up

Terraform data sources are the glue between managed and existing infrastructure. They help you avoid hardcoding, integrate across systems, and build flexible configurations.

Used well, they make your setup dynamic and reusable. Used poorly, they introduce hidden dependencies and fragile plans.

The difference usually comes down to one thing: being intentional about what Terraform owns—and what it simply observes.

Terraform Data Sources in Detail: Practical Patterns and Gotchas

First, a quick mental model

Why data sources matter in real projects

Example: Dynamic AMI lookup

Data sources vs resources (where people get confused)

Chaining data sources with resources

Working with remote state

Filtering strategies that actually work

Performance considerations

Gotchas you’ll probably hit

1. Data source not found

2. Timing issues

3. Implicit dependencies

When NOT to use data sources

A simple rule of thumb

Wrapping it up

Comments

Similar Articles

How to Create a Deploy Ansible Workflow That Actually Scales

Deploying Ansible Code: From Playbooks to Production

Automating Ansible Linting with GitHub Actions