Most developers pick up Terraform by creating resources. You define an aws_instance, apply it, and move on. But things get interesting when your infrastructure isn’t fully owned by your Terraform code.
That’s where Terraform data sources come in. They let you query existing infrastructure instead of creating it. Think of them as read-only views into your cloud environment.
First, a quick mental model
If a resource says “create this”, a data source says “find me this”.
Here’s a simple example fetching an existing AWS VPC:
1data "aws_vpc" "main" {
2 filter {
3 name = "tag:Name"
4 values = ["prod-vpc"]
5 }
6}No infrastructure is created here. Terraform simply looks up a VPC matching the filter and exposes its attributes.
Why data sources matter in real projects
In isolation, Terraform is clean. In reality, teams share infrastructure, migrate systems, and integrate across environments.
Common scenarios where data sources shine:
- Referencing existing VPCs, subnets, or security groups
- Pulling AMI IDs dynamically
- Reading outputs from another Terraform state
- Integrating with manually created resources
Without data sources, you’d either hardcode values or duplicate infrastructure—both bad options.
Example: Dynamic AMI lookup
Hardcoding AMI IDs is a classic mistake. They change frequently, and your deployment becomes brittle.
Instead:
1data "aws_ami" "latest_amazon_linux" {
2 most_recent = true
3
4 owners = ["amazon"]
5
6 filter {
7 name = "name"
8 values = ["amzn2-ami-hvm-*-x86_64-gp2"]
9 }
10}Now use it in a resource:
1resource "aws_instance" "app" {
2 ami = data.aws_ami.latest_amazon_linux.id
3 instance_type = "t3.micro"
4}This keeps your infrastructure automatically up to date without manual intervention.
Data sources vs resources (where people get confused)
A common mistake developers make is assuming data sources “import” infrastructure into Terraform management. They don’t.
Let’s be clear:
- Resource: Terraform manages lifecycle (create, update, destroy)
- Data source: Terraform only reads information
If you delete a resource block, Terraform destroys it. If you delete a data source block, nothing happens to the actual infrastructure.
Chaining data sources with resources
Things get powerful when you combine both.
Example: find a subnet and launch an instance inside it:
1data "aws_subnet" "selected" {
2 filter {
3 name = "tag:Name"
4 values = ["public-subnet-1"]
5 }
6}
7
8resource "aws_instance" "web" {
9 ami = data.aws_ami.latest_amazon_linux.id
10 instance_type = "t3.micro"
11 subnet_id = data.aws_subnet.selected.id
12}This pattern is extremely common in production systems.
Working with remote state
Here’s where things get interesting.
You can use data sources to read outputs from another Terraform project:
1data "terraform_remote_state" "network" {
2 backend = "s3"
3
4 config = {
5 bucket = "my-terraform-state"
6 key = "network/terraform.tfstate"
7 region = "us-east-1"
8 }
9}Then reference outputs:
1subnet_id = data.terraform_remote_state.network.outputs.public_subnet_idThis enables modular infrastructure where different teams manage separate stacks.
Filtering strategies that actually work
Filters are where most data source bugs come from.
A few practical tips:
- Prefer tag-based filters over names
- Ensure filters return exactly one result (or explicitly handle multiples)
- Use most_recent = true carefully—it can introduce drift
If your filter matches multiple resources unexpectedly, Terraform will fail with a vague error.
Performance considerations
Every data source call is an API request. In large configurations, this adds up.
Watch out for:
- Repeated identical data sources (cache manually using locals)
- Over-fetching data you don’t use
- Complex filters hitting slow APIs
A small optimization:
1locals {
2 vpc_id = data.aws_vpc.main.id
3}Use local values instead of repeatedly referencing the data source.
Gotchas you’ll probably hit
1. Data source not found
If Terraform can’t find a match, it fails the plan. There’s no “optional” by default.
2. Timing issues
Data sources run during planning. If you’re trying to read something created in the same apply, it may not exist yet.
Solution: reference the resource directly instead of using a data source.
3. Implicit dependencies
Terraform usually infers dependencies, but sometimes you need to be explicit:
1data "aws_subnet" "example" {
2 depends_on = [aws_vpc.main]
3
4 filter {
5 name = "vpc-id"
6 values = [aws_vpc.main.id]
7 }
8}When NOT to use data sources
Data sources are powerful, but not always the right tool.
Avoid them when:
- You fully control the infrastructure → use resources instead
- You need lifecycle management
- You’re introducing unnecessary external dependencies
Overusing data sources can make your configuration harder to reason about.
A simple rule of thumb
If Terraform should own it, use a resource. If Terraform should only read it, use a data source.
That distinction keeps your infrastructure predictable and maintainable.
Wrapping it up
Terraform data sources are the glue between managed and existing infrastructure. They help you avoid hardcoding, integrate across systems, and build flexible configurations.
Used well, they make your setup dynamic and reusable. Used poorly, they introduce hidden dependencies and fragile plans.
The difference usually comes down to one thing: being intentional about what Terraform owns—and what it simply observes.