Chapter 13: Workflows

The Problem

You've built segments, journeys, and queries. But someone still needs to run the daily data refresh. Someone needs to execute the weekly report. Someone needs to coordinate the data pipeline that feeds your segments.

That someone shouldn't be you clicking buttons every morning.

The Key Idea

Core concept

Workflows automate recurring tasks. AI writes workflow definitions that run on schedule without human intervention.

Think of workflows as the invisible machinery that keeps your marketing operations running—data flowing, tables updating, reports generating.

What Workflows Do

Common marketing workflows:

Task	Frequency	What Happens
Update customer metrics	Daily	Run SQL to calculate latest values
Prepare segment data	Daily	Transform raw data into segment-ready tables
Generate reports	Weekly	Calculate metrics, create summary tables
Archive old data	Monthly	Move historical records

The Workflow Pattern

Just like segments and journeys, workflows follow pull → edit → push:

bash

# Pull existing workflow project
tdx wf pull myproject

# Edit locally, then push changes
tdx wf push

# Run the workflow manually
tdx wf run myproject

Creating Your First Workflow

Tell AI what you need automated:

> "Create a workflow that updates customer metrics every morning at 6am"

AI generates a workflow file (workflows/daily-metrics/daily-metrics.dig):

yaml

timezone: America/New_York

schedule:
  daily>: 06:00:00

_export:
  td:
    database: marketing
    engine: presto

+update_customer_metrics:
  td>: queries/update_customer_metrics.sql
  create_table: customer_metrics

Understanding the Workflow

yaml

timezone: America/New_York

All times interpreted in this timezone.

yaml

schedule:
  daily>: 06:00:00

Run every day at 6:00 AM.

yaml

_export:
  td:
    database: marketing
    engine: presto

Default settings for all td> tasks in this workflow.

yaml

+update_customer_metrics:
  td>: queries/update_customer_metrics.sql
  create_table: customer_metrics

A task that runs a SQL file and creates/replaces a table with the results.

The td> Operator

The td> operator runs SQL queries. It's the core of TD workflows:

yaml

# Run SQL from a file
+from_file:
  td>: queries/analysis.sql
  create_table: results

# Run inline SQL
+inline_query:
  td>:
    query: |
      SELECT
        customer_id,
        SUM(amount) as total_spent
      FROM orders
      WHERE TD_TIME_RANGE(time, '${session_date}', NULL)
      GROUP BY 1
  insert_into: daily_totals

Output options:

create_table: — Replace table with results
insert_into: — Append results to existing table
store_last_results: true — Store results for conditional logic

Task Sequences

Tasks run in order. Later tasks wait for earlier ones:

yaml

+step_1_extract:
  td>: queries/extract_orders.sql
  create_table: orders_staging

+step_2_transform:
  td>: queries/calculate_metrics.sql
  create_table: customer_metrics

+step_3_aggregate:
  td>: queries/build_summary.sql
  create_table: daily_summary

Step 2 waits for step 1. Step 3 waits for step 2.

Parallel Tasks

Run independent tasks simultaneously:

yaml

+calculate_metrics:
  _parallel: true

  +revenue_metrics:
    td>: queries/revenue_metrics.sql
    create_table: revenue_metrics

  +engagement_metrics:
    td>: queries/engagement_metrics.sql
    create_table: engagement_metrics

  +product_metrics:
    td>: queries/product_metrics.sql
    create_table: product_metrics

+combine_results:
  td>: queries/combine_metrics.sql
  create_table: all_metrics

All three metrics queries run at the same time, then combine_results runs after.

Scheduling Options

Daily:

yaml

schedule:
  daily>: 06:00:00

Hourly:

yaml

schedule:
  hourly>: 00:00

Weekly:

yaml

schedule:
  weekly>: Mon,06:00:00

Multiple times per day:

yaml

schedule:
  cron>: 0 6,12,18 * * *

Specific days:

yaml

schedule:
  cron>: 0 9 1,15 * *  # 1st and 15th of each month

Session Variables

Built-in variables for dates:

yaml

+daily_report:
  td>:
    query: |
      SELECT *
      FROM events
      WHERE TD_TIME_RANGE(time, '${session_date}', TD_TIME_ADD('${session_date}', '1d'))
  create_table: daily_events_${session_date_compact}

Variable	Example Value	Use Case
`${session_date}`	2024-12-15	Date filtering
`${session_time}`	2024-12-15T06:00:00+00:00	Exact timestamp
`${session_date_compact}`	20241215	Table naming
`${last_session_date}`	2024-12-14	Previous run date

Variables and Parameters

Make workflows reusable:

yaml

_export:
  td:
    database: marketing
  target_days: 30

+recent_customers:
  td>:
    query: |
      SELECT *
      FROM customers
      WHERE last_purchase_date >= DATE_ADD('day', -${target_days}, CURRENT_DATE)
  create_table: recent_customers

Run with different parameters:

bash

tdx wf run daily-metrics --param target_days=90

Error Handling

What happens when something fails?

yaml

+critical_update:
  td>: queries/update_metrics.sql
  create_table: customer_metrics
  _retry: 3

  _error:
    +log_failure:
      td>:
        query: |
          INSERT INTO workflow_errors
          VALUES ('${session_id}', 'critical_update failed', CURRENT_TIMESTAMP)

If critical_update fails after 3 retries, the _error block runs.

Retry Logic

Automatically retry transient failures:

yaml

+api_dependent_query:
  _retry: 3
  _retry_interval: 60

  td>: queries/external_data.sql
  create_table: external_metrics

Try up to 3 times, waiting 60 seconds between attempts.

Exponential backoff:

yaml

+flaky_task:
  _retry:
    limit: 5
    interval: 30
    interval_type: exponential

  td>: queries/large_query.sql

Conditional Execution

Run tasks based on query results:

yaml

+check_data:
  td>:
    query: |
      SELECT COUNT(*) as cnt FROM new_orders WHERE date = '${session_date}'
  store_last_results: true

+process_if_data:
  if>: ${td.last_results.cnt > 0}
  _do:
    +process_orders:
      td>: queries/process_orders.sql
      create_table: processed_orders

A Complete Marketing Workflow

Here's a realistic daily workflow:

yaml

timezone: America/New_York

schedule:
  daily>: 05:00:00

_export:
  td:
    database: marketing
    engine: presto

# Step 1: Update base metrics
+update_metrics:
  td>: queries/update_customer_metrics.sql
  create_table: customer_metrics

# Step 2: Prepare segment data (parallel)
+prepare_segments:
  _parallel: true

  +high_value_data:
    td>: queries/high_value_customers.sql
    create_table: segment_high_value

  +recent_buyers_data:
    td>: queries/recent_buyers.sql
    create_table: segment_recent_buyers

  +at_risk_data:
    td>: queries/at_risk_customers.sql
    create_table: segment_at_risk

# Step 3: Build summary report
+daily_summary:
  td>:
    query: |
      SELECT
        '${session_date}' as report_date,
        (SELECT COUNT(*) FROM segment_high_value) as high_value_count,
        (SELECT COUNT(*) FROM segment_recent_buyers) as recent_buyers_count,
        (SELECT COUNT(*) FROM segment_at_risk) as at_risk_count,
        CURRENT_TIMESTAMP as generated_at
  insert_into: daily_reports

# Global error handler
_error:
  +log_error:
    td>:
      query: |
        INSERT INTO workflow_errors
        VALUES ('daily-marketing', '${session_id}', '${task_name}', CURRENT_TIMESTAMP)

Managing Workflows

List workflow projects:

bash

tdx wf projects

Pull a project to edit locally:

bash

tdx wf pull daily-metrics
# Creates: workflows/daily-metrics/

Push changes back:

bash

tdx wf push daily-metrics

View sessions and task status:

bash

tdx wf sessions daily-metrics
tdx wf attempt <attempt-id> tasks

View logs for a specific task:

bash

tdx wf attempt <attempt-id> logs +update_metrics

Run manually:

bash

tdx wf run daily-metrics

Mental Model: Assembly Line

Think of workflows as an assembly line:

[Raw Data] → [Extract] → [Transform] → [Load] → [Ready for Segments]

Each station does one job. The line runs automatically on schedule. If a station breaks, the line stops and logs the error.

Pitfalls

"The workflow didn't run."

Check the schedule and timezone:

> "When is daily-metrics scheduled to run next?"

"A task is taking too long."

Large queries might need optimization. Check execution time:

bash

tdx wf attempt <attempt-id> tasks

"The query failed with a timeout."

Add retry logic and consider breaking into smaller queries:

yaml

+large_query:
  _retry: 2
  td>: queries/optimized_query.sql

What You've Learned

Workflows automate recurring SQL tasks
The td> operator runs queries and creates tables
Tasks run sequentially or in parallel with _parallel: true
Schedules can be daily, hourly, weekly, or custom cron
Session variables like ${session_date} make queries dynamic
Error handling with _error blocks prevents silent failures
Retry logic handles transient issues

Next Step

Your data pipelines run automatically. Chapter 14 shows you how to build AI agents—giving your team members AI-powered assistants that can answer questions about customer data without needing Claude Code.

Your workflows are automated. Next, you'll build AI agents for your team.

Chapter 13: Workflows ​

The Problem ​

The Key Idea ​

What Workflows Do ​

The Workflow Pattern ​

Creating Your First Workflow ​

Understanding the Workflow ​

The td> Operator ​

Task Sequences ​

Parallel Tasks ​

Scheduling Options ​

Session Variables ​

Variables and Parameters ​

Error Handling ​

Retry Logic ​

Conditional Execution ​

A Complete Marketing Workflow ​

Managing Workflows ​

Mental Model: Assembly Line ​

Pitfalls ​

What You've Learned ​

Next Step ​

Chapter 13: Workflows

The Problem

The Key Idea

What Workflows Do

The Workflow Pattern

Creating Your First Workflow

Understanding the Workflow

The td> Operator

Task Sequences

Parallel Tasks

Scheduling Options

Session Variables

Variables and Parameters

Error Handling

Retry Logic

Conditional Execution

A Complete Marketing Workflow

Managing Workflows

Mental Model: Assembly Line

Pitfalls

What You've Learned

Next Step