Skip to content

Chapter 13: Workflows

The Problem

You've built segments, journeys, and queries. But someone still needs to run the daily data refresh. Someone needs to execute the weekly report. Someone needs to coordinate the data pipeline that feeds your segments.

That someone shouldn't be you clicking buttons every morning.

The Key Idea

Core concept

Workflows automate recurring tasks. AI writes workflow definitions that run on schedule without human intervention.

Think of workflows as the invisible machinery that keeps your marketing operations running—data flowing, tables updating, reports generating.

What Workflows Do

Common marketing workflows:

TaskFrequencyWhat Happens
Update customer metricsDailyRun SQL to calculate latest values
Prepare segment dataDailyTransform raw data into segment-ready tables
Generate reportsWeeklyCalculate metrics, create summary tables
Archive old dataMonthlyMove historical records

The Workflow Pattern

Just like segments and journeys, workflows follow pull → edit → push:

bash
# Pull existing workflow project
tdx wf pull myproject

# Edit locally, then push changes
tdx wf push

# Run the workflow manually
tdx wf run myproject

Creating Your First Workflow

Tell AI what you need automated:

> "Create a workflow that updates customer metrics every morning at 6am"

AI generates a workflow file (workflows/daily-metrics/daily-metrics.dig):

yaml
timezone: America/New_York

schedule:
  daily>: 06:00:00

_export:
  td:
    database: marketing
    engine: presto

+update_customer_metrics:
  td>: queries/update_customer_metrics.sql
  create_table: customer_metrics

Understanding the Workflow

yaml
timezone: America/New_York

All times interpreted in this timezone.

yaml
schedule:
  daily>: 06:00:00

Run every day at 6:00 AM.

yaml
_export:
  td:
    database: marketing
    engine: presto

Default settings for all td> tasks in this workflow.

yaml
+update_customer_metrics:
  td>: queries/update_customer_metrics.sql
  create_table: customer_metrics

A task that runs a SQL file and creates/replaces a table with the results.

The td> Operator

The td> operator runs SQL queries. It's the core of TD workflows:

yaml
# Run SQL from a file
+from_file:
  td>: queries/analysis.sql
  create_table: results

# Run inline SQL
+inline_query:
  td>:
    query: |
      SELECT
        customer_id,
        SUM(amount) as total_spent
      FROM orders
      WHERE TD_TIME_RANGE(time, '${session_date}', NULL)
      GROUP BY 1
  insert_into: daily_totals

Output options:

  • create_table: — Replace table with results
  • insert_into: — Append results to existing table
  • store_last_results: true — Store results for conditional logic

Task Sequences

Tasks run in order. Later tasks wait for earlier ones:

yaml
+step_1_extract:
  td>: queries/extract_orders.sql
  create_table: orders_staging

+step_2_transform:
  td>: queries/calculate_metrics.sql
  create_table: customer_metrics

+step_3_aggregate:
  td>: queries/build_summary.sql
  create_table: daily_summary

Step 2 waits for step 1. Step 3 waits for step 2.

Parallel Tasks

Run independent tasks simultaneously:

yaml
+calculate_metrics:
  _parallel: true

  +revenue_metrics:
    td>: queries/revenue_metrics.sql
    create_table: revenue_metrics

  +engagement_metrics:
    td>: queries/engagement_metrics.sql
    create_table: engagement_metrics

  +product_metrics:
    td>: queries/product_metrics.sql
    create_table: product_metrics

+combine_results:
  td>: queries/combine_metrics.sql
  create_table: all_metrics

All three metrics queries run at the same time, then combine_results runs after.

Scheduling Options

Daily:

yaml
schedule:
  daily>: 06:00:00

Hourly:

yaml
schedule:
  hourly>: 00:00

Weekly:

yaml
schedule:
  weekly>: Mon,06:00:00

Multiple times per day:

yaml
schedule:
  cron>: 0 6,12,18 * * *

Specific days:

yaml
schedule:
  cron>: 0 9 1,15 * *  # 1st and 15th of each month

Session Variables

Built-in variables for dates:

yaml
+daily_report:
  td>:
    query: |
      SELECT *
      FROM events
      WHERE TD_TIME_RANGE(time, '${session_date}', TD_TIME_ADD('${session_date}', '1d'))
  create_table: daily_events_${session_date_compact}
VariableExample ValueUse Case
${session_date}2024-12-15Date filtering
${session_time}2024-12-15T06:00:00+00:00Exact timestamp
${session_date_compact}20241215Table naming
${last_session_date}2024-12-14Previous run date

Variables and Parameters

Make workflows reusable:

yaml
_export:
  td:
    database: marketing
  target_days: 30

+recent_customers:
  td>:
    query: |
      SELECT *
      FROM customers
      WHERE last_purchase_date >= DATE_ADD('day', -${target_days}, CURRENT_DATE)
  create_table: recent_customers

Run with different parameters:

bash
tdx wf run daily-metrics --param target_days=90

Error Handling

What happens when something fails?

yaml
+critical_update:
  td>: queries/update_metrics.sql
  create_table: customer_metrics
  _retry: 3

  _error:
    +log_failure:
      td>:
        query: |
          INSERT INTO workflow_errors
          VALUES ('${session_id}', 'critical_update failed', CURRENT_TIMESTAMP)

If critical_update fails after 3 retries, the _error block runs.

Retry Logic

Automatically retry transient failures:

yaml
+api_dependent_query:
  _retry: 3
  _retry_interval: 60

  td>: queries/external_data.sql
  create_table: external_metrics

Try up to 3 times, waiting 60 seconds between attempts.

Exponential backoff:

yaml
+flaky_task:
  _retry:
    limit: 5
    interval: 30
    interval_type: exponential

  td>: queries/large_query.sql

Conditional Execution

Run tasks based on query results:

yaml
+check_data:
  td>:
    query: |
      SELECT COUNT(*) as cnt FROM new_orders WHERE date = '${session_date}'
  store_last_results: true

+process_if_data:
  if>: ${td.last_results.cnt > 0}
  _do:
    +process_orders:
      td>: queries/process_orders.sql
      create_table: processed_orders

A Complete Marketing Workflow

Here's a realistic daily workflow:

yaml
timezone: America/New_York

schedule:
  daily>: 05:00:00

_export:
  td:
    database: marketing
    engine: presto

# Step 1: Update base metrics
+update_metrics:
  td>: queries/update_customer_metrics.sql
  create_table: customer_metrics

# Step 2: Prepare segment data (parallel)
+prepare_segments:
  _parallel: true

  +high_value_data:
    td>: queries/high_value_customers.sql
    create_table: segment_high_value

  +recent_buyers_data:
    td>: queries/recent_buyers.sql
    create_table: segment_recent_buyers

  +at_risk_data:
    td>: queries/at_risk_customers.sql
    create_table: segment_at_risk

# Step 3: Build summary report
+daily_summary:
  td>:
    query: |
      SELECT
        '${session_date}' as report_date,
        (SELECT COUNT(*) FROM segment_high_value) as high_value_count,
        (SELECT COUNT(*) FROM segment_recent_buyers) as recent_buyers_count,
        (SELECT COUNT(*) FROM segment_at_risk) as at_risk_count,
        CURRENT_TIMESTAMP as generated_at
  insert_into: daily_reports

# Global error handler
_error:
  +log_error:
    td>:
      query: |
        INSERT INTO workflow_errors
        VALUES ('daily-marketing', '${session_id}', '${task_name}', CURRENT_TIMESTAMP)

Managing Workflows

List workflow projects:

bash
tdx wf projects

Pull a project to edit locally:

bash
tdx wf pull daily-metrics
# Creates: workflows/daily-metrics/

Push changes back:

bash
tdx wf push daily-metrics

View sessions and task status:

bash
tdx wf sessions daily-metrics
tdx wf attempt <attempt-id> tasks

View logs for a specific task:

bash
tdx wf attempt <attempt-id> logs +update_metrics

Run manually:

bash
tdx wf run daily-metrics

Mental Model: Assembly Line

Think of workflows as an assembly line:

[Raw Data] → [Extract] → [Transform] → [Load] → [Ready for Segments]

Each station does one job. The line runs automatically on schedule. If a station breaks, the line stops and logs the error.

Pitfalls

"The workflow didn't run."

Check the schedule and timezone:

> "When is daily-metrics scheduled to run next?"

"A task is taking too long."

Large queries might need optimization. Check execution time:

bash
tdx wf attempt <attempt-id> tasks

"The query failed with a timeout."

Add retry logic and consider breaking into smaller queries:

yaml
+large_query:
  _retry: 2
  td>: queries/optimized_query.sql

What You've Learned

  • Workflows automate recurring SQL tasks
  • The td> operator runs queries and creates tables
  • Tasks run sequentially or in parallel with _parallel: true
  • Schedules can be daily, hourly, weekly, or custom cron
  • Session variables like ${session_date} make queries dynamic
  • Error handling with _error blocks prevents silent failures
  • Retry logic handles transient issues

Next Step

Your data pipelines run automatically. Chapter 14 shows you how to build AI agents—giving your team members AI-powered assistants that can answer questions about customer data without needing Claude Code.


Your workflows are automated. Next, you'll build AI agents for your team.