Chapter 13: Workflows
The Problem
You've built segments, journeys, and queries. But someone still needs to run the daily data refresh. Someone needs to execute the weekly report. Someone needs to coordinate the data pipeline that feeds your segments.
That someone shouldn't be you clicking buttons every morning.
The Key Idea
Core concept
Workflows automate recurring tasks. AI writes workflow definitions that run on schedule without human intervention.
Think of workflows as the invisible machinery that keeps your marketing operations running—data flowing, tables updating, reports generating.
What Workflows Do
Common marketing workflows:
| Task | Frequency | What Happens |
|---|---|---|
| Update customer metrics | Daily | Run SQL to calculate latest values |
| Prepare segment data | Daily | Transform raw data into segment-ready tables |
| Generate reports | Weekly | Calculate metrics, create summary tables |
| Archive old data | Monthly | Move historical records |
The Workflow Pattern
Just like segments and journeys, workflows follow pull → edit → push:
# Pull existing workflow project
tdx wf pull myproject
# Edit locally, then push changes
tdx wf push
# Run the workflow manually
tdx wf run myprojectCreating Your First Workflow
Tell AI what you need automated:
> "Create a workflow that updates customer metrics every morning at 6am"AI generates a workflow file (workflows/daily-metrics/daily-metrics.dig):
timezone: America/New_York
schedule:
daily>: 06:00:00
_export:
td:
database: marketing
engine: presto
+update_customer_metrics:
td>: queries/update_customer_metrics.sql
create_table: customer_metricsUnderstanding the Workflow
timezone: America/New_YorkAll times interpreted in this timezone.
schedule:
daily>: 06:00:00Run every day at 6:00 AM.
_export:
td:
database: marketing
engine: prestoDefault settings for all td> tasks in this workflow.
+update_customer_metrics:
td>: queries/update_customer_metrics.sql
create_table: customer_metricsA task that runs a SQL file and creates/replaces a table with the results.
The td> Operator
The td> operator runs SQL queries. It's the core of TD workflows:
# Run SQL from a file
+from_file:
td>: queries/analysis.sql
create_table: results
# Run inline SQL
+inline_query:
td>:
query: |
SELECT
customer_id,
SUM(amount) as total_spent
FROM orders
WHERE TD_TIME_RANGE(time, '${session_date}', NULL)
GROUP BY 1
insert_into: daily_totalsOutput options:
create_table:— Replace table with resultsinsert_into:— Append results to existing tablestore_last_results: true— Store results for conditional logic
Task Sequences
Tasks run in order. Later tasks wait for earlier ones:
+step_1_extract:
td>: queries/extract_orders.sql
create_table: orders_staging
+step_2_transform:
td>: queries/calculate_metrics.sql
create_table: customer_metrics
+step_3_aggregate:
td>: queries/build_summary.sql
create_table: daily_summaryStep 2 waits for step 1. Step 3 waits for step 2.
Parallel Tasks
Run independent tasks simultaneously:
+calculate_metrics:
_parallel: true
+revenue_metrics:
td>: queries/revenue_metrics.sql
create_table: revenue_metrics
+engagement_metrics:
td>: queries/engagement_metrics.sql
create_table: engagement_metrics
+product_metrics:
td>: queries/product_metrics.sql
create_table: product_metrics
+combine_results:
td>: queries/combine_metrics.sql
create_table: all_metricsAll three metrics queries run at the same time, then combine_results runs after.
Scheduling Options
Daily:
schedule:
daily>: 06:00:00Hourly:
schedule:
hourly>: 00:00Weekly:
schedule:
weekly>: Mon,06:00:00Multiple times per day:
schedule:
cron>: 0 6,12,18 * * *Specific days:
schedule:
cron>: 0 9 1,15 * * # 1st and 15th of each monthSession Variables
Built-in variables for dates:
+daily_report:
td>:
query: |
SELECT *
FROM events
WHERE TD_TIME_RANGE(time, '${session_date}', TD_TIME_ADD('${session_date}', '1d'))
create_table: daily_events_${session_date_compact}| Variable | Example Value | Use Case |
|---|---|---|
${session_date} | 2024-12-15 | Date filtering |
${session_time} | 2024-12-15T06:00:00+00:00 | Exact timestamp |
${session_date_compact} | 20241215 | Table naming |
${last_session_date} | 2024-12-14 | Previous run date |
Variables and Parameters
Make workflows reusable:
_export:
td:
database: marketing
target_days: 30
+recent_customers:
td>:
query: |
SELECT *
FROM customers
WHERE last_purchase_date >= DATE_ADD('day', -${target_days}, CURRENT_DATE)
create_table: recent_customersRun with different parameters:
tdx wf run daily-metrics --param target_days=90Error Handling
What happens when something fails?
+critical_update:
td>: queries/update_metrics.sql
create_table: customer_metrics
_retry: 3
_error:
+log_failure:
td>:
query: |
INSERT INTO workflow_errors
VALUES ('${session_id}', 'critical_update failed', CURRENT_TIMESTAMP)If critical_update fails after 3 retries, the _error block runs.
Retry Logic
Automatically retry transient failures:
+api_dependent_query:
_retry: 3
_retry_interval: 60
td>: queries/external_data.sql
create_table: external_metricsTry up to 3 times, waiting 60 seconds between attempts.
Exponential backoff:
+flaky_task:
_retry:
limit: 5
interval: 30
interval_type: exponential
td>: queries/large_query.sqlConditional Execution
Run tasks based on query results:
+check_data:
td>:
query: |
SELECT COUNT(*) as cnt FROM new_orders WHERE date = '${session_date}'
store_last_results: true
+process_if_data:
if>: ${td.last_results.cnt > 0}
_do:
+process_orders:
td>: queries/process_orders.sql
create_table: processed_ordersA Complete Marketing Workflow
Here's a realistic daily workflow:
timezone: America/New_York
schedule:
daily>: 05:00:00
_export:
td:
database: marketing
engine: presto
# Step 1: Update base metrics
+update_metrics:
td>: queries/update_customer_metrics.sql
create_table: customer_metrics
# Step 2: Prepare segment data (parallel)
+prepare_segments:
_parallel: true
+high_value_data:
td>: queries/high_value_customers.sql
create_table: segment_high_value
+recent_buyers_data:
td>: queries/recent_buyers.sql
create_table: segment_recent_buyers
+at_risk_data:
td>: queries/at_risk_customers.sql
create_table: segment_at_risk
# Step 3: Build summary report
+daily_summary:
td>:
query: |
SELECT
'${session_date}' as report_date,
(SELECT COUNT(*) FROM segment_high_value) as high_value_count,
(SELECT COUNT(*) FROM segment_recent_buyers) as recent_buyers_count,
(SELECT COUNT(*) FROM segment_at_risk) as at_risk_count,
CURRENT_TIMESTAMP as generated_at
insert_into: daily_reports
# Global error handler
_error:
+log_error:
td>:
query: |
INSERT INTO workflow_errors
VALUES ('daily-marketing', '${session_id}', '${task_name}', CURRENT_TIMESTAMP)Managing Workflows
List workflow projects:
tdx wf projectsPull a project to edit locally:
tdx wf pull daily-metrics
# Creates: workflows/daily-metrics/Push changes back:
tdx wf push daily-metricsView sessions and task status:
tdx wf sessions daily-metrics
tdx wf attempt <attempt-id> tasksView logs for a specific task:
tdx wf attempt <attempt-id> logs +update_metricsRun manually:
tdx wf run daily-metricsMental Model: Assembly Line
Think of workflows as an assembly line:
[Raw Data] → [Extract] → [Transform] → [Load] → [Ready for Segments]Each station does one job. The line runs automatically on schedule. If a station breaks, the line stops and logs the error.
Pitfalls
"The workflow didn't run."
Check the schedule and timezone:
> "When is daily-metrics scheduled to run next?""A task is taking too long."
Large queries might need optimization. Check execution time:
tdx wf attempt <attempt-id> tasks"The query failed with a timeout."
Add retry logic and consider breaking into smaller queries:
+large_query:
_retry: 2
td>: queries/optimized_query.sqlWhat You've Learned
- Workflows automate recurring SQL tasks
- The
td>operator runs queries and creates tables - Tasks run sequentially or in parallel with
_parallel: true - Schedules can be daily, hourly, weekly, or custom cron
- Session variables like
${session_date}make queries dynamic - Error handling with
_errorblocks prevents silent failures - Retry logic handles transient issues
Next Step
Your data pipelines run automatically. Chapter 14 shows you how to build AI agents—giving your team members AI-powered assistants that can answer questions about customer data without needing Claude Code.
Your workflows are automated. Next, you'll build AI agents for your team.