Airflow XComs: Difference between revisions
Line 14: | Line 14: | ||
An XCom is identified by a <code>key</code>, which is the XCom's name, as well as the <code>task_id</code> and <code>dag_id</code> it came from. The XCom can have any serializable value, however it must be relatively small. If there is need to exchange large amounts of data, this is usually done uploading and downloading large files from a storage service. | An XCom is identified by a <code>key</code>, which is the XCom's name, as well as the <code>task_id</code> and <code>dag_id</code> it came from. The XCom can have any serializable value, however it must be relatively small. If there is need to exchange large amounts of data, this is usually done uploading and downloading large files from a storage service. | ||
XComs are explicitly “pushed” and “pulled” to/from their storage using the <code>xcom_push()</code> and <code>xcom_pull()</code> methods on [[Airflow_Concepts#Task_Instance|task instances]]. For more details see [[#Ingesting_Input_into_a_Task_with_xcom_pull.28.29|Ingesting Input into a Task with <code>xcom_pull()</code>]] and [[]] below. | XComs are explicitly “pushed” and “pulled” to/from their storage using the <code>xcom_push()</code> and <code>xcom_pull()</code> methods on [[Airflow_Concepts#Task_Instance|task instances]]. For more details see [[#Ingesting_Input_into_a_Task_with_xcom_pull.28.29|Ingesting Input into a Task with <code>xcom_pull()</code>]] and [[#Exposing_Task_Output_with_xcom_push.28.29_or_via_the_Return_Value|Exposing Task Output with <code>xcom_push()</code> or via the Return Value]] below. The XComs are stored in the xcom table and they need to be explicitly deleted after use, otherwise they'll leak in the table. | ||
The XComs are stored in the xcom table and they need to be explicitly deleted after use, otherwise they'll leak in the table. | |||
[[Airflow_Concepts#Variable|Variables]] are an alternative mechanism for tasks to share data. However, variables are global and should be used for overall configuration that covers the entire installation. To pass data to and from tasks, XComs are preferable. | [[Airflow_Concepts#Variable|Variables]] are an alternative mechanism for tasks to share data. However, variables are global and should be used for overall configuration that covers the entire installation. To pass data to and from tasks, XComs are preferable. |
Revision as of 03:05, 16 July 2022
External
- https://airflow.apache.org/docs/apache-airflow/stable/concepts/xcoms.html
- https://airflow.apache.org/docs/apache-airflow/stable/concepts/taskflow.html
- https://airflow.apache.org/docs/apache-airflow/2.0.0/concepts.html#xcoms
Internal
Overview
XComs is one of the methods tasks use to exchange data.
Concepts
Tasks communicate using inputs and outputs, and the XComs ("cross-communications") mechanism is an implementation of this pattern. By default, tasks are entirely isolated and may be running on entirely different machines so when they exchange data, the data must be serializable.
An XCom is identified by a key
, which is the XCom's name, as well as the task_id
and dag_id
it came from. The XCom can have any serializable value, however it must be relatively small. If there is need to exchange large amounts of data, this is usually done uploading and downloading large files from a storage service.
XComs are explicitly “pushed” and “pulled” to/from their storage using the xcom_push()
and xcom_pull()
methods on task instances. For more details see Ingesting Input into a Task with xcom_pull()
and Exposing Task Output with xcom_push()
or via the Return Value below. The XComs are stored in the xcom table and they need to be explicitly deleted after use, otherwise they'll leak in the table.
Variables are an alternative mechanism for tasks to share data. However, variables are global and should be used for overall configuration that covers the entire installation. To pass data to and from tasks, XComs are preferable.
Also see DAG run execution context.
When you call a TaskFlow function in the DAG file, rather than executing it, you will get an object representing the XCom for the result (an XComArg
, that you can use as inputs to downstream tasks and operators.
Programming Model
Also see:
@task
def task_a():
print("executing task A")
Other examples:
Ingesting Input into a Task with xcom_pull()
Data externally generated can be ingested by a task with xcom_pull
.