Airflow Concepts: Difference between revisions

From NovaOrdis Knowledge Base
Jump to navigation Jump to search
Line 40: Line 40:
===<span id='Bjy4L'></span>TaskFlow-decorated Task===
===<span id='Bjy4L'></span>TaskFlow-decorated Task===
{{External|https://airflow.apache.org/docs/apache-airflow/stable/concepts/taskflow.html}}
{{External|https://airflow.apache.org/docs/apache-airflow/stable/concepts/taskflow.html}}
Decorated with <code>@task</code>. A custom Python function packaged up as a Task.
==Task Assignment to DAG==
==Task Assignment to DAG==
{{External|https://airflow.apache.org/docs/apache-airflow/stable/concepts/dags.html#dag-assignment}}
{{External|https://airflow.apache.org/docs/apache-airflow/stable/concepts/dags.html#dag-assignment}}

Revision as of 02:17, 11 July 2022

External

Internal

Workflow

DAG

https://airflow.apache.org/docs/apache-airflow/stable/concepts/dags.html
Graph Concepts | Directed Acyclic Graph

The edges can be labeled in the UI.

SubDAG

https://airflow.apache.org/docs/apache-airflow/stable/concepts/dags.html#concepts-subdags

A DAG is made of tasks among which there are relations of dependency. The DAG is not concerned about what happens inside the tasks, it is only concerned about how to run them: order, retries, timeouts. etc.

Declaring a DAG

Control Flow

https://airflow.apache.org/docs/apache-airflow/stable/concepts/dags.html#control-flow

Dynamic DAG

https://airflow.apache.org/docs/apache-airflow/stable/concepts/dags.html#dynamic-dags

The DAGs can be purely declarative, or they can be declared in Python code, by adding tasks dynamically.

Task

https://airflow.apache.org/docs/apache-airflow/stable/concepts/tasks.html

A Task is the basic unit of execution in Airflow. Every task must be assigned to a DAG to run. Tasks have dependencies on each other. There could be upstream dependencies (if B depends on A, A → B, then A is an upstream dependency of B).

Task Dependencies

If a task B has a dependency on task A (A → B), it is said that A is upstream of B and B is downstream of A. The dependencies are the directed edges of the directed acyclic graph.

Task Types

Airflow has three types of tasks: Operator, Sensor, which is a subclass of Operator, and TaskFlow-decorated Task.

Operator

https://airflow.apache.org/docs/apache-airflow/stable/concepts/operators.html

An Operator is a predefined task template.

Sensor

https://airflow.apache.org/docs/apache-airflow/stable/concepts/sensors.html

A Sensor is a subclass of Operator that wait for an external event to happen.

TaskFlow-decorated Task

https://airflow.apache.org/docs/apache-airflow/stable/concepts/taskflow.html

Decorated with @task. A custom Python function packaged up as a Task.

Task Assignment to DAG

https://airflow.apache.org/docs/apache-airflow/stable/concepts/dags.html#dag-assignment

Passing Data between Tasks

Tasks pass data among each other using:

  • XComs, when the amount of metadata to be exchanged is small.
  • Uploading and downloading large files from a storage service.

TaskGroup

https://airflow.apache.org/docs/apache-airflow/stable/concepts/dags.html#concepts-taskgroups

This is a pure UI concept.

XComs

https://airflow.apache.org/docs/apache-airflow/stable/concepts/taskflow.html

"Cross-communications".

Workload

Scheduler

https://airflow.apache.org/docs/apache-airflow/stable/concepts/scheduler.html

Executor

https://airflow.apache.org/docs/apache-airflow/stable/executor/index.html

Worker

Metadata Database

Connections & Hooks

https://airflow.apache.org/docs/apache-airflow/stable/concepts/connections.html

Pool

https://airflow.apache.org/docs/apache-airflow/stable/concepts/pools.html