Airflow Sensor: Difference between revisions
(→Poke) |
|||
(5 intermediate revisions by the same user not shown) | |||
Line 3: | Line 3: | ||
* https://airflow.apache.org/docs/apache-airflow/stable/concepts/smart-sensors.html | * https://airflow.apache.org/docs/apache-airflow/stable/concepts/smart-sensors.html | ||
* https://airflow.apache.org/docs/apache-airflow/2.0.0/concepts.html#sensors | * https://airflow.apache.org/docs/apache-airflow/2.0.0/concepts.html#sensors | ||
* https://airflow.apache.org/docs/apache-airflow/stable/_api/airflow/sensors/index.html | |||
=Internal= | =Internal= | ||
* [[Airflow_Concepts#Sensor|Airflow Concepts]] | * [[Airflow_Concepts#Sensor|Airflow Concepts]] | ||
* [[Airflow_Deferrable_Operators#Overview|Deferrable Operators]] | |||
=Overview= | =Overview= | ||
A Sensor is a subclass of [[Airflow_Concepts#Operator|Operator]]. Sensors poll (wait and then periodically check) for an external event to happen. When the event they are waiting for occurs, the tasks succeeds, so their downstream tasks can run. The sensors are primarily idle, and because of that, they have primarily three modes of running, that allows executing them with various degrees of efficiency: [[#Poke|poke]], [[#Reschedule|reschedule]] and [[#Smart_Sensor|smart sensors]]. | A Sensor is a subclass of [[Airflow_Concepts#Operator|Operator]]. Sensors poll (wait and then periodically check) for an external event to happen. When the event they are waiting for occurs, the tasks succeeds, so their downstream tasks can run. The sensors are primarily idle, and because of that, they have primarily three modes of running, that allows executing them with various degrees of efficiency: [[#Poke|poke]], [[#Reschedule|reschedule]] and [[#Smart_Sensor|smart sensors]]. | ||
Line 15: | Line 19: | ||
==Reschedule== | ==Reschedule== | ||
The sensor takes up a worker slot only when it's checking, then frees the worker slot, sleeps for a set duration, then it is rescheduled on the worker slot. <code>reschedule</code> trades of latency for resources. Something that is checking every minute should be in <code>reschedule</code> mode. | |||
The <code>reschedule</code> mode can be configured when the sensor is instantiated. | |||
<syntaxhighlight lang='py'> | |||
S3KeySensor(task_id='something', mode='reschedule', ...) | |||
</syntaxhighlight> | |||
==Smart Sensor== | ==Smart Sensor== | ||
{{External|https://airflow.apache.org/docs/apache-airflow/stable/concepts/smart-sensors.html}} | {{External|https://airflow.apache.org/docs/apache-airflow/stable/concepts/smart-sensors.html}} | ||
There is a single centralized version of this sensor that batches all executions of it. | There is a single centralized version of this sensor that batches all executions of it. | ||
⚠️ Smart sensors are a deprecated early-access feature that will be removed in Airflow 2.4.0. It is superseded by [[Airflow Deferrable Operators#Overview|deferrable operators]], which offer a more flexible way to achieve efficient long-running sensors, as well as allowing operators to also achieve similar efficiency gains. |
Latest revision as of 23:26, 17 July 2022
External
- https://airflow.apache.org/docs/apache-airflow/stable/concepts/sensors.html
- https://airflow.apache.org/docs/apache-airflow/stable/concepts/smart-sensors.html
- https://airflow.apache.org/docs/apache-airflow/2.0.0/concepts.html#sensors
- https://airflow.apache.org/docs/apache-airflow/stable/_api/airflow/sensors/index.html
Internal
Overview
A Sensor is a subclass of Operator. Sensors poll (wait and then periodically check) for an external event to happen. When the event they are waiting for occurs, the tasks succeeds, so their downstream tasks can run. The sensors are primarily idle, and because of that, they have primarily three modes of running, that allows executing them with various degrees of efficiency: poke, reschedule and smart sensors.
Also see Deferrable Operators and Triggers.
Sensor Types
Poke
poke
is the default run mode for a sensor. The Sensor takes up a worker slot for its entire runtime and it sleeps between "pokes". Something that is checking every second should be in poke
mode.
Reschedule
The sensor takes up a worker slot only when it's checking, then frees the worker slot, sleeps for a set duration, then it is rescheduled on the worker slot. reschedule
trades of latency for resources. Something that is checking every minute should be in reschedule
mode.
The reschedule
mode can be configured when the sensor is instantiated.
S3KeySensor(task_id='something', mode='reschedule', ...)
Smart Sensor
There is a single centralized version of this sensor that batches all executions of it.
⚠️ Smart sensors are a deprecated early-access feature that will be removed in Airflow 2.4.0. It is superseded by deferrable operators, which offer a more flexible way to achieve efficient long-running sensors, as well as allowing operators to also achieve similar efficiency gains.