Airflow Sensor: Difference between revisions

From NovaOrdis Knowledge Base
Jump to navigation Jump to search
 
(13 intermediate revisions by the same user not shown)
Line 3: Line 3:
* https://airflow.apache.org/docs/apache-airflow/stable/concepts/smart-sensors.html
* https://airflow.apache.org/docs/apache-airflow/stable/concepts/smart-sensors.html
* https://airflow.apache.org/docs/apache-airflow/2.0.0/concepts.html#sensors
* https://airflow.apache.org/docs/apache-airflow/2.0.0/concepts.html#sensors
* https://airflow.apache.org/docs/apache-airflow/stable/_api/airflow/sensors/index.html
=Internal=
=Internal=
* [[Airflow_Concepts#Sensor|Airflow Concepts]]
* [[Airflow_Concepts#Sensor|Airflow Concepts]]
* [[Airflow_Deferrable_Operators#Overview|Deferrable Operators]]
=Overview=
=Overview=
A Sensor is a subclass of [[Airflow_Concepts#Operator|Operator]]. Sensors wait for an external event to happen. When the event they are waiting for occurs, the tasks succeeds, so their downstream tasks can run. The sensors are primarily idle, and because of that, they have primarily three modes of running, that allows executing them with various degrees of efficiency: [[#Poke|poke]], [[#Reschedule|reschedule]] and [[#Smart_Sensor|smart sensors]].
A Sensor is a subclass of [[Airflow_Concepts#Operator|Operator]]. Sensors poll (wait and then periodically check) for an external event to happen. When the event they are waiting for occurs, the tasks succeeds, so their downstream tasks can run. The sensors are primarily idle, and because of that, they have primarily three modes of running, that allows executing them with various degrees of efficiency: [[#Poke|poke]], [[#Reschedule|reschedule]] and [[#Smart_Sensor|smart sensors]].


Also see [[Airflow_Concepts#Deferrable_Operators_and_Triggers|Deferrable Operators and Triggers]].
Also see [[Airflow_Concepts#Deferrable_Operators_and_Triggers|Deferrable Operators and Triggers]].
Line 12: Line 16:
=Sensor Types=
=Sensor Types=
==Poke==
==Poke==
This is default run mode. The Sensor takes up a [[Airflow_Concepts#Worker|worker]] slot for its entire runtime.
<code>poke</code> is the default run mode for a sensor. The Sensor takes up a [[Airflow_Concepts#Worker|worker]] slot for its entire runtime and it sleeps between "pokes". Something that is checking every second should be in <code>poke</code> mode.


==Reschedule==
==Reschedule==
The sensor takes up a worker slot only when it's checking, and sleeps for a set duration between checks.
The sensor takes up a worker slot only when it's checking, then frees the worker slot, sleeps for a set duration, then it is rescheduled on the worker slot. <code>reschedule</code> trades of latency for resources.  Something that is checking every minute should be in <code>reschedule</code> mode.
 
The <code>reschedule</code> mode can be configured when the sensor is instantiated.
 
<syntaxhighlight lang='py'>
S3KeySensor(task_id='something', mode='reschedule', ...)
</syntaxhighlight>


==Smart Sensor==
==Smart Sensor==
{{External|https://airflow.apache.org/docs/apache-airflow/stable/concepts/smart-sensors.html}}
There is a single centralized version of this sensor that batches all executions of it.
There is a single centralized version of this sensor that batches all executions of it.
⚠️ Smart sensors are a deprecated early-access feature that will be removed in Airflow 2.4.0. It is superseded by [[Airflow Deferrable Operators#Overview|deferrable operators]], which offer a more flexible way to achieve efficient long-running sensors, as well as allowing operators to also achieve similar efficiency gains.

Latest revision as of 23:26, 17 July 2022

External

Internal

Overview

A Sensor is a subclass of Operator. Sensors poll (wait and then periodically check) for an external event to happen. When the event they are waiting for occurs, the tasks succeeds, so their downstream tasks can run. The sensors are primarily idle, and because of that, they have primarily three modes of running, that allows executing them with various degrees of efficiency: poke, reschedule and smart sensors.

Also see Deferrable Operators and Triggers.

Sensor Types

Poke

poke is the default run mode for a sensor. The Sensor takes up a worker slot for its entire runtime and it sleeps between "pokes". Something that is checking every second should be in poke mode.

Reschedule

The sensor takes up a worker slot only when it's checking, then frees the worker slot, sleeps for a set duration, then it is rescheduled on the worker slot. reschedule trades of latency for resources. Something that is checking every minute should be in reschedule mode.

The reschedule mode can be configured when the sensor is instantiated.

S3KeySensor(task_id='something', mode='reschedule', ...)

Smart Sensor

https://airflow.apache.org/docs/apache-airflow/stable/concepts/smart-sensors.html

There is a single centralized version of this sensor that batches all executions of it.

⚠️ Smart sensors are a deprecated early-access feature that will be removed in Airflow 2.4.0. It is superseded by deferrable operators, which offer a more flexible way to achieve efficient long-running sensors, as well as allowing operators to also achieve similar efficiency gains.