Apache Airflow not scheduling tasks
I have installed apache-airflow (version v1.9.0) along with python 2.7. To test whether its installed properly I tried to trigger a tutorial DAG from the interactive view in browser. The interface shows that the DAG is running, but the scheduler doesn't show any activity.
Below are the steps I tried
- Install airflow
pip install apache-airflow
- Install crypto for setting encryption using fernet key
pip install apache-airflow[crypto]
- Generate a fernet_key and add in airflow.cfg file:
from cryptography.fernet import Fernet fernet_key= Fernet.generate_key() print(fernet_key)
- Initialise a airflow sqlite db
- Start the airflow webserver
airflow webserver -p 8080
- Start the airflow scheduler in a different window
- Trigger the
tutotialDAG on the Airflow page at
After following these steps, I am not able to see any movement in my scheduler window, which just keeps me showing
INFO - Heartbeating the process manager INFO - Heartbeating the executor
I have tried it running in the local environment as well as in a virtual environment. I have also tried running through
the task is not triggered even when I try to trigger it through the terminal by
airflow trigger_dag tutorial
I am working on mac OS High Sierra Version 10.13.3
The switch next to each task is on "OFF" by default. that was it for me
7 Common Errors to Check when Debugging Airflow DAGs, Apache Airflow has become the premier open-source task scheduler for You should not expect your DAG executions to correspond to your In summary, it seems this situation happened when the parameter catchup_by_default is set to False in airflow.cfg file. This parameter means for Apache Airflow to ignore pass execution time and start the schedule now. To confirm the case I checked with change management if we had some change in this environment.
By default, all the dags will be paused at the start. You have to unpause the dags and trigger them.
Scheduling & Triggers, The Airflow scheduler monitors all tasks and all DAGs, and triggers the task Each DAG may or may not have a schedule, which informs how DAG Runs are Scheduling & Triggers¶. The Airflow scheduler monitors all tasks and all DAGs, and triggers the task instances whose dependencies have been met. Behind the scenes, it spins up a subprocess, which monitors and stays in sync with a folder for all DAG objects it may contain, and periodically (every minute or so) collects DAG parsing results and inspects active tasks to see whether they can be
After triggering your dag you have to turn on your DAG as by default it is off. You can turn this on by using AIRFLOW UI.
FAQ, There are very many reasons why your task might not be getting scheduled. Here are some of the common causes: Does your script “compile”, can the Airflow Airflow pools are not limiting the number of running task instances for the following dag in 188.8.131.52. Steps to recreate: Create a pool of size 5 through the UI. The following dag has 52 tasks with increasing priority corresponding to the task number.
[#AIRFLOW-203] Scheduler fails to reliably schedule tasks when , Scheduler fails to reliably schedule tasks when many dag runs are triggered Using Airflow with Celery, Rabbitmq, and Postgres backend. However, these tasks do not show up in queued status on the UI and don't actually If the task has a state of NONE it will be set to SCHEDULED if the scheduler determines that it needs to run. Tasks in the SCHEDULED state are sent to the executor, at which point it is put into the QUEUED state until it actually runs.
Checking Airflow Health Status, status of each component can be either “healthy” or “unhealthy”. Concurrency is defined in your Airflow DAG. If you do not set the concurrency on your DAG, the scheduler will use the default value from the dag_concurrency entry in your airflow.cfg. task_concurrency: This variable controls the number of concurrent running task instances across dag_runs per task. max_active_runs: the Airflow scheduler will run no more than max_active_runs DagRuns of your DAG at a given time.
Get started developing workflows with Apache Airflow, scheduler executes your tasks on an array of workers while following the specified dependencies. Airflow provides many plug-and-play operators that are ready to handle your task on Google Cloud Platform, Amazon Web Services, Microsoft Azure and many other services. This makes Airflow easy to use with your current infrastructure.