Airflow: Log file isn't local, Unsupported remote log location

I am not able see the logs attached to the tasks from the Airflow UI:

Log related settings in airflow.cfg file are:

  • remote_base_log_folder =
  • base_log_folder = /home/my_projects/ksaprice_project/airflow/logs
  • worker_log_server_port = 8793
  • child_process_log_directory = /home/my_projects/ksaprice_project/airflow/logs/scheduler

Although I am setting remote_base_log_folter it is trying to fetch the log from http://:8793/log/tutorial/print_date/2017-08-02T00:00:00 - I don't understand this behavior. According to the settings the workers should store the logs at /home/my_projects/ksaprice_project/airflow/logs and they should be fetched from the same location instead of remote.

Update task_instance table content:

I also faced the same problem.

Setting below variables in airflow.cfg worked for me. Use {hostname} as machine's FQDN {hostname} instead of localhost.

endpoint_url = http://{hostname}:8080

base_url = http://{hostname}:8080

Best of luck!

"Failed to fetch log file from worker" when running LocalExecutor , The assumption by airflow must be that these two processes are executing on the same host. In a containerized env, this is not the case. Airflow’s logging system requires a custom.py file to be located in the PYTHONPATH, so that it’s importable from Airflow. Start by creating a directory to store the config file, $AIRFLOW_HOME/config is recommended. Create empty files called $AIRFLOW_HOME/config/log_config.py and $AIRFLOW_HOME/config/__init__.py.

As you can see in the image-1 there is a timestamp , make sure in your logs you have the folder/file with that timestamp as name ..

You are looking at UI, so first make sure you have log files created in the directory, in my case my log folder looks like

(AIRFLOW-ENV) [cloudera@quickstart dags]$ ll /home/cloudera/workspace/python/airflow_home/logs/my_test_dag/my_sensor_task 
total 8
-rw-rw-rw- 1 cloudera cloudera 3215 Nov 14 08:45 2017-11-12T12:00:00
-rw-rw-rw- 1 cloudera cloudera 2694 Nov 14 08:45 2017-11-14T08:36:06.920727
(AIRFLOW-ENV) [cloudera@quickstart dags]$ 

So my log URL is

http://localhost:8080/admin/airflow/log?task_id=my_sensor_task&dag_id=my_test_dag&execution_date=2017-11-14T08:36:06.920727

When you go to your DAG, and select the GRAPH-VIEW, you can see a dropdown next to "RUN", select the appropriate run, and then in the graph-view below , select the appropriate task/operator and select view-log

Writing Logs — Airflow Documentation, Users can specify the directory to place log files in airflow.cfg using base_log_folder . By default, logs are placed in the AIRFLOW_HOME directory. The following� Logging in Airflow Airflow provides a ton of flexibility in configuring its logging system. All of the logging in Airflow is implemented through Python's standard logging library. Logs can be piped to remote storage, including Google Cloud Storage and Amazon S3 buckets, and most recently in Airflow 1.10, ElasticSearch.

I ran into this as well, and had to unpause the tasks.

dags_are_paused_at_creation = False

I also set new dags to default to unpaused in my airflow.cfg

dags_are_paused_at_creation = False

"Log file does not exist" - Astronomer Cloud, Is it a problem with my task, or astronomer cloud, or airflow? *** Log file does not exist: /usr/local/airflow/logs/[TASKNAME]s/run/2019-07-11T11:� Airflow’s logging system requires a custom.py file to be located in the PYTHONPATH, so that it’s importable from Airflow. Start by creating a directory to store the config file. $AIRFLOW_HOME/config is recommended. Create empty files called $AIRFLOW_HOME/config/log_config.py and $AIRFLOW_HOME/config/__init__.py.

Writing Logs — Airflow Documentation - Apache Airflow, py file to be located in the PYTHONPATH , so that it's importable from Airflow. Start by creating a directory to store the config file, $AIRFLOW_HOME/config is� My logrotate file looks like this: /var/log/airflow/*/*.log { # rotate log files weekly weekly # keep 1 week worth of backlogs rotate 1 # remove rotated logs older than 7 days maxage 7 missingok }

[#AIRFLOW-3909] cant read log file for previous tries with multiply , cant read log file for previous tries with multiply celery workers As example: *** Log file does not exist: /usr/local/airflow/logs/ While there is the "real" log and the following on different worker - it caused the UI show the� The first step in the workflow is to download all the log files from the server. Airflow supports concurrency of running tasks. We create one downloading task for one log file, all the tasks can be running in parallel, and we add all the tasks into one list.

Writing Logs — Airflow Documentation, py file to be located in the PYTHONPATH , so that it's importable from Airflow. Start by creating a directory to store the config file. $AIRFLOW_HOME/config is� When Airflow interprets a file to look for any valid DAGs, it first runs all code at the top level (i.e. outside of operators) immediately. Even if the operator itself only gets executed at execution time, everything called outside of an operator is called every heartbeat, which can be quite taxing.

Comments
  • what mode are you running the airflow in - Local, Celery? Try checking out the following URL as there is an elaborate discussion on the topic there github.com/puckel/docker-airflow/issues/44
  • using CeleryExecutor
  • could you check in the DB configured - table - task_instance . This table has column named 'hostname' from where the log URL is built and sourced. Ideally this value is same as what you get on running 'hostname' command on your worker node.
  • hostname column is empty string: ''
  • i see most of your task instances are in queued state hence having hostname empty is reasonable. Did the only 'success' task instance give you the desired output? Can you try running some basic operations like BashOperator and see if they are received by worker instance?
  • the base_url is certainly important, many of the pages in the UI use it to build links dynamically. The endpoint_url appears to be used by the cli only, so I doubt it helps with this issue.
  • See github.com/apache/incubator-airflow/blob/master/airflow/… for example where the log filepath is generated, and the following method log_url which uses the base_url config value.
  • Thanks! This fixed the issue for me.
  • I am not able to open the URL. Can you elaborate the fix ?
  • I think I have this problem. But I dont know why the log file with correct timestamp is not generated. Anything obvious that must be happening?