Question
The planner log is displaying partition recovery every minute. The entries indicate recovery is failing and there is a partition recovery queue.
proc stderr: E1004 18:59:05.176966 238 PartitionRecoverer.java:175] Recovering partitions for: db1.partitioned_tb_1
proc stderr: E1004 18:59:05.218933 238 PartitionRecoverer.java:151] Ignoring table db1.partitioned_tb_1 as it is already queued.
proc stderr: E1004 18:59:05.289180 238 PartitionRecoverer.java:183] Recovery for db1.partitioned_tb_1 failed in 112ms
proc stderr: E1004 18:59:05.289254 238 PartitionRecoverer.java:198] Remaining queue length: 5
proc stderr: E1004 18:59:35.289513 238 PartitionRecoverer.java:175] Recovering partitions for: db2.partitioned_tb_a
proc stderr: E1004 18:59:35.333499 238 PartitionRecoverer.java:151] Ignoring table db2.partitioned_tb_a as it is already queued.
proc stderr: E1004 18:59:35.372874 238 PartitionRecoverer.java:183] Recovery for db2.partitioned_tb_a failed in 83ms
proc stderr: E1004 18:59:35.373013 238 PartitionRecoverer.java:198] Remaining queue length: 5
.
.
.
- Regarding the message Ignoring table db1.partitioned_tb_1 as it is already queued, where is this queue?
- Other tables show successful partition recovery, yet this log snippet looks like the process is stuck. How do we get it to move past the "stuck" partitions?
Answer
When Auto Partition Recovery is enabled for the cluster, any access to a partitioned table (getTable/loadTable etc..) adds the table to a "partition recovery" queue. There is a continuous thread that picks up items (tables) in this queue and runs ALTER TABLE recover partitions on those tables. There is no set schedule: The recovery is queued when the table is accessed via a select/DDL operation on the table. If the table is not accessed, the recovery is not launched. So it is normal to see delays between recoveries on a table.
Recovery happens from the first and for any subsequent access. It also runs when you add a new partition. One method of keeping the catalog of tables with a large number or partitions and files in sync with s3 is to touch the table, perhaps through a view that is referencing the table. This access launches the auto-recovery kicks in.
Note that Auto Partition Recovery does not run when the when a table is first created. Okera highly recommends as a best practice the user run ALTER TABLE RECOVER PARTITIONS immediately following the CREATE TABLE
The Ignoring table log message indicates the table is already in the queue. This message most often appears for tables that are frequently accessed. Auto Partition Recovery is a continuous background process whose function is to keep the partition information updated. Disabling auto recovery would adversely affect queries when new partitioned data is added in s3 since the partition information would become stale.
Automatic Partition Recovery is a catalog maintenance operation that ensures that each partitioned dataset automatically includes new partitions. Unless the datasets contain a very large number of files and are deeply nested, Okera does not recommend disabling this feature.
Comments
0 comments
Please sign in to leave a comment.