Weird behavior optimizing query indices (MariaDB + InnoDB)

mysql query optimization techniques
query optimization in mysql with example
speed up mysql queries large tables
mysql millions of rows performance
how to optimize count query in mysql
how to reduce query execution time in mysql
mysql index optimization
mysql explain

I'm currently trying to optimize the indices for a quite large table of a project and experiencing a very counter intuitive behavior between the explain result and the actual query runtime.

The server is running MariaDB version 10.1.26-MariaDB-0+deb9u1 with the following configuration options:

key_buffer_size         = 5G
max_allowed_packet      = 16M
thread_stack            = 192K
thread_cache_size       = 8

myisam_sort_buffer_size = 512M
read_buffer_size        = 2M
read_rnd_buffer_size    = 1M

query_cache_type = 0
query_cache_limit = 256K
query_cache_min_res_unit = 2k
query_cache_size = 0M

join_buffer_size = 8M
sort_buffer_size = 8M
tmp_table_size = 64M
max_heap_table_size = 64M
table_open_cache = 4K
performance_schema = ON
innodb_buffer_pool_size = 30G
innodb_log_buffer_size = 4MB
innodb_log_file_size = 1G
innodb_buffer_pool_instances = 10

The table looks contains about 6.8 million rows summing up to 12.1GB and looks like this:

CREATE TABLE `ad_master_test` (
    `ID_AD_MASTER` BIGINT(20) UNSIGNED NOT NULL AUTO_INCREMENT,
    /* Some more attribute fields (mainly integers) ... */
    `FK_KAT` BIGINT(20) UNSIGNED NOT NULL,
    /* Some more content fields (mainly varchars/integers) ... */
    `STAMP_START` DATETIME NULL DEFAULT NULL,
    `STAMP_END` DATETIME NULL DEFAULT NULL,
    PRIMARY KEY (`ID_AD_MASTER`),
    INDEX `TEST1` (`STAMP_START`, `FK_KAT`),
    INDEX `TEST2` (`FK_KAT`, `STAMP_START`)
)
COLLATE='utf8_general_ci'
ENGINE=InnoDB
ROW_FORMAT=DYNAMIC
AUTO_INCREMENT=14149037;

I already simplyfied the query as far as possible to better illustrate the Problem. I'm using FORCE INDEX to illustrate my issue here.

This first index is optimized using the explain statement and looks pretty promising (regarding the explain output):

SELECT * 
FROM `ad_master_test`
FORCE INDEX (TEST1)
WHERE FK_KAT IN
    (94169,94163,94164,94165,94166,94167,94168,94170,94171,94172,
     94173,94174,94175,94176,94177,94162,99606,94179,94180,94181,
     94182,94183,94184,94185,94186,94187,94188,94189,94190,94191,
     94192,94193,94194,94195,94196,94197,94198,94199,94200,94201,
     94202,94203,94204,94205,94206,94207,94208,94209,94210,94211,
     94212,94213,94214,94215,94216,94217,94218,94219,94220,94221,
     94222,94223,94224,94225,94226,94227,94228,94229,94230,94231,
     94232,94233,94234,94235,94236,94237,94238,94239,94240,94241,
     94178,94161)

ORDER BY STAMP_START DESC
LIMIT 24

Results in this explain:

id     select_type   table            type     possible_keys   key     key_len    ref        rows      Extra
1      SIMPLE        ad_master_test   index    (NULL)          TEST1   14         (NULL)     24        Using where

And this profile:

Status                  Duration
starting                0.000180
checking permissions    0.000015
Opening tables          0.000041
After opening tables    0.000013
System lock             0.000011
Table lock              0.000013
init                    0.000115
optimizing              0.000044
statistics              0.000050
preparing               0.000039
executing               0.000009
Sorting result          0.000016
Sending data            4.827512
end                     0.000023
query end               0.000008
closing tables          0.000004
Unlocking tables        0.000014
freeing items           0.000011
updating status         0.000132
cleaning up             0.000021

The second index is just the fields reversed (the way I understood it here: https://dev.mysql.com/doc/refman/8.0/en/order-by-optimization.html ) which looks pretty horrible (regarding the explain output):

SELECT * 
FROM `ad_master_test`
FORCE INDEX (TEST2)
WHERE FK_KAT IN (94169,94163,94164,94165,94166,94167,94168,94170,94171,94172,94173,94174,94175,94176,94177,94162,99606,94179,94180,94181,94182,94183,94184,94185,94186,94187,94188,94189,94190,94191,94192,94193,94194,94195,94196,94197,94198,94199,94200,94201,94202,94203,94204,94205,94206,94207,94208,94209,94210,94211,94212,94213,94214,94215,94216,94217,94218,94219,94220,94221,94222,94223,94224,94225,94226,94227,94228,94229,94230,94231,94232,94233,94234,94235,94236,94237,94238,94239,94240,94241,94178,94161)
ORDER BY STAMP_START DESC
LIMIT 24

Results in this explain:

id     select_type   table            type     possible_keys   key     key_len    ref        rows      Extra
1      SIMPLE        ad_master_test   range    TEST2           TEST2   8          (NULL)     497.766   Using index condition; Using filesort

And this profile:

Status                 Duration
starting               0.000087
checking permissions   0.000007
Opening tables         0.000021
After opening tables   0.000007
System lock            0.000006
Table lock             0.000005
init                   0.000058
optimizing             0.000023
statistics             0.000654
preparing              0.000480
executing              0.000008
Sorting result         0.433607
Sending data           0.001681
end                    0.000010
query end              0.000007
closing tables         0.000003
Unlocking tables       0.000011
freeing items          0.000010
updating status        0.000158
cleaning up            0.000021

Edit: When not using force index the explain changes as following:

id     select_type   table            type     possible_keys   key     key_len    ref        rows      Extra
1      SIMPLE        ad_master_test   index    TEST2           TEST1   14         (NULL)     345       Using where

The profile and runtime stays (as expected) the same it was when using FORCE INDEX on the TEST1 index.

/Edit

I honestly can't wrap my head around this. Why does the explain and the actual query performance differ that extremely. What does the server do while the 5 seconds "Sending data"?

It looks like there are some TEXT or BLOB or even large VARCHAR columns?? 12.1GB/6.8M = 1.8KB. If you don't need them, don't fetch them; this may speed up any such query. How much RAM do you have?

The two indexes do seem to take a different time (4.8s vs 0.4s).

(STAMP_START, FK_KAT)

This avoids the "filesort" by scanning the index BTree in the desired order. It has to check every entry for a matching fk_kat. I think it will stop after 24 (see LIMIT) matching rows, but that could be the first 24 (fast), the last 24 (very slow), or something in between.

(FK_KAT, STAMP_START)

This show go directly to all 82 ids, scan each (assuming not unique), collecting perhaps hundreds of rows. Then do a "filesort". (Note: this will be a disk sort if any TEXT columns are being fetched.) Then deliver the first 24. (Oops; I don't think MariaDB 10.1 has that feature.)

Even though this takes more steps, by avoiding the full index scan it turns out to be faster.

Other Notes

key_buffer_size = 20G - Don't use MyISAM. But if you do, change this to 10% of RAM. If you don't, change it to 30M and give 70% of RAM to innodb_buffer_pool_size.

If you want to discuss further, please provide EXPLAIN FORMAT=JSON SELECT ... for each query. This will have the "cost" analysis, which should explain why it picked the worse index.

Another experiment

Instead of SELECT *, run the timings and EXPLAINs with just SELECT ID_AD_MASTER. If that proves to be "fast", then reformulate the query thus:

SELECT b.*   -- (or selected columns from `b`)
    FROM ( SELECT ID_AD_MASTER FROM ... ) AS a
    JOIN ad_master_test AS b  USING(ad_master_test)
    ORDER BY STAMP_START DESC ;   -- (yes, repeat the ORDER BY)

4. Query Performance Optimization, Query optimization, index optimization, and schema optimization go hand in hand. The default behavior is generally to fetch the whole result and buffer it in memory. Locks that are implemented by the storage engine, such as InnoDB's row We agree, a movie without actors is strange, but the Sakila sample  Server is MariaDB 5.5 Slow query (4-5 seconds) is: SELECT VisitDate AS LASTDATE FROM Vis Stack Exchange Network Stack Exchange network consists of 175 Q&A communities including Stack Overflow , the largest, most trusted online community for developers to learn, share their knowledge, and build their careers.

Suggestions to consider for your my.cnf [mysqld] section (RPS is RatePerSecond)

thread_handling=pool-of-threads  # from one-thread-per-connection see refman
max_connections=100  # from 151 because max_used_connections < 60
read_rnd_buffer_size=256K  # from 1M to reduce RAM used, < handler_read_rnd_next RPS
aria_pagecache_division_limit=50  # from 100 for WARM cache for < aria_pagecache_reads RPS
key_cache_division_limit=50  # from 100 for WARM cache for < key_reads
key_buffer_size=2G  # from 5G  Mysqltuner reports 1G used (this could be WRONG-test it)
innodb_io_capacity=30000  # from 200 since you have SSD
innodb_buffer_pool_instances=8  # from 16 for your volume of data
innodb_lru_scan_depth=128  # from 1024 to conserve CPU every SECOND see refman
innodb_buffer_pool_size=36G  # from 30G for effective size of 32G when
innodb_change_buffer_pool_size=10  # from 25% set aside for Del,Ins,Upd activities

for additional suggestions, view profile, Network profile, for contact info including my Skype ID. There are additional opportunities available to improve your configuration.

Remember the advice ONLY one change per day, monitor, if positive results proceed to next suggestion. Otherwise let me know any seriously adverse result and which change seemed to cause the problem, please.

4. Indexes - High Performance MySQL [Book], In fact, MySQL will only ever use one index per table per query—except for UNION s. store the indexes in descending order and are optimized for reading them in that order. InnoDB and BDB tables require primary keys for every table. or accidental server shutdown), MySQL may begin to exhibit very strange behavior. / Optimization / Optimizing for InnoDB Tables / Optimizing InnoDB Disk I/O 8.5.8 Optimizing InnoDB Disk I/O If you follow best practices for database design and tuning techniques for SQL operations, but your database is still slow due to heavy disk I/O activity, consider these disk I/O optimizations.

OPTIMIZE TABLE, It can either be used to defragment tables, or to update the InnoDB fulltext index. MariaDB starting with  SELECT TABLE_NAME, ENGINE FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_SCHEMA = 'test'; In this case, after the server has scanned the database directory to determine the names of the tables in the database, those names become available with no further file system lookups. Thus, TABLE_NAME requires no files to be opened.

Why MySQL Could Be Slow With Large Tables?, In fact, even MySQL optimizer currently does not take it into account. There are also clustered keys in Innodb which combine index access with So if you're dealing with large data sets and complex queries here and valuable MySQL, MariaDB, PostgreSQL and MongoDB-related Strange behavior… OPTIMIZE TABLE for an InnoDB table is mapped to an ALTER TABLE operation to rebuild the table and update index statistics and free unused space in the clustered index. This operation does not use fast index creation.

Innodb vs MySQL index counts, I had a customer recently who a few strange errors in their mysqld.err log: to do with Fast index creation in Innodb, and that it had been corrected Advanced Query Tuning in MySQL 5.6 and MySQL 5.7 Webinar: and valuable MySQL, MariaDB, PostgreSQL and MongoDB-related posts from the blog. Optimization. Language Structure. Controlling Transactional Behavior of the InnoDB memcached Plugin. This section covers topics related to InnoDB indexes.

[PDF] Query Optimization, Originally: Query Optimization from 0 to 10 by Jaime MariaDB [nlwiktionary]> ALTER TABLE page ADD INDEX RTREEs in MyISAM and InnoDB Strange game. EXPLAIN EXTENDED is now the default behavior. Engine-independent table statistics lift these limitations. Statistics are stored in regular tables in the mysql database. it is possible for a DBA to read and update the values. More data is collected/used. Statistics are stored in three tables, mysql.table_stats, mysql.column_stats and mysql.index_stats.

Comments
  • Thanks at first for your reply. I don't understand how scanning even a full index of 2 fields can take almost 5 seconds. The second one does not have 82 results, but 497.766. Thats why I'm so confused how that can be faster.
  • @JensN - Scanning an index of, say, 400MB takes time, especially if it is not cached in RAM. Without knowing your disk speed, caching situation, etc, I can't say whether 5 seconds is "high" or "low".
  • @JensN - 497.766 is just an estimate based on statistics. How many rows were there actually? EXPLAIN is usually off by some; sometimes it is off by a lot.
  • The actual number currently is 256.146. The index is around 150-200MB. The database is located on an SSD and the settings are in the original question. I actually completely removed the MyISAM tables that were still present on that server and updated the settings accordingly (I edited the question to reflect my changes). The server has more than enough ram to fit in the whole table and all related indices at once.
  • @WilsonHauck A) The server has 64GB of RAM and the mysqld process currently utilizes around 33GB. B) pastebin.com/uWWN6EGC C) pastebin.com/mKNgQpqq D) pastebin.com/1cKajEsz More: The server is a fresh installation with currently only one project on it. It permanently has a load from around 0.5 - 1.0 or lower and no obvious resource bottleneck of any kind.