How to parallelize requests with MySQL+PHP+nginx?

mysql parallel query example
parallel hints mysql
mysql 5.7 parallel query
mysql 8 parallel query
mariadb parallel query
mysql partition parallel processing
how to check if parallel query is enabled
aurora mysql parallel query
Recent Discovery

Among everything else I tried, I replaced my JMeter profile with a custom JavaScript that hit each of my API endpoints in turn in an infinite loop, and then ran this script in parallel in different browsers (one Firefox, one Chrome, one Safari) - to try and rule out issues related to all of my connections coming from the same source (same user agent, same cookies, same session ID, etc)

When I did this, I noticed that all of my issues went away. The queries ran in parallel and the app was a lot more responsive than JMeter would have you believe

It seems impossible to me that JMeter would be serializing the requests, since it's a de facto standard for load testing. So I started trying to reproduce the behavior

In an attempt to re-create the JMeter I created the following two PHP scripts which (hopefully) simulated my Yii application:

slow.php

<?php

session_start();

$_SESSION['some'] = 'value';
// Yii is calling session_write_close() almost immediately after
// the session is initialized, but to try and exacerbate issues,
// I've commented it out:
// session_write_close();

$dsn = "mysql:host=localhost;dbname=platypus;unix_socket=/tmp/mysql.sock";
$pdo = new PDO($dsn, "user", "password");
// Yii was using whatever the default persistence behavior was,
// but to try and exacerbate issues I set this flag:
$pdo->setAttribute(PDO::ATTR_PERSISTENT, true);
// Simulate a query running for 1 second by issuing a 1-second sleep
$pdo->query("DO SLEEP(1)");

echo "Done";

fast.php

<?php

session_start();

$_SESSION['some'] = 'value';

$dsn = "mysql:host=localhost;dbname=platypus;unix_socket=/tmp/mysql.sock";
$pdo = new PDO($dsn, "user", "password");
$pdo->setAttribute(PDO::ATTR_PERSISTENT, true);
// Simulate a query running for 0.1 seconds
$pdo->query("DO SLEEP(0.1)");

echo "Done";

Running JMeter against these two new endpoints there was no serialization of requests. Everything ran in parallel. fast.php always returned in 100-150ms and slow.php always returned in 1000-1050ms even as I scaled up to 3, 4, and 5 threads. I was able to watch things collapse at 11 threads, but that's because I exceeded the number of worker threads in PHP

So to summarize:

  • The issue only occurs when profiling my API with JMeter and is not inherent in the app itself
  • The issue isn't just a JMeter bug, but is somehow tied to my application or Yii 1.1
  • I tried but could not come up with a minimum repro case

Despite the issue being non-existent when profiling with others tools, lots of people responded and gave lots of helpful information:

  • Avoid persistent connections in PHP (could cause multiple requests to share a connection, probably not)
  • Avoid session locking by calling session_write_close() as early as possible
  • Ensure you have enough PHP worker threads to handle the number of simultaneous connections
  • MySQL fully supports parallel requests (if the hardware can handle it)
  • Be wary of table locking (any transaction with an UPDATE statement could potentially lock the tables)
  • MyISAM does table-level locking instead of row-level locking
Original Post

I inherited a web application and I'm trying to make sense of its performance profile so I can start optimizing it for speed.

One thing I noticed pretty early on is that requests to the server are being serialized. For instance, suppose I have three endpoints with response times like so:

/api/endpoint1 --> 50ms
/api/endpoint2 --> 1000ms
/api/endpoint3 --> 100ms

If I hit a single endpoint, I measure the appropriate response times. But when I set up a script to hit all 3 at once I will sometimes see something like the following:

endpoint1: 50ms
endpoint2: 1050ms
endpoint3: 1150ms

Clearly the call to endpoint3 was queued and waiting for the call to endpoint2 to finish before it got a response.

My first thought was that this should be trivially solved with multithreading, so I took a look at the server configuration. PHP-FPM's process manager was set to "dynamic" with "start_servers" of 1, "max_children" of 5, and "max_spare_servers" of 2. For testing purposes I swapped this to "static" so that 5 PHP processes would remain open for handling connections in parallel (more than the 3 for the number of endpoints I was hitting, so they should be able to process simultaneously)

This had no impact on performance, so I looked at my nginx config. "worker_processes" was set to 1 with "worker_connections" set to 1024. I know that nginx uses an event loop model, so it shouldn't be blocking while it waits for a response from PHP-FPM. But just in case, I bumped up "worker_processes" to 5

Still, no effect. So next I looked at the database. All 3 endpoints had to hit the database, and I know as a fact that the 1000ms response time is mostly spent waiting on a long-running database query. I tried setting "thread_pool-size" to 5 and also within the MySQL REPL I set "innodb_parallel_read_threads" and "mysqlx_min_worker_threads" to 5

Still, my requests were getting serialized. When I log into the MySQL REPL and type show processlist; while my script is running (using a while-true loop to repeatedly hit those 3 API endpoints) I noticed that there was only ever one connection to the web application's user

Unfortunately I'm not sure if my issue lies with the database (not allowing more than one connection), with PHP-FPM (not processing more than one request at a time), or with nginx (not forwarding more than one request at a time to PHP-FPM). I'm also not sure how to figure out which one is acting as the bottleneck

Update

Looking around some more I found this SO post which seems to suggest that MySQL doesn't support parallel queries from the same user (e.g. from the web application user)

Is this true? Surely such a ubiquitous database engine wouldn't have such a performance flaw, especially given how commonly it's used with AWS for massively scaled applications. I understand that for simple "read from disk" queries parallelizing them wouldn't improve performance since they'd just have to sit in a queue waiting on disk I/O, but modern databases have in-memory caches, and most of the really slow operations like filesort tend to happen in memory. There's no reason a disk-bound query couldn't run in parallel (make a request to disk and start waiting on I/O) while a cpu-bound query is busy sorting a table in RAM. The context switching may slightly slow down the cpu-bound queries, but if slowing those down from 1000ms to 1200ms means my 5ms query can run in 5 ms, I think that's worth it.

My queries

Here are the queries for my 3 endpoints. Note that the timings listed are the response time for the full HTTP pipeline (from browser request to response) so this includes overhead from nginx and PHP, plus any post-processing of the query done in PHP. That said, the query in endpoint 2 makes up 99% of the runtime, and locks the database so that endpoints 1 and 3 are queued up instead of returning quickly.

endpoint1 (50ms)

SELECT * FROM Widget WHERE id = 1 LIMIT 1

(Note that 50ms is the full response time for the endpoint, not how long the query takes. This query is clearly on the order of microseconds)

endpoint2 (1000ms)

USE platypus;
SELECT `t`.`(49 fields)` AS `t0_cX`,
       `site`.`(29 fields)` AS `t2_cX`,
       `customer`.`(26 fields)` AS `t4_cX`,
       `domain`.`(20 fields)` AS `t6_c0`,
       `domain-general_settings`.`(18 fields)` AS `t8_cX`,
       `domain-access_settings`.`(17 fields)` AS `t9_cX`,
       `customer-general_settings`.`(18 fields)` AS `t10_cX`,
       `customer-access_settings`.`(17 fields)` AS `t11_cX`,
       `site-general_settings`.`(18 fields)` AS `t12_cX`,
       `site-access_settings`.`(17 fields)` AS `t13_cX`,
       `backup_broadcast`.`(49 fields)` AS `t14_cX`,
       `playlists`.`(11 fields)` AS `t16_cX`,
       `section`.`(10 fields)` AS `t17_cX`,
       `video`.`(16 fields)` AS `t18_cX`,
       `general_settings`.`(18 fields)` AS `t19_cX`,
       `access_settings`.`(17 fields)` AS `t20_cX`,
FROM   `broadcast` `t`
       LEFT OUTER JOIN `site` `site`
                    ON ( `t`.`site_id` = `site`.`id` )
       LEFT OUTER JOIN `customer` `customer`
                    ON ( `site`.`customer_id` = `customer`.`id` )
       LEFT OUTER JOIN `domain` `domain`
                    ON ( `customer`.`domain_id` = `domain`.`id` )
       LEFT OUTER JOIN `generalsettings` `domain-general_settings`
                    ON ( `domain`.`general_settings_id` =
                         `domain-general_settings`.`id` )
       LEFT OUTER JOIN `accesssettings` `domain-access_settings`
                    ON
       ( `domain`.`access_settings_id` = `domain-access_settings`.`id` )
       LEFT OUTER JOIN `generalsettings` `customer-general_settings`
                    ON ( `customer`.`general_settings_id` =
                         `customer-general_settings`.`id` )
       LEFT OUTER JOIN `accesssettings` `customer-access_settings`
                    ON ( `customer`.`access_settings_id` =
                         `customer-access_settings`.`id` )
       LEFT OUTER JOIN `generalsettings` `site-general_settings`
                    ON ( `site`.`general_settings_id` =
                         `site-general_settings`.`id` )
       LEFT OUTER JOIN `accesssettings` `site-access_settings`
                    ON ( `site`.`access_settings_id` =
                         `site-access_settings`.`id` )
       LEFT OUTER JOIN `broadcast` `backup_broadcast`
                    ON ( `t`.`backup_broadcast_id` = `backup_broadcast`.`id` )
                       AND ( backup_broadcast.deletion IS NULL )
       LEFT OUTER JOIN `playlist_broadcast` `playlists_playlists`
                    ON ( `t`.`id` = `playlists_playlists`.`broadcast_id` )
       LEFT OUTER JOIN `playlist` `playlists`
                    ON
       ( `playlists`.`id` = `playlists_playlists`.`playlist_id` )
       LEFT OUTER JOIN `section` `section`
                    ON ( `t`.`section_id` = `section`.`id` )
       LEFT OUTER JOIN `video` `video`
                    ON ( `t`.`video_id` = `video`.`id` )
                       AND ( video.deletion IS NULL )
       LEFT OUTER JOIN `generalsettings` `general_settings`
                    ON ( `t`.`general_settings_id` = `general_settings`.`id` )
       LEFT OUTER JOIN `accesssettings` `access_settings`
                    ON ( `t`.`access_settings_id` = `access_settings`.`id` )
WHERE
(
    (
        t.id IN (
    SELECT `broadcast`.id FROM broadcast
       LEFT JOIN `mediashare` `shares`
              ON ( `shares`.`media_id` = `broadcast`.`id` )
                 AND `shares`.media_type = 'Broadcast'
    WHERE 
    (
        (
            broadcast.site_id IN(
                '489', '488', '253', '1083', '407'
            )
            OR
            shares.site_id IN(
                '489', '488', '253', '1083', '407'
            )
        )
    )
        )
    )
    AND
    (
        (
            (
                (t.deletion IS NULL)
            )
        )
        AND
        (
            IF(
                t.backup_mode IS NULL,
                t.status,
                IF(
                    t.backup_mode = 'broadcast',
                    backup_broadcast.status,
                    IF(
                        t.backup_mode = 'embed',
                        IF(
                            t.backup_embed_status,
                            t.backup_embed_status,
                            IF(
                                '2020-01-08 16:34:52' < t.date,
                                1,
                                IF(
                                    t.date > Date_sub(
                                        '2020-01-08 16:34:52',
                                        INTERVAL IF(t.expected_duration IS NULL, 10800, t.expected_duration) second
                                    ),
                                    10,
                                    12
                                )
                            )
                        ),
                        t.status
                    )
                )
            ) != 0
        )
    )
)
LIMIT  10;

This query takes roughly 1000ms to run, but the PHP for the endpoint is extremely simple (run the query, return the results as JSON) and only adds a couple milliseconds of overhead

endpoint 3 (100ms)

SELECT * FROM platypus.Broadcast
    WHERE deletion IS NULL
    AND site_id IN (SELECT id FROM platypus.Site
               WHERE deletion IS NULL
                 AND customer_id = 7);

There's additional validation on the PHP side here which makes this endpoint take 100ms. The SQL, as you can see, is still fairly simple.

Create Table Statements

As there is a post length limit in StackOverflow, I cannot show the CREATE TABLE for every single table touched by endpoint 2, but I can show at least one table. Others use the same engine.

CREATE TABLE `Widget` (
  `id` int unsigned NOT NULL AUTO_INCREMENT,
  `widget_name` varchar(255) NOT NULL,
  `widget_description` varchar(255) NOT NULL,
  `status` varchar(255) NOT NULL,
  `date_created` datetime NOT NULL,
  `date_modified` datetime NOT NULL,
  `auto_play` varchar(255) NOT NULL,
  `on_load_show` varchar(255) NOT NULL,
  `widget_content_source` varchar(255) NOT NULL,
  `associated_sites` text NOT NULL,
  `author_id` int NOT NULL,
  `associated_sections` text,
  `after_date` datetime DEFAULT NULL,
  `before_date` datetime DEFAULT NULL,
  `show_playlists` varchar(255) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL,
  `is_classic` tinyint(1) NOT NULL,
  `default_site` int unsigned DEFAULT NULL,
  `auth_code_url` varchar(255) DEFAULT NULL,
  `widget_layout_id` int unsigned DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `fk_Widget_widget_layout_id_WidgetLayout_id` (`widget_layout_id`),
  CONSTRAINT `fk_Widget_widget_layout_id_WidgetLayout_id` FOREIGN KEY (`widget_layout_id`) REFERENCES `WidgetLayout` (`id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=1412 DEFAULT CHARSET=utf8
Note

Notice that endpoint 2 doesn't even touch the Widget table, but endpoint 1 (which ONLY touches the Widget table) is also queued up. This eliminates the possibility of table locking.

When observing the process list in MySQL, only one connection is ever being made to the database from the application user. The issue may therefore lie in my PHP configuration.

Explain for query 2

Attached is the EXPLAIN SELECT ... query for endpoint 2

Simpler experiments

To try and determine where the parallel pipeline was falling apart, I created two simple scripts:

sleep.php

<?php
sleep(5);
echo "Done sleeping";

return.php

<?php
echo "That's all";

Doing this (sleeping in PHP) and running my script to hit these two endpoints with 3 threads I saw no issues. return.php always came back in ~11 milliseconds, despite sleep.php taking 5066 on average. I then tried doing the sleeping in MySQL:

sleep.php

<?php
$pdo = new PDO("...", "user", "pass");
$pdo->query("DO SLEEP(5)");
echo "Done sleeping";

This, again, had no issues. The sleeping endpoint did not block the non-sleeping one.

This means that the issue does not exist at the nginx, PHP, or PDO level - but that there must be some kind of table or row locking going on. I'm going to re-enable the general query log and scan through every query being performed to see if I can figure out what's happening.

Final Update

If you scroll up to "Recent Discovery" at the top of this post, you'll notice that I've modified my understanding of the problem.

I was not having an issue with parallelization, but with JMeter. I have failed to create a simple repro case, but I know now that the issue does not lie with my application but rather with how I'm profiling it.

Increasing slow query performance with the parallel query execution, Vadim wrote about the PHP asynchronous calls for MySQL. Another way to increase the parallelism will be to use “sharding” approach, for� Creating a DB Cluster that Works with Parallel Query. To create an Aurora MySQL cluster with parallel query, add new instances to it, or perform other administrative operations, you use the same AWS Management Console and AWS CLI techniques that you do with other Aurora MySQL clusters.

MySQL 8.0.14: A Road to Parallel Query Execution is Wide Open , For a very long time – since when multiple CPU cores were commonly available – I dreamed about MySQL having the ability to execute queries� I am trying to do a MySQL select request in parallel. The reason is because i need the response fast. I managed to create the request when i paralleled the connection as well, but as the connection takes more time then the actual select it would be faster to connect one time, and do the select in parallel.

MySQL can handle a lot of parallel queries, but you can't do more than one query at the time for each connection. The way PHP is usually setup is that each request goes to a different thread/process, so each process will have its own connection to MySQL, thus the problem mentioned is avoided. Unless you use persistent connection inside PHP and then you might end up using the same connection for each request. If that's the case it should be easy to disable it and go back to the standard one database connection per request model.

My first guess is that endpoint 2 triggers some locking on the database and that's why endpoint3 query is queued until enpoint2's query finishes. This can be fixed by changing the logic in the code (avoid or minimize the locking of the database), or by changing database configuration or table engines used to better suit application needs. Example: InnoDB does row level locking by MyISAM does whole table lock.

Profiling will be really helpful if you don't mind configuring it. I suggest to have a look at Blackfire.io, New Relic or xdebug profiling if you go this route. You will be able to find the bottlenecks faster this way.

How to parallelize requests with MySQL, PHP and NGINX, Nginx handles things in parallel by default and connection pooling allows you to handle MySQL requests/responses in parallel. Your only real issue here is PHP� Parallel execution performs these operations in parallel using multiple parallel processes. One process, known as the parallel execution coordinator , dispatches the execution of a statement to several parallel execution servers and coordinates the results from all of the server processes to send the results back to the user.

HM... too long for a comment.

a little bit simplified every engine has one queue where it gathers querys to be computed, depending on hardware it uses 2 or 3 or even more threads to compute every query. More threads are running more time every query needs, because of locks, like it locks an entire Table, when it inserts a new row with autoincrement.(you will find with a search many examples for locks). Of course every query needs memory and other resources that they have to share with the rest of all computer software that is running on a server.

With clustes you pay the price with overhead to manage multiple sql servers.

So from sql server side, it is parallel, however you need the hardware to support many threads/many engines(which should only be uses very carefully)

Of course you can have many users in sql, but for convenience sake, you have usually one per APP or sometimes even one per server. But the same user can access the database simultaneously, but you can disable that of course.

Your php runs parallel, because webserver are build to run papallel requests and and there it doesn't matter if it runs php, Python(django) or javascript(nodejs) , apache, IIS, nginx and there are a lot more, every technology has there perks and of cause more module you add to en engine, so much slower it gets.

So everything is parallel to a certain degree and you can increase the power of such systems as you see in cloud providers or virtual servers and so on.

The limits you only notice when like the introduction of Pokemon go or new games where even the huge cloud providers crash. Or the disaster with ObamaCare where nothing was tested on that scale, whichever idi... was responsible,

Parallelizing such tasks is difficult, because in case of a webserver and sqlserver it has to a degree caches where they park requests that are often made, but usually every request needs its own data.

In reality everything is much more complicated, starting with cpus with 3 Pipelines , Multiple cores and shared memory(which caused Meltdown and their brothers), goes over tables or databases that reside only in memory for high performance or web server that run only in cache of cpus, which is much faster than memory or harddrives.....

MySQL Shell 8.0 :: 7.3 Parallel Table Import Utility, To avoid a known potential security issue with LOAD DATA LOCAL , when the MySQL server replies to the parallel table import utility's LOAD DATA requests� Splitting a complex report into multiple queries and running it in parallel (asynchronously) can increase performance (3x to 10x in the above example) and will better utilize modern hardware. It is also possible to split the queries between multiple MySQL servers (i.e. MySQL slave servers) to further increase scalability (will require more coding).

WL#5569: Replication events parallel execution via , WL#5569: Replication events parallel execution via hashing per database name. Affects: Server-5.6 — Status: Complete. Description; Dependent Tasks; High� The Mysql JDBC driver does not allow multiple SQL statements in a single statement by default. However, a configuration parameter allows you to change its default behavior as explained below:

Working with Parallel Query for Amazon Aurora MySQL, Parallel query doesn't affect the calls to SUBSTR() and UPPER() because they appear in the select list. mysql> explain select sql_no_cache distinct� The crucial point is that this is a demand-based scheme; the Parallel Page Supplier responds to requests from workers, providing a batch of rows to any worker that needs more work to do. Referring back to the bean-counting analogy, the Parallel Page Supplier is represented by the scoop used to remove beans from the jar.

MySQL Connection Handling and Scaling, The MySQL Clients send connection requests to the MySQL Server. within certain response time SLA and that can be executed in parallel. Justin Swanhart. Justin is a former Principal Support Engineer on the support team. In the past, he was a trainer at Percona and a consultant. Justin also created and maintains Shard-Query, a middleware tool for sharding and parallel query execution and Flexviews, a tool for materialized views for MySQL.

Comments
  • What version of MySQL? I don't think innodb_parallel_read_threads existed until 8.0.14 and has limited utility. mysqlx_min_worker_threads seems unrelated.
  • "while my script is running" - does it wait before moving on??
  • "MySQL doesn't support parallel queries from the same user" -- False.
  • Show us the 1000ms query.
  • Major update! I figured out how to reproduce the session lock! Reading through the Yii app deeper, I found that it was using a custom session_name. My JMeter config wasn't getting tripped up on the session lock in my minimum repro case because it was using the wrong cookie name. When I updated my JMeter profile to use the default session name, or when I used session_name(...), I reproduced the session lock and saw the same serial behavior!
  • Speeding up the SQL is definitely an ultimate goal (and great improvements have already been made, bringing it down from 4 seconds to 1 second), but it's a complex query being generated by an ORM on top of a poorly designed schema - and it's mostly legacy stuff that we'll have trouble adjusting. I'll update my original post with the queries shortly
  • I've updated my post with one of the create table statements so you can see the engine and basic database setup (it's not super complex). I've also included the queries being run. Table locking is not occurring, but there is only one connection being made to the MySQL database at a time from PHP, despite 3 separate PHP worker processes running.
  • @stevendesu - I don't think it is possible for 3 "separate worker processes" to share a single connection. Do SHOW PROCESSLIST; to see how fast the "pids"are growing.
  • @stevendesu - (I tweaked the index recommendations again.)
  • Observing the pid it does increase, but not very quickly. About half the time when I run show processlist; I get back that the "Command" is Sleep, and the pid will remain unchanged for 1-2 seconds at a time, then suddenly jump by 4 or 5. As far as changing the query, I have limited ability to do that. The query is generated from a rather old and massive Yii 1.1 application with lots of scopes and filters. Instead of editing the query directly, I have to make modifications to the PHP code that's generating it, which can be tedious to trace.
  • I've attempted to use xdebug for profiling, but I wasn't able to garner a lot of useful information. I ended up with a massive graph and thousands of lines of logs and metrics that were nearly impossible to sift through. After a whole day of reading I concluded "we're waiting on I/O from the database". This led me to check things out from the database side, where I found there's only one connection being made at a time despite having 3 PHP worker processes running
  • How can I verify if PHP is using persistent connections? I have connection pooling disabled for PDO in php.ini, but I don't know if that's the only setting I need to tweak or if it's being overridden somehow
  • Search for PDO::ATTR_PERSISTENT, or see your ORm docs (since you said you are using an ORM).
  • The ORM docs specify that persistence can be enabled (using Yii::app->db->setPersistent(true)) but don't specify the default behavior. I experimented and found that when I enabled persistence I started seeing more than one connection to the database in show processlist;, but only one connection was utilized at a time and requests were still serialized. So the issue seems to lie not with MySQL, but with PHP. It almost seems like one worker thread is handling all requests. Perhaps an FPM setting?
  • MySQL specifically has one "thread" or one "process" (depending on OS) per connection. Hence, separate logins (connections) are running virtually independently.