Improve performance of complex SQL query

sql query performance tuning tips
improve sql server query performance on large tables
how to optimize sql query with multiple joins
query optimization in sql server with example
sql performance tuning tutorial
how to make select query faster in sql server
how to check query performance in sql server
how to search millions of record in sql table faster?

Since a few month by now I am collecting data of football matches from open sources. To do so I grab the data of a specific website URL, that presents the data for a specific football match, with PHP via XPath. Then I do some data editing so they fit the way I need them. The next and last step is to transfer them into my MySQL database in several tables.

With a fast growing database I slowly run into severe performance issues. Because I do everything locally on my computer and that one isn't a monster of machine, it already takes some time to process a match. To get a feeling how fast that is: In the first days of datamining a match took around 24 seconds. However, now the average went beyond a 60 seconds threshold.

Up until now I occasionally went into the PHP code and tried to improve it where possible, because I thought the main issue is in not so clean code snippets. While it helped a bit, the average time grew further after several days and lately I started to realize there must be another time consuming issue. So I made a test PHP script that performes some kind of logging while running the main code.

It shows that some SQL queries I do to insert the data in my database tables take a lot of time on average (I analyzed 100 matches here):

  • starting squad in DB: 6.44 seconds
  • benched substitutes in DB: 8.49 seconds

Checking the queries again I realized they are quite complex.

Those are the tables involved:

tblStartingSquad

+----+---------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+------------+------------+--------+  
| id | matchID | player1ID | player2ID | player3ID | player4ID | player5ID | player6ID | player7ID | player8ID | player9ID | player10ID | player11ID | clubID |
+----+---------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+------------+------------+--------+
| 1  |    1    |     1     |     2     |     3     |     4     |     5     |     6     |     7     |     8     |     9     |     10     |     11     |   1    |
| 2  |    1    |    12     |    13     |    14     |    15     |    16     |    17     |    16     |    17     |    18     |     19     |     20     |   2    |
| 3  |    2    |     1     |     2     |     3     |     4     |     5     |     6     |     7     |     8     |     9     |     10     |     11     |   1    |
| 4  |    2    |    21     |    22     |    23     |    24     |    25     |    26     |    27     |    28     |    29     |     30     |     31     |   3    |
+----+---------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+------------+------------+--------+

tblSubstitutes

+----+---------+------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+--------+  
| id | matchID | player12ID | player13ID | player14ID | player15ID | player16ID | player17ID | player18ID | player19ID | player20ID | player21ID | player22ID | player23ID | clubID |
+----+---------+------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+--------+
| 1  |    1    |     32     |     33     |     34     |     35     |     36     |     37     |     38     |     39     |     40     |     41     |     42     |     43     |   1    |
| 2  |    1    |     44     |     45     |     46     |     47     |     48     |     49     |     50     |     51     |     52     |     53     |     54     |     55     |   2    |
| 3  |    2    |     32     |     33     |     34     |     35     |     36     |     37     |     38     |     39     |     40     |     41     |     42     |     43     |   1    |
| 4  |    2    |     56     |     57     |     58     |     59     |     60     |     61     |     61     |     62     |     63     |     64     |     65     |     66     |   3    |
+----+---------+------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+--------+

tblMatch

+---------+---------------------+-------------+------------------+
| matchID |         date        |    coach1   |      coach2      |
+---------+---------------------+-------------+------------------+
|    1    | 2006-08-19 22:00:00 | Piotr Nowak | Fernando Clavijo |
|    2    | 2006-08-15 21:00:00 | Piotr Nowak |   Mustafa Ugur   |
+---------+---------------------+-------------+------------------+

tblPlayer

+----------+------------------------+------------------+
| playerID |       namePlayer       |      short       |
+----------+------------------------+------------------+
|     1    |       Enis Ulusan      |    enis-ulusan   |
|     2    |   Grant Robert Murray  |   grant-murray   |
|     3    |     Evgeniy Shpedt     |  evgeniy-shpedt  |
|     4    | Mihai Alexandru Costea |   mihai-costea   |
|     5    |       Jan Zolna        |     jan-zolna    |
|     6    |    Adrian Gheorghiu    | adrian-gheorghiu |
|     7    | Marius Marian Croitoru | marius-croitoru  |
|     8    |    Jacov Nachtailer    | jacov-nachtailer |
|    ...   |          ...           |        ...       |
+----------+------------------------+------------------+

tblClub

+--------+-----------------+
| clubID |    nameClub     |
+--------+-----------------+
|    1   |   D.C. United   |
|    2   | Colorado Rapids |
|    3   | Caykur Rizespor |
+--------+-----------------+

And those are the queries involved:

SQL Query starting squad

$tblstarting_squad = 'INSERT INTO tblStartingSquad (matchID, player1ID, player2ID, player3ID, player4ID, player5ID, player6ID, player7ID, player8ID, player9ID, player10ID, player11ID, clubID) 
                    SELECT
                        (SELECT matchID FROM tblMatch WHERE date = "' . $match_date . '" AND coach1 = "' . $match_coach_home . '" AND coach2 = "' . $match_coach_away . '"), 
                        (SELECT playerID FROM tblPlayer WHERE namePlayer = "' . $player_name[0] . '" AND short = "' . $player_short[0] . '" LIMIT 1), 
                        (SELECT playerID FROM tblPlayer WHERE namePlayer = "' . $player_name[1] . '" AND short = "' . $player_short[1] . '" LIMIT 1), 
                        (SELECT playerID FROM tblPlayer WHERE namePlayer = "' . $player_name[2] . '" AND short = "' . $player_short[2] . '" LIMIT 1), 
                        (SELECT playerID FROM tblPlayer WHERE namePlayer = "' . $player_name[3] . '" AND short = "' . $player_short[3] . '" LIMIT 1), 
                        (SELECT playerID FROM tblPlayer WHERE namePlayer = "' . $player_name[4] . '" AND short = "' . $player_short[4] . '" LIMIT 1), 
                        (SELECT playerID FROM tblPlayer WHERE namePlayer = "' . $player_name[5] . '" AND short = "' . $player_short[5] . '" LIMIT 1), 
                        (SELECT playerID FROM tblPlayer WHERE namePlayer = "' . $player_name[6] . '" AND short = "' . $player_short[6] . '" LIMIT 1), 
                        (SELECT playerID FROM tblPlayer WHERE namePlayer = "' . $player_name[7] . '" AND short = "' . $player_short[7] . '" LIMIT 1), 
                        (SELECT playerID FROM tblPlayer WHERE namePlayer = "' . $player_name[8] . '" AND short = "' . $player_short[8] . '" LIMIT 1), 
                        (SELECT playerID FROM tblPlayer WHERE namePlayer = "' . $player_name[9] . '" AND short = "' . $player_short[9] . '" LIMIT 1), 
                        (SELECT playerID FROM tblPlayer WHERE namePlayer = "' . $player_name[10] . '" AND short = "' . $player_short[10] . '" LIMIT 1), 
                        (SELECT clubID FROM tblClub WHERE nameClub = "' . $match_club[1] . '" LIMIT 1)
                    WHERE NOT EXISTS (
                        SELECT e.matchID 
                        FROM tblStartingSquad As e
                        INNER JOIN tblMatch As m
                            ON e.matchID = m.matchID
                        WHERE m.date = "' . $match_date . '" AND m.coach1 = "' . $match_coach_home . '" AND m.coach2 = "' . $match_coach_away . '" AND e.clubID = (SELECT clubID FROM tblClub WHERE nameClub = "' . $match_club[1] . '")
                    );';

if (!mysqli_query($db_connection, $tblstarting_squad)) {
                    echo("Error description $tblstarting_squad: " . mysqli_error($db_connection) . "<br />");
                }

SQL Query benched substitutes

$tblsubstitutes = 'INSERT INTO tblSubstitutes (matchID, player12ID, player13ID, player14ID, player15ID, player16ID, player17ID, player18ID, player19ID, player20ID, player21ID, player22ID, player23ID, clubID) 
                    SELECT
                        (SELECT matchID FROM tblMatch WHERE date = "' . $match_date . '" AND coach1 = "' . $match_coach_home . '" AND coach2 = "' . $match_coach_away . '"), 
                        (SELECT playerID FROM tblPlayer WHERE namePlayer = "' . $player_name[11] . '" AND short = "' . $player_short[11] . '" LIMIT 1), 
                        (SELECT playerID FROM tblPlayer WHERE namePlayer = "' . $player_name[12] . '" AND short = "' . $player_short[12] . '" LIMIT 1), 
                        (SELECT playerID FROM tblPlayer WHERE namePlayer = "' . $player_name[13] . '" AND short = "' . $player_short[13] . '" LIMIT 1), 
                        (SELECT playerID FROM tblPlayer WHERE namePlayer = "' . $player_name[14] . '" AND short = "' . $player_short[14] . '" LIMIT 1), 
                        (SELECT playerID FROM tblPlayer WHERE namePlayer = "' . $player_name[15] . '" AND short = "' . $player_short[15] . '" LIMIT 1), 
                        (SELECT playerID FROM tblPlayer WHERE namePlayer = "' . $player_name[16] . '" AND short = "' . $player_short[16] . '" LIMIT 1), 
                        (SELECT playerID FROM tblPlayer WHERE namePlayer = "' . $player_name[17] . '" AND short = "' . $player_short[17] . '" LIMIT 1), 
                        (SELECT playerID FROM tblPlayer WHERE namePlayer = "' . $player_name[18] . '" AND short = "' . $player_short[18] . '" LIMIT 1), 
                        (SELECT playerID FROM tblPlayer WHERE namePlayer = "' . $player_name[19] . '" AND short = "' . $player_short[19] . '" LIMIT 1), 
                        (SELECT playerID FROM tblPlayer WHERE namePlayer = "' . $player_name[20] . '" AND short = "' . $player_short[20] . '" LIMIT 1), 
                        (SELECT playerID FROM tblPlayer WHERE namePlayer = "' . $player_name[21] . '" AND short = "' . $player_short[21] . '" LIMIT 1),
                        (SELECT playerID FROM tblPlayer WHERE namePlayer = "' . $player_name[22] . '" AND short = "' . $player_short[22] . '" LIMIT 1), 
                        (SELECT clubID FROM tblClub WHERE nameClub = "' . $match_club[1] . '" LIMIT 1)
                    WHERE NOT EXISTS (
                        SELECT e.matchID 
                        FROM tblSubstitutes As e
                        INNER JOIN tblMatch As m
                            ON e.matchID = m.matchID
                        WHERE m.date = "' . $match_date . '" AND m.coach1 = "' . $match_coach_home . '" AND m.coach2 = "' . $match_coach_away . '" AND e.clubID = (SELECT clubID FROM tblClub WHERE nameClub = "' . $match_club[1] . '")
                    );';

if (!mysqli_query($db_connection, $tblsubstitutes)) {
                    echo("Error description $tblsubstitutes: " . mysqli_error($db_connection) . "<br />");
                }

Both queries are practically identical. They insert the playerID of 11 (respectively 12) players into the tblStartingSquad (respectively tblSubstitutes) if there's no other entry with identical data. The playerID has to be checked beforehand in the database as the raw data doesn't have the individual ID. That happenes by selecting it via namePlayer and short from the table tblPlayer.

The tables tblStartingSquad and tblSubstitutes itself contain 110,000 rows (for 55,000 matches) currently, tblPlayer is at 100,000 rows.

I've googled a bit for solutions but wasn't quite able to find anything that would improve the overall speed. What I understand as an issue is that I have to check every player individually, so I get 11 and 12 subqueries. That's not really elegant, but I really don't have an idea how to improve that. Maybe someone out here on StackOverflow has a suggestion?


Reconsider your wide table design for a long table design. Numbered suffixed columns are never ideal data storage. Rows are cheap. Columns are expensive. Joins, aggregation, searching, indexing, etc. are much easier in long format. Otherwise, your queries will be complex as you show with 12 subqueries or even self-joins!

Interestingly, your tblClub and tblPlayer are in long format but not tblStartingSquad and tblSubstitutes! Simply, remove all extraneous player columns into one where rows indicate different players:

tblStartingSquad

 ID   MatchID   PlayerID    ClubID  
  1         1          5         1
  2         1          8         1
  3         1          9         1
...

tblSubstitutes

 ID   MatchID   PlayerID    ClubID
  1         1          2         1
  2         1         16         1
  3         1          7         1
...

tblMatch (with renamed coach columns for clarity)

ID                    Date      HomeCoach          AwayCoach
 1     2006-08-19 22:00:00    Piotr Nowak   Fernando Clavijo
 2     2006-08-15 21:00:00    Piotr Nowak       Mustafa Ugur

PHP

From this database design, you can then run a simpler PHP parameterized query call and even easier with PDO, instead of mysqli for binding many parameters from arrays.

// OPEN CONNECTION
$dbconn = new PDO("mysql:host=$servername;dbname=$dbname", $username, $password);
// SET PDO ERROR MODE TO EXCEPTION
$dbconn -> setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);

// PREPARED STATEMENT
$sql = "INSERT INTO tblStartingSquad (`Match`, `PlayerID`, `ClubID`)
        SELECT m.MatchID, p1.PlayerID, c.ClubID
        FROM 
           (SELECT p.PlayerID
            FROM tblPlayer p
            WHERE p.namePlayer IN (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
           ) p1
        INNER JOIN 
           (SELECT p.PlayerID
            FROM tblPlayer p
            WHERE p.short IN (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
           ) p2 ON p1.PlayerID = p2.PlayerID
        CROSS JOIN
            (SELECT MatchID 
             FROM tblMatch
             WHERE `date` = ? AND HomeCoach = ? AND AwayCoach = ?) m
        CROSS JOIN
            (SELECT ClubID 
             FROM tblClub
             WHERE nameClub = ?) c
        WHERE NOT EXISTS
             (SELECT 1 FROM tblStartingSquad As e
              WHERE e.MatchID = m.matchID)"

try {
     // INITIALIZE STATEMENT
     $stmt = $dbconn->prepare($sql);

     $params = array($player_name[0], $player_name[1], $player_name[2], 
                     $player_name[3], $player_name[4], $player_name[5], 
                     $player_name[6], $player_name[7], $player_name[8], 
                     $player_name[9], $player_name[10],  
                     $player_short[0], $player_short[1], $player_short[2],
                     $player_short[3], $player_short[4], $player_short[5], 
                     $player_short[6], $player_short[7], $player_short[8], 
                     $player_short[9], $player_short[10],         
                     $match_date, $match_coach_home, $match_coach_away, $match_club);

     // ITERATIVELY BIND PARAMS
     foreach($params as $key => $val) {
        $stmt->bindParam($key+1, $val, PDO::PARAM_STR);
     }

     // EXECUTE ACTION
     $stmt->execute();

} catch (PDOException $e) {
     echo "Error: " . $e->getMessage();
}

Do a similar call for tblSubstitutes, adjusting append query's destination and WHERE clause and parameter values.

What is Complex SQL?, frequently involve heavy use of AND and OR clauses. 10 Ways to Improve SQL Query Performance. Improve SQL Query Performance. Avoid Multiple Joins in a Single Query. Eliminate Cursors from the Query. Avoid Use of Non-correlated Scalar Sub Query. Avoid Multi-statement Table Valued Functions (TVFs)


As already stated by O.Jones in the comments, it would be helpful to see more (php) code in order to judge on performance issues. Despite redesigning your database, another quick suggestion would in fact be to run a loop and use a prepared statement for the individual querys for the playerIDs. That might give you a slight performance boost.

It does seem logically to me, to do more in PHP instead of outsourcing the logics of data fetching to SQL.

Performance Tuning SQL Queries, This lesson of the SQL tutorial for data analysis covers how to conduct SQL performance tuning by Joins: If your query joins two tables in a way that substantially increases the row count of the result set, your Making joins less complicated. 25 tips to Improve SQL Query Performance. 1. Use EXISTS instead of IN to check existence of data. 2. Avoid * in SELECT statement. Give the name of columns which you require. 3. Choose appropriate Data Type. E.g. To store strings use varchar in place of text data type. Use text data type, whenever


The statements are not complex. They simply contain a lot of lookups. So make sure lookups are fast. You need the following indexes. Add them to your database if you don't have them already.

create index idx_find_player on tblplayer (nameplayer, short, playerid);
create index idx_find_club on tblclub (nameclub, clubid)
create index idx_find_match on tblmatch (date, coach1, coach2, matchid)

create index idx_find_squad1 on tblstartingsquad (matchid, clubid)
create index idx_find_squad2 on tblSsartingsquad (clubid, matchid)

create index idx_find_subs1 on tblsubstitutes (matchid, clubid)
create index idx_find_subs2 on tblsubstitutes (clubid, matchid)

I'm not sure which squad index is more likely to be used, so create both and see which the DBMS picks. You can then drop the other. Same for the substitutes Indexes.

8 Ways to Fine-Tune Your SQL Queries (for Production Databases , It's vital you optimize your queries for minimum impact on database performance. 1. Define business requirements first. We've covered best  There are different ways to improve query performance in SQL Server such as re-writing the SQL query, proper management of statistics, creation and use of indexes, etc. Hence, it is important to optimize queries to improve SQL performance.


for an accurate answer we have to see your execution plan in SQL, send it here, so I can help you with your problem

before that, I think you are doing all wrong, instead of writing one select for the single row you can simply define user-defined-table in your database and pass your values. by doing that you could have better performance in your SQL and in your server-side code and I guarantee you, you have the best performance in the first level and after that as I said before need execution-plan

  1. declare your type (don't forget to change datatype with your own)

CREATE TYPE [dbo].[com_ListOfGuid] AS TABLE(NamePlayer NVARCHAR(256) NOT NULL,Short NVARCHAR(256) NOT NULL)

  1. instead of creating select statement in your code just declare a datatable and fill data into it and pass it just like other parameters (don't forget to set you a name of column exactly like your user-defined-table-type ) 3.in your SQL code just join user-defined-table-type with your table and insert them into your target table (i give you an example and you change with your need)

    CREATE PROCEDURE Procedure_Your_Name @ObjectTable Type_Your_Name READONLY AS INSERT INTO TARGET_Table () SELECT * FROM first_Table F INNER JOIN @ObjectTable T ON F.NamePlayer = t.NamePlayer and t.Short = f.Short

More efficient SQL with query planning and optimization (article , any way you can improve the performance of your queries. Many times, especially for complex queries, there are indeed ways you can help optimize a query,  Performance is a big deal. No matter if we’re talking about applications in which users click buttons to display data or if we’re writing a query directly into let’s say SQL Server Management Studio (SSMS). Nobody likes to click a button, go get a coffee and hope the results are ready. As computers get faster […]


Query optimization techniques in SQL Server: tips and tricks, Fixing bad queries and resolving performance problems can involve SQL Server can efficiently filter a data set using indexes via the The execution plan got significantly more complex, as we are querying each table twice now, us know that there is a potentially easy fix to improve query performance. Tools for improving query performance Several performance analysis tools can help you improve SQL performance. Query and application performance analysis Performance analysis for queries and applications begins by answering some basic questions. Using EXPLAIN to understand the access path


Break Down SQL Server Queries to Improve Performance, Breaking down complex queries this way offers several advantages: Better plans – Query performance is improved because the query optimizer is able to create  Online. On Demand. Learn at your own pace by completing interactive exercises.


6 Simple Performance Tips for SQL SELECT Statements, 20% effort is likely give you an 80% performance improvement. algebra and a relational calculus to express extremely complex queries.