Optimize SELECT query for working with large database

how to make select query faster in sql server
how to search millions of record in sql table faster?
sql query performance tuning tips
improve sql server query performance on large tables
how to optimize sql query with multiple joins
query optimization in sql server with example
sql query optimization
sql server query optimization tips

This is a part of my database:

ID  EmployeeID    Status    EffectiveDate
 1  110545        Active    2011-08-01
 2  110700        Active    2012-01-05
 3  110060        Active    2012-01-05  
 4  110222        Active    2012-06-30
 5  110545        Resigned  2012-07-01
 6  110545        Active    2013-02-12

I want to generate records which select Active employees:

ID  EmployeeID    Status  EffectiveDate
 2  110700        Active  2012-01-05
 3  110060        Active  2012-01-05
 4  110222        Active  2012-06-30

So, I tried this query:

SELECT *
FROM Employee AS E
WHERE E.Status='Active' AND 
      E.EffectiveDate between'2011-08-01' and '2012-07-02'AND  NOT 
      EXISTS(SELECT * FROM Employee AS E2 
             WHERE E2.EmployeeID = E.EmployeeID AND E2.Status = 'Resigned'
                        AND E2.EffectiveDate between '2011-08-01' and '2012-07-02'
             );

It only works with small amount of data, but got timeout error with large database.

Can you help me optimize this?


This is how I read your request: You want to show active employees. For this to happen, you look at their latest entry, which is either 'Active' or 'Resigned'.

You want to restrict this to a certain time range. That probably means you want to find all employees that became active without becoming immediately inactive again within that time frame.

So, get the latest date per employee first, then stay with those rows in case they are active.

select *
from employee
where (employeeid, effectivedate) in
(
  select employeeid, max(effectivedate)
  from employee
  where effectivedate between date '2011-08-01' and date '2012-07-02'
  group by employeeid
)
and status = 'active'
order by employeeid;

The subquery tries to find a time range and then look at each employee to find their latest date within. I'd offer the DBMS this index:

create index idx on employee (effectivedate, employeeid);

The main query wants to find that row again by using employeeid and effectivedate and would then look up the status. The above index could be used again. We could even add the status in order to ease the lookup:

create index idx on employee (effectivedate, employeeid, status);

The DBMS may use this index or not. That's up to the DBMS to decide. I find it likely that it will, for it can be used for all steps in the execution of the query and even contains all columns the query works with, so the table itself wouldn't even have to be read.

Tips for SQL Database Tuning and Performance, Learn the benefits of SQL query tuning and how to optimize your SQL Server When working with large-scale data, even the most minor change can have a� This is a part of my database: ID EmployeeID Status EffectiveDate 1 110545 Active 2011-08-01 2 110700 Active 2012-01-05 3 110060 Active 2012-01-05 4


I have tried to achieve the above result set using Case Statements. Hope this helps.

CREATE TABLE employee_test
(rec NUMBER,
employee_id NUMBER,
status VARCHAR2(100),
effectivedate DATE);


INSERT INTO employee_test VALUES(1,110545,'Active',TO_DATE('01-08-2011','DD-MM-YYYY'));
INSERT INTO employee_test VALUES(2,110545,'Active',TO_DATE('05-01-2012','DD-MM-YYYY'));
INSERT INTO employee_test VALUES(3,110545,'Active',TO_DATE('05-01-2012','DD-MM-YYYY'));
INSERT INTO employee_test VALUES(4,110545,'Active',TO_DATE('30-06-2012','DD-MM-YYYY'));
INSERT INTO employee_test VALUES(5,110545,'Resigned',TO_DATE('01-07-2012','DD-MM-YYYY'));
INSERT INTO employee_test VALUES(6,110545,'Active',TO_DATE('12-02-2013','DD-MM-YYYY'));
COMMIT;


SELECT * FROM(
                        SELECT e.* ,
                        CASE WHEN (effectivedate BETWEEN TO_DATE('2011-08-01','YYYY-MM-DD') AND  TO_DATE('2012-07-02','YYYY-MM-DD')  AND status='Active')
                        THEN 'Y' ELSE 'N' END AS FLAG
                         FROM Employee_Test e)
 WHERE Flag='Y'
 ;

8 Ways to Fine-Tune Your SQL Queries (for Production Databases , It's vital you optimize your queries for minimum impact on database performance. Define business requirements first. SELECT fields instead of using SELECT * Avoid SELECT DISTINCT. Create joins with INNER JOIN (not WHERE) Use WHERE instead of HAVING to define filters. Use wildcards at the end of a phrase only. Although I run the above query in a table with 500 records, indexes can be very useful when you are querying a large dataset (e.g. a table with 1 million rows). Tip 2: Optimize Like Statements


I'm adding another answer with another interpretation of the request. Just in case :-)

The table shows statuses per employee. An employee can become active, then retired, then active again. But they can not become active and then active again, without becoming retired in between, of course.

We are looking at a time range and want to find all employees that became active but never retired within - no matter whether they became active again after retirement in that period.

This makes this easy. We are looking for employees, that have exactly one row in that time range and that row is active. One way to do this:

select employeeid, any_value(effectivedate), max(status)
from employee
where effectivedate between date '2011-08-01' and date '2012-07-02'
group by employeeid
having max(status) = 'Active'
order by employeeid;

As in my other answer, an appropriate index would be

create index idx on employee (effectivedate, employeeid, status);

as we want to look into the date range and look up the statuses per employee.

Optimize SELECT query for working with large database, This is how I read your request: You want to show active employees. For this to happen, you look at their latest entry, which is either 'Active' or� Note: I do not have write permission for the database so indexing is not an option for me. I got the query right. It is working correctly; however, the database is huge. If I want to retrieve data from the last 15 days, it takes forever. Is there a better way to do it? Here is the query


4. Query Performance Optimization, Query Performance Optimization In the previous chapter, we explained how Selection from High Performance MySQL, 2nd Edition [Book] The most basic reason a query doesn't perform well is because it's working with too much data. Deleting 10,000 rows at a time is typically a large enough task to make� The SQL Select statement is the primary mechanism to retrieve data from a database. Often even clicking a single button requires query performance optimization because everything that’s actually happening under the hood is just SQL Server pulling the data from a database. In the database world, this is very important.


Query optimization techniques in SQL Server: tips and tricks, Note that many large write operations will result from our own work: Software releases, data warehouse load processes, ETL processes, and� The view could be based on a SELECT statement that groups the Sales table data by date (at month level), customer, product, and summarizes measure values like sales, quantity, etc. The view can then be indexed. For SQL Server or Azure SQL Database sources, see Create Indexed Views.


Performance Tuning SQL Queries, This lesson of the SQL tutorial for data analysis covers how to conduct SQL exploratory analysis on a subset of data, refine your work into a final query, then notice a huge difference because 30,000 rows isn't too hard for the database to � Great article - appreciated. I know this was written in 2015, but the Power Query interface does now tell you when a query will be sent back to the server using query folding and when it doesn't. In the steps, just right-click on a step and select "View Native Query."