How to SELECT the newest four items per category?

I have a database of items. Each item is categorized with a category ID from a category table. I am trying to create a page that lists every category, and underneath each category I want to show the 4 newest items in that category.

For Example:

Pet Supplies

img1
img2
img3
img4

Pet Food

img1
img2
img3
img4

I know that I could easily solve this problem by querying the database for each category like so:

SELECT id FROM category

Then iterating over that data and querying the database for each category to grab the newest items:

SELECT image FROM item where category_id = :category_id 
ORDER BY date_listed DESC LIMIT 4

What I'm trying to figure out is if I can just use 1 query and grab all of that data. I have 33 categories so I thought perhaps it would help reduce the number of calls to the database.

Anyone know if this is possible? Or if 33 calls isn't that big a deal and I should just do it the easy way.

This is the greatest-n-per-group problem, and it's a very common SQL question.

Here's how I solve it with outer joins:

SELECT i1.*
FROM item i1
LEFT OUTER JOIN item i2
  ON (i1.category_id = i2.category_id AND i1.item_id < i2.item_id)
GROUP BY i1.item_id
HAVING COUNT(*) < 4
ORDER BY category_id, date_listed;

I'm assuming the primary key of the item table is item_id, and that it's a monotonically increasing pseudokey. That is, a greater value in item_id corresponds to a newer row in item.

Here's how it works: for each item, there are some number of other items that are newer. For example, there are three items newer than the fourth newest item. There are zero items newer than the very newest item. So we want to compare each item (i1) to the set of items (i2) that are newer and have the same category as i1. If the number of those newer items is less than four, i1 is one of those we include. Otherwise, don't include it.

The beauty of this solution is that it works no matter how many categories you have, and continues working if you change the categories. It also works even if the number of items in some categories is fewer than four.


Another solution that works but relies on the MySQL user-variables feature:

SELECT *
FROM (
    SELECT i.*, @r := IF(@g = category_id, @r+1, 1) AS rownum, @g := category_id
    FROM (@g:=null, @r:=0) AS _init
    CROSS JOIN item i
    ORDER BY i.category_id, i.date_listed
) AS t
WHERE t.rownum <= 3;

MySQL 8.0.3 introduced support for SQL standard window functions. Now we can solve this sort of problem the way other RDBMS do:

WITH numbered_item AS (
  SELECT *, ROW_NUMBER() OVER (PARTITION BY category_id ORDER BY item_id) AS rownum
  FROM item
)
SELECT * FROM numbered_item WHERE rownum <= 4;

Modern Hearing Aids: Pre-Fitting Testing and Selection Considerations, The selection of the 12 items on the final version of the PAL began with a list of #4 (average), or #6 (loud), with an equal number of items for each category. n  Use row_number() in a derived table. Partition by CellID and use the order by as per your specification. In the main query you filter on rn to get the top 10 rows per category.

This solution is an adaptation from another SO solution, thank you RageZ for locating this related/similar question.

NOTE

This solution seems satisfactory for Justin's use case. Depending on your use case you may want to check Bill Karwin or David Andres' solutions in this posting. Bill's solution has my vote! See why, as I put both queries next to one another ;-)

The benefit of my solution is that it returns one record per category_id (the info from the item table is "rolled-up"). The main drawback of my solution is its lack of readability and its growing complexity as the number of desired rows grows (say to have 6 rows per category rather than 6). Also it may be slightly slower as the number of rows in the item table grows. (Regardless, all solutions will perform better with a smaller number of eligible rows in the item table, and it is therefore advisable to either periodically delete or move older items and/or to introduce a flag to help SQL filter out rows early)

First try (didn't work!!!)...

The problem with this approach was that the subquery would [rightfully but bad for us] produce very many rows, based on the cartesian products defined by the self joins...

SELECT id, CategoryName(?), tblFourImages.*
FROM category
JOIN (
    SELECT i1.category_id, i1.image as Image1, i2.image AS Image2, i3.image AS Image3, i4.image AS Image4
    FROM item AS i1
    LEFT JOIN item AS i2 ON i1.category_id = i2.category_id AND i1.date_listed > i2.date_listed
    LEFT JOIN item AS i3 ON i2.category_id = i3.category_id AND i2.date_listed > i3.date_listed
    LEFT JOIN item AS i4 ON i3.category_id = i4.category_id AND i3.date_listed > i4.date_listed
) AS tblFourImages ON tblFourImages.category_id = category.id
--WHERE  here_some_addtional l criteria if needed
ORDER BY id ASC;

Second try. (works ok!)

A WHERE clause in added for the subquery, forcing the date listed to be the latest, second latest, thrird lateest etc. for i1, i2, i3 etc. respectively (and also allowing for the null cases when there are fewer than 4 items for a given category id). Also added was unrelated filter clauses to prevent showing entries that are "sold" or entries that do not have an image (added requirements)

This logic makes the assumption that there are no duplicate date listed values (for a given category_id). Such cases would otherwise create duplicate rows. Effectively this use of the date listed is that of a monotonically incremented primary key as defined/required in Bill's solution.

SELECT id, CategoryName, tblFourImages.*
FROM category
JOIN (
    SELECT i1.category_id, i1.image as Image1, i2.image AS Image2, i3.image AS Image3, i4.image AS Image4, i4.date_listed
    FROM item AS i1
    LEFT JOIN item AS i2 ON i1.category_id = i2.category_id AND i1.date_listed > i2.date_listed AND i2.sold = FALSE AND i2.image IS NOT NULL
          AND i1.sold = FALSE AND i1.image IS NOT NULL
    LEFT JOIN item AS i3 ON i2.category_id = i3.category_id AND i2.date_listed > i3.date_listed AND i3.sold = FALSE AND i3.image IS NOT NULL
    LEFT JOIN item AS i4 ON i3.category_id = i4.category_id AND i3.date_listed > i4.date_listed AND i4.sold = FALSE AND i4.image IS NOT NULL
    WHERE NOT EXISTS (SELECT * FROM item WHERE category_id = i1.category_id AND date_listed > i1.date_listed)
      AND (i2.image IS NULL OR (NOT EXISTS (SELECT * FROM item WHERE category_id = i1.category_id AND date_listed > i2.date_listed AND date_listed <> i1.date_listed)))
      AND (i3.image IS NULL OR (NOT EXISTS (SELECT * FROM item WHERE category_id = i1.category_id AND date_listed > i3.date_listed AND date_listed <> i1.date_listed AND date_listed <> i2.date_listed)))
      AND (i4.image IS NULL OR (NOT EXISTS (SELECT * FROM item WHERE category_id = i1.category_id AND date_listed > i4.date_listed AND date_listed <> i1.date_listed AND date_listed <> i2.date_listed AND date_listed <> i3.date_listed)))
) AS tblFourImages ON tblFourImages.category_id = category.id
--WHERE  --
ORDER BY id ASC;

Now... compare the following where I introduce an item_id key and use Bill's solution to provide the list of these to the "outside" query. You can see why Bill's approach is better...

SELECT id, CategoryName, image, date_listed, item_id
FROM item I
LEFT OUTER JOIN category C ON C.id = I.category_id
WHERE I.item_id IN 
(
SELECT i1.item_id
FROM item i1
LEFT OUTER JOIN item i2
  ON (i1.category_id = i2.category_id AND i1.item_id < i2.item_id
      AND i1.sold = 'N' AND i2.sold = 'N'
      AND i1.image <> '' AND i2.image <> ''
      )
GROUP BY i1.item_id
HAVING COUNT(*) < 4
)
ORDER BY category_id, item_id DESC

Social Psychology and the Unconscious: The Automaticity of Higher , membership would be clear, but subjects could sort items based on race relatively unaffected by the number of stimulus items per category, except that Whereas the IAT measures relative association strengths involving four categories (men with As a consequence, the selection of comparison categories is of critical  Hi! Couldn't find a solution to this anywhere so I decided to ask about it myself. So, my data consists of events, and these events have multiple tasks associated with them. These tasks can be added and deleted multiple times from an event. Basically I have a table that includes these additions/

BLS handbook of methods, BLS field representatives collect prices monthly for food, energy items, and a small number of (See appendix 6 for a list of CPOPS categories. In areas that issue permits for new construction, construction units were selected from the list of Then, a systematic sample of four or five units from each chosen ED is selected. WITH cte AS (SELECT Row_number () OVER (partition BY NAME ORDER BY date DESC) RN, id, name, price, date FROM table1) SELECT id, name, price, date FROM cte WHERE rn = 1. Note you should probably add ID (partition BY NAME ORDER BY date DESC, ID DESC) in your actual query as a tie-breaker for date. improve this answer. answered May 4 '12 at 16:56.

not very pretty but:

SELECT image 
FROM item 
WHERE date_listed IN (SELECT date_listed 
                      FROM item 
                      ORDER BY date_listed DESC LIMIT 4)

Deviance and Social Control: A Sociological Perspective: A , We combined these three categories and labeled it 'family breakup. the mother​) was provided with gender-appropriate sketches, and asked to select which of the Children's attachment to parents was measured using four items per parent​,  I have a table where each ID is repeated 3 times. there is a date in front of each id in each row. I want to select entire row for each ID where date is latest. There are total 370 columns in this table i want all columns to get selected when i select that row. Sample - ID Name Date Marks ..

Depending on how constant your categories are, the following is the simplest route

SELECT C.CategoryName, R.Image, R.date_listed
FROM
(
    SELECT CategoryId, Image, date_listed
    FROM 
    (
      SELECT CategoryId, Image, date_listed
      FROM item
      WHERE Category = 'Pet Supplies'
      ORDER BY date_listed DESC LIMIT 4
    ) T

    UNION ALL

    SELECT CategoryId, Image, date_listed
    FROM
    (        
      SELECT CategoryId, Image, date_listed
      FROM item
      WHERE Category = 'Pet Food'
      ORDER BY date_listed DESC LIMIT 4
    ) T
) RecentItemImages R
INNER JOIN Categories C ON C.CategoryId = R.CategoryId
ORDER BY C.CategoryName, R.Image, R.date_listed

New Perspectives on Transfer in Second Language Learning, As described above, participants were required to choose between a passive and an Although Korean learners most accurately judged GJT items of the K1CAU performance on four types of illicit MCs suggests that the PER category does  Selecting the one maximum row from each group. Let’s say I want to select the most recent log entry for each program, or the most recent changes in an audit table, or something of the sort. This question comes up over and over on IRC channels and mailing lists. I’ll re-phrase the question in terms of fruits.

Use smart categories in Numbers, Organize your data in a new way with smart categories. A table can have one main category and up to four subcategories. Numbers moves the selected rows into a new group, and the remaining rows in the table are  You can search for items by entering a few words into the search box, and we'll show you matching results. On desktop, to browse by department, select a product department from the drop-down menu next to the Search box and click the magnifying glass icon. Each department will offer its own customized options to search and browse.

SQL Server GROUPING SETS Explained Clearly By Practical , Let's create a new table named sales.sales_summary for the demonstration. SELECT SELECT * FROM sales.sales_summary ORDER BY brand, category, model_year; The four queries above return four result sets with four grouping sets:. With the matrix visual selected, select the drop-down arrow next to TotalSales in the Values well, and select New quick measure. In the Quick measures window, under Calculation, select Average per category. Drag Average Unit Price from the Fields pane into the Base value field. Leave Category in the Category field, and select OK.

Finding the Oldest/Youngest Records Within a Group, A common query that crops up in web apps is finding the oldest or the most recent record in a single table. This is straightforward in SQL. SELECT rank_filter.* FROM ( SELECT items.*, rank() OVER ( PARTITION BY color ORDER BY created_at DESC ) FROM items WHERE items.cost < 50 ) rank_filter WHERE RANK = 1. Let’s break down what’s going on in each piece. The most interesting piece is the inner query that utilizes the OVER clause.

Comments
  • How "static" are your categories? Is it a list that changes every now and then or is it constant?
  • the categories are very static (rarely will change). They won't ever really change unless I add a category which I don't think will happen or will be very rare
  • @justinl: if they're static, you're best off with a simple UNION statement. See my answer for an example.
  • @justinl suggested title for question: "MySql, A JOIN B: how to limit to N rows from B, for each PK from A ?"
  • FYI: If you want to constrain against other table columns you have to do so in the ON brackets, and using a WHERE just above the GROUP BY eg: ON (i2.active = TRUE) WHERE i1.active = TRUE
  • @drake, you're right about that. But for finding the top 1 per group, there is another query style that's even more efficient, because it can do the task without using GROUP BY at all. See for example my answer in stackoverflow.com/questions/121387/…
  • @drake, in my experience, any difference is very slight. You can benchmark it yourself to be sure. In general, you should use COUNT(column) for the logical reason - when you want the count to skip rows where the column is NULL. Whereas COUNT(*) counts all rows, whether the column is null or not.
  • @Davos: dev.mysql.com/doc/refman/8.0/en/…
  • @RaymondNijland, Yes, MySQL's AUTO_INCREMENT is a monotonically increasing pseudokey. Other SQL implementations use terms like SEQUENCE, IDENTITY, etc.
  • Now I get: #1054 - Unknown column 'date_listed' in 'order clause' If I remove the date_listed from the ORDER clause it does work, but it seems to not iterate over the different categories, but instead just lists out the same category over and over again