DISTINCT and Eliminating Duplicates
Let’s say that you want to list the ID and names of only those people who have sold an antique. Obviously, you want a list where each seller is only listed once–you don’t want to know how many antiques a person sold, just the fact that this person sold one (for counts, see the Aggregate Function section below). This means that you will need to tell SQL to eliminate duplicate sales rows, and just list each person only once. To do this, use the DISTINCT keyword.
First, we will need an equijoin to the AntiqueOwners table to get the detail data of the person’s LastName and FirstName. However, keep in mind that since the SellerID column in the Antiques table is a foreign key to the AntiqueOwners table, a seller will only be listed if there is a row in the AntiqueOwners table listing the ID and names. We also want to eliminate multiple occurrences of the SellerID in our listing, so we use DISTINCT on the column where the repeats may occur (however, it is generally not necessary to strictly put the Distinct in front of the column name).
To throw in one more twist, we will also want the list alphabetized by LastName, then by FirstName (on a LastName tie). Thus, we will use the ORDER BY clause:
SELECT DISTINCT SELLERID, OWNERLASTNAME, OWNERFIRSTNAME FROM ANTIQUES, ANTIQUEOWNERS WHERE SELLERID = OWNERID ORDER BY OWNERLASTNAME, OWNERFIRSTNAME;
In this example, since everyone has sold an item, we will get a listing of all of the owners, in alphabetical order by last name. For future reference (and in case anyone asks), this type of join is considered to be in the category of inner joins.
Aliases & In Subqueries
In this section, we will talk about Aliases, In and the use of subqueries, and how these can be used in a 3-table example. First, look at this query which prints the last name of those owners who have placed an order and what the order is, only listing those orders which can be filled (that is, there is a buyer who owns that ordered item):
SELECT OWN.OWNERLASTNAME Last Name, ORD.ITEMDESIRED Item Ordered FROM ORDERS ORD, ANTIQUEOWNERS OWN WHERE ORD.OWNERID = OWN.OWNERID AND ORD.ITEMDESIRED IN (SELECT ITEM FROM ANTIQUES);
Last Name Item Ordered
There are several things to note about this query:
- First, the "Last Name" and "Item Ordered" in the Select lines gives the headers on the report.
- The OWN & ORD are aliases; these are new names for the two tables listed in the FROM clause that are used as prefixes for all dot notations of column names in the query (see above). This eliminates ambiguity, especially in the equijoin WHERE clause where both tables have the column named OwnerID, and the dot notation tells SQL that we are talking about two different OwnerID’s from the two different tables.
- Note that the Orders table is listed first in the FROM clause; this makes sure listing is done off of that table, and the AntiqueOwners table is only used for the detail information (Last Name).
- Most importantly, the AND in the WHERE clause forces the In Subquery to be invoked ("= ANY" or "= SOME" are two equivalent uses of IN). What this does is, the subquery is performed, returning all of the Items owned from the Antiques table, as there is no WHERE clause. Then, for a row from the Orders table to be listed, the ItemDesired must be in that returned list of Items owned from the Antiques table, thus listing an item only if the order can be filled from another owner. You can think of it this way: the subquery returns a set of Items from which each ItemDesired in the Orders table is compared; the In condition is true only if the ItemDesired is in that returned set from the Antiques table.
- Also notice, that in this case, that there happened to be an antique available for each one desired…obviously, that won’t always be the case. In addition, notice that when the IN, "= ANY", or "= SOME" is used, that these keywords refer to any possible row matches, not column matches…that is, you cannot put multiple columns in the subquery Select clause, in an attempt to match the column in the outer Where clause to one of multiple possible column values in the subquery; only one column can be listed in the subquery, and the possible match comes from multiple row values in that onecolumn, not vice-versa.
Whew! That’s enough on the topic of complex SELECT queries for now. Now on to other SQL statements.
I will discuss five important aggregate functions: SUM, AVG, MAX, MIN, and COUNT. They are called aggregate functions because they summarize the results of a query, rather than listing all of the rows.
- SUM () gives the total of all the rows, satisfying any conditions, of the given column, where the given column is numeric.
- AVG () gives the average of the given column.
- MAX () gives the largest figure in the given column.
- MIN () gives the smallest figure in the given column.
- COUNT(*) gives the number of rows satisfying the conditions.
Looking at the tables at the top of the document, let’s look at three examples:
SELECT SUM(SALARY), AVG(SALARY) FROM EMPLOYEESTATISTICSTABLE;
This query shows the total of all salaries in the table, and the average salary of all of the entries in the table.
SELECT MIN(BENEFITS) FROM EMPLOYEESTATISTICSTABLE WHERE POSITION = 'Manager';
This query gives the smallest figure of the Benefits column, of the employees who are Managers, which is 12500.
SELECT COUNT(*) FROM EMPLOYEESTATISTICSTABLE WHERE POSITION = 'Staff';
This query tells you how many employees have Staff status (3).
In SQL, you might (check your DBA) have access to create views for yourself. What a view does is to allow you to assign the results of a query to a new, personal table, that you can use in other queries, where this new table is given the view name in your FROM clause. When you access a view, the query that is defined in your view creation statement is performed (generally), and the results of that query look just like another table in the query that you wrote invoking the view. For example, to create a view:
CREATE VIEW ANTVIEW AS SELECT ITEMDESIRED FROM ORDERS;
Now, write a query using this view as a table, where the table is just a listing of all Items Desired from the Orders table:
SELECT SELLERID FROM ANTIQUES, ANTVIEW WHERE ITEMDESIRED = ITEM;
This query shows all SellerID’s from the Antiques table where the Item in that table happens to appear in the Antview view, which is just all of the Items Desired in the Orders table. The listing is generated by going through the Antique Items one-by-one until there’s a match with the Antview view. Views can be used to restrict database access, as well as, in this case, simplify a complex query.
GROUP BY & HAVING
One special use of GROUP BY is to associate an aggregate function (especially COUNT; counting the number of rows in each group) with groups of rows. First, assume that the Antiques table has the Price column, and each row has a value for that column. We want to see the price of the most expensive item bought by each owner. We have to tell SQL to group each owner’s purchases, and tell us the maximum purchase price:
SELECT BUYERID, MAX(PRICE) FROM ANTIQUES GROUP BY BUYERID;
Now, say we only want to see the maximum purchase price if the purchase is over $1000, so we use the HAVING clause:
SELECT BUYERID, MAX(PRICE) FROM ANTIQUES GROUP BY BUYERID HAVING PRICE > 1000;