Weblog entry #88 for dkg

Error messages are your friend (postgres is good)
Posted by dkg on Tue 4 Dec 2012 at 20:32
Here is a bit of simple (yet subtly-flawed) sql, which produces different answers on different database engines:
0 dkg@pip:~$ cat test.sql
drop table if exists foo;
create table foo (x int, y int);
insert into foo VALUES (1,3);
insert into foo VALUES (1,5);
select y from foo group by x;
0 dkg@pip:~$ sqlite3 < test.sql
5
0 dkg@pip:~$ mysql -N dkg < test.sql
3
0 dkg@pip:~$ psql -qtA dkg < test.sql
ERROR:  column "foo.y" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: select y from foo group by x;
               ^
0 dkg@pip:~$ 
  • Clear error reporting and
  • an insistence on explicit disambiguation
are two of the many reasons postgresql is my database engine of choice.

 

Comments on this Entry

Posted by Anonymous (85.253.xx.xx) on Tue 4 Dec 2012 at 21:56
Firebird is also ok: "Invalid expression in the select list (not contained in either an aggregate function or the GROUP BY clause)". But Postgresql error is clearer.

[ Parent | Reply to this comment ]

Posted by Anonymous (62.24.xx.xx) on Wed 5 Dec 2012 at 07:20

The sqlite version built into Android seems to check this. I am not able to find option to enable the check for upstream release though.

On the other hand, there is a case that I happen to hit rather often where the select is well defined even without aggregating the variable. It is like:

create table foo (x integer primary key, y integer);
create table bar (x integer references foo (x), z integer);
select y, min(z) from foo natural join bar group by x;

Here y does not appear in the group by and is not aggregated and still the result is well defined, because there is only one y for each x. I don't have Postgresql around to check whether it accepts this.

[ Parent | Reply to this comment ]

Posted by Anonymous (87.213.xx.xx) on Wed 5 Dec 2012 at 08:18
No database that I've worked with considers uniqueness of columns with respect to group-by columns. All databases that I've worked with require an explicit group by x,y even when y is guaranteed to be unique with respect to x.

Annoying, maybe, but I gather that determining uniqueness of column values can be quite resource-intensive in complex queries. If you really want to avoid many columns in your group-by clause, a nested select or inner join would be the way to go.

[ Parent | Reply to this comment ]

Posted by Anonymous (46.165.xx.xx) on Fri 14 Dec 2012 at 13:31
Oracle and MS SQL also complain about that.

[ Parent | Reply to this comment ]