{"id":3217,"date":"2016-03-04T17:31:08","date_gmt":"2016-03-04T17:31:08","guid":{"rendered":"http:\/\/www.garysieling.com\/blog\/?p=3217"},"modified":"2016-03-04T17:31:08","modified_gmt":"2016-03-04T17:31:08","slug":"creating-a-pivot-table-in-postgres","status":"publish","type":"post","link":"https:\/\/www.garysieling.com\/blog\/creating-a-pivot-table-in-postgres\/","title":{"rendered":"Postgres Pivot Tables"},"content":{"rendered":"<p>Postgres doesn&#8217;t have a built-in pivot table feature, but they are relatively easy to construct.<\/p>\n<p>Let&#8217;s say you have a table of users, and a table of events, and want to make a pivot table that shows how often each event occurred.<\/p>\n<pre lang=\"sql\">\ncreate table users (name varchar);\n\ninsert into users ('gary');\ninsert into users ('melissa');\ninsert into users ('ella');\ninsert into users ('chloe');\n\ncreate table events (user_name varchar, event varchar);\ninsert \ninto events (user_name, event) \nvalues ('gary', 'event_1');\n\ninsert \ninto events (user_name, event) \nvalues ('gary', 'event_1');\n\ninsert \ninto events (user_name, event) \nvalues ('gary', 'event_2');\n\ninsert \ninto events (user_name, event) \nvalues ('melissa', 'event_2');\n\ninsert \ninto events (user_name, event) \nvalues ('ella', 'event_1');\n<\/pre>\n<p>The pivot table output should look like this:<\/p>\n<pre>\nuser     event_1  event 2\ngary     2        1\nmelissa  0        2\nella     1        0\nziti     0        0\n<\/pre>\n<p>You can get counts easily enough, but the &#8220;0&#8221; records will drop out, which is a bit of a pain:<\/p>\n<pre lang=\"sql\">\nselect \n  user_name, \n  event, \n  count(*) event_count\nfrom events\ngroup by 1, 2\n<\/pre>\n<p>What we really want is a list of all combinations of users and events, combined with their counts. Normally you don&#8217;t do things like this, but this can be obtained with a cartesian join.<\/p>\n<p>You&#8217;ll want to make sure to itemize the names (don&#8217;t use select *) &#8211; this will help with the next step.<\/p>\n<pre lang=\"sql\">\nselect users.name, event_list.event\nfrom users\njoin ( \n  select distinct event \n  from events\n) as event_list on 1=1\n<\/pre>\n<p>The above query gives us the following data:<\/p>\n<pre>\ngary\tevent_2\ngary\tevent_1\nmelissa\tevent_2\nmelissa\tevent_1\nella\tevent_2\nella\tevent_1\nchloe\tevent_2\nchloe\tevent_1\n<\/pre>\n<p>If you have a static list of events, you can also join to it directly:<\/p>\n<pre lang=\"sql\">\nselect users.name, event_list.event\nfrom users\njoin ( \n  select unnest(array['event_1', 'event_2']) event\n) as event_list on 1=1\n<\/pre>\n<p>Now that we have this, we can join it back to the count data:<\/p>\n<pre lang=\"sql\">\nselect \n  users.name, \n  event_list.event, \n  coalesce(event_count, 0) event_count\nfrom users\njoin (\n   select distinct event \n   from events\n  ) as event_list on 1=1\nleft join (\n  select \n    user_name, \n    event, \n    count(*) event_count\n  from events\n  group by 1, 2\n) counts on counts.user_name = users.name\n        and counts.event = event_list.event\n<\/pre>\n<p>This new dataset is much better, as it includes every possible cell that we would want in the pivot table:<\/p>\n<pre>\nname    event     event_count\ngary\tevent_1\t  2\ngary\tevent_2\t  1\nmelissa\tevent_1\t  0\nmelissa\tevent_2\t  1\nella\tevent_1\t  1\nella\tevent_2\t  0\nchloe\tevent_1\t  0\nchloe\tevent_2\t  0\n<\/pre>\n<p>To get these all into one row, we can wrap this in array aggregation. Note that the values in the arrays need to be sorted by the event type, or else each record will get counts in a different order.<\/p>\n<pre lang=\"sql\">\nselect \n  name, \n  array_agg(event order by event) events, \n  array_agg(event_count order by event) counts\nfrom (\n  ...\n) aggregated_date\ngroup by name\n<\/pre>\n<p>This returns:<\/p>\n<pre>\nname    events                  counts\nchloe\t{\"event 1\",\"event 2\"}\t{0,0}\nella\t{\"event 1\",\"event 2\"}\t{1,0}\ngary\t{\"event 1\",\"event 2\"}\t{2,1}\nmelissa\t{\"event 1\",\"event 2\"}\t{0,1}\n<\/pre>\n<p>Now getting pivot table data is easy through array indexing. Note that Postgres arrays start at 1 (!), and obviously this solution won&#8217;t help you if you have an indeterminite number of events.<\/p>\n<pre lang=\"sql\">\nselect \n  name, \n  counts[1] as event_1, \n  counts[2] as event_2\nfrom (\n  ...\n) pivot_data\n<\/pre>\n<pre>\nname    event_1   event_2\nchloe\t0 \t  0\nella\t1\t  0\ngary\t2\t  1\nmelissa\t0\t  1\n<\/pre>\n<p>If you&#8217;re following along at home, here is the final result:<\/p>\n<pre lang=\"sql\">\n -- Create pivot table\nselect\n  name, \n  counts[1] as event_1, \n  counts[2] as event_2\nfrom (\n  -- Combine all the counts into one row per user\n  select\n    name, \n    array_agg(event order by event) events, \n    array_agg(event_count order by event) counts\n  from (\n    -- Create counts of \"0\" for events that haven't occurred\n    select \n      users.name, \n      event_list.event, \n      coalesce(event_count, 0) event_count\n    from users\n    join (\n       -- Compute the full list of events\n       select distinct event \n       from events\n    ) as event_list on 1 = 1\n    left join (\n      -- Compute how often each event happened per user\n      select \n        user_name, \n        event, \n        count(*) as event_count\n      from events\n      group by 1, 2\n    ) counts\n       on counts.user_name = users.name\n       and counts.event = event_list.event\n  ) aggregated_date\n  group by name\n) pivot_data\norder by name\n<\/pre>\n","protected":false},"excerpt":{"rendered":"<p>Postgres doesn&#8217;t have a built-in pivot table feature, but they are relatively easy to construct. Let&#8217;s say you have a table of users, and a table of events, and want to make a pivot table that shows how often each event occurred. create table users (name varchar); insert into users (&#8216;gary&#8217;); insert into users (&#8216;melissa&#8217;); &hellip; <\/p>\n<p class=\"link-more\"><a href=\"https:\/\/www.garysieling.com\/blog\/creating-a-pivot-table-in-postgres\/\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;Postgres Pivot Tables&#8221;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"om_disable_all_campaigns":false,"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"footnotes":""},"categories":[4],"tags":[160,437,461,523],"aioseo_notices":[],"amp_enabled":true,"_links":{"self":[{"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/posts\/3217"}],"collection":[{"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/comments?post=3217"}],"version-history":[{"count":0,"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/posts\/3217\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/media?parent=3217"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/categories?post=3217"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/tags?post=3217"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}