postgresql - How do I filter jsonb with multiple criteria? -
i have following table structure:
create table mytable ( id serial primary key, data jsonb );
and following data (partial brevity...notice randomness of years , sales/expense yrs don't align each other):
insert mytable (data) values ('{"employee": "jim romo", "sales": [{"value": 10, "yr": "2012"}, {"value": 5, "yr": "2013"}, {"value": 40, "yr": "2014"}], "expenses": [{"value": 2, "yr": "2007"}, {"value": 1, "yr": "2013"}, {"value": 3, "yr": "2014"}], "product": "tv", "customer": "1", "updated": "20150501" }'), ('{"employee": "jim romo", "sales": [{"value": 10, "yr": "2012"}, {"value": 5, "yr": "2013"}, {"value": 41, "yr": "2014"}], "expenses": [{"value": 2, "yr": "2009"}, {"value": 3, "yr": "2013"}, {"value": 3, "yr": "2014"}], "product": "tv", "customer": "2", "updated": "20150312" }'), ('{"employee": "jim romo", "sales": [{"value": 20, "yr": "2012"}, {"value": 25, "yr": "2013"}, {"value": 33, "yr": "2014"}], "expenses": [{"value": 8, "yr": "2012"}, {"value": 12, "yr": "2014"}, {"value": 5, "yr": "2009"}], "product": "radio", "customer": "2", "updated": "20150311" }'), ('{"employee": "bill baker", "sales": [{"value": 1, "yr": "2010"}, {"value": 2, "yr": "2009"}, {"value": 3, "yr": "2014"}], "expenses": [{"value": 3, "yr": "2011"}, {"value": 1, "yr": "2012"}, {"value": 7, "yr": "2013"}], "product": "tv", "customer": "1", "updated": "20150205" }'), ('{"employee": "bill baker", "sales": [{"value": 10, "yr": "2010"}, {"value": 12, "yr": "2011"}, {"value": 3, "yr": "2014"}], "expenses": [{"value": 4, "yr": "2011"}, {"value": 7, "yr": "2009"}, {"value": 4, "yr": "2013"}], "product": "radio", "customer": "1", "updated": "20150204" }'), ('{"employee": "jim romo", "sales": [{"value": 22, "yr": "2009"}, {"value": 17, "yr": "2013"}, {"value": 35, "yr": "2014"}], "expenses": [{"value": 14, "yr": "2011"}, {"value": 13, "yr": "2014"}, {"value": 8, "yr": "2013"}], "product": "tv", "customer": "3", "updated": "20150118" }')
for each employee need evaluate most-recently updated row , find employees 2014 tv sales greater 30. there need further filter employees average tv expenses of less 5. average need take of tv expenses , not latest row.
my expected output 1 row:
employee | customer | 2014 tv sales | 2013 avg tv expenses ------------+----------+-----------------+---------------------- jim romo | 1 | 40 | 4
i can (kindof) 1 or other not both:
a. 2014 sales > 30 (but unable recent "tv" sales ;(
select * mytable (select (a->>'value')::float (select jsonb_array_elements(data->'sales') a) b @> json_object(array['yr', '2014'])::jsonb) > 30
b. avg 2013 expense (this needs avg tv expenses)
select avg((a->>'value')::numeric) (select jsonb_array_elements(data->'expenses') mytable) b @> json_object(array['yr', '2013'])::jsonb
edit: potentially large table commentary on performance , indexing needs appreciated new both postgresql , jsonb.
edit #2: i've tried both answers , neither seem efficient large table ;(
this (rather lengthy) answer problem. comments inside query should explain different pieces. basic ideas followed are: 1) keep simple each operation, try first produce correct result, optimize; 2) trasform as possible (but not much) json structure in more "relational like" structure, since relations have more powerful operators json data in postgres. of corse there space simplify query , produce more efficient version, @ least starting point.
with mytable1 -- transform table in more "relational-like" structure (just clarity) (select id, data->>'employee' employee, data->>'product' product, (data->>'updated')::integer updated, (data->>'customer')::integer customer, data->'sales' sales, data->'expenses' expenses mytable), avg_exp_for_2013_tv -- find average expenses tv in 2013 each employee (select employee, avg(expenses.value) avg2013_expenses mytable1 , jsonb_to_recordset(expenses) expenses(yr text, value float) product = 'tv' , expenses.yr = '2013' group employee), most_recent_updates_employees -- find recent updates each employee (select employee, max(updated) updated mytable1 t1 group employee), most_recent_updated_rows -- find rows recent updates (select t1.* mytable1 t1, most_recent_updates_employees m t1.employee = m.employee , t1.updated = m.updated), employees_with_2014_tv_sales_gt_30 (select employee, customer, sales.value sales_value most_recent_updated_rows m, jsonb_to_recordset(m.sales) sales(yr text, value float) yr = '2014' , value > 30) select e1.employee, e1.customer, e1.sales_value "2014 tv sales", e2.avg2013_expenses "2013 avg tv expenses" employees_with_2014_tv_sales_gt_30 e1, avg_exp_for_2013_tv e2 e1.employee = e2.employee , avg2013_expenses < 5
Comments
Post a Comment