sql - Cache Dynamic Query PHP + BigQuery -


i have database in google bigquery million of rows (more 2 million new rows every day) contains of user activities

i create php program insight database many queries show statistic of data per day, per hour , many more

i have 2 cases 2 problem:

  1. i try find data of user activities in date between 2016-11-01 , 2016-11-10, , need break down data 2016-11-05 data (the data subset of queries result). data need clasify per day or per hour or per user type , many more. right use many queries in database group data , many data manipulation. example "select * user_activities date>='2016-11-01' , date<='2016-11-10' group date,hour" , when need break down data in 2016-11-05 re-run query: "select * user_activities date='2016-11-05' group date,hour"

  2. or need query data different parameter, example user activities between 2016-11-01 , 2016-11-10 contains activities "a", , need change witch activities "b". have column identify type of activities user do. right run query "select * user_activities activities 'a' , when activities type changed run new query "select * user_activities activities 'b'.

so question is: because data on database big, , because insight query activities in php program high frequency, cost of data management , processing become high. case case 1 , 2 there alternate solution php caching make database request become less?

in 1-2 days bigquery data request can become terabyte of data. i'm afraid not efficient in term of cost database management.

as far have tried these solutions:

  1. i take raw data database, cache on php , run data manipulation manually. example run "select * user_activities date>='2016-11-01' , date<='2016-11-10'" , try run data manipulation group hour or group user type or group user activities manually , sequentially on php function. because data contains million of data process become long , not efficient.
  2. i take raw data database, insert temporary table, , manipulate data query temporary table. process become not efficient because insert process million rows of data become long.

do have suggestion how can optimize problem?

  1. implement partitioned tables has been recommended you.
    if have 1 single big table 5tb of data without partition costs high.
    when partitioned tables, have storage days query not whole table. fraction of it, 10gb or smaller. , pay only.

  2. you can save query result table directly instead of reimporting say, , query table smaller further aggregation.

  3. try not use 'select *' instead select columns must have in output.
  4. if data enough small, , lots of small querios on it, may want take out bq , store in elasticsearch or mysql , run there queries.

Comments

Popular posts from this blog

account - Script error login visual studio DefaultLogin_PCore.js -

xcode - CocoaPod Storyboard error: -