Add One Line of SQL to Optimise Your BigQuery Tables | by Matt Chapman | Dec, 2023


Clustering: A easy method to group comparable rows and forestall pointless knowledge processing

In my earlier article, I defined learn how to optimise SQL queries utilizing partitioning:

Now, I’m writing the sequel! (Dad joke, anybody?)

This text will have a look at clustering: one other highly effective optimisation approach you should utilize in BigQuery. Like partitioning, clustering will help you write extra performant queries which might be faster and cheaper to run. If you wish to develop your SQL toolkit and construct these higher-level Information Science abilities, this can be a excellent place to start out.

In BigQuery, a clustered desk is a desk that retains comparable rows grouped collectively in bodily “blocks”.

For instance, image a desk known as user_signups that retains monitor of all of the folks registering an account on a fictitious web site. It is received 4 columns:

  • registration_date: the date on which the person created an account
  • nation: the nation the place the person is predicated
  • tier: the person’s plan (“Free” or “Paid”)
  • username: the person’s username

If we wished, we might cluster the desk by nation in order that customers from the identical nation are saved close by one another within the desk:

Leave a Reply

Your email address will not be published. Required fields are marked *