In OneLake, we are able to partition data so that performance enhancements could be made through data skipping.

Consider a situation where large amounts of sales data are being stored. You could partition sales data by year. The partitions are stored in subfolders named “year=2021”, “year=2022”, etc. If you only want to report on sales data for 2024, then the partitions for other years can be skipped, which improves read performance.

By partitioning your table:
CREATE TABLE partitioned_products ( ProductID INTEGER, ProductName STRING, Category STRING, ListPrice DOUBLE ) PARTITIONED BY (Category);

Leave a Reply