This summary is produced by the author, and not by AI.
Semantic models are the backbone of your Fabric data landscape. In essence, they are databases that serve data to a variety of applications: Power BI reports, Fabric assets like notebooks and explorations, but also to Excel workbooks and other applications through API endpoints. Indeed, the data they serve must be accurate and relevant.
In a previous article we discussed storage modes for Power BI semantic models and mentioned that the preferred default is import storage mode, where source data is periodically refreshed into memory. Those imports must then be kept up to date to ensure they are accurate and relevant. This article will focus on refreshing the data stored in the semantic model, but also touch on other storage like cached query results and the concept of framing in Direct Lake storage mode since it relates to the concept of loading data in batches, like the import storage mode does. Real-time data processing deserves its own article we may write in the future.
We will discuss the different ways data can be refreshed, from an overview perspective rather than technical details, and link to helpful resources where relevant.
For import semantic models without special refresh needs, the refresh schedules configured in the settings of published semantic models are the way to go. Scheduled refresh is a simple yet flexible mechanism that’s as close to set-and-forget as it gets for keeping imported data updated.
When to use?
Where to configure?
What are the limits?
The ‘automatic’ refresh type will recalculate tables and columns, and refresh tables and partitions based on the state of partitions. The other refresh types are clear, full, calculate, data only and defragment. They can be performed with Tabular Editor and other tools supporting XMLA or TMSL commands, such as SQL Server Management Studio.
What to look out for?
The feature ‘OneDrive refresh’ should not be confused with scheduled refresh, as it synchronizes the file from OneDrive but doesn't reload data from underlying sources. To refresh data from sources, scheduled or on-demand refresh is still required. An on-demand refresh will trigger both a file sync and a data refresh.
As its name suggests, a manual refresh starts whenever it’s requested by manually clicking the ‘Refresh’ button in the user interface. In Microsoft documentation, this is often referred to as on-demand refresh. On-demand refreshes can be requested manually or programmatically. These request methods have different use cases and customization options, so we consider them separately in this article.
When to use?
Where to configure?
Anyone with ‘Write’ permission to the semantic model can request an on-demand refresh on the model details page.
What are the limits?
What to look out for?
A manual refresh can’t be requested when a refresh is already ongoing. The ongoing refresh must first be cancelled before a new refresh can be requested.
Sometimes your semantic model should only refresh when a process has finished, like upstream ETL loads or data quality checks have completed successfully. A fixed schedule could interfere with those processes in case they take longer than typically, and a manual refresh is just not scalable. In those cases, an on-demand refresh requested by an automated process through the API or XMLA endpoints ensures synchronization. These endpoints allow enhanced refresh with more customization options than scheduled or manual refreshes.
When to use?
Where to configure?
What are the limits?
What to look out for?
A refresh can’t be requested when a refresh is already ongoing. The ongoing refresh must first be cancelled before a new refresh can be requested. A request to cancel can be made through the API endpoint.
Source data doesn’t always change often, so it can make sense to reload only what needs reloading with a partial refresh. Incremental refresh can significantly reduce the amount of data to refresh by reloading only new or updated data. An incremental refresh policy can be configured for a table, and on refresh that table will be split into partitions organized by date granularities (i.e. years, quarters, months or days) according to the policy. Each partition can then refresh separately. On every refresh, partitions will be managed automatically, creating and loading new partitions, reloading partitions when source data has been updated, and merging historical partitions.
Partitioning tables by non-date columns is also an option. Custom partitioning allows more flexibility than the partitions automatically created by incremental refresh policies, at the cost of some more overhead to manage.
When to use?
Where to configure?
What are the limits?
What to look out for?
Query caching stores the results of queries for report landing pages so visuals can be rendered faster. This feature is available to import semantic models in Premium or Embedded capacities, and the caches will be refreshed after a refresh requested through the interface (scheduled or manual on-demand). Refreshes requested through the API or XMLA endpoints will not automatically refresh query caches.
In Direct Lake storage mode, data is loaded into the semantic model's VertiPaq engine. Just like Import mode, queries run against in-memory columns. How data gets into the VertiPaq engine is different, though: instead of the import and refresh process, Direct Lake uses column transcoding to load Parquet columns directly from OneLake into VertiPaq. Direct Lake is a deep topic we’ve covered before, but what’s relevant to the topic of data refresh is the concept of framing. Framing controls which data version the semantic model sees. A frame is essentially a pointer to a specific point-in-time snapshot based on the Delta Table transaction log, similar to how the import storage mode captures data as it existed at a specific point in time.
There are different ways to keep data imported to Power BI up to date. Understanding them will help you choose the right approach, and in doing so help keep the data landscape efficient and fresh.
Scheduled refresh is a convenient option if there are no special requirements. If there are, then process-driven refresh offers a myriad of customization options and partial refresh can cut down on amount of data to refresh, at the cost of some more overhead to manage. Remember that cached queries need to be refreshed too, and make use of framing in Direct Lake to select which frame of recent data is best suited to consumption.
Take your semantic models further with Tabular Editor.
Give Tabular Editor a spin