This summary is produced by the author, and not by AI.
In Power BI, you need to make a semantic model (also called a data model) to support analyses with reports or dashboards, pivot tables, and AI. One of the first decisions that you make when you connect to data in Power BI is how you will store it. By default, Power BI will import that data in memory. This means that Power BI loads a copy of the data that you query from report visuals (which generate DAX queries behind-the-scenes). This import storage mode is the simplest and most convenient way to connect to data, but it isn't the only option. Choosing between storage mode types is one of the earliest and most important decisions that you might make in a Power BI project, and it can have significant effects on your model design or functionality.
There are several different storage modes for Power BI, which are summarized broadly and in terms of capabilities in the Microsoft documentation. Since these storage modes are determined by table, you can also combine them in a single model in some scenarios, called a composite model. We discuss composite models later in this article.
The following diagram gives you a brief overview of the different storage mode types:
In this article, we describe each of these storage modes in brief and provide some tips or examples on how to best choose between and use them in a Power BI project. In short:
It is important that you consider the storage mode carefully; do not make the decision about what storage mode to use frivolously, or it will cause you a lot of pain (and potentially a lot of money) down the road. The following sections go into some more detail.
Import is the most common and simplest storage mode, preferred for the majority of scenarios. It’s supported by all data sources and file types. With an import storage mode, you get the best performance, because data is stored in memory. However, when you want to update your data model to include the latest data, then you need to perform a refresh. For larger data volumes or complexity, this refresh can become a tax for some teams, where refreshes become long or put strain on source systems. However, in many cases, this can be avoided by following recommended practices, such as:
Import storage mode is appropriate in scenarios like the following:
In general, Import should be considered the default and preferred way to connect to and store data in Power BI. Other approaches should be considered when you have an explicit reason to use them, and a benefit (over cost) for doing so.
DirectQuery is an alternate storage mode where you query data directly in a supported data source (such as an Azure SQL database, Snowflake, or Databricks). With import, visuals query in-memory data with DAX. With DirectQuery, there’s an additional step where that DAX is translated into SQL and sent to the source system.
This means that when a user first views a report that’s connected to a semantic model in DirectQuery mode, they see relatively recent data as of the last page refresh or interaction. In contrast, if the semantic model were in import, then the data would only reflect the situation as of the latest semantic model refresh. As such, DirectQuery might be appropriate for some situations where you need near-real time reporting, or higher demands for data freshness.
However, since the data isn't stored in memory, it means that queries usually take longer to complete when you use DirectQuery. That means that your reports take longer to show data, and you have a higher indirect cost of optimizing data models and calculations. You have to consider special configuration and optimizations that pertain only to DirectQuery models, some of which are exclusive to PPU, Premium, or Fabric capacities, like automatic aggregations and user-defined aggregations. Furthermore, querying your data source system can have a direct cost, so higher usage of reports can lead to higher bills. An example of this is would be higher costs from a consumption-based data warehouse such as Snowflake or Databricks.
Direct Query might be appropriate in scenarios like the following:
In general, Direct Query is considered an expensive and poorly-performing approach that is rarely needed over import. If you have higher demands for data freshness, then you might want to just consider more frequent refreshes of your data model. If you have Microsoft Fabric, then you should consider Direct Lake storage mode as an alternative to Direct Query, which provides comparable data freshness, but superior performance.
If you have Microsoft Fabric, then you can configure a semantic model to use Direct Lake storage mode. This is a more complex and technical storage mode to explain; we will only cover the basics, here. Direct Lake only works with data that you’ve stored in tables (or in table shortcuts) in OneLake, the storage layer for your data in Fabric. Specifically, if you have tables in a Fabric lakehouse, data warehouse, SQL database, or mirrored database, then you can create a Direct Lake semantic model that can query these tables directly without loading them into the semantic model.
What makes Direct Lake unique is that you experience better performance than DirectQuery, while also supporting demands for higher data freshness. Refreshes with Direct Lake involves a metadata operation called framing which updates the point-in-time reference the Delta log. Then, column data is only loaded into memory (or transcoded) as it’s needed. Again, we’re simplifying the process for this overview article; the key benefit is that you don’t deal with semantic model refresh times. Instead, all data transformation for tables happens upstream with tools like notebooks or pipelines that can do these things more efficiently.
A key difference between import, DirectQuery, and Direct Lake is that Direct Lake is only available on workspaces that use a Fabric capacity license mode. If you don’t have Microsoft Fabric, then you can’t use Direct Lake.
You will notice in the Microsoft Documentation that they refer to multiple different types of Direct Lake:
A key consideration for Direct Lake is the heightened complexity when compared to import or even DirectQuery storage modes (compounded by the existence of two different types of Direct Lake). With Direct Lake, you need to consider advanced technical concepts like the partitioning and management of Delta Tables in OneLake, which must be managed programmatically using a notebook. Query performance on Direct Lake semantic models involves many specific considerations that don’t apply to other storage modes. Furthermore, adding columns or tables requires changes to the underlying delta tables, which is more complex for the average Power BI user to do. As such, Direct Lake might be inappropriate for non-technical teams (like business teams that perform decentralized analytics) or teams early in their Power BI implementation journey that lack the technical skills to manage a Direct Lake model (and therefore the underlying Delta tables) effectively.
For instance, you should not switch to DirectQuery or Direct Lake just because you are having difficulties with the refresh of your import semantic model. First, you should try to diagnose and optimize this refresh, and understand why it’s slow. For many models, slow refreshes can be made faster with a few optimization tweaks, which is a lot less time, effort, and cost than migrating to a different (and more complex) storage mode.
However, over time, it is likely that Microsoft will introduce tools and user interface changes that streamline or reduce complexity for the average Power BI user, making Direct Lake more accessible and easier to use.
Storage modes are determined at the level of table partitions. So, it’s possible that you might want to configure some tables to be imported, while other tables from different data sources are queried directly. For instance, you could have a model that has Sales data as well as Product and Customer dimensions imported, but then has an InventorySnapshots table from a different database that uses Direct Query storage mode. A model that contains data from different data sources and which have different storage modes is called a composite model.
There is some confusion about the term “composite model”. Many people refer to composite model only as one that only combines Import and DirectQuery tables, or that only combines tables from a published Power BI model with another data source. “Composite” as a term simply just means that there are multiple storage modes in the same data model. Both examples are composite models, but a composite model could also combine Import and Direct Lake tables, DirectQuery and Direct Lake tables… irrespective of what the data source is (given certain source- or scenario-specific constraints). Composite models might be suitable in scenarios like the following:
In general, composite models are a specific solution to specific problems. Using a composite model adds complexity and creates additional considerations for model design, functionality, and performance that you wouldn’t otherwise have. Always try to use the simplest solution feasible for your scenario. For instance, you might not want to use a composite model in the scenarios like the following:
If you do plan to use composite models, prepare that you will need to spend extra time and effort to educate yourself about the consequences for how you build, manage, and distribute your model. Regarding performance, SQLBI also has a specific white paper and other free training material about this topic.
The following two sections describe some configurations that are specific to tables, and not data models as a whole.
When you change a DirectQuery table to import in a composite model, you can end up with limited relationships. Limited relationships negatively affect model performance, because joins can’t propagate to the source system. To elaborate, detailed data must be transferred from the DirectQuery source to the VertiPaq engine, then the join is done in memory on data without an index (which is slow). To avoid this, Power BI lets you configure the storage mode to dual for the partitions of certain related tables.
Note that Power BI Desktop exposes this (and storage modes in general) at the table level. Technically, however, storage modes are properties of partitions. Except for hybrid tables, all partitions in a table should share the same storage mode.
A scenario when this happens could be a composite model that has fact tables with mixed storage modes (like a Sales table that is imported, and a HistoricalSales table that is DirectQuery). In this scenario, you configure dual storage mode for your dimensions. In a nutshell, dual storage mode means that the table will use import, Direct Lake, or DirectQuery depending on the query.
In summary, you just should be aware of dual table storage mode when you’ll use composite models. Depending on your model design, they might be necessary.
Earlier, we discussed the concept of incremental refresh in an import model. With incremental refresh, you can configure a table to only refresh new or changing rows. This can speed up refresh times quite dramatically. How this works is the table is broken up into “chunks” called partitions (usually based on a date field) and only certain partitions (usually the most recent) are refreshed.
However, it is also possible to configure a table such that some partitions are imported, while the most recent one is configured to use DirectQuery. This hybrid table configuration supports scenarios where you have demands for higher data freshness, but that only pertains to the most recent period (like today or this week), while other periods are refreshed (like this month) and the other historical data are static; archived. In this way, we mitigate performance problems inherent to DirectQuery by only querying a very small amount of data that is relevant in a particular freshness window.
Realistically, you should only consider hybrid tables over traditional incremental refresh if this data freshness requirement is very real and explicit. For the vast majority of scenarios, this isn't necessary; standard incremental refresh will be sufficient.
A final note is that hybrid tables are a premium feature. This means that you cannot use hybrid tables in semantic models published to a Pro workspace. You can only use them if the workspace has Premium-Per User, Premium capacity, or Fabric capacity license modes.
Choosing a storage mode is a very important decision that you make early-on in a Power BI project. Making the wrong decision can have disastrous consequences, and it isn't uncommon to see teams and projects that chose wrongly because of assumptions or misunderstandings in how these storage modes work and what their benefits are. Realistically, import storage mode is the preferred option for the vast majority of scenarios. In some situations, you might want (or need) to use DirectQuery, Direct Lake, or composite models, but doing so means that you must spend some extra time and effort to adequately manage the potential complexity or limitations that might come with these choices.
Before you decide, it is worthwhile to conduct a small test yourself and identify what the consequences are in terms of function, performance, and cost.
Take your semantic models further with Tabular Editor.
Give Tabular Editor a spin