Modes of Connectivity in Power BI
The core functionality of any business intelligence (BI) tool is to gather data from various sources and correlate it to generate quality reports and dashboards.
Power BI offers two modes of connectivity for data: Import and Direct Query.
Import Mode: In this mode, Power BI connects to the data source and downloads the entire data into the Power BI model. The data is stored within the Power BI model, and a fresh copy can be obtained by refreshing the model. The PBIX file internally stores the model data in a compressed format. When published to the Power BI Service, the dataset model is stored on the Common Data Model, which is essentially an Azure Managed SQL Server instance in the backend.
Over the past 20 years, Information Technology has played a dominant role in business processes. The volume of data has significantly increased, leading businesses to adopt enterprise database platforms such as SQL Server, Oracle, and others. These platforms are capable of efficiently storing, retrieving, and processing large amounts of data.
2.Direct Query Mode: In this mode, Power BI connects to the data source but does not download the entire data. Instead, it generates SQL queries or their equivalent to fetch data specifically for visualization purposes. The optimization features of the underlying data source engine are utilized to retrieve the data.
Here are the key differences between Import Mode and Direct Query Mode:
Import Mode:
Storage: Data is stored within the Power BI model.
Volume: Can store up to a maximum of 1 GB of data in the model.
Performance: Fast with smaller volumes of data, but performance may degrade as the data volume increases.
Compatibility: Compatible with any type of data source.
Row Level Security (RLS): Supports RLS.
DAX & Transformations: Supports all Power Query transformations and DAX calculations.
Refresh Schedule: Users can set automatic refresh intervals based on subscriptions.
Availability: If data refresh fails, the last persisted data in the model is used to prepare visuals.
Direct Query Mode:
Storage: Does not store data within the model.
Volume: No restriction on data volume.
Performance: Fast if proper indexes are created on the database; otherwise, it may underperform.
Compatibility: Supported for Database Server type of data sources.
Row Level Security (RLS): Supports RLS, but care must be taken when using Service Accounts to connect to the database, as it impersonates the identity of the Power BI user.
DAX & Transformations: Supports only those transformations and DAX calculations for which Power BI can generate equivalent SQL queries.
Refresh Schedule: Not applicable since data is not stored in the model; data is fetched from the server when the user opens the report.
Availability: If data refresh fails due to the unavailability of the data source, the entire report goes blank as the data is not stored in the model, and there is no option to revert to the previous state.
In most scenarios, Direct Query mode outperforms Import mode due to the following reasons:
Power BI's visuals require various operations such as merging, grouping, aggregating, and filtering to prepare the underlying tabular data. Both DAX (powered by SQL Server Analysis Service) and M Script (powered by Power Query) perform ETL (Extract, Transform, Load) and visualization-related calculations. However, compared to database engines, they underperform in terms of optimized calculations. Database engines are better suited for handling data with indexing capabilities, which are lacking in DAX and Power Query. As the data size increases, the performance difference becomes more noticeable.
Import mode may outperform Direct Query mode in the following scenarios:
When the indexing feature of the database is not utilized, causing the database server to take longer to read data from the disk. In such cases, in-memory calculations performed by Power BI may work faster.
Running the database server with limited resources (e.g., low CPU or RAM) can cause the server to struggle with fetching data. Although Power BI Service runs on shared Azure resources, it may outperform the database server in this scenario.
When multiple concurrent users query the database, Power BI fires multiple SQL queries for different visuals. If there are locks on the table, it may result in longer wait times and freeze Power BI visuals.
When there is a long connection time to an on-premises server over a high-latency VPN. If the data volume is not substantial, it would be more efficient to import the data rather than using direct query. Import mode ensures a better experience with the Azure resource in the Power BI Service backend.
It's important to note that Direct Query is a live connection to the data source, but it is not real-time in the sense of a continuous streaming of data like a security or stock market ticker. When a report is opened, a fresh connection is established to the database server to fetch the latest data. However, the visualizations do not automatically update in real-time. They become static after the initial screen is loaded. Refreshing the report or reopening it will load the latest data.