Three ways to import Lakehouse data into a semantic model

[[About|Andrew Boerger]] | April 20, 2025 ## Introduction Despite the introduction of Direct Lake models, Import mode remains a popular and reliable option for Power BI semantic modeling. Given that Import models aren’t going anywhere, how can we connect to our Delta tables sitting in OneLake? This post shares three options to import Lakehouse data, each with advantages, limitations, and implications for performance and flexibility. ## Option 1: `Sql.Database` This is the default method when connecting to a Lakehouse from the OneLake catalog in Power BI Desktop via the SQL endpoint. **Note:** Connecting to OneLake creates a Direct Lake model. If you're importing, click the chevron next to "Connect" and chose the SQL endpoint: ![[Pasted image 20250419203543.png]] ![[Pasted image 20250419203709.png]] ```m let Source = Sql.Database("<YourSQLEndpoint>", "<YourLakehouseName>"), Table = Source{[Schema="<YourSchema>", Item="<YourTableOrView>"]}[Data] in Table ``` - Connects to the Lakehouse's **SQL endpoint** - Select tables and views - Vulnerable to the dreaded SQL endpoint sync delay, particularly if you're loading the model immediately after an ETL process in the Lakehouse completes. ## Option 2: `AzureStorage.DataLake` + `DeltaLake.Table` These next two options don't have an easy GUI in Power BI Desktop. Our second option is thanks to [Chris Webb's blog post](https://blog.crossjoin.co.uk/2023/11/29/read-data-from-delta-lake-tables-with-the-deltalake-table-m-function/), and allows us bypass the SQL endpoint entirely and query Delta tables directly from storage. ```m let Source = AzureStorage.DataLake("https://onelake.dfs.fabric.microsoft.com/<YourWorkspaceGUID>/<YourLakehouseGUID>/Tables/<YourTableName>/", [HierarchicalNavigation=true]), ToDelta = DeltaLake.Table(Source) in ToDelta ``` - **No SQL endpoint required** - we connect to storage - Targets the Delta format tables stored in OneLake - Requires workspace and lakehouse GUIDs ## Option 3: `Lakehouse.Contents` This is the default method when you connect to a Lakehouse using a Dataflow Gen2. This connector provides a browsing experience through all your accessible Lakehouses, as well as all objects within them - files and tables alike. ![[Pasted image 20250420121404.png]] ```m let Source = Lakehouse.Contents([]), Workspace = Source{[workspaceId="<YourWorkspaceGUID>"]}[Data], Lakehouse = Workspace{[lakehouseId="<YourLakehouseGUID>"]}[Data], Table = Lakehouse{[Id="<YourTableName>", ItemKind="Table"]}[Data] in Table ``` You can also choose whether to use the SQL endpoint! On the Microsoft Fabric subreddit, u/CurtHagenlocher noted [an undocumented option](https://www.reddit.com/r/MicrosoftFabric/comments/1ji1f3q/comment/mji59t9/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button) for the `Lakehouse.Contents` connector - not only is this a hybrid connector, but we can control how we import Delta tables using the `Lakehouse.Contents([EnableFolding=false])` option. With folding disabled, the SQL endpoint is not used, and the Delta table is loaded directly from OneLake storage, similar to method #2. - **Browse lakehouses, tables, and files** - **Use or bypass SQL endpoint** depending on `EnableFolding` - Most flexible method ## Performance Comparison I ran a quick test to compare performance across each method. This is not a formal benchmark, only a quick test to see if there were any relative performance differences. In each case, the same table was used - a table containing 18.8 million rows of Power BI audit logs. Absolute import duration will vary dramatically according to the shape, size, and cardinality of your tables. | Method | SQL Endpoint | Rows/Size | Refresh Time | Lineage View | | ------------------------------------------- | ------------ | ---------- | ------------ | ------------ | | `Sql.Database` | Yes | 18.8M rows | 12:30 min | Good | | `AzureStorage.DataLake` + `DeltaLake.Table` | No | 761MB | 8:46 min | Poor | | `Lakehouse.Contents` | Yes | 18.8M rows | 13:54 min | Good | | `Lakehouse.Contents([EnableFolding=false])` | No | 761MB | 8:38 min | Good | Regarding capacity units, further testing is needed, but in general bypassing the SQL endpoint results in less CU usage. Referencing the Fabric Capacity Metrics app, our import SQL queries incur extra compute usage from "SQL Endpoint Query" operations, in addition to the OneLake Read operations which all methods incur. ## Other considerations For higher volume data, incremental refresh becomes important. When not folding through the SQL endpoint, it becomes trickier to get a performance improvement with incremental refresh. Chris Webb had [another great blog post](https://blog.crossjoin.co.uk/2023/12/17/incremental-refresh-on-delta-tables-in-power-bi/) sharing how Delta Lake partitioning can help out here - at least for truly massive tables where partitioning is actually recommended. How about lineage view? When publishing a model containing all import methods, let's see what we get: ![[Pasted image 20250420113706.png]] The `SQL.Database` option shows lineage from the SQL endpoint of the lakehouse, and the `Lakehouse.Contents` option shows lineage from the lakehouse itself. The `AzureStorage.DataLake` + `DeltaLake.Table` option doesn't provide great lineage, but this isn't much of a surprise since it's just connecting to our lakehouse as a generic data lake storage URL. What about [auto-binding](https://learn.microsoft.com/en-us/fabric/cicd/deployment-pipelines/understand-the-deployment-process?tabs=new-ui#autobinding)? Where supported, auto-binding is a convenient experience when using Fabric deployment pipelines. If I promote my import model from Dev to Test, the pipeline should recognize if there is a corresponding Test lakehouse and bind my import model to it automatically. This is the experience we currently have with Dataflow Gen1. Do any of these import methods auto-bind as I promote the import model through a deployment pipeline? Unfortunately not: ![[Pasted image 20250420114247.png]] After we promote to Test, our import model stays connected to our Dev lakehouse, despite a corresponding Test lakehouse existing in the same workspace. Giving up on auto-binding for now, we instead set up deployment rules to swap out the workspace GUID, lakehouse GUID, and SQL endpoint between environments: ![[Pasted image 20250420115508.png]] After applying these deployment rules, let's check our import model's lineage in Test: ![[Pasted image 20250420115836.png]] Now we have a good result. **Note:** Based on my testing, the `Lakehouse.Contents` option will show lineage from the Lakehouse only, even when your queries have folding enabled and are using the SQL endpoint. ## Conclusion Of the three, **Lakehouse.Contents** stands out as the most flexible and powerful approach. We get a GUI browser, good lineage, and with `EnableFolding=false` we can choose to use or bypass the SQL endpoint as needed. It’s become my go-to method for incorporating lakehouse data in my import semantic models.