<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[Bradley Schacht]]></title><description><![CDATA[Bradley Schacht is a Principal Program Manager on the Microsoft Fabric product team based in Saint Augustine, FL.]]></description><link>https://bradleyschacht.com</link><generator>RSS for Node</generator><lastBuildDate>Tue, 21 Apr 2026 17:11:55 GMT</lastBuildDate><atom:link href="https://bradleyschacht.com/rss.xml" rel="self" type="application/rss+xml"/><language><![CDATA[en]]></language><ttl>60</ttl><item><title><![CDATA[Azure Data Studio is dead...now what?]]></title><description><![CDATA[It was a cold, wet, Tuesday morning. In hindsight I should have known the weather was simply foreshadowing for the harsh reality that would soon confront me. I logged on, opened my web browser, and read the headline that shook me to my core.
Azure Da...]]></description><link>https://bradleyschacht.com/azure-data-studio-is-dead-now-what</link><guid isPermaLink="true">https://bradleyschacht.com/azure-data-studio-is-dead-now-what</guid><category><![CDATA[SQL]]></category><category><![CDATA[development]]></category><dc:creator><![CDATA[Bradley Schacht]]></dc:creator><pubDate>Tue, 12 Aug 2025 11:00:35 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1755089583956/c596987c-201a-42d1-abcf-22b28b04731c.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>It was a cold, wet, Tuesday morning. In hindsight I should have known the weather was simply foreshadowing for the harsh reality that would soon confront me. I logged on, opened my web browser, and read the headline that shook me to my core.</p>
<p><b><i>Azure Data Studio is dead!</i></b></p>

<p>I resisted the move to Azure Data Studio for so long. Why would I need a different tool when I have <a target="_blank" href="https://aka.ms/ssms">SQL Server Management Studio</a>? SSMS did everything I needed. It was proven. It was solid. But the future, I was told, was Azure Data Studio. After a couple of years I finally decided I was going to jump on the ADS (Azure Data Studio) train during my annaul computer rebuild. I made the decision that I would not be installing SSMS and I would be forcing myself to use ADS.</p>
<p>I typed up about half a blog post, realized it was really more of a complaint post about Azure Data Studio and now I'm starting over. Where were we? Ah yes, I've been using Azure Data Studio as my primary development tool for SQL for several years now. I don't even install SSMS on my computers anymore. It's just not necessary for what I need on a daily basis. When I heard ADS was being retired in favor of the MSSQL extension for VS Code I was sad. I have used the MSSQL extension before and it brought back many of the feelings I had in my early days of switching from SSMS to ADS. It felt unfinished, half baked, and had more rough edges than smooth. It just wasn't ready for prime time. </p>
<p>Over the last year though there have been quite a few major updates to the extension and I am happy to say, I at the point where VS Code is my daily driver rather than Azure Data Studio. I do feel like there are things that need some help. The result grid for example just feels off. It's a strange mix of missing functionality, clunky rendering, and a fonts that don't match the rest of VS Code. But the core of what I need at this point checks two important boxes: available and functional. </p>
<p>Now, on to the main point....</p>
<p>How do I configure VS Code's MSSQL extension to be most optimal for my workflow?What settings am I changing to make the environment more functional vs. the defaults? And how can I make VS Code match what I had in Azure Data Studio?</p>
<p>First, you'll need to add the <a target="_blank" href="https://learn.microsoft.com/en-us/sql/tools/visual-studio-code-extensions/mssql/mssql-extension-visual-studio-code?view=sql-server-ver17">MSSQL extension</a> to VS Code if you haven't already. Next, you'll find that not all settings from Azure Data Studio exist in the VS Code side. </p>
<p>From there, let's explore my settings file.</p>
<pre><code class="lang-json">{
    <span class="hljs-attr">"editor.acceptSuggestionOnEnter"</span>: <span class="hljs-string">"off"</span>,
    <span class="hljs-attr">"editor.fontSize"</span>: <span class="hljs-number">14</span>,
    <span class="hljs-attr">"editor.minimap.enabled"</span>: <span class="hljs-literal">false</span>,
    <span class="hljs-attr">"editor.mouseWheelZoom"</span>: <span class="hljs-literal">true</span>,
    <span class="hljs-attr">"markdown.preview.scrollEditorWithPreview"</span>: <span class="hljs-literal">false</span>,
    <span class="hljs-attr">"markdown.preview.scrollPreviewWithEditor"</span>: <span class="hljs-literal">false</span>,
    <span class="hljs-attr">"mssql.copyRemoveNewLine"</span>: <span class="hljs-literal">false</span>,
    <span class="hljs-attr">"mssql.format.alignColumnDefinitionsInColumns"</span>: <span class="hljs-literal">true</span>,
    <span class="hljs-attr">"mssql.format.datatypeCasing"</span>: <span class="hljs-string">"uppercase"</span>,
    <span class="hljs-attr">"mssql.format.keywordCasing"</span>: <span class="hljs-string">"uppercase"</span>,
    <span class="hljs-attr">"mssql.openQueryResultsInTabByDefaultDoNotShowPrompt"</span>: <span class="hljs-literal">true</span>,
    <span class="hljs-attr">"mssql.persistQueryResultTabs"</span>: <span class="hljs-literal">true</span>,
    <span class="hljs-attr">"powershell.integratedConsole.focusConsoleOnExecute"</span>: <span class="hljs-literal">false</span>,
    <span class="hljs-attr">"security.workspace.trust.untrustedFiles"</span>: <span class="hljs-string">"open"</span>,
    <span class="hljs-attr">"workbench.colorTheme"</span>: <span class="hljs-string">"Default Light Modern"</span>,
    <span class="hljs-attr">"workbench.editor.pinnedTabsOnSeparateRow"</span>: <span class="hljs-literal">true</span>,
    <span class="hljs-attr">"workbench.editor.untitled.labelFormat"</span>: <span class="hljs-string">"name"</span>,
    <span class="hljs-attr">"workbench.startupEditor"</span>: <span class="hljs-string">"none"</span>,
    <span class="hljs-attr">"workbench.tree.indent"</span>: <span class="hljs-number">20</span>,
    <span class="hljs-attr">"workbench.tree.renderIndentGuides"</span>: <span class="hljs-string">"onHover"</span>,
    <span class="hljs-attr">"mssql.connectionGroups"</span>: [
        <span class="hljs-comment">// I have root, Azure SQL Database, and Microsoft Fabric</span>
    ],
    <span class="hljs-attr">"mssql.connections"</span>: [
        <span class="hljs-comment">// You know I can't give you this information!</span>
    ]
}
</code></pre>
<p>Of course, your settings will vary from these, but the thing I like about Azure Data Studio and VS Code is everyone can make the editor nearly their ideal editor. For me, I don't really care for intellisense, so turning off accept suggestions on Enter allows me to see intellisense suggestions when I may want to, but doesn't put random items into the editor when I really just want to go to a new line in my code. Here are some thoughts on these settings. </p>
<ul>
<li>Accept Suggestions on Enter: See above.</li>
<li>Font Size: 14 works for me well across all my computers as I sync these settings between my Surface Laptop and my desktop computer. </li>
<li>Minimap: I don't really find a lot of value in this. You can't really see much of anything in it. It's kind of nice when you are searching to see the highlights a little easier, but generally speaking I find it gets in my way.</li>
<li>Mouse Wheel Zoom: I like to zoom with my mouse wheel. Enough said. </li>
<li>Markdown Preview Scroll Editor/Preview: I write my blog posts in markdown, GitHub pages, training docs, and contribute to Microsoft docs. I just don't find the scrolling sync very well and becomes more of a distraction than a help.</li>
<li>SQL Copy Remove New Line: I want the line feeds to stay in the text I copy.</li>
<li>SQL Format Options: Just some preference for how I like my SQL code to be formatted. </li>
<li>SQL Persist Query Results Tab: When I switch between queries and come back, I want to pick up right where I left off rather than needing to scroll back.</li>
<li>PowerShell focus console on execute: I run a script and I want to be able to make changes. Stop moving me someplace else!</li>
<li>Untrusted files: Just let me open my files!</li>
<li>Color Theme: I like light mode. Don't talk to me about it. Get out of here with your dark mode...unless it's late at night and I'm in my office where it's dark and the light hurts my eyes. But that's temporary, LIGHT MODE FOREVER!!</li>
<li>Pinned tabs on separate row: Just makes it easier to organize the tabs. I go back and forth on this one. </li>
<li>Tab label format: I like to keep a tidy tab. Just the name please.</li>
<li>Startup editor: Nothing. Give me a blank canvas.</li>
<li>Tree indent: I like a little more separation between my indentions.</li>
<li>Tree indent guides: Like my tabs, I like to keep my tree clean. Just show me the guides when I move the mouse over and need to do some navigating. </li>
</ul>
<p>I've played around with keyboard shortcuts as well, perhaps I'll go into those details more in another post but the short version is there are some keyboard shortcuts I can't live without (CTRL + E to run a query) but there are some things that I'm trying to rewire my brain to do differently given I use VS Code for more than just SQL development (replacing CRTL + R to hide the results with CTRL + J to toggle the panel).</p>
<p>There are a few more settings and keyboard shortcuts that I had in Azure Data Studio that don't seem to have an equal setting in VS Code or may not be relevant becuase of slight differences in how VS Code is setup. But that's how I have things setup in a nutshell. </p>
<p>What are you doing for development these days? </p>
<p>Are you firmly in camp Azure Data Studio for as long as it will run, SSMS forever, already loving VS Code, forcing yourself down a path to get with the times, or perhaps you run a little bit of everything at different times.</p>
]]></content:encoded></item><item><title><![CDATA[Working with Fabric shortcuts in PySpark]]></title><description><![CDATA[Shortcuts are an important part of the Fabric story. They allow you to access data in storage (OneLake, ADLS Gen2, AWS S3, etc.) without having to copy the data into Fabric. They have many uses including:

Creating a hub and spoke architecture (OneLa...]]></description><link>https://bradleyschacht.com/working-with-fabric-shortcuts-in-pyspark</link><guid isPermaLink="true">https://bradleyschacht.com/working-with-fabric-shortcuts-in-pyspark</guid><dc:creator><![CDATA[Bradley Schacht]]></dc:creator><pubDate>Wed, 19 Jun 2024 11:30:55 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1718743366309/6bfa35e4-57bc-47e6-9a49-4dd439146d3e.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><a target="_blank" href="https://learn.microsoft.com/en-us/fabric/onelake/onelake-shortcuts">Shortcuts</a> are an important part of the Fabric story. They allow you to access data in storage (OneLake, ADLS Gen2, AWS S3, etc.) without having to copy the data into Fabric. They have many uses including:</p>
<ul>
<li><p>Creating a hub and spoke architecture (OneLake shortcut)</p>
</li>
<li><p>Leveraging an existing data lake (ADLS Gen2 shortcut)</p>
</li>
<li><p>Accessing data prepared in another compute engine (ADLS Gen2 shortcut)</p>
</li>
<li><p>Using data from other clouds for analysis (AWS S3 shortcut)</p>
</li>
<li><p>Connecting to on-prem data (S3 compatible shortcut with the on-prem data gateway)</p>
</li>
</ul>
<p>There are two ways to create shortcuts:</p>
<ul>
<li><p>The Fabric UX</p>
</li>
<li><p>Fabric APIs</p>
</li>
</ul>
<p>Today, focus is on the APIs.</p>
<p>I often build and tear down demos or testing environments on a regular basis which means I like to have code-based configuration options. Many customers want to do similar things or need to create dozens of shortcuts and want to reduce the click tax of the UX.</p>
<p>Like many things in Python, I have to go back and reference how to do them multiple times. I'm a SQL person after all. When I can build a way for me to not need to write extra code or go back to the docs then I'm going to do it. That's exactly what I've done with the shortcut maintenance.</p>
<p>Here is the function I've built. It is just a wrapper for the Get, Create, and Delete APIs where I can pass the relevant items without needing to go rebuild the request URLs each time and also returns the status and any errors, so I don't have to go capture those each time. We'll discuss the parameters for this function shortly.</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> json
<span class="hljs-keyword">import</span> requests
<span class="hljs-keyword">from</span> requests <span class="hljs-keyword">import</span> status_codes
<span class="hljs-keyword">import</span> sempy.fabric <span class="hljs-keyword">as</span> fabric
<span class="hljs-keyword">import</span> time

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">fn_shortcut</span>(<span class="hljs-params">action, shortcut_path, shortcut_name, target = None</span>):</span>

    request_headers = {
        <span class="hljs-string">"Authorization"</span>: <span class="hljs-string">"Bearer "</span> + mssparkutils.credentials.getToken(<span class="hljs-string">"pbi"</span>),
        <span class="hljs-string">"Content-Type"</span>: <span class="hljs-string">"application/json"</span>
    }

    request_body = {
        <span class="hljs-string">"path"</span>: shortcut_path,
        <span class="hljs-string">"name"</span>: shortcut_name,
        <span class="hljs-string">"target"</span>: target
    }

    lakehouse_id = fabric.get_lakehouse_id()
    workspace_id = fabric.get_workspace_id()

    <span class="hljs-comment"># Get a shortcut</span>
    <span class="hljs-keyword">if</span> action == <span class="hljs-string">'Get'</span>:
        response = requests.request(method = <span class="hljs-string">"GET"</span>, url = <span class="hljs-string">f'https://api.fabric.microsoft.com/v1/workspaces/<span class="hljs-subst">{workspace_id}</span>/items/<span class="hljs-subst">{lakehouse_id}</span>/shortcuts/<span class="hljs-subst">{shortcut_path}</span>/<span class="hljs-subst">{shortcut_name}</span>'</span>, headers = request_headers)

    <span class="hljs-comment"># Create a shortcut</span>
    <span class="hljs-keyword">if</span> action == <span class="hljs-string">'Create'</span>:
        response = requests.request(method = <span class="hljs-string">"POST"</span>, url = <span class="hljs-string">f'https://api.fabric.microsoft.com/v1/workspaces/<span class="hljs-subst">{workspace_id}</span>/items/<span class="hljs-subst">{lakehouse_id}</span>/shortcuts?shortcutConflictPolicy=Abort'</span>, headers = request_headers, json = request_body)

    <span class="hljs-comment"># Delete a shortcut</span>
    <span class="hljs-keyword">if</span> action == <span class="hljs-string">'Delete'</span>:
        response = requests.request(method = <span class="hljs-string">"DELETE"</span>, url = <span class="hljs-string">f'https://api.fabric.microsoft.com/v1/workspaces/<span class="hljs-subst">{workspace_id}</span>/items/<span class="hljs-subst">{lakehouse_id}</span>/shortcuts/<span class="hljs-subst">{shortcut_path}</span>/<span class="hljs-subst">{shortcut_name}</span>'</span>, headers = request_headers)

        <span class="hljs-keyword">if</span> response.status_code == <span class="hljs-number">200</span>:
            <span class="hljs-comment"># Wait for the delete operation to fully propogate</span>
            <span class="hljs-keyword">while</span> mssparkutils.fs.exists(<span class="hljs-string">f'<span class="hljs-subst">{shortcut_path}</span>/<span class="hljs-subst">{shortcut_name}</span>'</span>):
                time.sleep(<span class="hljs-number">5</span>)

    <span class="hljs-comment"># Build the return payload for a success response</span>
    <span class="hljs-keyword">if</span> (response.status_code &gt;= <span class="hljs-number">200</span> <span class="hljs-keyword">and</span> response.status_code &lt;= <span class="hljs-number">299</span>):
        response_content = {
            <span class="hljs-string">"request_url"</span>           : response.url,
            <span class="hljs-string">"response_content"</span>      : {} <span class="hljs-keyword">if</span> response.text == <span class="hljs-string">''</span> <span class="hljs-keyword">else</span> json.loads(response.text),
            <span class="hljs-string">"status"</span>                : <span class="hljs-string">"success"</span>,
            <span class="hljs-string">"status_code"</span>           : response.status_code,
            <span class="hljs-string">"status_description"</span>    : status_codes._codes[response.status_code][<span class="hljs-number">0</span>]
            }

    <span class="hljs-comment"># Build the return payload for a failure response</span>
    <span class="hljs-keyword">if</span> <span class="hljs-keyword">not</span> (response.status_code &gt;= <span class="hljs-number">200</span> <span class="hljs-keyword">and</span> response.status_code &lt;= <span class="hljs-number">299</span>):
        response_content = {
            <span class="hljs-string">"request_body"</span>          : request_body,
            <span class="hljs-string">"request_headers"</span>       : request_headers,
            <span class="hljs-string">"request_url"</span>           : response.url,
            <span class="hljs-string">"response_text"</span>         : json.loads(response.text),
            <span class="hljs-string">"status"</span>                : <span class="hljs-string">"error"</span>,
            <span class="hljs-string">"status_code"</span>           : response.status_code,
            <span class="hljs-string">"status_description"</span>    : status_codes._codes[response.status_code][<span class="hljs-number">0</span>]
        }

    <span class="hljs-keyword">return</span> response_content
</code></pre>
<p>Before we look at how I use this function, let's understand the properties for each API call so you have a reference point for the function's parameters. You will need to attach the notebook to a lakehouse and gather a little bit of information about your shortcut depending on what you are trying to accomplish (Get, Create, or Delete).</p>
<h1 id="heading-information-for-all-api-calls">Information for all API calls</h1>
<p>All the API calls need a few pieces of information for assembling the various URIs and authentication while create shortcut needs a few additional pieces of information. For each API call you will need:</p>
<ul>
<li><p>Lakehouse ID (The script will get this for you. Remember to attach a lakehouse.)</p>
</li>
<li><p>Workspace ID (The script will get this for you.)</p>
</li>
<li><p>Header which passes the bearer token (The script will get this for you.)</p>
</li>
</ul>
<h2 id="heading-get-shortcut">Get shortcut</h2>
<p>The <a target="_blank" href="https://learn.microsoft.com/en-us/rest/api/fabric/core/onelake-shortcuts/get-shortcut?tabs=HTTP">Get Shortcut API</a> returns the properties of the shortcut. Different values may be returned for OneLake vs. external shortcuts. You need:</p>
<ul>
<li><p>Shortcut Path (The relative location in the lakehouse like Files/Folder)</p>
</li>
<li><p>Shortcut Name (The name of the shortcut like MyShortcut in the path Files/Folders/MyShortcut)</p>
</li>
</ul>
<h2 id="heading-delete-shortcut">Delete shortcut</h2>
<p>The <a target="_blank" href="https://learn.microsoft.com/en-us/rest/api/fabric/core/onelake-shortcuts/delete-shortcut?tabs=HTTP">Delete Shortcut API</a> deletes an existing shortcut. You probably could have guessed that. For this you need the same properties as Get and, just like using Get, the Lakehouse and Warehouse IDs will be captured for you.</p>
<h2 id="heading-create-shortcut">Create shortcut</h2>
<p>The <a target="_blank" href="https://learn.microsoft.com/en-us/rest/api/fabric/core/onelake-shortcuts/create-shortcut?tabs=HTTP">Create Shortcut API</a> is where things get a little more complicated because we need to pass a request body which is slightly different for each type of shortcut that is created. The reference for those can be found in the <a target="_blank" href="https://learn.microsoft.com/en-us/rest/api/fabric/core/onelake-shortcuts/create-shortcut?tabs=HTTP#target">Create Shortcut - Target</a> docs. Because I mainly use ADLS Gen2 and OneLake shortcuts, I will include those here. Check out the docs for the others.</p>
<h3 id="heading-adls-gen2">ADLS Gen2</h3>
<p>There are three pieces of information to replace in the code below:</p>
<ul>
<li><p>Location is the storage account URI</p>
</li>
<li><p>Subpath is the directory to which the shortcut will point, including the container name</p>
</li>
<li><p>Connection ID is the Fabric connection ID to the storage account which includes how you will be authenticating to the storage account (Click the <strong>gear icon</strong> in the top right corner of the screen. Select <strong>Manage connections and gateways.</strong> View the settings for the connection from the list. Locate the <strong>Connection ID</strong>.)</p>
</li>
</ul>
<pre><code class="lang-python">target = {
    <span class="hljs-string">"adlsGen2"</span>: {
        <span class="hljs-string">"location"</span>: <span class="hljs-string">"https://StorageAccountNameHere.dfs.core.windows.net"</span>,
        <span class="hljs-string">"subpath"</span>: <span class="hljs-string">"/container/folder"</span>,
        <span class="hljs-string">"connectionId"</span>: <span class="hljs-string">"xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"</span>
    }
}
</code></pre>
<h3 id="heading-onelake">OneLake</h3>
<p>OneLake is a bit more straightforward because there is no connection involved. Just provide the workspace id, lakehouse id, and folder path where the data resides. The path can be to the Files or Tables section.</p>
<pre><code class="lang-python">target = {
    <span class="hljs-string">"OneLake"</span>: {
        <span class="hljs-string">"workspaceId"</span>: workspace_id,
        <span class="hljs-string">"itemId"</span>: lakehouse_id,
        <span class="hljs-string">"path"</span>: <span class="hljs-string">"Files/SomeFolder/SubFolder'
            }
        }</span>
</code></pre>
<h1 id="heading-bringing-it-all-together">Bringing it all together</h1>
<p>Now that we know what we need to provide getting, deleting, and creating shortcuts, let's circle back to our Python function. For my example, I am using a lakehouse that contains some IMDB data.</p>
<p>You will recall that our function is doing a bit of work for us, so we don't need to provide every single piece of information. The function has 3 required and one optional parameter.</p>
<ul>
<li><p><strong>action</strong> is required and accepts Get, Create, and Delete to indicate the operation you want to complete.</p>
</li>
<li><p><strong>shortcut_path</strong> is required and should be the path up to, but not including the name of the shortcut. Where we have a shortcut at Files/MyFolder/MyShortcut this parameter would be <strong>Files/MyFolder</strong>.</p>
</li>
<li><p><strong>shortcut_name</strong> is required and should be the name of the shortcut itself. Where we have a shortcut at Files/MyFolder/MyShortcut this parameter would be <strong>MyShortcut</strong>.</p>
</li>
<li><p><strong>target</strong> is optional and only needs to be passed when creating a shortcut. This is the payload that describes where the shortcut points and how to authenticate as shown in the prior two sections for ADLS Gen2 and OneLake.</p>
</li>
</ul>
<p>Let's show the function in action!</p>
<h2 id="heading-get-shortcut-1">Get shortcut</h2>
<p>Getting shortcut information is super simple. Just pass the Get action, path, and name.</p>
<pre><code class="lang-python">fn_shortcut(<span class="hljs-string">"Get"</span>, <span class="hljs-string">"Files"</span>, <span class="hljs-string">"IMDB"</span>)
<span class="hljs-comment"># Or make the output pretty with this command</span>
<span class="hljs-comment"># print(json.dumps(fn_shortcut("Get", "Files", "IMDB"), indent = 4))</span>
</code></pre>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1718743282875/0f2f4dde-a541-4bf1-8237-7ff2c502107b.png" alt class="image--center mx-auto" /></p>
<h2 id="heading-delete-shortcut-1">Delete shortcut</h2>
<p>Deleting a shortcut is just as easy. This command does take a little bit longer to run because I have some code that checks to make sure the background cleanup operations complete before returning. You can remove that from the function is you don't want it.</p>
<pre><code class="lang-python">fn_shortcut(<span class="hljs-string">"Delete"</span>, <span class="hljs-string">"Files"</span>, <span class="hljs-string">"IMDB"</span>)
<span class="hljs-comment"># Or make the output pretty with this command</span>
<span class="hljs-comment"># print(json.dumps(fn_shortcut("Delete", "Files", "IMDB"), indent = 4))</span>
</code></pre>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1718743255020/2b5d1f06-d6b1-4f8c-9ade-91315f3ea3fc.png" alt class="image--center mx-auto" /></p>
<h2 id="heading-create-shortcut-1">Create shortcut</h2>
<p>For creating shortcuts don't forget about the additional parameter that defines the shortcut's target. I will include two examples, one for OneLake and one for ADLS Gen2.</p>
<pre><code class="lang-python">target = {
    <span class="hljs-string">"adlsGen2"</span>: {
        <span class="hljs-string">"location"</span>: <span class="hljs-string">f'https://scbradlstorage01.dfs.core.windows.net'</span>,
        <span class="hljs-string">"subpath"</span>: <span class="hljs-string">"/sampledata/IMDB"</span>,
        <span class="hljs-string">"connectionId"</span>: <span class="hljs-string">'dd8ae71c-dcd7-4550-88cd-caba775f786b'</span>
    }
}

fn_shortcut(action = <span class="hljs-string">'Create'</span>, shortcut_path =<span class="hljs-string">'Files'</span>, shortcut_name = <span class="hljs-string">'IMDB'</span>, target = target)
<span class="hljs-comment"># Or make the output pretty with this command</span>
<span class="hljs-comment"># print(json.dumps(fn_shortcut(action = 'Create', shortcut_path ='Files', shortcut_name = 'IMDB', target = target), indent = 4))</span>
</code></pre>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1718743244289/d81ec1db-e7a6-48e9-961b-a06c3e0ac2be.png" alt class="image--center mx-auto" /></p>
<pre><code class="lang-python">target = {
    <span class="hljs-string">"OneLake"</span>: {
        <span class="hljs-string">"workspaceId"</span>: <span class="hljs-string">"fdddc13a-f994-4b62-85ed-bbf7633ede24"</span>,
        <span class="hljs-string">"itemId"</span>: <span class="hljs-string">"c57d0e82-b84a-4844-884c-1d6a629adc2a"</span>,
        <span class="hljs-string">"path"</span>: <span class="hljs-string">"Files/IMDB/CastCrew"</span>
    }
}

fn_shortcut(action = <span class="hljs-string">'Create'</span>, shortcut_path =<span class="hljs-string">'Tables'</span>, shortcut_name = <span class="hljs-string">'CastCrew'</span>, target = target)
<span class="hljs-comment"># Or make the output pretty with this command</span>
<span class="hljs-comment"># print(json.dumps(fn_shortcut(action = 'Create', shortcut_path ='Tables', shortcut_name = 'CastCrew', target = target), indent = 4))</span>
</code></pre>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1718743230673/ebcfe685-547a-41ad-ac02-74680da45a36.png" alt class="image--center mx-auto" /></p>
<h2 id="heading-error-handling">Error handling</h2>
<p>There is a little bit of error handling in the function as well. If a shortcut exists and you try to create it or if you attempt to delete or get a shortcut that doesn't exist, the errors will be captured and returned with all the relevant information. For example, if I rerun the previous command it will try to create a shortcut that already exists. Parsing out the error message, we can see that Copy, Rename or Update of shortcuts are not supported by OneLake which indicates the shortcut is there so I can't modify it.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1718743216890/81d25732-f43e-4729-aee9-34aa78805d90.png" alt class="image--center mx-auto" /></p>
<h1 id="heading-wrapping-it-up">Wrapping it up</h1>
<p>There you have it. A single Python function that will allow you to reuse the Fabric Shortcut APIs.</p>
<p>What do you think?</p>
]]></content:encoded></item><item><title><![CDATA[Gathering useful notebook and environment details at runtime]]></title><description><![CDATA[I was recently working on some tests where I needed to gather some Spark configuration information at runtime to validate that the correct environment was being used (number of executors in my case) and to log some execution details so I could look u...]]></description><link>https://bradleyschacht.com/gathering-useful-notebook-and-environment-details-at-runtime</link><guid isPermaLink="true">https://bradleyschacht.com/gathering-useful-notebook-and-environment-details-at-runtime</guid><dc:creator><![CDATA[Bradley Schacht]]></dc:creator><pubDate>Tue, 11 Jun 2024 04:00:37 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1718137871785/81abef99-66d7-4e79-82f3-fa2c014a3dfe.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>I was recently working on some tests where I needed to gather some Spark configuration information at runtime to validate that the correct environment was being used (number of executors in my case) and to log some execution details so I could look up usage in the Fabric Capacity Metrics app (the spark app name is key).</p>
<p>After extensive digging I was able to isolate a few different items that I felt were useful for a variety of use cases. Some of these use the <a target="_blank" href="https://learn.microsoft.com/en-us/python/api/semantic-link/overview-semantic-link?view=semantic-link-python">Semantic Link</a> package but the most important ones for what I needed just look at the Spark configuration and require no additional packages.</p>
<p>Here is what I collected from the Spark configuration:</p>
<ul>
<li><p>Executor Cores</p>
</li>
<li><p>Executor Memory</p>
</li>
<li><p>Minimum Number of Executors</p>
</li>
<li><p>Maximum Number of Executors</p>
</li>
<li><p>Number of Nodes in Spark Pool</p>
</li>
<li><p>Spark App Name</p>
</li>
</ul>
<p>Here is what I collected from Semantic Link:</p>
<ul>
<li><p>Default Lakehouse ID</p>
</li>
<li><p>Default Lakehouse Name</p>
</li>
<li><p>Notebook ID</p>
</li>
<li><p>Notebook Name</p>
</li>
<li><p>Workspace ID</p>
</li>
<li><p>Workspace Name</p>
</li>
</ul>
<pre><code class="lang-python">%pip install semantic-link

default_lakehouse_id    = <span class="hljs-string">'No default lakehouse'</span> <span class="hljs-keyword">if</span> fabric.get_lakehouse_id() == <span class="hljs-string">''</span> <span class="hljs-keyword">else</span> fabric.get_lakehouse_id()
default_lakehouse_name  = <span class="hljs-string">'No default lakehouse'</span> <span class="hljs-keyword">if</span> fabric.get_lakehouse_id() == <span class="hljs-string">''</span> <span class="hljs-keyword">else</span> fabric.resolve_item_name(default_lakehouse_id)
notebook_item_id        = fabric.get_artifact_id()
notebook_item_name      = fabric.resolve_item_name(notebook_item_id)
pool_executor_cores     = spark.sparkContext.getConf().get(<span class="hljs-string">"spark.executor.cores"</span>)
pool_executor_memory    = spark.sparkContext.getConf().get(<span class="hljs-string">"spark.executor.memory"</span>)
pool_min_executors      = spark.sparkContext.getConf().get(<span class="hljs-string">"spark.dynamicAllocation.minExecutors"</span>)
pool_max_executors      = spark.sparkContext.getConf().get(<span class="hljs-string">"spark.dynamicAllocation.maxExecutors"</span>)
pool_number_of_nodes    = len(str(sc._jsc.sc().getExecutorMemoryStatus().keys()).replace(<span class="hljs-string">"Set("</span>,<span class="hljs-string">""</span>).replace(<span class="hljs-string">")"</span>,<span class="hljs-string">""</span>).split(<span class="hljs-string">", "</span>))
spark_app_name          = spark.sparkContext.getConf().get(<span class="hljs-string">"spark.app.name"</span>)[::<span class="hljs-number">-1</span>].split(<span class="hljs-string">"_"</span>,<span class="hljs-number">1</span>)[<span class="hljs-number">0</span>][::<span class="hljs-number">-1</span>]
workspace_id            = fabric.get_notebook_workspace_id()
workspace_name          = fabric.resolve_workspace_name(workspace_id)

print(<span class="hljs-string">f'default_lakehouse_id:   <span class="hljs-subst">{default_lakehouse_id}</span>'</span>)
print(<span class="hljs-string">f'default_lakehouse_name: <span class="hljs-subst">{default_lakehouse_name}</span>'</span>)
print(<span class="hljs-string">f'notebook_item_id:       <span class="hljs-subst">{notebook_item_id}</span>'</span>)
print(<span class="hljs-string">f'notebook_item_name:     <span class="hljs-subst">{notebook_item_name}</span>'</span>)
print(<span class="hljs-string">f'spark_app_name:         <span class="hljs-subst">{spark_app_name}</span>'</span>)
print(<span class="hljs-string">f'pool_executor_cores:    <span class="hljs-subst">{pool_executor_cores}</span>'</span>)
print(<span class="hljs-string">f'pool_executor_memory:   <span class="hljs-subst">{pool_executor_memory}</span>'</span>)
print(<span class="hljs-string">f'pool_min_executors:     <span class="hljs-subst">{pool_min_executors}</span>'</span>)
print(<span class="hljs-string">f'pool_max_executors:     <span class="hljs-subst">{pool_max_executors}</span>'</span>)
print(<span class="hljs-string">f'pool_number_of_nodes:   <span class="hljs-subst">{pool_number_of_nodes}</span>'</span>)
print(<span class="hljs-string">f'workspace_id:           <span class="hljs-subst">{workspace_id}</span>'</span>)
print(<span class="hljs-string">f'workspace_name:         <span class="hljs-subst">{workspace_name}</span>'</span>)
</code></pre>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1718047326007/b9cc37a4-815b-432b-a53a-2ce38bd35684.png" alt class="image--center mx-auto" /></p>
<p>Now I can use this in a variety of ways. For my use case, I logged all this information to a Delta table so I could track stats across each execution and tie each record back to a cost using Capacity Metrics (more on that in another post maybe).</p>
]]></content:encoded></item><item><title><![CDATA[Using sempy to get SQL query CU cost from the Fabric Capacity Metrics app]]></title><description><![CDATA[Download the notebook used in this post from my GitHub account here:Get SQL query CUs from Capacity Metrics.ipynb
A frequent inquiry from customers goes something like this: "I ran a query on the [data warehouse or SQL analytics endpoint] and now I w...]]></description><link>https://bradleyschacht.com/using-sempy-to-get-sql-query-cu-cost-from-the-fabric-capacity-metrics-app</link><guid isPermaLink="true">https://bradleyschacht.com/using-sempy-to-get-sql-query-cu-cost-from-the-fabric-capacity-metrics-app</guid><dc:creator><![CDATA[Bradley Schacht]]></dc:creator><pubDate>Wed, 15 May 2024 00:00:00 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1718137892897/d04d1f6f-cba9-4c5f-80f1-a20795c21f63.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><strong>Download the notebook used in this post from my GitHub account here:</strong><br /><a target="_blank" href="https://github.com/bradleyschacht/resources/blob/main/Fabric%20Capacity%20Metrics/Get%20SQL%20query%20CUs%20from%20Capacity%20Metrics.ipynb">Get SQL query CUs from Capacity Metrics.ipynb</a></p>
<p>A frequent inquiry from customers goes something like this: "I ran a query on the [data warehouse or SQL analytics endpoint] and now I want to know how much that query cost me."</p>
<p>On the surface that seems easy enough. Afterall, the Capacity Metrics app tracks the usage for every operation that occurs on your Fabric capacity right down to the individual T-SQL query.</p>
<p>Before getting into how to get the capacity usage, let's outline the assumptions this post is going to make:</p>
<ol>
<li><p>You understand the association between workspaces and capacities.</p>
</li>
<li><p>You understand the concept of smoothing as it relates to capacity usage.</p>
</li>
</ol>
<h2 id="heading-how-would-i-do-this-manually">How would I do this manually?</h2>
<p>However, when you try to use the app, you soon realize it's far more cumbersome than you'd like. This is the process you need to go through:</p>
<ol>
<li><p>Run a query using the SQL endpoint (web browser, SSMS, ADS, etc.).</p>
</li>
<li><p>Gather the distributed_statement_id and query execution date and time from Query Insights (or capture it through another method of your choosing).</p>
</li>
</ol>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1716523912390/b010c431-643e-4d27-82a8-e23d0e778acf.png" alt /></p>
<ol start="3">
<li>In the capacity metrics app, navigate to a time point that is at least 15 minutes after the query finished running.</li>
</ol>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1716523913534/e56d54d0-9944-4de2-ae5b-e9d346eed0eb.png" alt /></p>
<ol start="4">
<li>Filter the background operations table to just the operations you want to see (Operation Id = distributed_statement_id).</li>
</ol>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1716523914797/01928ccb-bc33-46c0-b345-eff682d6e6b3.png" alt /></p>
<p>It's step number 3 and 4 that cause the most pain and what we will address here today programmatically. Step 3 is just confusing for people because you need to wait for the query to show up in the report, but you need to go to "someplace after the query finishes running", usually 10-15 minutes, and find the query there. Step 4 is the bigger challenge because you need to get the list of IDs in step 2 then search for one of them in the filter in step 4, click the check the box, go back to step 2 to copy the next ID, search for it, and repeat that over for each query's distributed statement id.</p>
<p>The process is cumbersome for one query, it will drain your evening if you need to do this for a lot of queries, and it's just impossible if you need to do this for any number of queries at scale.</p>
<h2 id="heading-how-would-i-do-this-programmatically">How would I do this programmatically?</h2>
<p>The overall process is similar: Run a query (or queries), get the distributed_statement_id list, look them up in the capacity metrics app.</p>
<p>To accomplish this programmatically we will need to make a couple of changes to the manual process above. Let's do an overview then we'll dive into details.</p>
<ol>
<li><p>Run a query using the SQL endpoint (web browser, SSMS, ADS, etc.).</p>
</li>
<li><p>Gather the distributed_statement_id and query execution date and time from Query Insights. Turn it into a string in the format of "distributed_statement_id_01", "distributed_statement_id_02", "distributed_statement_id_NN"</p>
</li>
<li><p>Enter parameters into data engineering notebook.</p>
</li>
<li><p>Run the notebook.</p>
</li>
</ol>
<p>Before you try to use this method, you will need to download the notebook <a target="_blank" href="https://github.com/bradleyschacht/resources/blob/main/Fabric/Capacity%20Metrics/Get%20SQL%20query%20CUs%20from%20Capacity%20Metrics.ipynb">Get SQL query CUs from Capacity Metrics</a> which is hosted on my GitHub account. Then, upload the notebook to a Fabric workspace. The workspace will need capacity to run the notebook. The notebook can be in any Fabric workspace; it does not need to be collocated with the Capacity Metrics app or with the workspace where your lakehouses/warehouses live. In fact, this notebook doesn't even require a lakehouse to be in the workspace to run.</p>
<p>Let's look at a walkthrough of this.</p>
<h3 id="heading-run-queries">Run queries</h3>
<p>I think this is self explanatory. Run your queries however you'd like. Keep in mind that it takes several minutes for the queries to show up in Query Insights and a few more minutes to show up in the Capacity Metrics data. Just assume that you need to wait at least 15 minutes after running a query for this whole process to work (the same is true of the manual process above as we are using the same dataset).</p>
<h3 id="heading-gather-the-distributedstatementids">Gather the distributed_statement_id(s)</h3>
<p>Here is where things deviate from the manual process. While we are still going to be using Query Insights, we want to combine all the distributed statement ids into a single string, grouped by the date the query was run. This is necessary for the magic of a solution that allows you to get the CUs in bulk. Consider the following query to be a guide on how to do this but by no means the only way. The key is the format: double quotes around each distributed_statement_id forming a comma separated list.</p>
<pre><code class="lang-sql"><span class="hljs-keyword">SELECT</span>
    <span class="hljs-keyword">CONVERT</span>(<span class="hljs-built_in">DATE</span>, start_time) ExecutionDate,
    <span class="hljs-keyword">COUNT</span>(*) <span class="hljs-keyword">AS</span> CountOfQueries,
    <span class="hljs-string">'"'</span> + STRING_AGG(<span class="hljs-keyword">CONVERT</span>(<span class="hljs-built_in">VARCHAR</span>(<span class="hljs-keyword">MAX</span>), distributed_statement_id), <span class="hljs-string">'", "'</span>) + <span class="hljs-string">'"'</span> <span class="hljs-keyword">AS</span> OperationIDList
<span class="hljs-keyword">FROM</span>
    (
        <span class="hljs-keyword">SELECT</span>
            *
        <span class="hljs-keyword">FROM</span> queryinsights.exec_requests_history
        <span class="hljs-keyword">WHERE</span>
            command <span class="hljs-keyword">LIKE</span> <span class="hljs-string">'%%'</span>
            <span class="hljs-keyword">AND</span> start_time &gt;= <span class="hljs-string">'2024-05-08 20:53:38.000000'</span>
            <span class="hljs-keyword">AND</span> start_time &lt;= <span class="hljs-string">'2024-05-13 20:25:53.000000'</span>
    ) <span class="hljs-keyword">AS</span> queryinsights_requests
<span class="hljs-keyword">GROUP</span> <span class="hljs-keyword">BY</span>
    <span class="hljs-keyword">CONVERT</span>(<span class="hljs-built_in">DATE</span>, start_time)
</code></pre>
<p>The output from this query will provide the date the queries were executed, how many matched the subquery criteria, and the complete string of query ids. You will want to copy the ExecutionDate and the corresponding OperationIDList for the next step.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1716523915793/c1414470-8298-43a2-a927-806a9f8ad3d9.png" alt /></p>
<p>For the screenshots later in this post, I chose the 14 queries that were run on May 13th.</p>
<h3 id="heading-enter-parameters-in-the-notebook">Enter parameters in the notebook</h3>
<p>The big departure from the manual process is that we will use a notebook to query the Capacity Metrics dataset to get the data we need. You will need to import the notebook to any Fabric workspace (it requires capacity to use Spark) then fill in the 5 parameters in the first code cell.</p>
<ul>
<li><p>capacity_metrics_workspace</p>
<ul>
<li>The name of the workspace that hosts the Capacity Metrics app.</li>
</ul>
</li>
<li><p>capacity_metrics_dataset</p>
<ul>
<li>The name of the dataset used by the Capacity Metrics app.</li>
</ul>
</li>
<li><p>capacity_id</p>
<ul>
<li>The capacity which hosted the workspace where user query were run.</li>
</ul>
</li>
<li><p>date_of_operations</p>
<ul>
<li><p>This is the <strong>ExecutionDate</strong> column from the prior step.</p>
</li>
<li><p>The date the operations were run.</p>
</li>
</ul>
</li>
<li><p>operation_id_list</p>
<ul>
<li><p>This is the <strong>OperationIDList</strong> column from the prior step.</p>
</li>
<li><p>The list of operations of which you want to collect the capacity unit usage.</p>
</li>
<li><p>This should be a comma separate list and each operation should be enclosed in double quotes</p>
</li>
<li><p>For example: "AAAAAAAA-BBBB-CCCC-DDDD-1234567890AB", "EEEEEEEE-FFFF-GGGG-HHHH-1234567890AB"</p>
</li>
</ul>
</li>
</ul>
<h3 id="heading-run-the-notebook">Run the notebook</h3>
<p>That's it. Run it!</p>
<p>After the notebook runs, scroll down to the very last cell where the dataframe will display each distributed_statement_id (this is the OperationID field in Capacity Metrics) the total CUs consumed by that query and some additional information.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1716523916924/47c030d2-bb59-4048-81a2-def5f03aa17a.png" alt /></p>
<p>Here you can see the total CUs for each of the 14 queries from step 2.</p>
<p>Did you also notice that we didn't have to deal with a timepoint? We only had to know what date the SQL query ran. You didn't even have to know the time that the query ran. The amount of time that I spend waiting for time period to show up in the capacity metrics report just so I can search for my data is wild. And it's all gone!</p>
<p>The magic that makes this happen is <a target="_blank" href="https://learn.microsoft.com/en-us/fabric/data-science/semantic-link-overview">Semantic Link</a> which provides a bridge between the data engineering and Power BI experiences in Fabric. It contains a variety of functionality including refreshing semantic models, reading tables from a semantic model, listing workspaces, getting workspace information, and executing DAX statements to return a dataset from a semantic model which is what this notebook uses.</p>
<h2 id="heading-wrapping-it-up">Wrapping it up</h2>
<p>There you have it! Now instead of looking up each query individually, you can just run a simple notebook to look up the CU usage in bulk! Let's wrap up with some key things to be aware of.</p>
<ol>
<li><p>Here is a link to get a copy of the notebook (<a target="_blank" href="https://github.com/bradleyschacht/resources/blob/main/Fabric%20Capacity%20Metrics/Get%20SQL%20query%20CUs%20from%20Capacity%20Metrics.ipynb">Get SQL CUs from Capacity Metrics</a>) from my GitHub account.</p>
</li>
<li><p>You still need Capacity Metrics. Make note of the workspace name and semantic model name. Those are needed.</p>
</li>
<li><p>This notebook assumes that your Capacity Metrics semantic model is refreshed. Perhaps I will extend this in the future to add model refresh.</p>
</li>
<li><p>You will need the ID for the capacity that runs your workspace, not just the capacity name. This is the one part that's a little more difficult than the manual approach.</p>
</li>
<li><p>You do need to run this notebook for each date individually. I wanted to make this as easy as possible to use, so I started simple. It's still far faster than looking up individual queries manually. The nice thing though, is you can take this notebook and wrap it in another process that can be much more flexible.</p>
<p> For me, I have another version of this project where I persist the data from queryinsights.exec_requests_history into a user table, run this notebook to get the CU information, and then update the user table with the CU usage. The next time the process runs it only looks up the queries from query insights that do NOT have any CU usage already.</p>
<p> You could also do this across many workspaces to centralize the data. You could parameterize this with a pipeline instead of calling it from another notebook. The possibilities are endless! Would love to hear what you have built with it though!</p>
</li>
<li><p>I believe there is a limit of 10,000 or 100,000 records that can be returned at any one time. Yes, those numbers are very different. I don't know this for a fact though, but I'm fairly certain that limit is in place. Which means if you have a highly used system, you may need to break the query list into chunks and feed them through. Another great reason to parameterize this and build a framework around it.</p>
</li>
</ol>
<p>What do you think? Time saver? Did I waste my time building this?</p>
<p>Let me know in the comments what you think!</p>
]]></content:encoded></item><item><title><![CDATA[List Parameters for a Python Function]]></title><description><![CDATA[As someone who is newer to the world of Python I have a lot of questions. I mean a LOT. I know my way around SQL really well, but when I come to write some PySpark code I often need to dissect things to better understand what is going on and how to u...]]></description><link>https://bradleyschacht.com/list-parameters-for-a-python-function</link><guid isPermaLink="true">https://bradleyschacht.com/list-parameters-for-a-python-function</guid><dc:creator><![CDATA[Bradley Schacht]]></dc:creator><pubDate>Tue, 07 May 2024 00:00:00 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1716516409920/q_IcUXJVN.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>As someone who is newer to the world of Python I have a lot of questions. I mean a LOT. I know my way around SQL really well, but when I come to write some PySpark code I often need to dissect things to better understand what is going on and how to use different functions.</p>
<p>Which leads me to today's question...how do I get a list of all the parameters for a Python function? In today's case: mssparkutils.credentials.getSecret.</p>
<p>You could go to the documentation, but that's often not complete (I'm looking at you, my fellow Microsoft employees on the documentation team!). Usually, I just want to know what my possibilities are so I can at least throw some options in there to see what happens. Maybe I'll get lucky. That's where inspect.signature comes into play.</p>
<h2 id="heading-help">help</h2>
<p>The first thing I could try is using Python's built in "help" method. By simply passing a module, method, function, keyword, or other object the help information from PyDoc.</p>
<p>Python</p>
<pre><code>help(mssparkutils.credentials.getSecret)
</code></pre><p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1716523906051/8075b37c-10d9-4d7f-9960-466403a3921d.png" alt /></p>
<p>Easy enough. What if I want a little more detail?</p>
<h2 id="heading-inspect">inspect</h2>
<p>The inspect module can help get some useful information for us in this case as well. Using my new friend help, I can see this will give me a look at useful information from Python objects. Here we will look at two functions: inspect.signature() and inspect.getfullargspec().</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1716523907124/6c4247b8-9bb9-4408-b34a-4b3bc58010e1.png" alt /></p>
<p>To begin, we will import the inspect module.</p>
<h3 id="heading-inspectsignature">inspect.signature()</h3>
<p>As you can see, running inspect.signature(msparkutils.credentials.getSecret) returns what I need in a simple output which is very similar to what we say with help(mssparkutils.credentials.getSecret) earlier. It gives me the list of inputs which are akvName, secret, and linkedService which is optional as it has a default value assigned to it.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1716523907960/d5213d94-9aa2-4d4f-a478-53d4d6ea5ad4.png" alt /></p>
<p>But I am a simple person. I need things spelled out a bit more explicitly sometimes.</p>
<h3 id="heading-inspectgetfullargspec">inspect.getfullargspec()</h3>
<p>When I run inspect.getfullargspec(mssparkutils.credentials.getSecret) I am given a much more detailed breakdown of the output. While help and signature() will usually get the job done, it is nice knowing that I can get a full breakdown of the inputs and outputs.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1716523908839/c6df61b4-be4c-407b-8ab1-29bb7a7c0018.png" alt /></p>
<p>I am sure there are reasons to use signature over getfullargspec and reasons to use getfullargspec over signature. I saw something about a decorated function being handled properly by signature but not getfullargspec. Since the only possible time I can think of to use a decorator function is when I'm putting up Christmas decor and it is currently May, I think that is a deep dive topic for future Brad.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1716523909872/ba930c6b-4504-436e-8280-c6719826bc93.png" alt /></p>
]]></content:encoded></item><item><title><![CDATA[Building The Big Demo]]></title><description><![CDATA[In the coming weeks I'm going to start building out a brand new demo environment and I'm going to blog the whole thing...probably...and I plan to blog the whole thing!
As you may (or may not) know, I work on the Fabric product team at Microsoft. I am...]]></description><link>https://bradleyschacht.com/building-the-big-demo</link><guid isPermaLink="true">https://bradleyschacht.com/building-the-big-demo</guid><dc:creator><![CDATA[Bradley Schacht]]></dc:creator><pubDate>Mon, 08 Apr 2024 00:00:00 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1716516409920/q_IcUXJVN.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>In the coming weeks I'm going to start building out a brand new demo environment and I'm going to blog the whole thing...probably...and I <em>plan</em> to blog the whole thing!</p>
<p>As you may (or may not) know, I work on the Fabric product team at Microsoft. I am a principal program manager on the Fabric CAT (Customer Advisory Team) team. I always feel strange saying CAT team because then it's team team. It's like trying to explain Microsoft Teams to people. "Just post it on our Teams...uh...team?" or "I need to go to the ATM machine to get cash." You see cash is what people used to use to pay for things before...ok, we are wayyyy off topic. Where was I?</p>
<p>Ah yes, I work on the Fabric CAT team, and I often go talk to people about Fabric as a whole. However, most of the demos I do are just touching on one piece of the puzzle. What I'm setting out to do with this series is show the process of building an end-to-end solution in Microsoft Fabric from data source to visualization showing various options along the way. My goal is to be able to walk into any room and be able to say "let me pull up a workspace a show you what this looks like" for most of the major Fabric functionality.</p>
<p>Let's cover some of the basics...</p>
<h2 id="heading-what-this-will-not-be">What this will not be</h2>
<p>Let's get this out of the way first. What will this demo not be, or what purpose will it not be designed to serve.</p>
<ul>
<li><p>This will not be comprehensive of every single piece of functionality in Microsoft Fabric. We are sticking to major themes, the Heading 1 topics if you will.</p>
</li>
<li><p>This will not include every experience, just the ones I deal with on a regular basis. Think data science or private endpoints. Maybe it will expand in the future, but we'll see where it goes. The idea is big picture, make the concepts simple.</p>
</li>
<li><p>This will not be complicated where it can be simple. Again, let's not boil the ocean. We aren't hitting every security feature. I'll leave that for individual posts or other people to cover.</p>
</li>
<li><p>This will not be smoke and mirrors. I will only use what everyone has access to, and I won't hide any code from you to make things look easier than they are or fake any data along the way.</p>
</li>
<li><p>This will not show every architecture option. I honestly don't know if I am going to show loading data using Dataflows Gen2 since I can do everything I need in a pipeline copy activity. Again, we shall see what happens. Maybe I'll have secondary posts that cover alternative methods or design patterns.</p>
</li>
</ul>
<h2 id="heading-what-this-will-be">What this will be</h2>
<p>Now that we have some ground rules for what this project will not be, let's go over what it will be.</p>
<ul>
<li><p>This will walk you through every step of building an end-to-end analytics solution on Microsoft Fabric.</p>
</li>
<li><p>This will be something you can build in your environment as well. I'll make everything available in one form or another, be it screenshots, step-by-step clicks instructions, code on GitHub, or videos on YouTube.</p>
</li>
<li><p>This will be using the Microsoft Wide World Importers dataset. It's easy to understand (retail sales), has all the translation code to go from the OLTP to DW, and is easily available to everyone.</p>
</li>
<li><p>This will be as representative of the real world as possible while still keeping the concepts simple. For example, we will use an Azure SQL DB for the source data, but we aren't going to be simulating transactions on the source system and build slowly changing dimension logic. It's easy enough to extend the solution to include those items if you really want to.</p>
</li>
<li><p>This will be built out over the course of N weeks. That is not a typo, that's not a placeholder I forgot to update, that is the truth. I don't know how long this is going to take. I've thought about doing one post a week, breaking up into approximately 30 minute building sessions, covering one step at a time no matter how long it takes. This is going to change as we go along, I just don't know how long it's going to take. I know I can build everything from scratch in a few hours, but teaching/blogging time is a different animal.</p>
</li>
<li><p>This will be somewhat sanitized but, as I mentioned above, I won't hide things from you. What I mean by that is that to covert the Wide World Importers into Wide World Importers DW there are a series of views and stored procedures that need to be built/converted. I'm not going to cover every step of that code building/conversion process because it's frankly a bit boring and doesn't serve the purpose of this series, but I will give you all the real code. That means some of the steps you have to do in a real-world project will be happening behind the scenes or will be already done for you. Again, our focus is Fabric not project management or BI development.</p>
</li>
</ul>
<h2 id="heading-what-the-architecture-will-look-like">What the architecture will look like</h2>
<p>This is still to be defined, but here's what I'm thinking:</p>
<p><strong>Azure SQL Database</strong> --- Data Factory Pipeline ---&gt; <strong>Lakehouse Files</strong> --- Notebook ---&gt; <strong>Lakehouse Tables</strong> --- T-SQL ---&gt; <strong>Data Warehouse</strong> --- DirectLake ---&gt; <strong>Power BI report</strong></p>
<p>Be on the lookout for the first post in the series where we will setup the source database soon!</p>
]]></content:encoded></item><item><title><![CDATA[Upcoming Presentation – Jacksonville SQL Server User Group]]></title><description><![CDATA[I’m excited to announce that I will be speaking at the February meeting of the Jacksonville SQL Server User Group (JSSUG). It’s always fun to spend some time with local SQL community. I’ll be presenting a session recapping all the exciting features a...]]></description><link>https://bradleyschacht.com/upcoming-presentation-jacksonville-sql-server-user-group-1</link><guid isPermaLink="true">https://bradleyschacht.com/upcoming-presentation-jacksonville-sql-server-user-group-1</guid><dc:creator><![CDATA[Bradley Schacht]]></dc:creator><pubDate>Wed, 17 Jan 2024 00:00:00 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1716516409920/q_IcUXJVN.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>I’m excited to announce that I will be speaking at the February meeting of the <a target="_blank" href="https://www.meetup.com/jaxssug/">Jacksonville SQL Server User</a> Group (JSSUG). It’s always fun to spend some time with local SQL community. I’ll be presenting a session recapping all the exciting features added to <a target="_blank" href="https://www.microsoft.com/en-us/microsoft-fabric">Microsoft Fabric</a> in 2023 and previewing what's ahead for 2024. If you’re local, come on out and join us for an evening of learning, networking, and of course food. The event is free, but please register at the link below to help the organizers with logistics and planning.</p>
<p>I hope to see you there!</p>
<p><strong>Date:</strong> Wednesday July 19, 2023<br /><strong>Time:</strong> 6:00 - 8:00 PM Eastern<br /><strong>Where:</strong> <a target="_blank" href="https://goo.gl/maps/E6jnYZ1BBsNzjcT87">Keiser University (6430 Southpoint Parkway Suite #100, Jacksonville, FL 32216</a><br /><strong>Cost:</strong> Free!<br /><strong>Registration:</strong> <a target="_blank" href="https://www.meetup.com/jaxssug/events/">Meetup</a></p>
<h3 id="heading-microsoft-fabric-2023-year-in-review-and-2024-roadmap">Microsoft Fabric 2023 Year in Review and 2024 Roadmap</h3>
<p>Less than one year ago Microsoft Fabric was announced at the Build 2023 conference. Since then the team has been hard at work rolling out new features and functionality each month. From small improvements like the ability to convert entire folders to Delta with a right-click rather than going file by file or background improvements like faster CSV loading to marquee functionality like the launch of multiple Copilots and the announcement of general availability or the new database mirroring, 2023 was a huge year for Fabric. We will run down the list of some of the biggest announcements and show demos along the way so you can see the exciting functionality that is being delivered each and every month. Then, we will take a look at the year ahead with a sneak peak into the 2024 roadmap. If you're using Fabric or considering using Fabric for a project, you'll want to join this session to see what all the buzz is about!</p>
]]></content:encoded></item><item><title><![CDATA[The Year Ahead: 2024]]></title><description><![CDATA[My favorite time of year is now behind us: Christmas. The weather cools down (I live in Florida, so that's a little iffy), the neighborhood lights up with decorations, people are a little more friendly, church services are more full than normal, Nich...]]></description><link>https://bradleyschacht.com/the-year-ahead-2024</link><guid isPermaLink="true">https://bradleyschacht.com/the-year-ahead-2024</guid><dc:creator><![CDATA[Bradley Schacht]]></dc:creator><pubDate>Wed, 03 Jan 2024 00:00:00 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1716516409920/q_IcUXJVN.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>My favorite time of year is now behind us: Christmas. The weather cools down (I live in Florida, so that's a little iffy), the neighborhood lights up with decorations, people are a little more friendly, church services are more full than normal, Nichole and I get to watch cheesy holiday movies, and we get to work through all our favorite traditions with our boys.</p>
<p>As the Christmas season closes and we move deeper into winter there is now a new year to look forward to which means it's time for new year's resolutions.</p>
<blockquote>
<p><strong><em>Resolution</em></strong></p>
<p>A firm decision to do or not to do something.</p>
</blockquote>
<p>I'm not a big fan of resolutions. They are often lofty, unattainable (realistically anyway), and leave little room for error which means people abandon them as soon as things start to go off the rails even a little. Saying "I resolve to go to the gym 3 times a week" psychologically gives you an out as soon as you miss that third day. It creates the feeling of "I said I was going to go this, I messed it up, now my streak is done, and I don't need to keep trying." In fact, almost 25% of people give up on their resolutions in the first week, and more than 2/3 give up before the end of January.</p>
<blockquote>
<p><strong><em>Goal</em></strong></p>
<p>The object of a person's ambition or effort; an aim or desired result.</p>
</blockquote>
<p>Instead, I like to set goals. Goals give you something to aim for but allow for the inevitable error and adjustment that comes along with being human and needing to respond to life events. Someone with the goal of exercising 3 times a week that has a lot of travel coming up would have the freedom to exchange the gym with a jog around the park or a workout in their hotel room before an early morning flight. And if they don't hit the goal that week, then they can evaluate what went wrong, how they can do better and get back to it the next week. Or if they set a goal of 5 days at the gym, they may have a baby and realize that's no longer viable and adjust to 2 gym days with walks on the other days.</p>
<p>Are those two things really all that different. No, of course not. But the mindset that comes with resolutions vs. goals tends to be very different.</p>
<p>What then are my goals for the new year? Some of these are aimed at being healthier so I can live a long, fulfilling life for my family. Some of these are centered around being more intentional with my time. And likely the most important are about deepening my relationship with Jesus Christ. Some of these I already do and want to make sure I don't lose focus, some are things I don't do well and want to improve, and others are areas I haven't started yet.</p>
<ol>
<li><p>Read through the entire Bible this year (<a target="_blank" href="https://www.thebiblerecap.com/start">The Bible Recap</a>)</p>
</li>
<li><p>Pray every day with the family and on my own</p>
</li>
<li><p>Listen to 1 audiobook each month (I'm not a big reader, but I'll listen!)</p>
</li>
<li><p>Dedicate as much time between 5:00 PM and the boys bedtime to my family as possible (less screen time/work and more play time with the kids and talk with Nichole)</p>
</li>
<li><p>Spend more quality time with Nichole after the boys go to bed (puzzles, crafts, just chatting)</p>
</li>
<li><p>Get more sleep (fewer work until 2:00 - 3:00 AM nights)</p>
</li>
<li><p>Daily walk and/or bike ride with the family</p>
</li>
<li><p>Do something outside with the family every weekend (bike trail, kayak, beach, park trip, etc.)</p>
</li>
<li><p>Eat better (less Dr Pepper, more meals at home, fewer carbs, more veggies, all that good stuff)</p>
</li>
<li><p>Reduce and shift the kind of content I consume (more history, Bible commentary, educational material and fewer TV shows/movies)</p>
</li>
<li><p>Create more content than I did last year (more blogs, more videos)</p>
</li>
</ol>
<p>What do you think?<br />Am I nitpicking about settings resolutions vs. goals?<br />What are you resolving to do this year?<br />What goals are you setting for yourself?</p>
]]></content:encoded></item><item><title><![CDATA[Upcoming Presentation - Boston Business Intelligence User Group]]></title><description><![CDATA[I'm excited to announce that I will be speaking at the July meeting of the virtual Boston Business Intelligence User Group. I'll be presenting an introductory session on Microsoft Fabric. The session is hosted online so everyone is welcome to join fo...]]></description><link>https://bradleyschacht.com/upcoming-presentation-boston-business-intelligence-user-group</link><guid isPermaLink="true">https://bradleyschacht.com/upcoming-presentation-boston-business-intelligence-user-group</guid><dc:creator><![CDATA[Bradley Schacht]]></dc:creator><pubDate>Thu, 08 Jun 2023 00:00:00 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1716516409920/q_IcUXJVN.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>I'm excited to announce that I will be speaking at the July meeting of the virtual <a target="_blank" href="https://www.meetup.com/boston_bi/">Boston Business Intelligence User Group</a>. I'll be presenting an introductory session on <a target="_blank" href="https://www.microsoft.com/en-us/microsoft-fabric">Microsoft Fabric</a>. The session is hosted online so everyone is welcome to join for an evening of learning and networking. The event is free, but please register at the link below to help the organizers with logistics and planning.</p>
<p>I hope to see you there!</p>
<p><strong>Date:</strong> Thursday July 20, 2023<br /><strong>Time:</strong> 7:00 - 9:00 PM Eastern<br /><strong>Where:</strong> Online (Register to receive the link)<br /><strong>Cost:</strong> Free!<br /><strong>Registration:</strong> <a target="_blank" href="https://www.meetup.com/boston_bi/events/293862841/">Meetup</a></p>
<h3 id="heading-introducing-microsoft-fabric-the-all-in-one-analytics-solution">Introducing Microsoft Fabric: The All-in-One Analytics Solution</h3>
<p>The world is awash with data and it is up to data professionals to make sense of it all. Just a few weeks ago Microsoft announced the next generation of analytics at Build: Microsoft Fabric!</p>
<p>Fabric is a complete end-to-end analytics platform bringing together Azure Data Factory, Azure Synapse, and Power BI in a single location and more deeply integrated than ever. It’s built on a SaaS platform which helps deliver innovation more quickly and allowing users to get up and running in seconds. At the core Fabric provides a lake-centric and open data hub using the popular Delta format with built-in security, governance, and compliance throughout. At the edge, Fabric delivers flexibility for data scientists, data warehouse developers, and Power BI users to build and analyze using their unique skill sets.</p>
<p>During this session we will discuss the guiding principles and architecture behind Fabric and show a lot of demos! Whether you’re a data engineer, data scientist, data warehouse developer, or an existing Power BI user, Fabric has something for everyone. You won’t want to miss this session!</p>
]]></content:encoded></item><item><title><![CDATA[What is Microsoft Fabric]]></title><description><![CDATA[This year at the annual Build developer conference, Microsoft announced the public preview of Microsoft Fabric, an all-in-one analytics platform that will drive the future of analytics. To say that this release is huge would be an understatement.
On ...]]></description><link>https://bradleyschacht.com/what-is-microsoft-fabric</link><guid isPermaLink="true">https://bradleyschacht.com/what-is-microsoft-fabric</guid><dc:creator><![CDATA[Bradley Schacht]]></dc:creator><pubDate>Thu, 08 Jun 2023 00:00:00 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1716516409920/q_IcUXJVN.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>This year at the annual Build developer conference, Microsoft announced the public preview of <a target="_blank" href="https://www.microsoft.com/en-us/microsoft-fabric">Microsoft Fabric</a>, an all-in-one analytics platform that will drive the future of analytics. To say that this release is huge would be an understatement.</p>
<p>On the surface Fabric looks like an iteration on the work that was done when <a target="_blank" href="https://azure.microsoft.com/en-us/products/synapse-analytics/?ef_id=_k_CjwKCAjw1YCkBhAOEiwA5aN4AdhInV-VS2IR5UGSbPGFpIcwr1zMHHwbvkp86woEQ4htL8BlZ72oXBoC_lMQAvD_BwE_k_&amp;OCID=AIDcmm5edswduu_SEM__k_CjwKCAjw1YCkBhAOEiwA5aN4AdhInV-VS2IR5UGSbPGFpIcwr1zMHHwbvkp86woEQ4htL8BlZ72oXBoC_lMQAvD_BwE_k_&amp;gad=1&amp;gclid=CjwKCAjw1YCkBhAOEiwA5aN4AdhInV-VS2IR5UGSbPGFpIcwr1zMHHwbvkp86woEQ4htL8BlZ72oXBoC_lMQAvD_BwE">Azure Synapse Analytics</a> was launched a few years ago. The reality is that this is not just a step forward but a leap. Many of the components of Fabric do look similar to Synapse, including carrying forward the brand in several of the workloads like Synapse Data Warehousing. However, the architecture, integration, and delivery are vastly different.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1716523897497/32b445f3-b012-4d25-9083-267d39d05034.png" alt /></p>
<h2 id="heading-onelake">OneLake</h2>
<p>We have to start with <a target="_blank" href="https://learn.microsoft.com/en-us/fabric/onelake/onelake-overview">OneLake</a> because that is the thread that holds this Fabric together. Everyone is going to write, Tweet, or say that so I'm just going to get it out of the way. Please don't hold it against me. I know, low hanging fruit and all, but it is what it is. Moving on... At the core, OneLake is Azure Data Lake Storage but it's so much more than that. Gone are the days of provisioning storage accounts, setting up private endpoints between storage and compute, copying data so different compute engines can access it, managing security in multiple locations, duplicating data when you want to create development environments, and so much more.</p>
<p>You don't create OneLake. You just have OneLake. It's described as OneDrive for your data, which now that I think about it, I already store data in OneDrive...I'll figure that all out another time. The point is, you interact with OneLake just as easily as you interact with OneDrive. It's driven by your Azure AD account by default. It's easy to share with others. You don't have to worry about how to tune to get the necessary IOPS. It's just there.</p>
<p>The best part is that all of the compute engines in Fabric store their data in OneLake, which means all the compute engines can access each other's data. All the architects out there are probably thinking "Does this mean I don't have to do all that data copying to give my SQL people access to the data in my lake? And does that mean I don't have to copy the data from my warehouse to a dataset in Power BI to be able to report on it?" That is absolutely correct. Along with OneLake comes "One Copy" which means every engine can see and interact with the all the data (with proper security of course).</p>
<p>The other best part...or maybe the equally best part, however you'd say that...is OneLake acts just like Azure Data Lake Storage. That means my data is not locked inside OneLake never to be seen again by any of the other Azure services. Therefore, all the Databricks users out there can read and write data to OneLake meaning you can continue to leverage all of those investments and integrations with other services that you've spent years building.</p>
<p>The other, other best part (this is getting out of hand) is if you have an existing Azure Data Lake Storage account you can create a shortcut and bring that data right into OneLake without even having to move it! So everything in your data estate can then be seen and analyzed through OneLake. More on all of this in a future post. As you can see, there is a huge amount of integration that OneLake unlocks and it's arguably the most important piece of the entire puzzle.</p>
<h2 id="heading-one-format">One format</h2>
<p>In addition to OneLake there is one other key change that enables all the functionality in Fabric: standardizing on <a target="_blank" href="https://learn.microsoft.com/en-us/fabric/data-engineering/lakehouse-and-delta-tables">Delta</a>. Every compute engine in Fabric now reads and writes Delta format. This is how we prevent the need to copy data from the lakehouse into the warehouse or into the lakehouse from Kusto. Why Delta? At this point, it's the industry standard. Sure, there is some <a target="_blank" href="https://learn.microsoft.com/en-us/fabric/data-engineering/delta-optimization-and-v-order?tabs=sparksql">secret sauce</a> going on under the covers that enables all kinds of nifty functionality but that's for another time...</p>
<h2 id="heading-workloads">Workloads</h2>
<p>Fabric supports a variety of workloads that are more deeply integrated than ever before. We will go into details of each of these workloads in separate posts, but for now let's talk about what is built into the platform.</p>
<ul>
<li><p><a target="_blank" href="https://learn.microsoft.com/en-us/fabric/data-factory/data-factory-overview">Data Factory</a> - Easily integrate and transform data from any source. Provides no-code and low-code transformation experiences and continues to be the orchestration hub of the data solution.</p>
</li>
<li><p><a target="_blank" href="https://learn.microsoft.com/en-us/fabric/data-engineering/data-engineering-overview">Synapse Data Engineering</a> - Provides data engineers a familiar, notebook-based experience for transforming data at scale using Spark.</p>
</li>
<li><p><a target="_blank" href="https://learn.microsoft.com/en-us/fabric/data-science/data-science-overview">Synapse Data Science</a> - The place to go for managing, training, and deploying machine learning models.</p>
</li>
<li><p><a target="_blank" href="https://learn.microsoft.com/en-us/fabric/data-warehouse/data-warehousing">Synapse Data Warehousing</a> - A relational data warehouse with truly independent compute and storage that provides industry-leading performance. This is NOT Synapse dedicated SQL pools and it is NOT Synapse serverless SQL pools.</p>
</li>
<li><p><a target="_blank" href="https://learn.microsoft.com/en-us/fabric/real-time-analytics/overview">Synapse Real Time Analytics</a> - Dive into large volumes of data from apps, websites, IoT devices and any other time series data you can get your hands on.</p>
</li>
<li><p><a target="_blank" href="https://learn.microsoft.com/en-us/power-bi/fundamentals/power-bi-overview">Power BI</a> - I feel like this doesn't even need an introduction. Create interactive reports with stunning visuals (I'm not that creative, but I know some people that are) to gain insights from data across your organization.</p>
</li>
<li><p>Data Activator - Coming soon! - A detection system that provides alerting and monitoring, so you know exactly when important information is changing.</p>
</li>
</ul>
<h2 id="heading-experiences">Experiences</h2>
<p>The experience in Fabric line up with the workloads to reduce the noise and quickly surface the most relevant information for what you need to get done.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1716523898629/0eb66f8d-19d6-461f-a4db-4f80a3e3ebfc.png" alt /></p>
<p>Selecting an experience, such as Synapse Data Warehouse, will bring you to a screen that shows common tasks and resources that a user working on a data warehouse would find useful. In this case, the ability to create a new warehouse, create a new data pipeline, or link directly to the getting started documentation.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1716523899699/5ce68f92-4b7b-4400-a5b8-2adeaff75c53.png" alt /></p>
<h2 id="heading-software-as-a-service">Software-as-a-Service</h2>
<p>Fabric will be delivered like no analytics platform at Microsoft before. Fabric is leveraging a <a target="_blank" href="https://learn.microsoft.com/en-us/fabric/get-started/microsoft-fabric-overview#saas-foundation">Software as a Service (SaaS)</a> model. This doesn't mean that you have any less functionality, you will still have complete control over your data and experience like you have in all Azure PaaS services. However, it does mean some things are changing for the better including:</p>
<ul>
<li><p>It just works.</p>
</li>
<li><p>One capacity that will be shared between all the workloads. No more paying for each individual dedicated SQL pool and worrying about paying for dedicated capacity in multiple databases that may sit unused.</p>
</li>
<li><p>Updates delivered weekly! That means the platform can be updated with the latest and greatest functionality all the time rather than waiting for less-frequent "big bang" releases.</p>
</li>
<li><p>Trials to get started in seconds if you just want to kick the tires.</p>
</li>
<li><p>Fast provisioning and automatic scaling. A data warehouse takes about 10-20 seconds to spin up rather than the 10+ minutes that it takes today. Spark pools come online in less than 15 seconds rather than 5+ minutes today.</p>
</li>
<li><p>Success by default, meaning fewer knobs to tune because the best practices are implemented automatically (for SQL think things like stats <strong><em>always</em></strong> being up to date) and the workloads are all integrated seamlessly.</p>
</li>
</ul>
<h2 id="heading-wrapping-up">Wrapping up</h2>
<p>This is the biggest advance in analytics at Microsoft since I started building data warehouses on SQL Server back in 2009. Full disclosure, I may be a little biased since I work on the Fabric product team, as of the time of writing this post I am a member of Fabric CAT (Customer Advisory Team). Sure, all the core components of this platform existed in one shape or another before, but the level of integration, ease of use, and new functionality that is unlocked with Fabric is quite amazing.</p>
<p>If you want to hear directly from the leadership and feature PM teams about all the different workloads, you can head over to the <a target="_blank" href="https://powerbi.microsoft.com/en-in/blog/microsoft-digital-event-may-24-25/">Microsoft Fabric Launch Event</a> page to find a list of sessions that were presented on May 24 and 25 showcasing all the functionality that was announced. If you want to just jump right in and watch the full 6+ hours of content, you can catch the on-demand versions for <a target="_blank" href="https://www.youtube.com/watch?v=1o_QDFq6gzE">Day 1</a> and <a target="_blank" href="https://www.youtube.com/watch?v=_Y-XyCRE6ec">Day 2</a> over on YouTube.</p>
<p>In a livestream I did earlier this week someone said Fabric was just Synapse repackaged. I can assure you that is not the case. If you have any doubts, just go <a target="_blank" href="https://learn.microsoft.com/en-us/fabric/get-started/fabric-trial">try the 60-day free trial</a> and see for yourself.</p>
]]></content:encoded></item><item><title><![CDATA[Upcoming Presentation - Jacksonville SQL Server User Group]]></title><description><![CDATA[I'm excited to announce that I will be speaking at the July meeting of the Jacksonville SQL Server User Group (JSSUG). It's always fun to spend some time with local SQL community. I'll be presenting an introductory session on Microsoft Fabric. If you...]]></description><link>https://bradleyschacht.com/upcoming-presentation-jacksonville-sql-server-user-group</link><guid isPermaLink="true">https://bradleyschacht.com/upcoming-presentation-jacksonville-sql-server-user-group</guid><dc:creator><![CDATA[Bradley Schacht]]></dc:creator><pubDate>Wed, 07 Jun 2023 00:00:00 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1716516409920/q_IcUXJVN.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>I'm excited to announce that I will be speaking at the July meeting of the <a target="_blank" href="https://www.meetup.com/jaxssug/">Jacksonville SQL Server User</a> Group (JSSUG). It's always fun to spend some time with local SQL community. I'll be presenting an introductory session on <a target="_blank" href="https://www.microsoft.com/en-us/microsoft-fabric">Microsoft Fabric</a>. If you're local, come on out and join us for an evening of learning, networking, and of course food. The event is free, but please register at the link below to help the organizers with logistics and planning.</p>
<p>I hope to see you there!</p>
<p><strong>Date:</strong> Wednesday July 19, 2023<br /><strong>Time:</strong> 6:00 - 8:00 PM Eastern<br /><strong>Where:</strong> <a target="_blank" href="https://goo.gl/maps/E6jnYZ1BBsNzjcT87">Keiser University (6430 Southpoint Parkway Suite #100, Jacksonville, FL 32216</a><br /><strong>Cost:</strong> Free!<br /><strong>Registration:</strong> <a target="_blank" href="https://www.meetup.com/jaxssug/events/294028133/">Meetup</a></p>
<h3 id="heading-introducing-microsoft-fabric-the-all-in-one-analytics-solution">Introducing Microsoft Fabric: The All-in-One Analytics Solution</h3>
<p>The world is awash with data and it is up to data professionals to make sense of it all. Just a few weeks ago Microsoft announced the next generation of analytics at Build: Microsoft Fabric!</p>
<p>Fabric is a complete end-to-end analytics platform bringing together Azure Data Factory, Azure Synapse, and Power BI in a single location and more deeply integrated than ever. It’s built on a SaaS platform which helps deliver innovation more quickly and allowing users to get up and running in seconds. At the core Fabric provides a lake-centric and open data hub using the popular Delta format with built-in security, governance, and compliance throughout. At the edge, Fabric delivers flexibility for data scientists, data warehouse developers, and Power BI users to build and analyze using their unique skill sets.</p>
<p>During this session we will discuss the guiding principles and architecture behind Fabric and show a lot of demos! Whether you’re a data engineer, data scientist, data warehouse developer, or an existing Power BI user, Fabric has something for everyone. You won’t want to miss this session!</p>
<p><a target="_blank" href="https://www.meetup.com/jaxssug/events/294028133/attendees/"></a></p>
]]></content:encoded></item><item><title><![CDATA[Upcoming Presentation - Toronto Data Professionals Community]]></title><description><![CDATA[I'm excited to announce that I will be speaking at the June meeting of the virtual Toronto Data Professionals Community. I'll be presenting an introductory session on Microsoft Fabric. The session is hosted online so everyone is welcome to join for a...]]></description><link>https://bradleyschacht.com/upcoming-presentation-toronto-data-professionals-community</link><guid isPermaLink="true">https://bradleyschacht.com/upcoming-presentation-toronto-data-professionals-community</guid><dc:creator><![CDATA[Bradley Schacht]]></dc:creator><pubDate>Wed, 07 Jun 2023 00:00:00 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1716516409920/q_IcUXJVN.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>I'm excited to announce that I will be speaking at the June meeting of the virtual <a target="_blank" href="https://www.meetup.com/toronto-data-professionals-meetup-group/">Toronto Data Professionals Community</a>. I'll be presenting an introductory session on <a target="_blank" href="https://www.microsoft.com/en-us/microsoft-fabric">Microsoft Fabric</a>. The session is hosted online so everyone is welcome to join for an evening of learning and networking. The event is free, but please register at the link below to help the organizers with logistics and planning.</p>
<p>I hope to see you there!</p>
<p><strong>Date:</strong> Thursday June 15, 2023<br /><strong>Time:</strong> 6:00 - 7:30 PM Eastern<br /><strong>Where:</strong> Online (Microsoft Teams, register to receive the link)<br /><strong>Cost:</strong> Free!<br /><strong>Registration:</strong> <a target="_blank" href="https://www.meetup.com/toronto-data-professionals-meetup-group/events/293805152">Meetup</a></p>
<h3 id="heading-introducing-microsoft-fabric-the-all-in-one-analytics-solution">Introducing Microsoft Fabric: The All-in-One Analytics Solution</h3>
<p>The world is awash with data and it is up to data professionals to make sense of it all. Just a few weeks ago Microsoft announced the next generation of analytics at Build: Microsoft Fabric!</p>
<p>Fabric is a complete end-to-end analytics platform bringing together Azure Data Factory, Azure Synapse, and Power BI in a single location and more deeply integrated than ever. It’s built on a SaaS platform which helps deliver innovation more quickly and allowing users to get up and running in seconds. At the core Fabric provides a lake-centric and open data hub using the popular Delta format with built-in security, governance, and compliance throughout. At the edge, Fabric delivers flexibility for data scientists, data warehouse developers, and Power BI users to build and analyze using their unique skill sets.</p>
<p>During this session we will discuss the guiding principles and architecture behind Fabric and show a lot of demos! Whether you’re a data engineer, data scientist, data warehouse developer, or an existing Power BI user, Fabric has something for everyone. You won’t want to miss this session!</p>
<p><a target="_blank" href="https://www.meetup.com/jaxssug/events/294028133/attendees/"></a></p>
]]></content:encoded></item><item><title><![CDATA[What is a DWU?]]></title><description><![CDATA[You're new to Azure Synapse Analytics but you know you want to build a data warehouse. What do you do? After a quick search of the internet, you discover you need a Synapse dedicated SQL pool. Log into Azure, click a few buttons, deploy a Synapse wor...]]></description><link>https://bradleyschacht.com/what-is-a-dwu</link><guid isPermaLink="true">https://bradleyschacht.com/what-is-a-dwu</guid><dc:creator><![CDATA[Bradley Schacht]]></dc:creator><pubDate>Tue, 14 Feb 2023 00:00:00 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1716516409920/q_IcUXJVN.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>You're new to Azure Synapse Analytics but you know you want to build a data warehouse. What do you do? After a quick search of the internet, you discover you need a Synapse dedicated SQL pool. Log into Azure, click a few buttons, deploy a Synapse workspace, and it's time to get started with the dedicated SQL pool but there is one problem...you have no idea what a DWU is.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1716523891194/67cebd54-539f-4fc9-9663-78bfd7fb9e5f.png" alt /></p>
<p>The default is 1,000 of them and they aren't cheap.</p>
<p>Do you need 1,000? 100? 8,278?</p>
<p>These are all good questions, but let's try to address the "What is a DWU?" question first.</p>
<h2 id="heading-what-is-a-dwu">What is a DWU?</h2>
<p>A DWU, or Data Warehouse Unit, is the Synapse dedicated SQL pool's blended representation of compute power. Similar to the early days of Azure SQL Database where we only had DTUs, you won't find vCores or memory listed anywhere on the page or documentation.</p>
<p>This number drives a number of things that we will cover in separate posts at a later date, but at the core it combines CPU, memory, and IO. The used DWU percentage will therefore be represented by the maximum of those three metrics. For example, a workload that is very CPU intensive using 60% of the available CPU for the currently selected DWU, but uses very little memory at 10% and a small amount of IO at 15% would show as using 60% of the available DWU. Similarly, an IO intensive workload using 5% CPU, 15% memory, and 40% of the provisioned IO would show as using 40% of the available DWU.</p>
<h2 id="heading-how-do-i-choose-the-right-number-of-dwus">How do I choose the right number of DWUs?</h2>
<p>To oversimplify things, if you need less overall performance then go with a lower DWU. If you need more overall performance, then go with a higher DWU.</p>
<p>I'm a former consultant, so I am well trained in saying "it depends". To that end, in reality, the number of required DWUs is very workload specific. Here are a few guiding principals for helping land on the right setting:</p>
<ul>
<li><p>Do you initial development on a very small number of DWUs. Start with 100 DWUs just get the DDL deployed even.</p>
</li>
<li><p>DW100 to DW500 are all a single compute node with varying levels of compute power. That means you are really running a single node server with the overhead of an MPP engine. Not until DW1000 do you get a second compute node.</p>
</li>
<li><p>When doing load tests, use at least DW1000. Anything lower shouldn't be considered for production and should only be used for development purposes.</p>
</li>
<li><p>Use a tool like JMeter to simulate a workload. Be sure to look at the whole workload.</p>
<ul>
<li><p>Build a simulation for your ETL.</p>
</li>
<li><p>Build another for ad hoc user queries.</p>
</li>
<li><p>Build another for reporting tool queries.</p>
</li>
<li><p>Run them in series or parallel depending on expected loading and query patterns.</p>
</li>
<li><p>Use realistic data sizes. Don't build the simulation on 100k records when you're going to have 100 billion records. It doesn't need to be a 100% match on data size, but it should be representative.</p>
</li>
</ul>
</li>
<li><p>Scale up or down based on the performance observed in your test.</p>
</li>
<li><p>Don't forget to account for bursts of activity and plan to scale appropriately. A monthly or quarterly load may mean you scale up a few hours in advance or at some logical point where you can take a few minutes of downtime for the scaling operation to complete.</p>
</li>
<li><p><a target="_blank" href="https://learn.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/memory-concurrency-limits#concurrency-maximums-for-workload-groups">DWUs drive concurrency</a>. Maximum concurrency, 128 queries, in not unlocked until DW6000c.</p>
</li>
<li><p>The more CCI rebuilds you want to run in parallel, the more memory you will need, the more DWUs you will need.</p>
</li>
<li><p>The moral of the story is this...test, test, test.</p>
</li>
</ul>
<h2 id="heading-how-do-i-know-what-dwu-im-running">How do I know what DWU I'm running?</h2>
<p>The current SQL pool service level will be shown in the Azure Portal, Synapse Studio, PowerShell, and through the T-SQL query below when connected to the master database.</p>
<p>SQL</p>
<pre><code>SELECT
    d.name AS DatabaseName,
    Edition AS DatabaseEdition,
    service_objective AS ServiceObjective
FROM sys.database_service_objectives AS dso
INNER JOIN sys.databases AS d
    ON dso.database_id = d.database_id
</code></pre><p>There you have it. Data warehouse units simplified!</p>
]]></content:encoded></item><item><title><![CDATA[PASS Data Community Summit 2022]]></title><description><![CDATA[We are back in business!
It's been a few years since there was an in-person PASS Summit. Some major changes and a couple of years later we are excited to be back in person for what is now the "PASS Data Community Summit".
If SQL Saturday and Data Sat...]]></description><link>https://bradleyschacht.com/pass-data-community-summit-2022</link><guid isPermaLink="true">https://bradleyschacht.com/pass-data-community-summit-2022</guid><dc:creator><![CDATA[Bradley Schacht]]></dc:creator><pubDate>Tue, 15 Nov 2022 00:00:00 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1716516409920/q_IcUXJVN.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>We are back in business!</p>
<p>It's been a few years since there was an in-person PASS Summit. Some major changes and a couple of years later we are excited to be back in person for what is now the "PASS Data Community Summit".</p>
<p>If SQL Saturday and Data Saturday event sizes are any indication, this conference will be significantly smaller than past years when it was the PASS Summit but with just as much great content and many great networking opportunities. I can't wait to make my way out to Seattle, meet some new people, and present on Synapse + Power BI.</p>
<p>This year I will be presenting a session called "<a target="_blank" href="https://passdatacommunitysummit.com/sessions/all-sessions/1855">Better Together: Power BI and Azure Synapse Analytics</a>". If you have ever attended a session of mine in the past, you will know I don't like slides. This session will walk through the adventure of creating a Power BI report on Synapse serverless, eventually hitting performance challenges as the dataset gets really big, tuning the report and Synapse layers, and ultimately making the end users happy again.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1716523887960/e089face-084e-4f3c-81aa-12662c96528f.png" alt /></p>
<p>If you're attending the Data Community Summit and are looking for a fun place to go at 2:30 PM, come on by! I'll also be around on Wednesday at the Microsoft booth and other places around the conference. If you have Synapse questions or just want to meet to talk, come find me!</p>
]]></content:encoded></item><item><title><![CDATA[Create Comma Delimited List in SQL]]></title><description><![CDATA[Previously, I wrote a blog about how to create a comma separated list in T-SQL. 12 years later...one moment I have to go check and see if I'm really so old that I can say I wrote a blog post 12 years ago...
Ok, I'm back. It is confirmed. I am in fact...]]></description><link>https://bradleyschacht.com/create-comma-delimited-list-in-sql</link><guid isPermaLink="true">https://bradleyschacht.com/create-comma-delimited-list-in-sql</guid><dc:creator><![CDATA[Bradley Schacht]]></dc:creator><pubDate>Tue, 02 Aug 2022 00:00:00 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1716516409920/q_IcUXJVN.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Previously, I <a target="_blank" href="https://bradleyschacht.com/comma-delimited-list-with-coalesce/">wrote a blog</a> about how to create a comma separated list in T-SQL. 12 years later...one moment I have to go check and see if I'm really so old that I can say I wrote a blog post 12 years ago...</p>
<p>Ok, I'm back. It is confirmed. I am in fact officially old. I really did write that blog 12 years ago on June 23, 2010. Wow. Anyway...</p>
<p>12 years later that post has received dozens of views from me trying to remember how I did that. Well, I'm here to tell you there is a better way! I didn't discover this because I wanted to improve my code or because I was scanning the release notes for SQL Server to see what new T-SQL functionality has been released in the last few versions. No, I had to find a better way because my previous method which used a variable and the COALESCE function does not work on Azure Synapse Analytics.</p>
<p>The new method: <a target="_blank" href="https://docs.microsoft.com/en-us/sql/t-sql/functions/string-agg-transact-sql?view=sql-server-ver16">STRING_AGG()</a></p>
<p>Let's get started with some sample data:</p>
<pre><code>CREATE TABLE dbo.State
    (
        StateID [int],
        StateName [varchar](<span class="hljs-number">50</span>)
    )

INSERT INTO dbo.State
SELECT <span class="hljs-number">1</span>, <span class="hljs-string">'Florida'</span>
UNION ALL
SELECT <span class="hljs-number">2</span>, <span class="hljs-string">'Tennessee'</span>
UNION ALL
SELECT <span class="hljs-number">3</span>, <span class="hljs-string">'Georgia'</span>
UNION ALL
SELECT <span class="hljs-number">4</span>, NULL
UNION ALL
SELECT <span class="hljs-number">5</span>, <span class="hljs-string">'Texas'</span>

SELECT * FROM dbo.State
</code></pre><p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1716523884848/1bae99af-66c0-45ee-9427-0f5e4bf2657d.png" alt /></p>
<p>Now, a simple STRING_AGG(expression, separator) and we are good to go!</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1716523885600/86e66044-3f1b-49ad-b442-fd03b692c86d.png" alt /></p>
<p>Quick and simple!</p>
<p>A couple quick notes in closing.</p>
<ol>
<li>Notice that NULL values are ignored and the separator is not added. If you want to include NULL values you'll need to wrap the column name in this example with ISNULL so it would look something like SELECT STRING_AGG(ISNULL(StateName, 'No Name Provided'), ',') FROM dbo.State</li>
<li>There is an optional order by operator that can be used to order the list.</li>
<li>The data type is determined by the expression. That means if the column is a string it will retain the string properties. MAX fields will result in a MAX data type. Non-max fields will result in the largest possible non-MAX value (VARCHAR(8000) or NVARCHAR(4000)). All other data types result in an NVARCHAR(4000) result.</li>
<li>You aren't limited to a literal string like a comma or a pipe for the separator. That's just my most common request. CHAR(13), CHAR(9), and just about any other expression are all valid as well.</li>
</ol>
]]></content:encoded></item><item><title><![CDATA[Azure Synapse Analytics Release Notes]]></title><description><![CDATA[Each month the Azure Synapse Analytics team works hard to get new features, updates, and improvements out the door. Here you will find a running list of updates released each month and links to the corresponding blog posts from the product team. You ...]]></description><link>https://bradleyschacht.com/azure-synapse-analytics-release-notes</link><guid isPermaLink="true">https://bradleyschacht.com/azure-synapse-analytics-release-notes</guid><dc:creator><![CDATA[Bradley Schacht]]></dc:creator><pubDate>Tue, 24 May 2022 00:00:00 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1716516409920/q_IcUXJVN.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Each month the Azure Synapse Analytics team works hard to get new features, updates, and improvements out the door. Here you will find a running list of updates released each month and links to the corresponding blog posts from the product team. You can always find the full updates from the product team and other great Synapse content over on the <a target="_blank" href="https://techcommunity.microsoft.com/t5/azure-synapse-analytics-blog/bg-p/AzureSynapseAnalyticsBlog">Azure Synapse Blog</a>.</p>
<h2 id="heading-may-2022">May 2022</h2>
<p>Read about the <a target="_blank" href="https://techcommunity.microsoft.com/t5/azure-synapse-analytics-blog/azure-synapse-analytics-may-update-2022/ba-p/3430970">May 2022</a> updates on the <a target="_blank" href="https://techcommunity.microsoft.com/t5/azure-synapse-analytics-blog/bg-p/AzureSynapseAnalyticsBlog">Azure Synapse Blog</a>.</p>
<ul>
<li>General<ul>
<li>[<a target="_blank" href="https://techcommunity.microsoft.com/t5/azure-synapse-analytics-blog/azure-synapse-analytics-may-update-2022/ba-p/3430970#TOCREF_1">See Details</a>] Get connected with the Azure Synapse Influencer program</li>
</ul>
</li>
<li>SQL<ul>
<li>[<a target="_blank" href="https://techcommunity.microsoft.com/t5/azure-synapse-analytics-blog/azure-synapse-analytics-may-update-2022/ba-p/3430970#TOCREF_3">See Details</a>] Data Warehouse Migration guide for Dedicated SQL Pools in Azure Synapse Analytics</li>
<li>[<a target="_blank" href="https://techcommunity.microsoft.com/t5/azure-synapse-analytics-blog/azure-synapse-analytics-may-update-2022/ba-p/3430970#TOCREF_4">See Details</a>] Specify character column lengths... Not anymore!</li>
</ul>
</li>
<li>Apache Spark for Synapse<ul>
<li>[<a target="_blank" href="https://techcommunity.microsoft.com/t5/azure-synapse-analytics-blog/azure-synapse-analytics-may-update-2022/ba-p/3430970#TOCREF_6">See Details</a>] Azure Synapse Dedicated SQL Pool Connector for Apache Spark Now Available in Python</li>
<li>[<a target="_blank" href="https://techcommunity.microsoft.com/t5/azure-synapse-analytics-blog/azure-synapse-analytics-may-update-2022/ba-p/3430970#TOCREF_7">See Details</a>] Manage Azure Synapse Apache Spark configuration</li>
</ul>
</li>
<li>Synapse Data Explorer<ul>
<li>[<a target="_blank" href="https://techcommunity.microsoft.com/t5/azure-synapse-analytics-blog/azure-synapse-analytics-may-update-2022/ba-p/3430970#TOCREF_9">See Details</a>] Synapse Data Explorer live query in Excel</li>
<li>[<a target="_blank" href="https://techcommunity.microsoft.com/t5/azure-synapse-analytics-blog/azure-synapse-analytics-may-update-2022/ba-p/3430970#TOCREF_11">See Details</a>] Use Managed Identities for External SQL Server Tables</li>
<li>[<a target="_blank" href="https://techcommunity.microsoft.com/t5/azure-synapse-analytics-blog/azure-synapse-analytics-may-update-2022/ba-p/3430970#TOCREF_12">See Details</a>] New KQL Learn module (2 out of 3) is live!</li>
<li>[<a target="_blank" href="https://techcommunity.microsoft.com/t5/azure-synapse-analytics-blog/azure-synapse-analytics-may-update-2022/ba-p/3430970#TOCREF_13">See Details</a>] Azure Synapse Data Explorer connector for Microsoft Power Automate, Logic Apps, and Power Apps [Gen...</li>
<li>[<a target="_blank" href="https://techcommunity.microsoft.com/t5/azure-synapse-analytics-blog/azure-synapse-analytics-may-update-2022/ba-p/3430970#TOCREF_15">See Details</a>] Dynamic events routing from event hub to multiple databases </li>
<li>[<a target="_blank" href="https://techcommunity.microsoft.com/t5/azure-synapse-analytics-blog/azure-synapse-analytics-may-update-2022/ba-p/3430970#TOCREF_16">See Details</a>] Configure a database using a KQL inline script as part of JSON ARM deployment template</li>
</ul>
</li>
<li>Data Integration<ul>
<li>[<a target="_blank" href="https://techcommunity.microsoft.com/t5/azure-synapse-analytics-blog/azure-synapse-analytics-may-update-2022/ba-p/3430970#TOCREF_18">See Details</a>] Export pipeline monitoring as a CSV</li>
<li>[<a target="_blank" href="https://techcommunity.microsoft.com/t5/azure-synapse-analytics-blog/azure-synapse-analytics-may-update-2022/ba-p/3430970#TOCREF_19">See Details</a>] Incremental data loading made easy for Synapse and Azure Database for PostgreSQL and MySQL</li>
<li>[<a target="_blank" href="https://techcommunity.microsoft.com/t5/azure-synapse-analytics-blog/azure-synapse-analytics-may-update-2022/ba-p/3430970#TOCREF_20">See Details</a>] User-Defined Functions for Mapping Data Flows [Public Preview]</li>
<li>[<a target="_blank" href="https://techcommunity.microsoft.com/t5/azure-synapse-analytics-blog/azure-synapse-analytics-may-update-2022/ba-p/3430970#TOCREF_21">See Details</a>] Assert Error Handling</li>
<li>[<a target="_blank" href="https://techcommunity.microsoft.com/t5/azure-synapse-analytics-blog/azure-synapse-analytics-may-update-2022/ba-p/3430970#TOCREF_22">See Details</a>] Mapping data flows projection editing</li>
</ul>
</li>
<li>Synapse Link<ul>
<li>[<a target="_blank" href="https://techcommunity.microsoft.com/t5/azure-synapse-analytics-blog/azure-synapse-analytics-may-update-2022/ba-p/3430970#TOCREF_24">See Details</a>] Azure Synapse Link for SQL [Public Preview]</li>
</ul>
</li>
</ul>
<hr />
<h2 id="heading-april-2022">April 2022</h2>
<p>Read about the <a target="_blank" href="https://techcommunity.microsoft.com/t5/azure-synapse-analytics-blog/azure-synapse-analytics-april-update-2022/ba-p/3295633">April 2022</a> updates on the <a target="_blank" href="https://techcommunity.microsoft.com/t5/azure-synapse-analytics-blog/bg-p/AzureSynapseAnalyticsBlog">Azure Synapse Blog</a>.</p>
<ul>
<li>SQL<ul>
<li><a target="_blank" href="https://techcommunity.microsoft.com/t5/azure-synapse-analytics-blog/azure-synapse-analytics-april-update-2022/ba-p/3295633#TOCREF_2">[See Details]</a> Cross-subscription restore for Azure Synapse SQL [Generally Available]</li>
<li><a target="_blank" href="https://techcommunity.microsoft.com/t5/azure-synapse-analytics-blog/azure-synapse-analytics-april-update-2022/ba-p/3295633#TOCREF_3">[See Details]</a> Recover SQL pool from dropped server or workspace</li>
</ul>
</li>
<li>Synapse Database Templates &amp; Database Designer<ul>
<li>[<a target="_blank" href="https://techcommunity.microsoft.com/t5/azure-synapse-analytics-blog/azure-synapse-analytics-april-update-2022/ba-p/3295633#TOCREF_5">See Details</a>] Revamped exploration experience</li>
<li>[<a target="_blank" href="https://techcommunity.microsoft.com/t5/azure-synapse-analytics-blog/azure-synapse-analytics-april-update-2022/ba-p/3295633#TOCREF_6">See Details</a>] Clone lake database</li>
<li>[<a target="_blank" href="https://techcommunity.microsoft.com/t5/azure-synapse-analytics-blog/azure-synapse-analytics-april-update-2022/ba-p/3295633#TOCREF_7">See Details</a>] Use wildcards to specify custom folder hierarchies</li>
</ul>
</li>
<li>Apache Spark for Synapse<ul>
<li>[<a target="_blank" href="https://techcommunity.microsoft.com/t5/azure-synapse-analytics-blog/azure-synapse-analytics-april-update-2022/ba-p/3295633#TOCREF_9">See Details</a>] Apache Spark 3.2 [Public Preview]</li>
<li>[<a target="_blank" href="https://techcommunity.microsoft.com/t5/azure-synapse-analytics-blog/azure-synapse-analytics-april-update-2022/ba-p/3295633#TOCREF_10">See Details</a>] Parameterization for Spark job definition</li>
<li>[<a target="_blank" href="https://techcommunity.microsoft.com/t5/azure-synapse-analytics-blog/azure-synapse-analytics-april-update-2022/ba-p/3295633#TOCREF_11">See Details</a>] Notebook snapshot</li>
</ul>
</li>
<li>Security<ul>
<li>[<a target="_blank" href="https://techcommunity.microsoft.com/t5/azure-synapse-analytics-blog/azure-synapse-analytics-april-update-2022/ba-p/3295633#TOCREF_12">See Details</a>] Synapse Monitoring Operator RBAC role [Generally Available]</li>
</ul>
</li>
<li>Data Integration<ul>
<li>[<a target="_blank" href="https://techcommunity.microsoft.com/t5/azure-synapse-analytics-blog/azure-synapse-analytics-april-update-2022/ba-p/3295633#TOCREF_14">See Details</a>] Dataverse connector added to Synapse Data Flows</li>
<li>[<a target="_blank" href="https://techcommunity.microsoft.com/t5/azure-synapse-analytics-blog/azure-synapse-analytics-april-update-2022/ba-p/3295633#TOCREF_16">See Details</a>] Synapse Pipelines Web activity response timeout improvement</li>
</ul>
</li>
<li>Developer Experience<ul>
<li>[<a target="_blank" href="https://techcommunity.microsoft.com/t5/azure-synapse-analytics-blog/azure-synapse-analytics-april-update-2022/ba-p/3295633#TOCREF_18">See Details</a>] Reference unpublished notebooks</li>
</ul>
</li>
</ul>
<hr />
<h2 id="heading-march-2022">March 2022</h2>
<p>Read about the <a target="_blank" href="https://techcommunity.microsoft.com/t5/azure-synapse-analytics-blog/azure-synapse-analytics-march-update-2022/ba-p/3269194">March 2022</a> updates on the <a target="_blank" href="https://techcommunity.microsoft.com/t5/azure-synapse-analytics-blog/bg-p/AzureSynapseAnalyticsBlog">Azure Synapse Blog</a>.</p>
<ul>
<li>Developer Experience<ul>
<li>[<a target="_blank" href="https://techcommunity.microsoft.com/t5/azure-synapse-analytics-blog/azure-synapse-analytics-march-update-2022/ba-p/3269194#TOCREF_1">See Details</a>] Synapse notebooks: Code cells with exception to show standard output</li>
<li>[<a target="_blank" href="https://techcommunity.microsoft.com/t5/azure-synapse-analytics-blog/azure-synapse-analytics-march-update-2022/ba-p/3269194#TOCREF_1">See Details</a>] Synapse notebooks: Partial output is available for running notebook code cells</li>
<li>[<a target="_blank" href="https://techcommunity.microsoft.com/t5/azure-synapse-analytics-blog/azure-synapse-analytics-march-update-2022/ba-p/3269194#TOCREF_2">See Details</a>] Synapse notebooks: Dynamically control your Spark session configuration with pipeline parameters</li>
<li>[<a target="_blank" href="https://techcommunity.microsoft.com/t5/azure-synapse-analytics-blog/azure-synapse-analytics-march-update-2022/ba-p/3269194#TOCREF_3">See Details</a>] Synapse notebooks: Reuse and manage notebook sessions</li>
<li>[<a target="_blank" href="https://techcommunity.microsoft.com/t5/azure-synapse-analytics-blog/azure-synapse-analytics-march-update-2022/ba-p/3269194#TOCREF_4">See Details</a>] Synapse notebooks: Support for Python logging</li>
</ul>
</li>
<li>SQL<ul>
<li>[<a target="_blank" href="https://techcommunity.microsoft.com/t5/azure-synapse-analytics-blog/azure-synapse-analytics-march-update-2022/ba-p/3269194#TOCREF_6">See Details</a>] Column Level Encryption for Azure Synapse dedicated SQL Pools [Generally Available]</li>
<li>[<a target="_blank" href="https://techcommunity.microsoft.com/t5/azure-synapse-analytics-blog/azure-synapse-analytics-march-update-2022/ba-p/3269194#TOCREF_7">See Details</a>] Better performance for CETAS and subsequent SELECTs</li>
</ul>
</li>
<li>Apache Spark for Synapse<ul>
<li>[<a target="_blank" href="https://techcommunity.microsoft.com/t5/azure-synapse-analytics-blog/azure-synapse-analytics-march-update-2022/ba-p/3269194#TOCREF_9">See Details</a>] Synapse Spark Common Data Model (CDM) Connector [Generally Available]</li>
<li>[[](https://techcommunity.microsoft.com/t5/azure-synapse-analytics-blog/azure-synapse-analytics-march-update-2022/ba-p/3269194#TOCREF_10)<a target="_blank" href="https://techcommunity.microsoft.com/t5/azure-synapse-analytics-blog/azure-synapse-analytics-march-update-2022/ba-p/3269194#TOCREF_9">See Details</a>] Spark Dedicated SQL Pool (DW) Connector: Performance Improvements</li>
<li>[<a target="_blank" href="https://techcommunity.microsoft.com/t5/azure-synapse-analytics-blog/azure-synapse-analytics-march-update-2022/ba-p/3269194#TOCREF_11">See Details</a>] Synapse Spark Dedicated SQL Pool (DW) Connector: Support for all Spark Dataframe SaveMode choices (...</li>
<li>[<a target="_blank" href="https://techcommunity.microsoft.com/t5/azure-synapse-analytics-blog/azure-synapse-analytics-march-update-2022/ba-p/3269194#TOCREF_12">See Details</a>] Accelerate Spark execution speed using the new Intelligent Cache feature [Public Preview]</li>
</ul>
</li>
<li>Security<ul>
<li>[<a target="_blank" href="https://techcommunity.microsoft.com/t5/azure-synapse-analytics-blog/azure-synapse-analytics-march-update-2022/ba-p/3269194#TOCREF_14">See Details</a>] Azure Synapse Analytics now supports Azure Active Directory (Azure AD) only authentication</li>
<li>[<a target="_blank" href="https://techcommunity.microsoft.com/t5/azure-synapse-analytics-blog/azure-synapse-analytics-march-update-2022/ba-p/3269194#TOCREF_15">See Details</a>] API support to raise or lower Workspace Managed SQL Server Dedicated SQL Minimal TLS version</li>
</ul>
</li>
<li>Data Integration<ul>
<li>[<a target="_blank" href="https://techcommunity.microsoft.com/t5/azure-synapse-analytics-blog/azure-synapse-analytics-march-update-2022/ba-p/3269194#TOCREF_17">See Details</a>] Flowlets and CDC Connectors [Generally Available]</li>
<li>[<a target="_blank" href="https://techcommunity.microsoft.com/t5/azure-synapse-analytics-blog/azure-synapse-analytics-march-update-2022/ba-p/3269194#TOCREF_18">See Details</a>] sFTP connector for Synapse data flows</li>
<li>[<a target="_blank" href="https://techcommunity.microsoft.com/t5/azure-synapse-analytics-blog/azure-synapse-analytics-march-update-2022/ba-p/3269194#TOCREF_19">See Details</a>] Data flow improvements to Data Preview</li>
<li>[<a target="_blank" href="https://techcommunity.microsoft.com/t5/azure-synapse-analytics-blog/azure-synapse-analytics-march-update-2022/ba-p/3269194#TOCREF_20">See Details</a>] Pipeline script activity</li>
</ul>
</li>
</ul>
<hr />
<h2 id="heading-february-2022">February 2022</h2>
<p>Read about the <a target="_blank" href="https://techcommunity.microsoft.com/t5/azure-synapse-analytics-blog/azure-synapse-analytics-february-update-2022/ba-p/3221841">February 2022</a> updates on the <a target="_blank" href="https://techcommunity.microsoft.com/t5/azure-synapse-analytics-blog/bg-p/AzureSynapseAnalyticsBlog">Azure Synapse Blog</a>.</p>
<ul>
<li>SQL<ul>
<li>[<a target="_blank" href="https://techcommunity.microsoft.com/t5/azure-synapse-analytics-blog/azure-synapse-analytics-february-update-2022/ba-p/3221841#TOCREF_2">See Details</a>] More consistent query execution times for Serverless SQL Pools</li>
<li>[<a target="_blank" href="https://techcommunity.microsoft.com/t5/azure-synapse-analytics-blog/azure-synapse-analytics-february-update-2022/ba-p/3221841#TOCREF_3">See Details</a>] The OPENJSON function makes it easy to get array element indexes</li>
</ul>
</li>
<li>Data Integration<ul>
<li>[<a target="_blank" href="https://techcommunity.microsoft.com/t5/azure-synapse-analytics-blog/azure-synapse-analytics-february-update-2022/ba-p/3221841#TOCREF_5">See Details</a>] Upsert supported by Copy Activity</li>
<li>[<a target="_blank" href="https://techcommunity.microsoft.com/t5/azure-synapse-analytics-blog/azure-synapse-analytics-february-update-2022/ba-p/3221841#TOCREF_6">See Details</a>] Transform Dynamics Data Visually in Synapse Data Flows</li>
<li>[<a target="_blank" href="https://techcommunity.microsoft.com/t5/azure-synapse-analytics-blog/azure-synapse-analytics-february-update-2022/ba-p/3221841#TOCREF_7">See Details</a>] Connect to your SQL sources in data flows using Always Encrypted</li>
<li>[<a target="_blank" href="https://techcommunity.microsoft.com/t5/azure-synapse-analytics-blog/azure-synapse-analytics-february-update-2022/ba-p/3221841#TOCREF_8">See Details</a>] Capture descriptions from Asserts in Data Flows</li>
<li>[<a target="_blank" href="https://techcommunity.microsoft.com/t5/azure-synapse-analytics-blog/azure-synapse-analytics-february-update-2022/ba-p/3221841#TOCREF_9">See Details</a>] Easily define schemas for complex type fields</li>
</ul>
</li>
</ul>
<hr />
<h2 id="heading-january-2022">January 2022</h2>
<p>Read about the <a target="_blank" href="https://techcommunity.microsoft.com/t5/azure-synapse-analytics-blog/azure-synapse-analytics-december-2021-update/ba-p/3042904">January 2022</a> updates on the <a target="_blank" href="https://techcommunity.microsoft.com/t5/azure-synapse-analytics-blog/bg-p/AzureSynapseAnalyticsBlog">Azure Synapse Blog</a>.</p>
<ul>
<li>Apache Spark for Synapse<ul>
<li>[<a target="_blank" href="https://techcommunity.microsoft.com/t5/azure-synapse-analytics-blog/azure-synapse-analytics-january-update-2022/ba-p/3071681#TOCREF_2">See Details</a>] 4 New database templates [Public Preview]</li>
</ul>
</li>
<li>Machine Learning<ul>
<li>[<a target="_blank" href="https://techcommunity.microsoft.com/t5/azure-synapse-analytics-blog/azure-synapse-analytics-january-update-2022/ba-p/3071681#TOCREF_3">See Details</a>] Improvements to the SynapseML library</li>
</ul>
</li>
<li>Security<ul>
<li>[<a target="_blank" href="https://techcommunity.microsoft.com/t5/azure-synapse-analytics-blog/azure-synapse-analytics-january-update-2022/ba-p/3071681#TOCREF_5">See Details</a>] Azure Synapse Analytics security overview</li>
<li>[<a target="_blank" href="https://techcommunity.microsoft.com/t5/azure-synapse-analytics-blog/azure-synapse-analytics-january-update-2022/ba-p/3071681#TOCREF_6">See Details</a>] TLS 1.2 required for new workspaces</li>
</ul>
</li>
<li>Data Integration<ul>
<li>[<a target="_blank" href="https://techcommunity.microsoft.com/t5/azure-synapse-analytics-blog/azure-synapse-analytics-january-update-2022/ba-p/3071681#TOCREF_8">See Details</a>] Data quality validation rules using Assert transformation</li>
<li>[<a target="_blank" href="https://techcommunity.microsoft.com/t5/azure-synapse-analytics-blog/azure-synapse-analytics-january-update-2022/ba-p/3071681#TOCREF_9">See Details</a>] Native data flow connector for Dynamics</li>
<li>[<a target="_blank" href="https://techcommunity.microsoft.com/t5/azure-synapse-analytics-blog/azure-synapse-analytics-january-update-2022/ba-p/3071681#TOCREF_10">See Details</a>] IntelliSense and auto-complete added to pipeline expressions</li>
</ul>
</li>
<li>Synapse SQL<ul>
<li>[<a target="_blank" href="https://techcommunity.microsoft.com/t5/azure-synapse-analytics-blog/azure-synapse-analytics-january-update-2022/ba-p/3071681#TOCREF_12">See Details</a>] COPY schema discovery for complex data ingestion</li>
<li>[<a target="_blank" href="https://techcommunity.microsoft.com/t5/azure-synapse-analytics-blog/azure-synapse-analytics-january-update-2022/ba-p/3071681#TOCREF_13">See Details</a>] HASHBYTES easily generates hashes in Serverless SQL</li>
</ul>
</li>
</ul>
<hr />
<h2 id="heading-december-2021">December 2021</h2>
<p>Read about the <a target="_blank" href="https://techcommunity.microsoft.com/t5/azure-synapse-analytics-blog/azure-synapse-analytics-december-2021-update/ba-p/3042904">December 2021</a> updates on the <a target="_blank" href="https://techcommunity.microsoft.com/t5/azure-synapse-analytics-blog/bg-p/AzureSynapseAnalyticsBlog">Azure Synapse Blog</a>.</p>
<ul>
<li>[<a target="_blank" href="https://techcommunity.microsoft.com/t5/azure-synapse-analytics-blog/azure-synapse-analytics-december-2021-update/ba-p/3042904#REF1">See Details</a>] Finding Synapse Monthly Update blogs</li>
<li>Apache Spark in Synapse<ul>
<li>[<a target="_blank" href="https://techcommunity.microsoft.com/t5/azure-synapse-analytics-blog/azure-synapse-analytics-december-2021-update/ba-p/3042904#REF3">See Details</a>] Additional notebook export formats</li>
<li>[<a target="_blank" href="https://techcommunity.microsoft.com/t5/azure-synapse-analytics-blog/azure-synapse-analytics-december-2021-update/ba-p/3042904#REF4">See Details</a>] Three new chart types in notebooks</li>
<li>[<a target="_blank" href="https://techcommunity.microsoft.com/t5/azure-synapse-analytics-blog/azure-synapse-analytics-december-2021-update/ba-p/3042904#REF5">See Details</a>] Reconnect notebooks to Spark sessions</li>
</ul>
</li>
<li>Data Integration<ul>
<li>[<a target="_blank" href="https://techcommunity.microsoft.com/t5/azure-synapse-analytics-blog/azure-synapse-analytics-december-2021-update/ba-p/3042904#REF7">See Details</a>] Map Data tool [Public Preview], a no-code guided ETL experience</li>
<li>[<a target="_blank" href="https://techcommunity.microsoft.com/t5/azure-synapse-analytics-blog/azure-synapse-analytics-december-2021-update/ba-p/3042904#REF7">See Details</a>] Quick Reuse of Spark clusters</li>
<li>[<a target="_blank" href="https://techcommunity.microsoft.com/t5/azure-synapse-analytics-blog/azure-synapse-analytics-december-2021-update/ba-p/3042904#REF9">See Details</a>] External call transformation</li>
<li>[<a target="_blank" href="https://techcommunity.microsoft.com/t5/azure-synapse-analytics-blog/azure-synapse-analytics-december-2021-update/ba-p/3042904#REF10">See Details</a>] Flowlets [Public Preview]</li>
</ul>
</li>
</ul>
]]></content:encoded></item><item><title><![CDATA[Log Analytics with Azure Synapse Analytics]]></title><description><![CDATA[There are a lot of services in Azure. Way more than a few. What is something you want to do with all your services and applications? You want to monitor them. How do you do that? By looking at the logs that are produced. How do you capture and make s...]]></description><link>https://bradleyschacht.com/log-analytics-with-azure-synapse-analytics</link><guid isPermaLink="true">https://bradleyschacht.com/log-analytics-with-azure-synapse-analytics</guid><dc:creator><![CDATA[Bradley Schacht]]></dc:creator><pubDate>Mon, 01 Nov 2021 00:00:00 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1716516409920/q_IcUXJVN.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>There are a lot of services in Azure. Way more than a few. What is something you want to do with all your services and applications? You want to monitor them. How do you do that? By looking at the logs that are produced. How do you capture and make sense of thousands of log entries from hundreds of services for this really awesome solution you’ve been working on? Azure Monitor.</p>
<p>This post isn’t about Azure Monitor though. Not directly anyway.</p>
<p>Azure Synapse is no different than any other application or service in your environment. It serves a purpose in the architecture and as a result needs special attention from people with a specific skillset, in this case DBAs and developers. That’s where Azure Monitor comes into the picture and what we will be discussing today: how to make sense of the integration between auditing in Azure Synapse and Azure Monitor. The focus will be on Azure Synapse Analytics Dedicated SQL Pools (Gen 2) specifically, but there is integration for monitoring serverless SQL pools, Spark pools, pipelines, and general workspace operations.</p>
<p>First, those with an eye for detail will notice that this post has Log Analytics in the title and not Azure Monitor. That’s true. A very brief explanation so everyone has things straight in their heads. I put Log Analytics in the title because when you set up auditing or diagnostics the option is labeled “Send to Log Analytics workspace” not “Send to Azure Monitor” and I wanted people to be able to find this post. Azure Monitor is a set of functionalities for collecting and analyzing logs. There are some prebuilt integrations and visualizations with some Azure services like Key Vault or Storage Accounts Then there is Log Analytics, a functionality inside Azure Monitor which also happens to be where Synapse writes its logs, for storing and querying all log data. Queries are written in a Kusto Query Language or KQL.</p>
<p>Second, let’s be sure a few very important details are clear. When referring to Azure Synapse there are two distinct but very similar services. They are both the same Microsoft cloud MPP database engine, both cover the same general T-SQL surface area, both scale in terms of DWUs, both talk about “Azure Synapse Analytics” in the documentation and description, but one is deployed to an Azure SQL Server, and one is deployed to a Workspace.</p>
<p>We need to be very clear about this because they both operate slightly different. Again, this post is focused on Azure Synapse Analytics Dedicated SQL Pools (Gen 2). An alternate post is available for <a target="_blank" href="https://bradleyschacht.com/log-analytics-with-dedicated-sql-pools-formerly-sql-dw/">Dedicate SQL Pools (Formerly SQL DW)</a> if you need information about that deployment option.</p>
<p>Let’s get started.</p>
<h1 id="heading-what-are-my-options">What are my options?</h1>
<p>There are two types of logging that can be enabled: Auditing and Diagnostics. Auditing tracks a set of database events like queries and login attempts. Diagnostics will make copies of a specific set of system DMVs periodically which can be helpful because Synapse only stores a few thousand records depending on the DMV.</p>
<p>Everything from here assumes that you have a Log Analytics workspace created and a Synapse Workspace with a Gen 2 dedicated SQL pool deployed.</p>
<h1 id="heading-enabling-auditing">Enabling Auditing</h1>
<p>Auditing can be enabled at the workspace level, which will cover all databases on the workspace automatically, or at the individual database level.</p>
<p>First, navigate to your Synapse Analytics Workspace or dedicated SQL pool in the Azure Portal. My screenshots will show the configuration from the dedicated SQL pool, but the same setting can be found under the label of Azure SQL Auditing at the workspace level. From here, select Auditing from the Security section.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1716523866196/80a8fd99-9cc4-4044-a5b2-7b54b279f995.png" alt /></p>
<p>Next, toggle the <strong>Enable Azure SQL Auditing</strong> to the on position. Next, check the boxes for the locations where you would like the log to be written, in this example we are going to focus on Log Analytics. Select a log analytics workspace to which the data will be written. Click <strong>Save</strong> once complete.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1716523867207/40aa140e-44eb-48c9-bfcb-906d0ddad923.png" alt /></p>
<h1 id="heading-enabling-diagnostics">Enabling Diagnostics</h1>
<p>While Auditing can be turned on for a database by either enabling it at the workspace or database level, diagnostics must be enabled at the database level. There are diagnostics at the workspace level, but those are different metrics and do not apply to monitoring the dedicated SQL pool.</p>
<p>If you’re not there already, navigate in the Azure Portal to your dedicated SQL pool. Select <strong>Diagnostic Settings</strong> from the <strong>Monitoring</strong> section.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1716523868312/8cf5210f-34bd-47d4-862e-712a5fd91af7.png" alt /></p>
<p>Next, click the <strong>Add diagnostic setting</strong> button.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1716523869331/d1b65062-0731-49ec-96c3-7b0941c6bcd3.png" alt /></p>
<p>Provide a name for this diagnostic setting. Check the box next to <strong>Send to Log Analytics workspace</strong>. Select the subscription and workspace to which the DMVs will be written. Select which DMVs you want to log. Finally, click Save.</p>
<p>Remember, the diagnostic settings each correspond to a different DMV that will be copied to your Log Analytics workspace. If you need additional DMVs you’ll need to roll your own auditing solution at this time.</p>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Diagnostic Setting Label</td><td>System DMV</td></tr>
</thead>
<tbody>
<tr>
<td>SqlRequests</td><td>sys.dm_pdw_sql_requests</td></tr>
<tr>
<td>RequestSteps</td><td>sys.dm_pdw_request_steps</td></tr>
<tr>
<td>ExecRequests</td><td>sys.dm_pdw_exec_requests</td></tr>
<tr>
<td>DmsWorkers</td><td>sys.dm_dms_workers</td></tr>
<tr>
<td>Waits</td><td>sys.dm_pdw_waits</td></tr>
</tbody>
</table>
</div><p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1716523870283/084c5054-34cd-414e-b85d-ad7754710cb7.png" alt /></p>
<h1 id="heading-reading-the-logs">Reading the Logs</h1>
<p>Data will take several minutes to show up in Log Analytics. I’ve had the initial population take up to 30 minutes. After that the data should show up much more quickly, within several minutes. Connect to your database and run a query or two to ensure we have some user queries show up in the logs.</p>
<p>For my example I have run the following query:<br />SELECT COUNT(*), GETDATE(), DB_NAME() FROM IMDB.stage_Person</p>
<p>To view the logs, you can navigate to your log analytics workspace, or select <strong>Logs</strong> from your dedicated SQL pool’s <strong>Monitoring</strong> section. This can be found directly beneath the Diagnostic Settings option used in the last section.</p>
<h2 id="heading-viewing-the-audit-log">Viewing the Audit Log</h2>
<p>Within the Log Analytics Workspace you will see a group of tables with the prefix Synapse* and at least one table prefixed with SQL*. More tables may appear if you are using auditing for Azure SQL Database. The audit setup in the first section of this post will be logged to the SQLSecurityAuditEvents table.</p>
<p>This sample query pulls a few of the relevant columns from the SQLSecurityAuditEvents table.</p>
<pre><code class="lang-sql">SQLSecurityAuditEvents
| project
    ResourceGroup,
    LogicalServerName,
    DatabaseName,
    EventTime,
    Category,
    ActionName,
    IsServerLevelAudit,
    Succeeded,
    tostring(SessionId),
    ClientIp,
    HostName,
    ServerPrincipalName,
    DurationMs,
    Statement,
    TenantId,
    _ResourceId
</code></pre>
<p>Here you can see the query I ran in the log with an action of BATCH COMPLETED.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1716523871260/d24e5814-2076-48f9-85f9-15cb9fcdc621.png" alt /></p>
<h2 id="heading-viewing-the-diagnostic-logs-dmvs">Viewing the Diagnostic Logs (DMVs)</h2>
<p>Where all audit events are logged to the same table, regardless of the event type, each of the DMVs enabled in Diagnostic settings gets logged to its own table. Expanding on the list of DMVs from earlier in the post we can see the Log Analytics table closely matches the diagnostic setting name with a prefix of SynapseSqlPool.</p>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Diagnostic Setting Label</td><td>System DMV</td><td>Log Analytics Table</td></tr>
</thead>
<tbody>
<tr>
<td>SqlRequests</td><td>sys.dm_pdw_sql_requests</td><td>SynapseSqlPoolSqlRequests</td></tr>
<tr>
<td>RequestSteps</td><td>sys.dm_pdw_request_steps</td><td>SynapseSqlPoolRequestSteps</td></tr>
<tr>
<td>ExecRequests</td><td>sys.dm_pdw_exec_requests</td><td>SynapseSqlPoolExecRequests</td></tr>
<tr>
<td>DmsWorkers</td><td>sys.dm_dms_workers</td><td>SynapseSqlPoolDmsWorkers</td></tr>
<tr>
<td>Waits</td><td>sys.dm_pdw_waits</td><td>SynapseSqlPoolWaits</td></tr>
</tbody>
</table>
</div><p>This sample query pulls a few of the relevant columns from the SynapseSqlPoolExecRequests table.</p>
<pre><code class="lang-sql">SynapseSqlPoolExecRequests
| summarize
    TimeGenerated = max(TimeGenerated),
    ResourceGroup = any(ResourceGroup),
    LogicalServerName = any(LogicalServerName),
    DatabaseId = any(toreal(DatabaseId)),
    StartTime = max(StartTime),
    EndCompileTime = max(EndCompileTime),
    Category = any(Category),
    Status = min(Status),
    Command = any(Command)
    by RequestId
</code></pre>
<p>Here you can see the query I ran in the SynapseSqlPoolExecRequests table.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1716523872414/d1c33065-85e9-4699-a207-7fbb8db97036.png" alt /></p>
<h1 id="heading-wrapping-it-up">Wrapping it up</h1>
<p>There is a lot more that can be accomplished with KQL queries in Log Analytics. You may even take the queries found in the accompanying <a target="_blank" href="https://bradleyschacht.com/log-analytics-with-dedicated-sql-pools-formerly-sql-dw/">Dedicate SQL Pools (Formerly SQL DW)</a> post and combine them to give a single view of all your dedicate SQL pools.</p>
<p>The key take learning from this walkthrough is:</p>
<ol>
<li>Auditing and Diagnostics are different options and enabled in different locations.</li>
<li>Auditing writes all the data to a single table no matter the action.</li>
<li>Diagnostics pull a few key DMVs from Synapse and write them to the log. Each DMV is written to its own table in Log Analytics.</li>
<li>Something not discussed is that there is the potential on highly used systems for some audit or diagnostic records could be missed.</li>
</ol>
]]></content:encoded></item><item><title><![CDATA[Log Analytics with Dedicated SQL Pools (Formerly SQL DW)]]></title><description><![CDATA[There are a lot of services in Azure. Way more than a few. What is something you want to do with all your services and applications? You want to monitor them. How do you do that? By looking at the logs that are produced. How do you capture and make s...]]></description><link>https://bradleyschacht.com/log-analytics-with-dedicated-sql-pools-formerly-sql-dw</link><guid isPermaLink="true">https://bradleyschacht.com/log-analytics-with-dedicated-sql-pools-formerly-sql-dw</guid><dc:creator><![CDATA[Bradley Schacht]]></dc:creator><pubDate>Mon, 01 Nov 2021 00:00:00 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1716516409920/q_IcUXJVN.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>There are a lot of services in Azure. Way more than a few. What is something you want to do with all your services and applications? You want to monitor them. How do you do that? By looking at the logs that are produced. How do you capture and make sense of thousands of log entries from hundreds of services for this really awesome solution you’ve been working on? Azure Monitor.</p>
<p>This post isn’t about Azure Monitor though. Not directly anyway.</p>
<p>Azure Synapse is no different than any other application or service in your environment. It serves a purpose in the architecture and as a result needs special attention from people with a specific skillset, in this case DBAs and developers. That’s where Azure Monitor comes into the picture and what we will be discussing today: how to make sense of the integration between auditing in Dedicated SQL Pools (Formerly SQL DW) and Azure Monitor. The focus will be on Dedicated SQL Pools (Formerly SQL DW) not Azure Synapse Analytics, but more on that in a moment.</p>
<p>First, those with an eye for detail will notice that this post has Log Analytics in the title and not Azure Monitor. That’s true. A very brief explanation so everyone has things straight in their heads. I put Log Analytics in the title because when you set up auditing or diagnostics the option is labeled “Send to Log Analytics workspace” not “Send to Azure Monitor” and I wanted people to be able to find this post. Azure Monitor is a set of functionalities for collecting and analyzing logs. There are some prebuilt integrations and visualizations with some Azure services like Key Vault or Storage Accounts Then there is Log Analytics, a functionality inside Azure Monitor which also happens to be where Synapse writes its logs, for storing and querying all log data. Queries are written in a Kusto Query Language or KQL.</p>
<p>Second, let’s be sure a few very important details are clear. When referring to Azure Synapse there are two distinct but very similar services. They are both the same Microsoft cloud MPP database engine, both cover the same general T-SQL surface area, both scale in terms of DWUs, both talk about “Azure Synapse Analytics” in the documentation and description, but one is deployed to an Azure SQL Server, and one is deployed to a Workspace.</p>
<p>We need to be very clear about this because they both operate slightly different. Again, this post is focused on Dedicated SQL Pools (Formerly SQL DW). An alternate post is available for <a target="_blank" href="https://bradleyschacht.com/log-analytics-with-azure-synapse-analytics/">Azure Synapse Analytics Dedicated SQL Pools (Gen 2)</a> if you need information about that deployment option.</p>
<p>Let’s get started.</p>
<h1 id="heading-what-are-my-options">What are my options?</h1>
<p>There are two types of logging that can be enabled: Auditing and Diagnostics. Auditing tracks a set of database events like queries and login attempts. Diagnostics will make copies of a specific set of system DMVs periodically which can be helpful because Synapse only stores a few thousand records depending on the DMV.</p>
<p>Everything from here assumes that you have a Log Analytics workspace created and a Dedicated SQL Pools (Formerly SQL DW) deployed.</p>
<h1 id="heading-enabling-auditing">Enabling Auditing</h1>
<p>Auditing can be enabled at the logical server level, which will cover all databases on the logical server automatically, or at the individual database level.</p>
<p>First, navigate to your logical SQL server or dedicated SQL pool in the Azure Portal. My screenshots will show the configuration from the dedicated SQL pool, but the same setting can be found under the label of Auditing at the logical SQL server level. From here, select Auditing from the Security section.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1716523875348/5ac27352-7a4e-4b5d-bd59-22f8d9f6885b.png" alt /></p>
<p>Next, toggle the <strong>Enable Azure SQL Auditing</strong> to the on position. Next, check the boxes for the locations where you would like the log to be written, in this example we are going to focus on Log Analytics. Select a log analytics workspace to which the data will be written. Click <strong>Save</strong> once complete.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1716523876526/2696f7cf-0e1a-4867-8fe2-08af6942b134.png" alt /></p>
<h1 id="heading-enabling-diagnostics">Enabling Diagnostics</h1>
<p>While Auditing can be turned on for a database by either enabling it at the logical SQL server or database level, diagnostics must be enabled at the database level. There are no diagnostics at the logical SQL server level.</p>
<p>If you’re not there already, navigate in the Azure Portal to your dedicated SQL pool. Select <strong>Diagnostic Settings</strong> from the <strong>Monitoring</strong> section.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1716523877435/e64419bf-aaeb-4847-b410-4893f475026d.png" alt /></p>
<p>Next, click the <strong>Add diagnostic setting</strong> button.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1716523878538/85fa8856-a944-488e-9b80-784b5c18d4b2.png" alt /></p>
<p>Provide a name for this diagnostic setting. Check the box next to <strong>Send to Log Analytics workspace</strong>. Select the subscription and workspace to which the DMVs will be written. Select which DMVs you want to log. There is an additional SQLSecurityAuditEvents entry in my screenshot which is there because I already enabled the database audit. We'll ignore that for the time being. Finally, click Save.</p>
<p>Remember, the diagnostic settings each correspond to a different DMV that will be copied to your Log Analytics workspace. If you need additional DMVs you’ll need to roll your own auditing solution at this time.</p>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Diagnostic Setting Label</td><td>System DMV</td></tr>
</thead>
<tbody>
<tr>
<td>SqlRequests</td><td>sys.dm_pdw_sql_requests</td></tr>
<tr>
<td>RequestSteps</td><td>sys.dm_pdw_request_steps</td></tr>
<tr>
<td>ExecRequests</td><td>sys.dm_pdw_exec_requests</td></tr>
<tr>
<td>DmsWorkers</td><td>sys.dm_dms_workers</td></tr>
<tr>
<td>Waits</td><td>sys.dm_pdw_waits</td></tr>
</tbody>
</table>
</div><p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1716523879685/e0ecdf1d-7b24-4ca6-811d-7965b76e0539.png" alt /></p>
<h1 id="heading-reading-the-logs">Reading the Logs</h1>
<p>Data will take several minutes to show up in Log Analytics. I’ve had the initial population take up to 30 minutes. After that the data should show up much more quickly, within several minutes. Connect to your database and run a query or two to ensure we have some user queries show up in the logs.</p>
<p>For my example I have run the following query:<br />SELECT *, GETDATE(), DB_NAME() FROM dbo.MyTable</p>
<p>To view the logs, you can navigate to your log analytics workspace, or select <strong>Logs</strong> from your dedicated SQL pool’s <strong>Monitoring</strong> section. This can be found directly beneath the Diagnostic Settings option used in the last section.</p>
<h2 id="heading-viewing-the-audit-log">Viewing the Audit Log</h2>
<p>Within the Log Analytics Workspace you may see a group of tables with the prefix Synapse*. These tables correspond to diagnostic settings configured on the Synapse Analytics Workspaces. If you do not have any Synapse Workspaces these tables will likely not show up. More tables may appear if you are using auditing for Azure SQL Database. The audit setup in the first section of this post will be logged to the AzureDiagnostics table.</p>
<p>This sample query pulls a few of the relevant columns from the AzureDiagnostics table by narrowing down the data to the SQLSecurityAuditEvents category which is used for the Audit log. For Synapse Workspace Dedicated SQL Pools, as opposed to the "formerly SQL DW" SQL Pools that are the focus of this post, the data is written to a separate table called SQLSecurityAuditEvents rather than being a category of log within the AzureDiagnostics table.</p>
<pre><code class="lang-sql">AzureDiagnostics
| where Category == "SQLSecurityAuditEvents"
| project
        ResourceGroup,
        LogicalServerName_s,
        database_name_s,
        event_time_t,
        Category,
        action_name_s,
        tobool(is_server_level_audit_s),
        tobool(succeeded_s),
        tostring(session_id_d),
        client_ip_s,
        host_name_s,
        server_principal_name_s,
        tolong(duration_milliseconds_d),
        statement_s,
        TenantId,
        _ResourceId
| project-<span class="hljs-keyword">rename</span>
        ResourceGroup,
        LogicalServerName=LogicalServerName_s,
        DatabaseName=database_name_s,
        EventTime=event_time_t,
        <span class="hljs-keyword">Category</span>,
        ActionName=action_name_s,
        IsServerLevelAudit=is_server_level_audit_s,
        Succeeded=succeeded_s,
        SessionId=session_id_d,
        ClientIp=client_ip_s,
        HostName=host_name_s,
        ServerPrincipalName=server_principal_name_s,
        DurationMs=duration_milliseconds_d,
        <span class="hljs-keyword">Statement</span>=statement_s,
        TenantId,
        _ResourceId
</code></pre>
<p>Here you can see the query I ran in the log with an action of BATCH COMPLETED.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1716523880884/aaffd69d-e566-4c04-bdf6-b3223a12197e.png" alt /></p>
<h2 id="heading-viewing-the-diagnostic-logs-dmvs">Viewing the Diagnostic Logs (DMVs)</h2>
<p>Where Synapse Workspaces write DMV data to separate tables, for Dedicated SQL Pools (Formerly SQL DW) each of the DMVs enabled in Diagnostic settings gets logged to the same AzureDiagnostics table used in the Auditing section above, but with a category that corresponds to the diagnostic setting/System DMV.</p>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Diagnostic Setting Label</td><td>System DMV</td><td>Log Analytics AzureDiagnostics Category</td></tr>
</thead>
<tbody>
<tr>
<td>SqlRequests</td><td>sys.dm_pdw_sql_requests</td><td>SqlRequests</td></tr>
<tr>
<td>RequestSteps</td><td>sys.dm_pdw_request_steps</td><td>RequestSteps</td></tr>
<tr>
<td>ExecRequests</td><td>sys.dm_pdw_exec_requests</td><td>ExecRequests</td></tr>
<tr>
<td>DmsWorkers</td><td>sys.dm_dms_workers</td><td>DmsWorkers</td></tr>
<tr>
<td>Waits</td><td>sys.dm_pdw_waits</td><td>Waits</td></tr>
</tbody>
</table>
</div><p>This sample query pulls a few of the relevant columns from the AzureDiagnostics table by narrowing the data down to the ExecRequests Category. This data corresponds to the system DMV sys.dm_pdw_exec_requests.</p>
<pre><code class="lang-sql">AzureDiagnostics
| where Category == "ExecRequests"
| project
    ResourceGroup,
    LogicalServerName_s,
    DatabaseId_d,
    StartTime_t,
    EndCompileTime_t,
    Category,
    Status_s,
    RequestId_s,
    Command_s
| summarize
    ResourceGroup = any(ResourceGroup),
    LogicalServerName = any(LogicalServerName_s),
    DatabaseId = any(DatabaseId_d),
    StartTime = max(StartTime_t),
    EndCompileTime = max(EndCompileTime_t),
    Category = any(Category),
    Status = min(Status_s),
    Command = any(Command_s),
    record_count = count()
    by RequestId_s
</code></pre>
<p>Here you can see the query I ran focused on the ExecRequests category.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1716523881910/3a5366df-4a1c-406c-ab2d-8936df9bed40.png" alt /></p>
<h1 id="heading-wrapping-it-up">Wrapping it up</h1>
<p>There is a lot more that can be accomplished with KQL queries in Log Analytics. You may even take the queries found in the accompanying <a target="_blank" href="https://bradleyschacht.com/log-analytics-with-azure-synapse-analytics/">Azure Synapse Analytics Dedicated SQL Pools (Gen 2)</a> post and combine them to give a single view of all your dedicate SQL pools.</p>
<p>The key take learning from this walkthrough is:</p>
<ol>
<li>Auditing and Diagnostics are different options and enabled in different locations.</li>
<li>Auditing writes all the data to a single table no matter the action.</li>
<li>Diagnostics pull a few key DMVs from Synapse and write them to the log. Each DMV is written to the same table in Log Analytics each with a different category.</li>
<li>Something not discussed is that there is the potential on highly used systems for some audit or diagnostic records could be missed.</li>
</ol>
]]></content:encoded></item><item><title><![CDATA[How to Resolve Remove-AzKeyVaultKey : Operation returned an invalid status code 'Forbidden' Error]]></title><description><![CDATA[I've come across another error message in my seemingly never-ending battle with Azure Key Vault. A while back I couldn't delete a resource group because of Key Vault soft-delete. Then I couldn't recreate a Key Vault with the same name again because o...]]></description><link>https://bradleyschacht.com/how-to-resolve-remove-azkeyvaultkey-operation-returned-an-invalid-status-code-forbidden-error</link><guid isPermaLink="true">https://bradleyschacht.com/how-to-resolve-remove-azkeyvaultkey-operation-returned-an-invalid-status-code-forbidden-error</guid><dc:creator><![CDATA[Bradley Schacht]]></dc:creator><pubDate>Tue, 28 Jan 2020 00:00:00 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1716516409920/q_IcUXJVN.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>I've come across another error message in my seemingly never-ending battle with Azure Key Vault. A while back I couldn't delete a resource group because of Key Vault soft-delete. Then I couldn't recreate a Key Vault with the same name again because of soft-delete. Which brings us to today's post where I had to go to PowerShell and purge a key that but ran into an issue because of soft-delete.</p>
<p>Maybe my issue isn't with Key Vault so much as it is with soft-delete.</p>
<p>While doing some testing for <a target="_blank" href="https://bradleyschacht.com/bring-your-own-key-to-azure-sql-database-managed-instance-tde">TDE protectors on Managed Instance</a> I had to remove a key and recreate it for part of my test setup. The key I deleted earlier in the day was named ManagedInstance. So when I went to create a new key called ManagedInstance in the Azure Portal I received an error message. No big deal, I had this one in the bag. Go to PowerShell, find the key, remove the key, go back to the portal and try again. Unfortunately, I hit a snag. Let's walk through the code and see where I went wrong.</p>
<p>To remove a soft-deleted key you must first locate the key's name or Id. To do this we will use the Get-AzKeyVaultKey cmdlet with the parameter -InRemovedState to show all the deleted, but not really deleted (aka soft-deleted) keys.</p>
<pre><code class="lang-powershell"><span class="hljs-built_in">Get-AzKeyVaultKey</span> <span class="hljs-literal">-VaultName</span> <span class="hljs-string">"bschacht-kv"</span> <span class="hljs-literal">-InRemovedState</span>
</code></pre>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1716523860816/9ab6cdba-070e-4c71-8c27-1de240bd1fa9.png" alt /></p>
<p>Make note of the name or Id and move on to the next cmdlet and we will be done...or we should have been done anyway. We will use Remove-AzKeyVaultKey, specify the key name to be removed, make note that this key is in a removed state and run the cmdlet. Except here is where I ran into the somewhat unhelpful error message Remove-AzKeyVaultKey : Operation returned an invalid status code 'Forbidden'.</p>
<pre><code class="lang-powershell"><span class="hljs-built_in">Remove-AzKeyVaultKey</span> <span class="hljs-literal">-VaultName</span> <span class="hljs-string">"bschacht-kv"</span> <span class="hljs-literal">-Name</span> <span class="hljs-string">"ManagedInstance"</span> <span class="hljs-literal">-InRemovedState</span>
</code></pre>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1716523861606/a82761d7-6b45-4e33-ab4a-9d9c6f3bb8dd.png" alt /></p>
<p>It turns out this error message stems from the user running the command (myself in this case) doesn't not have purge permissions explicitly granted in the Key Vault. Running this cmdlet requires the purge permission, no inherited RBAC or explicitly granted RBAC roles no matter how high up the food chain (Contributor, Owner, etc.) will give you the ability to remove a soft-deleted key. You must go to the Azure Portal, navigate to your Key Vault, go to Access Policies, and grant the user running this cmdlet the Purge permission for keys which is in a special section on the dropdown menu called "Privileged Key Operations".</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1716523862686/6d6affd0-851f-4968-94a2-eeb672a3c8e0.png" alt /></p>
<p>Now, let's run the Remove-AzKeyVaultKey cmdlet again, verify that we are sure we want to remove the key permanently, and no response means success!</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1716523863559/e249e8a3-f56e-4466-a8ef-5981626011a8.png" alt /></p>
<p>There you have it. In the end it was a very simple fix. Unfortunately that error message seems to be less than well documented. Thankfully, if you go look at the <a target="_blank" href="https://docs.microsoft.com/en-us/powershell/module/az.keyvault/remove-azkeyvaultkey">documentation for the cmdlet</a> under example 3 (which, let's be honest, who does that unless they run into a problem and have to go looking for an answer) you will in fact see the purge permission noted. Too bad I don't read instructions. I just go run things and see what happens. It's a great way to learn...as long as you aren't doing it in production.</p>
]]></content:encoded></item><item><title><![CDATA[Bring Your Own Key to Azure SQL Database Managed Instance TDE]]></title><description><![CDATA[Last year Azure SQL Database Managed Instance saw the introduction of bring your own key (BYOK) functionality for transparent data encryption (TDE). This functionality has been in the singleton database version of Azure SQL Database for a while longe...]]></description><link>https://bradleyschacht.com/bring-your-own-key-to-azure-sql-database-managed-instance-tde</link><guid isPermaLink="true">https://bradleyschacht.com/bring-your-own-key-to-azure-sql-database-managed-instance-tde</guid><dc:creator><![CDATA[Bradley Schacht]]></dc:creator><pubDate>Fri, 24 Jan 2020 00:00:00 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1716516409920/q_IcUXJVN.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Last year Azure SQL Database Managed Instance saw the introduction of bring your own key (BYOK) functionality for transparent data encryption (TDE). This functionality has been in the singleton database version of Azure SQL Database for a while longer and you can <a target="_blank" href="https://bradleyschacht.com/bring-your-own-key-to-azure-sql-database-tde-new-ui/">read about how to use that here</a>. The experience between the two is very similar, but let's focus on the Managed Instance side of things today.</p>
<p>A few prerequisites for today:</p>
<ul>
<li>Azure SQL Managed Instance (any service tier, any size)</li>
<li>A database (any user database will do, even an empty one)</li>
<li>Azure Key Vault (with soft-delete enabled)</li>
<li>Storage Account (optional, if you want to test the backup/restore process)</li>
</ul>
<h2 id="heading-switching-to-bring-your-own-key">Switching to Bring Your Own Key</h2>
<p>Bringing your own key for TDE is actually not a requirement to use Managed Instance. By default, the service will manage its own TDE key. However, that comes with a few limitations. The main one being you cannot create user initiated backups when using the service-managed key. The built-in backup/restore functionality still works just fine with service-managed keys though.</p>
<p>The key settings can be found by navigating the <a target="_blank" href="http://portal.azure.com">Azure Portal</a> to your Managed Instance and clicking on the <strong>Transparent Data Encryption</strong> option in the service navigation panel. Then change the</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1716523841965/85d70fdf-be16-4688-8ba1-9b9ebbd6f3f0.png" alt /></p>
<p>The next step is to identify the key that will be used for encrypting the databases. This can be done in one of two ways: choosing a Key Vault and Key from inside the tenant or specifying a key identifier URL for a particular key. The second option is particularly useful if the user doing the configuration does not have access to the list of Key Vaults/Keys but is given a key identifier that should be used. The two configuration options can be seen in the following screenshot.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1716523842958/a1816966-057e-4e8e-8334-c3107e8cf604.png" alt /></p>
<h4 id="heading-important-notes">Important Notes</h4>
<p>A few important pieces of information to note on from these settings.</p>
<p>First, the Managed Instance must be granted access through a Key Vault access policy. If that has not already been completed an attempt will be made to add the Managed Instance with the appropriate permissions (Get, Wrap Key, Unwrap Key). The access policy is defined and enforced at the vault level, not the individual key, secret, or certificate level. So an instance with Get access to one key in this vault will have Get access to all the keys in this vault.</p>
<p>The second piece of important information is the check box labeled <strong>Make the selected key the default TDE protector.</strong> By checking that box, all databases will be encrypted using the selected key. The situation may arise where you would like to simply restore a database protected with a particular key, but you do not wish to encrypt the databases on the server with that key. In that scenario, you will want to add the key to the Managed Instance, but NOT select the option to make it the default TDE protector.</p>
<p>Next, the Managed Instance and Key Vault must be in the same Active Directory tenant. At this time it is not supported to backup keys from one subscription and move them to another subscription, the keys must remain in the same subscription but can be moved to another vault someplace else in the same geography (moving between Key Vaults in North America that reside in the same subscription is supported, moving between a Key Vault in North America and one in Europe is not supported even if they are in the same subscription.</p>
<p>Finally, changing the default TDE protector does not encrypt existing backups with the new key. So it is important for business continuity that you retain the prior TDE key(s) in order to be able to restore backups. If you switch from customer-managed key back to service-managed key you must retain the copy of your key in Key Vault at least until the system's point-in-time restore window has rolled off for the customer-managed keys (i.e. retain the customer-managed key for 7-35 days depending on your configuration after switching back to service-managed key).</p>
<p>There is one and only one key that will protect the databases on an instance.</p>
<ul>
<li>Key marked as "Make the selected key the default TDE protector."<ul>
<li>Can restore databases encrypted with this key.</li>
<li>Will be the key used to encrypt the user databases on the instance.</li>
<li>A database restored with this key will also be encrypted with this key.</li>
</ul>
</li>
<li>Key NOT marked as "Make the selected key the default TDE protector."<ul>
<li>Can restored databases encrypted with this key.</li>
<li>A database restored with this key will be encrypted with the key marked as the protector after the restore process is completed.</li>
</ul>
</li>
</ul>
<h2 id="heading-see-it-in-action-changing-to-customer-managed-key">See it in Action - Changing to Customer-Managed Key</h2>
<p>As a starting point I have a Managed Instance with a single database, TDETesting, that is encrypted with the service-managed key.</p>
<p>We can use a simple T-SQL query to get the encryptor thumbprint. This same query will help us identify when the key has changed.</p>
<pre><code class="lang-sql"><span class="hljs-keyword">SELECT</span>
    DB_NAME(database_id) <span class="hljs-keyword">AS</span> database_name,
    encryption_state,
    <span class="hljs-keyword">CASE</span> encryption_state
        <span class="hljs-keyword">WHEN</span> <span class="hljs-number">0</span> <span class="hljs-keyword">THEN</span> <span class="hljs-string">'No Encryption Key Present. Database Not Encrypted.'</span>
        <span class="hljs-keyword">WHEN</span> <span class="hljs-number">1</span> <span class="hljs-keyword">THEN</span> <span class="hljs-string">'Database Unencrypted'</span>
        <span class="hljs-keyword">WHEN</span> <span class="hljs-number">2</span> <span class="hljs-keyword">THEN</span> <span class="hljs-string">'Database Encryption in Progress'</span>
        <span class="hljs-keyword">WHEN</span> <span class="hljs-number">3</span> <span class="hljs-keyword">THEN</span> <span class="hljs-string">'Database Encrypted'</span>
        <span class="hljs-keyword">WHEN</span> <span class="hljs-number">4</span> <span class="hljs-keyword">THEN</span> <span class="hljs-string">'Encryption Key Change in Progress'</span>
        <span class="hljs-keyword">WHEN</span> <span class="hljs-number">5</span> <span class="hljs-keyword">THEN</span> <span class="hljs-string">'Database Decryption in Progress'</span>
        <span class="hljs-keyword">WHEN</span> <span class="hljs-number">6</span> <span class="hljs-keyword">THEN</span> <span class="hljs-string">'Certificate or Key Change in Progress'</span>
        <span class="hljs-keyword">ELSE</span> <span class="hljs-string">'Unknown Status'</span>
        <span class="hljs-keyword">END</span> <span class="hljs-keyword">AS</span> encryption_state_descriptoin,
    encryptor_thumbprint
<span class="hljs-keyword">FROM</span> sys.dm_database_encryption_keys
</code></pre>
<p>We can see that based on the query results the current thumbprint is 0x051082BA88D2882F551C530B74EB6F4380843029.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1716523843749/327c85f8-aa02-46c1-81a2-67337266040e.png" alt /><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1716523844532/113ab12e-5a78-4fa5-af5f-b11a3fdeaaf4.png" alt /></p>
<p>Before the switching to the customer-managed key, this is what the access policy settings look like on my Key Vault. The only user that has rights to do anything is myself.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1716523845506/7cb1042d-2cec-4f15-8a46-b9932afc57d6.png" alt /></p>
<p>From my Managed Instance I will switch to customer-managed key.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1716523846270/ae4700b1-29ae-438f-8f62-1a90f95ae2bf.png" alt /></p>
<p>Select my Key Vault.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1716523847203/e410baff-28e4-44b0-919a-71770f236859.png" alt /></p>
<p>And on the select a key blade, I will choose create a new key as I do not already have a key to use.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1716523848172/0cdc58c5-136d-4826-8e0e-bf44b567a8e7.png" alt /></p>
<p>I will keep the option on generate to create a key and provide a name. I'm going to use ManagedInstance, but there is not a specific name that is required. Then Click create.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1716523849005/770ad06c-0770-4133-8881-f51f350ab03a.png" alt /></p>
<p>With all the options configured click Save.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1716523849971/c51bef7e-921d-40ef-9948-f0cf7394577c.png" alt /></p>
<p>After the settings are applied we can see the change in access policy on the Key Vault has added my Managed Instance with Get, Wrap, and Unwrap permissions.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1716523850939/f8d5c8f3-c44a-40fd-86c2-576c07d1a026.png" alt /></p>
<p>Additionally, running our T-SQL script we can see the encryptor thumbprint has now changed from 0x051082BA88D2882F551C530B74EB6F4380843029 to 0x71ABFFF1EAA10687BD43878C75E7F2D1744E285C.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1716523851933/614e1947-3338-41a7-8f90-df85d1fb6da7.png" alt /></p>
<p>With the customer-managed key in place, we can now run a create a user backup of the database.</p>
<pre><code class="lang-sql"><span class="hljs-keyword">BACKUP</span> <span class="hljs-keyword">DATABASE</span> TDETesting <span class="hljs-keyword">TO</span> <span class="hljs-keyword">URL</span> = <span class="hljs-string">'https://bschachtstor.blob.core.windows.net/backup/TDETesting.bak'</span> <span class="hljs-keyword">WITH</span> COPY_ONLY
</code></pre>
<p>If we then look at the files in that backup, we will see that it is encrypted with the same thumbprint as our database showed after switching to customer-managed keys.</p>
<pre><code class="lang-sql"><span class="hljs-keyword">RESTORE</span> FILELISTONLY <span class="hljs-keyword">FROM</span> <span class="hljs-keyword">URL</span> = <span class="hljs-string">'https://bschachtstor.blob.core.windows.net/backup/TDETesting.bak'</span>
</code></pre>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1716523852562/ea4121d3-6bba-43f7-b016-34023491d693.png" alt /></p>
<h2 id="heading-see-it-in-action-restoring-a-backup">See it in Action - Restoring a Backup</h2>
<p>What happens when we need to restore a database that has been backed up under a different key? In this particular case I have a backup that was created under a different customer-managed key. This screenshot shows a different thumbprint than the system or current customer-managed key, 0x190256228610BBE409C0345597D49ABB5A40EFA6.</p>
<p>System key thumbprint: 0x051082BA88D2882F551C530B74EB6F4380843029<br />Current key thumbprint: 0x71ABFFF1EAA10687BD43878C75E7F2D1744E285C<br />Backup key thumbprint: 0x190256228610BBE409C0345597D49ABB5A40EFA6</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1716523853514/2b31ebd4-8a43-4209-8360-e82e2a4e592f.png" alt /></p>
<p>In the Azure Portal, I changed the key to the new ManagedInstance-AlternateKey that I added to my Key Vault. However, I did not choose the <strong>Make the selected key the default TDE protector</strong> option as I want to continue to encrypt my databases with the key used previously in this example but restore databases that are encrypted with the ManagedInstance-AlternateKey.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1716523854303/92fe2e0d-7b8f-4bae-ad40-4b59c56fda7d.png" alt /></p>
<p>After saving the change I can now run the restore of my database that is encrypted with the ManagedInstance-AlternateKey (thumbprint 0x190256228610BBE409C0345597D49ABB5A40EFA6). After the restore completes we can see that the database encryptor has been switched to the thumbprint 0x71ABFFF1EAA10687BD43878C75E7F2D1744E285C that is used on my existing database. As mentioned previously, the current implementation is such that all databases are encrypted with the same key. So while the key ending in EFA6 is required for the restore process the database is then encrypted with the TDE protector key ending in 285C.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1716523855264/3f65f9f6-139a-48c0-8cdb-da9c93d3009d.png" alt /></p>
<h2 id="heading-what-keys-are-associated-with-my-managed-instance">What Keys Are Associated with my Managed Instance?</h2>
<p>We have seen how we can assign a customer-managed key to Managed Instance, add an additional key for restore purposes, how to see which key is encrypting the database, and how to restore backups encrypted with non-TDE Protector keys. There can be confusion as to what keys are available though because the Azure Portal only shows the last key assigned. In our demo walkthrough that wasn't even the TDE Protector key. How then do we find what keys are available to our managed instance for restore commands and what key is the current TDE Protector?</p>
<p>For that we go to PowerShell.</p>
<p>Get-AzSqlInstanceKeyVaultKey will give a list of all keys currently associated with your managed instance. In order for a backup file to be restored to your Managed Instance the key with which the backup was encrypted must be listed. In my example there is a service-managed key and two customer-managed keys with the thumbprints from earlier in this post.</p>
<pre><code class="lang-powershell"><span class="hljs-built_in">Get-AzSqlInstanceKeyVaultKey</span> <span class="hljs-literal">-ResourceGroupName</span> <span class="hljs-string">"ManagedInstance-RG"</span> <span class="hljs-literal">-InstanceName</span> <span class="hljs-string">"bschacht-sqlmi01"</span>
</code></pre>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1716523856111/1101efb7-e2f1-418b-9ee9-3cf0ba3ea29e.png" alt /></p>
<p>However, there is no indication of the current TDE Protector. For that we will go to a different PowerShell cmdlet, Get-AzSqlInstanceTransparentDataEncryptionProtector.</p>
<pre><code class="lang-powershell"><span class="hljs-built_in">Get-AzSqlInstanceTransparentDataEncryptionProtector</span> <span class="hljs-literal">-ResourceGroupName</span> <span class="hljs-string">"ManagedInstance-RG"</span> <span class="hljs-literal">-InstanceName</span> <span class="hljs-string">"bschacht-sqlmi01"</span>
</code></pre>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1716523857223/e5cde7d1-8401-4228-912a-5c6d45b82eee.png" alt /></p>
<p>Removing a key is also a simple operation with the cmdlet Remove-AzSqlInstanceKeyVaultKey.</p>
<pre><code class="lang-powershell"><span class="hljs-built_in">Remove-AzSqlInstanceKeyVaultKey</span> <span class="hljs-literal">-ResourceGroupName</span> <span class="hljs-string">"ManagedInstance-RG"</span> <span class="hljs-literal">-InstanceName</span> <span class="hljs-string">"bschacht-sqlmi01"</span> <span class="hljs-literal">-KeyId</span> https://bschacht<span class="hljs-literal">-kv</span>.vault.azure.net/keys/ManagedInstance<span class="hljs-literal">-AlternateKey</span>/<span class="hljs-number">2603353</span>efae04fbab840385dd43a590b
</code></pre>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1716523857971/561ae0d6-9952-4727-9609-83430c2c3edb.png" alt /></p>
<h2 id="heading-wrapping-up">Wrapping Up</h2>
<p>What other questions do you have about using TDE with Azure SQL Database Managed Instance? Let me know in the comments and I'll see if I can work through some examples with you.</p>
]]></content:encoded></item></channel></rss>