Data Warehouse
112 TopicsDiscover the Future of Data Engineering with Microsoft Fabric for Technical Students & Entrepreneurs
Microsoft Fabric is an all-in-one analytics solution for enterprises that covers everything from data movement to data science, Real-Time Analytics, and business intelligence. It offers a comprehensive suite of services, including data lake, data engineering, and data integration, all in one place. This makes it an ideal platform for technical students and entrepreneurial developers looking to streamline their data engineering and analytics workflows.6.1KViews4likes1CommentMicrosoft creates industry standards for datacenter hardware storage and security
Today we are announcing a next-generation specification for solid state device (SSD) storage, Project Denali. We’re also discussing Project Cerberus, which provides a critical component for security protection that to date has been missing from server hardware: protection, detection and recovery from attacks on platform firmware. Both storage and security are the next frontiers for hardware innovation, and today we’re highlighting the latest advancements across these key focus areas to further the industry in enabling the future of the cloud. Storage paradigms have performed well on-premises, but they haven’t resulted in innovation for increasing performance and cost efficiencies needed for cloud-based models. For this reason, we’re setting out to define a new standard for flash storage specifically targeted for cloud-based workloads and I’m excited to reveal Project Denali, which we’re establishing with CNEX Labs. Fundamentally, Project Denali standardizes the SSD firmware interfaces by disaggregating the functionality for software-defined data layout and media management. With Project Denali, customers can achieve greater levels of performance, while leveraging the cost-reduction economics that come at cloud scale. Read about it in the Azure blog.896Views2likes0CommentsEnhance Azure SQL Data Warehouse performance with new monitoring functionality for Columnstore
Azure SQL Data Warehouse (SQL DW) is a SQL-based petabyte-scale, massively parallel, cloud solution for data warehousing. It is fully managed and highly elastic, enabling you to provision and scale capacity in minutes. You can scale compute and storage independently, allowing you to range from burst to archival scenarios. Azure SQL DW is powered by a Columnstore engine to provide super-fast performance for analytic workloads. This is the same Columnstore engine that is built into the industry leading SQL Server Database from Microsoft. To get full speed from the Azure SQL DW, it is important to maximize Columnstore Row Group quality. A row group is the chunk of rows that are compressed together in the Columnstore. In order to enable easier monitoring and tuning of row group quality we are now exposing a new Dynamic Management View (DMV). Read about it on the Azure blog.1.1KViews1like0CommentsUnderstanding the Differences Between SWITCHOFFSET and AT TIME ZONE in SQL Server
When working with date and time data in SQL Server, handling different time zones can be a critical aspect, especially for applications with a global user base. SQL Server provides two functions that can be used to handle time zone conversions: SWITCHOFFSET and AT TIME ZONE. Although they might seem similar at first glance, they have distinct differences in functionality and use cases. This article aims to elucidate these differences and help you decide which one to use based on your requirements. SWITCHOFFSET The SWITCHOFFSET function is used to change the time zone offset of a datetimeoffset value without changing the actual point in time that the value represents. Essentially, it shifts the time by the specified offset. Syntax SWITCHOFFSET (DATETIMEOFFSET, time_zone_offset) DATETIMEOFFSET: The date and time value with the time zone offset you want to change. time_zone_offset: The new time zone offset, in the format +HH:MM or -HH:MM. Example DECLARE @dt datetimeoffset = '2023-12-31 23:09:14.4600000 +01:00'; SELECT SWITCHOFFSET(@dt, '+00:00') AS UtcTime; In this example, SWITCHOFFSET converts the time to UTC by applying the +00:00 offset. AT TIME ZONE The AT TIME ZONE function is more advanced and versatile compared to SWITCHOFFSET. It converts a datetime or datetime2 value to a datetimeoffset value by applying the time zone conversion rules of the specified time zone. It can also be used to convert a datetimeoffset value to another time zone. Syntax DATETIME [AT TIME ZONE time_zone] DATETIME: The date and time value to be converted. time_zone: The target time zone name. Example DECLARE @dt datetimeoffset = '2023-12-31 23:09:14.4600000 +01:00'; SELECT @dt AT TIME ZONE 'UTC' AS UtcTime; In this example, AT TIME ZONE converts the datetimeoffset to the UTC time zone. Key Differences Functionality: SWITCHOFFSET only adjusts the time by the specified offset without considering daylight saving rules or historical time zone changes. AT TIME ZONE considers the full time zone conversion rules, including daylight saving changes, making it more accurate for real-world applications. Input and Output: SWITCHOFFSET works with datetimeoffset values and outputs a datetimeoffset value. AT TIME ZONE works with datetime, datetime2, and datetimeoffset values and outputs a datetimeoffset value. Use Cases: Use SWITCHOFFSET when you need a quick offset change without needing full time zone awareness. Use AT TIME ZONE when you need precise and accurate time zone conversions, especially when dealing with historical data and daylight saving time. Performance Considerations When working with large datasets, performance is a crucial aspect to consider. SWITCHOFFSET: Generally faster for simple offset changes as it performs a straightforward arithmetic operation. AT TIME ZONE: May incur additional overhead due to the complexity of applying time zone rules, but it provides accurate results for real-world time zone conversions. Example with a Large Dataset Suppose you have a Users table with 200,000 records, each having a CreatedDate column with datetimeoffset values in various time zones. Converting these to UTC using both methods can illustrate performance differences. -- Using SWITCHOFFSET SELECT COUNT(*) FROM Users WHERE CAST(SWITCHOFFSET(CreatedDate, '+00:00') AS date) = '2024-01-01'; -- Using AT TIME ZONE SELECT COUNT(*) FROM Users WHERE CONVERT(date, CreatedDate AT TIME ZONE 'UTC') = '2024-01-01'; In scenarios like this, benchmarking both methods on your specific dataset and SQL Server environment is advisable to understand the performance implications fully. CPU Times vs Total Duration Let's analyze the efficiency of the two alternatives (SWITCHOFFSET and AT TIME ZONE) when working with a table containing 200,000 records with different time zones in the datetimeoffset field named CreatedDate. Example Table Preparation First, create an example table Users with a CreatedDate field of type datetimeoffset and insert 200,000 records with different time zones. sql -- Create the example table CREATE TABLE Users ( UserID INT IDENTITY(1,1) PRIMARY KEY, CreatedDate DATETIMEOFFSET ); -- Insert 200,000 records with different time zones DECLARE @i INT = 1; WHILE @i <= 200000 BEGIN INSERT INTO Users (CreatedDate) VALUES (DATEADD(MINUTE, @i, SWITCHOFFSET(SYSDATETIMEOFFSET(), CONCAT('+', RIGHT('0' + CAST((@i % 24) AS VARCHAR(2)), 2), ':00')))); SET @i = @i + 1; END; Measuring Efficiency Now, measure the two alternatives for converting the CreatedDate field to UTC and then projecting it as date. Option 1: SWITCHOFFSET sql SET STATISTICS TIME ON; SELECT CAST(SWITCHOFFSET(CreatedDate, '+00:00') AS date) AS UTCDate FROM Users; SET STATISTICS TIME OFF; Option 2: AT TIME ZONE sql SET STATISTICS TIME ON; SELECT CONVERT(date, CreatedDate AT TIME ZONE 'UTC') AS UTCDate FROM Users; SET STATISTICS TIME OFF; Execution Plan and Timing Analysis After running both queries, compare the CPU times and the total duration reported by SET STATISTICS TIME ON to evaluate efficiency. Possible Efficiency Differences SWITCHOFFSET: SWITCHOFFSET is likely more efficient in this scenario because it performs a single operation to adjust the time zone and then projects it as date. This operation is done in a single step, which can reduce overhead. AT TIME ZONE: AT TIME ZONE might introduce a slight overhead because it first changes the time zone and then converts it to date. However, AT TIME ZONE is clearer and can handle multiple time zones more explicitly. Recommendation Although the real efficiency can depend on the specific environment and the detailed execution plan, generally, SWITCHOFFSET is expected to be more efficient for large datasets when only adjusting the time zone and projecting the date is required. Code for Testing in SQL Server sql -- Option 1: SWITCHOFFSET SET STATISTICS TIME ON; SELECT CAST(SWITCHOFFSET(CreatedDate, '+00:00') AS date) AS UTCDate FROM Users; SET STATISTICS TIME OFF; -- Option 2: AT TIME ZONE SET STATISTICS TIME ON; SELECT CONVERT(date, CreatedDate AT TIME ZONE 'UTC') AS UTCDate FROM Users; SET STATISTICS TIME OFF; Comparing Results CPU Times: Compare the CPU times reported by both queries. Total Duration: Compare the total duration of execution of both queries. Evaluating the results from the time statistics will help determine which option is more efficient for your specific case. Additional Considerations Indexes: Ensure that the CreatedDate column is indexed if large volumes of data are expected to be read. Parallelism: SQL Server can handle the query in parallel to improve performance, but parallelism settings might affect the results. Real-World Workload: Conduct tests in an environment as close to production as possible to obtain more accurate results. Conclusion Choosing between SWITCHOFFSET and AT TIME ZONE depends on your specific needs: Use SWITCHOFFSET for simple, quick offset changes where historical accuracy and daylight saving adjustments are not critical. Use AT TIME ZONE for comprehensive and accurate time zone conversions, especially in applications dealing with users across multiple time zones and needing historical accuracy. Understanding these differences will help you make informed decisions in your SQL Server applications, ensuring both performance and accuracy in your date and time data handling.1.3KViews1like1CommentIs the use of NVARCHAR(MAX) acceptable for this scenario?
Hi, I'm using Azure Data Factory (ADF) Pipelines to ingest several external datasources into our domain. As you'll know, I need to create a target database table for each source table I'm looking to ingest. Upon creating the target tables for the first time, I can look at the source tables field properties (datatype and size) and set those same properties for the target tables. My concern is if the source owners change the source field properties in the future and do not notify us (which is a strong possibility) then the ADF Pipelines may fail in which I will then investigate and resolve. There could be another scenario where source field A is 20 characters long, and so I set the target field to be 20 characters long, but then the source field is amended to be 50 characters long, and I'm not informed. This may not break the ADF Pipeline execution but simply truncate the data. An approach would be to set each target field to be NVARCHAR( MAX). This would (should) avoid the situation above. I'm a one-person band on all things data at my company and am attempting to juggle many things in the best way I can. I feel the NVARCHAR(MAX) approach would reduce headaches and needed time to investigate. Questions: 1) I understand it isn't good practice to use NVARCHAR(MAX) unless needed, but would you advise it acceptable with what I'm concerned about? 2) I understand using NVARCHAR(MAX) takes up more database size (allows for 2Gb) but if we're not actually filling that 2Gb up does it cost more in Azure SQL Server? The volume of data we have isn't great but I am still considering the costs involved. TLDR: Would setting target fields as NVARCHAR(MAX) increase Azure SQL Server costs even if the fields only contain 20 characters? Thanks in advance.Solved2.4KViews1like2CommentsHelp Needed - updating rows based on rows of different category
Hi all, Long time lurker, first time poster. I've been working with SSMS for about 4 years now, designing and deploying a db to host info from a front-end PowerApp. I learnt myself and it's probably a shitshow of design, but it works! I have a question. In the db each person has their own unique row for each financial year. I need to update rows with the financial year of 2024/25, with whatever is in there for the same rows in 2023/24. Just one column. So, as an example, I need to update column x where row year = 2024/25 with the corresponding data in rows where year = 2023/24. I could brute force by adding a clause of WHERE ID = '1' OR WHERE ID = '2' for all the relevant rows, but i was wondering if there is a more elegant way?253Views1like0CommentsNeed Help with sql code
Table1 code description A000 fever A001 stomach virus Table2 MEMBER_NBR code1 code2 12368 A000 NULL 2476 NULL A001 I need to join these 2 tables. the final result will be in the form of Result table Code1 description1 code2 description2 A000 fever NULL null NULL null A001 stomach virus1.2KViews1like2CommentsFlashback 2016 – Highlights from Azure SQL Data Warehouse
As 2016 ends, here are some of the highlights from a memorable year! General availability now across 23 regions world wide Industry Leading Performance for Analytic Queries Fast Loading with ADF and PolyBase Enhanced migration, monitoring, and SQL tooling experience Accelerated look up queries Easy integration with Azure Active Directory authentication and other services within Azure HIIPA Certification New products from Partners for easy experience Exclusive free trial Read more on the Azure Blog.1.4KViews1like0CommentsAzure SQL Data Warehouse now supports automatic creation of statistics
We are pleased to announce that Azure SQL Data Warehouse (Azure SQL DW) now supports automatic creation of column level statistics. Azure SQL DW is a fast, flexible, and secure analytics platform for the enterprise. Modern systems such as Azure SQL DW, rely on cost-based optimizers to generate quality execution plans for user queries. Even though Azure SQL DW implements a cost-based optimizer, the system relies on developers and administrators to create statistics objects manually. When all queries are known in advance, determining what statistics objects need to be created is an achievable task. However, when the system is faced with ad-hoc and random queries which is typical for the data warehousing workloads, system administrators may struggle to predict what statistics need to be created leading to potentially suboptimal query execution plans and longer query response times. One way to mitigate this problem is to create statistics objects on all the table columns in advance. However, that process comes with a penalty as statistics objects need to be maintained during table loading process, causing longer loading times. Read about it in the Azure blog.918Views1like0Comments