azure sql database
146 TopicsLesson Learned #522: Troubleshooting TLS and Cipher Suites with Python connecting to Azure SQL DB
A few days ago, we were working on a service request where our customer was experiencing several issues connecting to Azure SQL Database due to TLS version and cipher suite mismatches when using Python and ODBC Driver 18. Although we were able to get that information through a network trace, I would like to share things that I learned. Using the library SSL in Python allows to establish a TLS/SSL context where I can control the TLS version and specify or inspect the cipher suite. Here’s a small script that demonstrates how to connect to the Azure SQL Gateway over port 1433 and inspect the TLS configuration: import ssl import socket #ServerName to connect (Only Gateway) host = 'servername.database.windows.net' port = 1433 # TLS context context = ssl.create_default_context() print("Python uses:", ssl.OPENSSL_VERSION) context.minimum_version = ssl.TLSVersion.TLSv1_2 context.maximum_version = ssl.TLSVersion.TLSv1_2 context.check_hostname = True context.verify_mode = ssl.CERT_REQUIRED context.load_default_certs() # Testing the connection. with socket.create_connection((host, port)) as sock: with context.wrap_socket(sock, server_hostname=host) as ssock: print("TLS connection established.") print("TLS version:", ssock.version()) print("Cipher suite:", ssock.cipher()) # CN (Common Name) cert = ssock.getpeercert() try: cn = dict(x[0] for x in cert['subject'])['commonName'] print(f"\n Certificate CN: {cn}") except Exception as e: print(" Error extracting CN:", e) print("Valid from :", cert.get('notBefore')) print("Valid until:", cert.get('notAfter')) Using this script I was able to: Enforce a specific TLS version by setting minimum_version and maximum_version , for example, (1.2 or 1.3) Retrieve the cipher suite negotiated. Inspect the details of the certificate. Enjoy!108Views0likes0CommentsLesson Learned #520: Troubleshooting Azure SQL Database Redirect Connection over Private Endpoint
A few days ago, we handled an interesting support case where a customer encountered the following connection error when using sqlcmd to connect to their Azure SQL Database "Sqlcmd: Error: Microsoft ODBC Driver 18 for SQL Server : Login timeout expired. Sqlcmd: Error: Microsoft ODBC Driver 18 for SQL Server : TCP Provider: Error code 0x102. Sqlcmd: Error: Microsoft ODBC Driver 18 for SQL Server : A network-related or instance-specific error has occurred while establishing a connection to servername.database.windows.net (Redirected: servername.database.windows.net\xxxx8165ccxxx,6188). Server is not found or not accessible. Check if instance name is correct and if SQL Server is configured to allow remote connections. For more information see SQL Server Books Online." At first glance, what immediately caught our attention was the port number mentioned in the error 6188. This is not the typical 1433 port that SQL Server usually listens on. Additionally, the message referenced a "Redirected" connection, which gave us the first strong clue. We asked the customer to run the following commands for diagnostics steps: ping servername.database.windows.net to identify the IP address resolved for the Azure SQL Database Server, returning a private IP: 10.1.0.200. nslookup servername.database.windows.net to confirm whether the resolution was happening through a custom DNS or public DNS. ifconfig -a to determine the local IP address of the client, which was 10.1.0.10 (our customer is using Linux environment - RedHat) With all this information in hand, we asked the customer to open a terminal on their Linux machine and execute sudo tcpdump -i eth0 host 10.1.0.200 meanwhile they are attempting to connect using another terminal with sqlcmd and we observed that the sqlcmd was: First making a request to the port 1433 that is expected And then immediately attempting a second connection to port 6188 on the same private IP. It was during this second connection attempt that the timeout occurred. After it, based on the port and the message we asked to our customer what type of connection has this server and they reported Redirect. We explained in Azure SQL, when Redirect mode is enabled, the client: Connects to the gateway on port 1433 Receives a redirection response with a new target IP and dynamic port (e.g., 6188) Attempts a new connection to the private endpoint using this port We reviewed the connection configuration and confirmed that Redirect mode was enabled. After speaking with the customer's networking and security team, we discovered that their firewall rules were blocking outbound connections to dynamic ports like 6188. We proposed two alternative solutions: Option 1: Adjust Firewall Rules Allow outbound traffic from the client’s IP (10.1.0.10) to the Private Endpoint IP (10.1.0.200) for the required range of ports used by Azure SQL in Redirect mode. This keeps the benefits of Redirect mode: Lower latency Direct database access via Private Link Reduced dependence on Azure Gateway Option 2: Switch to Proxy Mode Change the Azure SQL Server's connection policy to Proxy, which forces all traffic through port 1433 only. This is simpler for environments where security rules restrict dynamic port ranges, but it may introduce slightly higher latency. In this case, the customer opted to update the VNet's NSG and outbound firewall rules to allow the necessary range of ports for the Private Endpoint. This allowed them to retain the benefits of Redirect mode while maintaining secure, high-performance connectivity to their Azure SQL Database.277Views0likes0CommentsLesson Learned #479:Loading Data from Parquet to Azure SQL Database using C# and SqlBulkCopy
In the realm of big data and cloud computing, efficiently managing and transferring data between different platforms and formats is paramount. Azure SQL Database, a fully managed relational database service by Microsoft, offers robust capabilities for handling large volumes of data. However, when it comes to importing data from Parquet files, a popular columnar storage format, Azure SQL Database's native BULK INSERT command does not directly support this format. This article presents a practical solution using a C# console application to bridge this gap, leveraging the Microsoft.Data.SqlClient.SqlBulkCopy class for high-performance bulk data loading.Lesson Learned #519: Reusing Connections in Azure SQL DB: How Connection Pooling Caches Your Session
A few days ago, I was working on a case where a customer reported an unexpected behavior in their application: even after switching the connection policy from Proxy to Redirect, the connections were still using Proxy mode. After investigating, we found that the customer was using connection pooling, which caches connections for reuse. This meant that even after changing the connection policy, the existing connections continued using Proxy mode because they had already been established with that setting. The new policy would only apply to newly created connections, not the ones being reused from the pool. To confirm this, we ran a test using .NET and Microsoft.Data.SqlClient to analyze how the connection pool behaves and whether connections actually switch to Redirect mode when the policy changes. How Connection Pooling Works Connection pooling is designed to reuse existing database connections instead of creating a new one for every request. This improves performance by reducing latency and avoiding unnecessary authentication handshakes. However, once a connection is established, it is cached with the original settings, including: Connection policy (Proxy or Redirect) Authentication mode Connection encryption settings This means that if you change the connection policy but reuse a pooled connection, it will retain its original mode. The only way to apply the new policy is to create a new physical connection that does not come from the pool. Testing Connection Pooling Behavior For Testing the connection pooling behavior, I developed this small code in C# that basically, opens the connection, provides information about the port using and close the connection. Repeating this process 10000 times. The idea was to track active connections and check if the port and connection policy were changing after modifying the connection policy. Initially, I attemped to use netstat -ano to track active connections and monitor the local port used by each session. Unfortunately, in Azure SQL Database, local port information is not reported, making it difficult to confirm whether a connection was truly being reused at the OS level. Despite this limitation, by analyzing the session behavior and connection reuse patterns, we were able to reach a clear conclusion. using System; using System.Collections.Generic; using System.Linq; using System.Text; using System.Threading; using System.Threading.Tasks; using Microsoft.Data.SqlClient; namespace InfoConn { using System; using System.Data; using System.Diagnostics; using System.Text.RegularExpressions; using System.Threading; using Microsoft.Data.SqlClient; class Program { static void Main() { string connectionStringProxy = "Server=tcp:servername.database.windows.net,1433;Database=db1;User Id=user1;Password=..;Pooling=True;"; Console.WriteLine("Starting Connection Pooling Test"); for (int i = 0; i < 10000; i++) { using (SqlConnection conn = new SqlConnection(connectionStringProxy)) { conn.Open(); ShowConnectionDetails(conn, i); } Thread.Sleep(5000); } Console.WriteLine("Test complete."); } static void ShowConnectionDetails(SqlConnection conn, int attempt) { string query = "SELECT session_id, client_net_address, local_net_address, auth_scheme FROM sys.dm_exec_connections WHERE session_id = @@SPID;"; using (SqlCommand cmd = new SqlCommand(query, conn)) { using (SqlDataReader reader = cmd.ExecuteReader()) { while (reader.Read()) { Console.WriteLine($"[Attempt {attempt + 1}] Session ID: {reader["session_id"]}"); Console.WriteLine($"[Attempt {attempt + 1}] Client IP: {reader["client_net_address"]}"); Console.WriteLine($"[Attempt {attempt + 1}] Local IP: {reader["local_net_address"]}"); Console.WriteLine($"[Attempt {attempt + 1}] Auth Scheme: {reader["auth_scheme"]}"); } } } RetrievePortInformation(attempt); } static void RetrievePortInformation(int attempt) { try { int currentProcessId = Process.GetCurrentProcess().Id; Console.WriteLine($"[Attempt {attempt + 1}] PID: {currentProcessId}"); string netstatOutput = RunNetstatCommand(); var match = Regex.Match(netstatOutput, $@"\s*TCP\s*(\S+):(\d+)\s*(\S+):(\d+)\s*ESTABLISHED\s*{currentProcessId}"); if (match.Success) { string localAddress = match.Groups[1].Value; string localPort = match.Groups[2].Value; string remoteAddress = match.Groups[3].Value; string remotePort = match.Groups[4].Value; Console.WriteLine($"[Attempt {attempt + 1}] Local IP: {localAddress}"); Console.WriteLine($"[Attempt {attempt + 1}] Local Port: {localPort}"); Console.WriteLine($"[Attempt {attempt + 1}] Remote IP: {remoteAddress}"); Console.WriteLine($"[Attempt {attempt + 1}] Remote Port: {remotePort}"); } else { Console.WriteLine($"[Attempt {attempt + 1}] No active TCP connection found in netstat."); } } catch (Exception ex) { Console.WriteLine($"[Attempt {attempt + 1}] Error retrieving port info: {ex.Message}"); } } static string RunNetstatCommand() { using (Process netstatProcess = new Process()) { netstatProcess.StartInfo.FileName = "netstat"; netstatProcess.StartInfo.Arguments = "-ano"; netstatProcess.StartInfo.RedirectStandardOutput = true; netstatProcess.StartInfo.UseShellExecute = false; netstatProcess.StartInfo.CreateNoWindow = true; netstatProcess.Start(); string output = netstatProcess.StandardOutput.ReadToEnd(); netstatProcess.WaitForExit(); return output; } } } }346Views0likes0CommentsLesson Learned #497:Understanding the Ordering of uniqueidentifier in SQL Server
Today, I worked on a service request that our customer asked about how SQL Server sorts the uniqueidentifier data type. We know that uniqueidentifier store globally unique identifiers (GUIDs). GUIDs are widely used for unique keys due to their extremely low probability of duplication. One common method to generate a GUID in SQL Server is by using the NEWID() function. However, the ordering of GUIDs, especially those generated by NEWID(), can appear non-intuitive. I would like to share my lessons learned how to determine the shorting method using uniqueidentifier and NEWID().3.4KViews0likes2CommentsLesson Learned #518:Configuring Database Watcher
Database Watcher was released in November 2024 and is a managed monitoring solution for databases. It is currently supported only for Azure SQL Database and Azure SQL Managed Instance. The setup process for both supported products is comparable. The below training video will feature setting up Database Watcher using SQL Authentication with a system-managed identity. The demo will also show how to set up a free Azure Data Explorer cluster and configure the necessary permissions to allow communication between Database Watcher and the ADX cluster. Several prerequisites will be shown in the demo that need to be configured. Please refer to the "Create and configure a database watcher - Azure SQL Database & SQL Managed Instance" documentation on Microsoft Learn. Additionally, an Azure Key Vault is required for SQL Authentication to store the local SQL user account and password. Please note that this step is not needed if Entra authentication is use. Additionally, we have a new video recorded by Kinye Yu about Configure Database Watcher with Entra ID and User managed Identity. Thanks for these videos Kinye Yu!!!!379Views0likes0CommentsLesson Learned #515:Recommendations for Troubleshooting - Login failed for user 'XXX'. (18456)
During a recent support case, a customer encountered the error: pyodbc.InterfaceError: ('28000', "[28000] [Microsoft][ODBC Driver 18 for SQL Server][SQL Server]Login failed for user 'XXX'. (18456) (SQLDriverConnect); ") using a Python code. Following, I would like to share my lessons learned to fix this issue. The error code 18456 is typically caused by login issues, such as incorrect or missing credentials, rather than connectivity or networking problems. In our investigation, we identified the root cause and suggested recommendations to prevent and resolve similar issues. Root Cause The application was configured to retrieve the database server and host name from environment variables. However: Missing Environment Variables: One or more variables were not set. Default Value Misconfiguration: The code defaulted to a hardcoded value when variables were missing. For example, the server defaulted to "localhost", which was not the intended database server. As a result, the application attempted to connect to an unintended server with incorrect or missing credentials, leading to the Login failed error. Recommendations 1. Validate Environment Variables Always validate critical environment variables like server, username, and password. If a required variable is missing or empty, the application should raise an explicit error or log a clear warning. 2. Avoid Misleading Defaults Use placeholder values, such as "NOT_SET", as defaults for critical variables. This approach ensures that misconfigurations are immediately visible and do not silently fail. 3. Log Connection Details Log critical details like the server and database being accessed. Ensure this information is included in application logs to make troubleshooting easier. Avoid logging sensitive information such as passwords. Python Solution I was thinking how to improve the Python code, implementing a validation of environment variables, handle errors and log critical connection details: import os def get_env_variable(var_name, default_value=None, allow_empty=False): """ Retrieves and validates an environment variable. :param var_name: The name of the environment variable. :param default_value: The default value if the variable is missing. :param allow_empty: If False, raises an error for empty variables. :return: The value of the environment variable or default_value. Example: server = get_env_variable("DB_SERVER", default_value="NOT_SET") """ value = os.getenv(var_name, default_value) if value is None or (not allow_empty and (value.strip() == "" or value.strip() == "NOT_SET" or default_value is None)): raise ValueError(f"Environment variable '{var_name}' is required but not set.") return value453Views0likes0CommentsLesson Learned #514: Optimizing Bulk Insert Performance in Parallel Data Ingestion - Part1
While working on a support case, we encountered an issue where bulk inserts were taking longer than our customer’s SLA allowed. Working on the troubleshooting scenario, we identified three main factors affecting performance: High Transaction Log (TLOG) Threshold: The transaction log was under significant pressure, impacting bulk insert speeds. Excessive Indexes: The table had multiple indexes, adding overhead during each insert. High Index Fragmentation: Index fragmentation was high, further slowing down data ingestion. We found that the customer was using partitioning within a single table, and we began considering a restructuring approach that would partition data across multiple databases rather than within a single table. This approach provided several advantages: Set individual Transaction LOG thresholds for each partition, reducing the log pressure. Disable indexes before each insert and rebuild them afterward. In a single table with partitioning, it is currently not possible to disable indexes per partition individually. Select specific Service Level Objectives (SLOs) for each database, optimizing resource allocation based on volume and SLA requirements. Indexed Views for Real-Time Data Retrieval: Partitioned databases allowed us to create indexed views that updated simultaneously with data inserts more faster because the volumen is less, enhancing read performance and consistency. Reduced Fragmentation and Improved Query Performance: By rebuilding indexes after each bulk insert, we maintained optimal index structure across all partitions, significantly improving read speeds and taking less time. By implementing these changes, we were able to achieve a significant reduction in bulk insert times, helping our customer meet SLA targets. Following, I would like to share the following code: This code reads CSV files from a specified folder and performs a parallel bulk inserts per each file. Each CSV file has a header, and the data is separated by the | character. During the reading process, the code identifies the value in column 20, which is a date in the format YYYY-MM-DD. This date value is converted to the YYYY-MM format, which is used to determine the target database where the data should be inserted. For example, if the value in column20 is 2023-08-15, it extracts 2023-08 and directs the record to the corresponding database for that period, for example, db202308. Once the batchsize is reached per partition (rows per YYYY-MM read in the CSV file), the application executes in parallel the SQLBulkCopy. using System; using System.Collections.Generic; using System.Data; using System.Diagnostics; using System.IO; using System.Runtime.Remoting.Contexts; using System.Threading; using System.Threading.Tasks; using Microsoft.Data.SqlClient; namespace BulkInsert { class Program { static string sqlConnectionStringTemplate = "Server=servername.database.windows.net;Database={0};Authentication=Active Directory Managed Identity;User Id=xxxx;MultipleActiveResultSets=False;Encrypt=True;TrustServerCertificate=False;Pooling=true;Max Pool size=300;Min Pool Size=100;ConnectRetryCount=3;ConnectRetryInterval=10;Connection Lifetime=0;Application Name=ConnTest Check Jump;Packet Size=32767"; static string localDirectory = @"C:\CsvFiles"; static int batchSize = 1248000; static SemaphoreSlim semaphore = new SemaphoreSlim(10); static async Task Main(string[] args) { var files = Directory.GetFiles(localDirectory, "*.csv"); Stopwatch totalStopwatch = Stopwatch.StartNew(); foreach (var filePath in files) { Console.WriteLine($"Processing file: {filePath}"); await LoadDataFromCsv(filePath); } totalStopwatch.Stop(); Console.WriteLine($"Total processing finished in {totalStopwatch.Elapsed.TotalSeconds} seconds."); } static async Task LoadDataFromCsv(string filePath) { Stopwatch fileReadStopwatch = Stopwatch.StartNew(); var dataTables = new Dictionary<string, DataTable>(); var bulkCopyTasks = new List<Task>(); using (StreamReader reader = new StreamReader(filePath, System.Text.Encoding.UTF8, true, 819200)) { string line; string key; long lineCount = 0; long lShow = batchSize / 2; reader.ReadLine(); fileReadStopwatch.Stop(); Console.WriteLine($"File read initialization took {fileReadStopwatch.Elapsed.TotalSeconds} seconds."); Stopwatch processStopwatch = Stopwatch.StartNew(); string[] values = new string[31]; while (!reader.EndOfStream) { line = await reader.ReadLineAsync(); lineCount++; if(lineCount % lShow == 0 ) { Console.WriteLine($"Read {lineCount}"); } values = line.Split('|'); key = DateTime.Parse(values[19]).ToString("yyyyMM"); if (!dataTables.ContainsKey(key)) { dataTables[key] = CreateTableSchema(); } var batchTable = dataTables[key]; DataRow row = batchTable.NewRow(); for (int i = 0; i < 31; i++) { row[i] = ParseValue(values[i], batchTable.Columns[i].DataType,i); } batchTable.Rows.Add(row); if (batchTable.Rows.Count >= batchSize) { Console.WriteLine($"BatchSize processing {key} - {batchTable.Rows.Count}."); Stopwatch insertStopwatch = Stopwatch.StartNew(); bulkCopyTasks.Add(ProcessBatchAsync(dataTables[key], key, insertStopwatch)); dataTables[key] = CreateTableSchema(); } } processStopwatch.Stop(); Console.WriteLine($"File read and processing of {lineCount} lines completed in {processStopwatch.Elapsed.TotalSeconds} seconds."); } foreach (var key in dataTables.Keys) { if (dataTables[key].Rows.Count > 0) { Stopwatch insertStopwatch = Stopwatch.StartNew(); bulkCopyTasks.Add(ProcessBatchAsync(dataTables[key], key, insertStopwatch)); } } await Task.WhenAll(bulkCopyTasks); } static async Task ProcessBatchAsync(DataTable batchTable, string yearMonth, Stopwatch insertStopwatch) { await semaphore.WaitAsync(); try { using (SqlConnection conn = new SqlConnection(string.Format(sqlConnectionStringTemplate, $"db{yearMonth}"))) { await conn.OpenAsync(); using (SqlTransaction transaction = conn.BeginTransaction()) using (SqlBulkCopy bulkCopy = new SqlBulkCopy(conn, SqlBulkCopyOptions.Default, transaction) { DestinationTableName = "dbo.dummyTable", BulkCopyTimeout = 40000, BatchSize = batchSize, EnableStreaming = true }) { await bulkCopy.WriteToServerAsync(batchTable); transaction.Commit(); } } insertStopwatch.Stop(); Console.WriteLine($"Inserted batch of {batchTable.Rows.Count} rows to db{yearMonth} in {insertStopwatch.Elapsed.TotalSeconds} seconds."); } finally { semaphore.Release(); } } static DataTable CreateTableSchema() { DataTable dataTable = new DataTable(); dataTable.Columns.Add("Column1", typeof(long)); dataTable.Columns.Add("Column2", typeof(long)); dataTable.Columns.Add("Column3", typeof(long)); dataTable.Columns.Add("Column4", typeof(long)); dataTable.Columns.Add("Column5", typeof(long)); dataTable.Columns.Add("Column6", typeof(long)); dataTable.Columns.Add("Column7", typeof(long)); dataTable.Columns.Add("Column8", typeof(long)); dataTable.Columns.Add("Column9", typeof(long)); dataTable.Columns.Add("Column10", typeof(long)); dataTable.Columns.Add("Column11", typeof(long)); dataTable.Columns.Add("Column12", typeof(long)); dataTable.Columns.Add("Column13", typeof(long)); dataTable.Columns.Add("Column14", typeof(DateTime)); dataTable.Columns.Add("Column15", typeof(double)); dataTable.Columns.Add("Column16", typeof(double)); dataTable.Columns.Add("Column17", typeof(string)); dataTable.Columns.Add("Column18", typeof(long)); dataTable.Columns.Add("Column19", typeof(DateTime)); dataTable.Columns.Add("Column20", typeof(DateTime)); dataTable.Columns.Add("Column21", typeof(DateTime)); dataTable.Columns.Add("Column22", typeof(string)); dataTable.Columns.Add("Column23", typeof(long)); dataTable.Columns.Add("Column24", typeof(double)); dataTable.Columns.Add("Column25", typeof(short)); dataTable.Columns.Add("Column26", typeof(short)); dataTable.Columns.Add("Column27", typeof(short)); dataTable.Columns.Add("Column28", typeof(short)); dataTable.Columns.Add("Column29", typeof(short)); dataTable.Columns.Add("Column30", typeof(short)); dataTable.Columns.Add("Column31", typeof(short)); dataTable.BeginLoadData(); dataTable.MinimumCapacity = batchSize; return dataTable; } static object ParseValue(string value, Type targetType, int i) { if (string.IsNullOrWhiteSpace(value)) return DBNull.Value; if (long.TryParse(value, out long longVal)) return longVal; if (double.TryParse(value, out double doubleVal)) return doubleVal; if (DateTime.TryParse(value, out DateTime dateVal)) return dateVal; return value; } } }Lesson Learned #513: Using SQL Copilot and Python Retry Logic to Resolve Deadlocks in Azure SQL DB
A few weeks ago, I worked with a service request where our customer has a Python application that was reporting a deadlock with the following error message: INFO:root:Connected to the Database in dbname - Thread id 21132 - 0.0454 seconds --- INFO:root:Connected to the Database in dbname - Thread id 39856 - 0.4066 seconds --- ERROR:root:Transaction 2 failed: ('40001', '[40001] [Microsoft][ODBC Driver 18 for SQL Server][SQL Server]Transaction (Process ID 84) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction. (1205) (SQLExecDirectW)') INFO:root:Transaction 1 completed successfully. In many of our articles and public information like these ones Analyze and prevent deadlocks in Azure SQL Database or Lesson Learned #19: How to obtain the deadlocks of your Azure SQL Database or Managed Instance?, we review ways to detect and resolve deadlocks by analyzing their details. At this time, I would like to share my experience using Copilot in Azure SQL Database to pinpoint the deadlock. After the error received, we started using with several prompts like Give me the deadlock that occurred in my database recently or Provide detailed information about the last deadlock in my database, including involved tables and queries and we got the following information: Based on the internal Azure SQL telemetry, the following deadlock query hashes have been identified in your database within the past 2 hours: Deadlock Details Query Hash 1: 0xD57AFD9584A2C053 Query Text 1: (@2 tinyint,@1 varchar(8000))UPDATE [dbo].[MS_TableA_MS] SET [ValueA] = @1 WHERE [ID]=@2; Query Hash 2: 0xBEAB84B27CE1F534 Query Text 2: (@2 tinyint,@1 varchar(8000))UPDATE [dbo].[MS_TableB_MS] SET [ValueB] = @1 WHERE [ID]=@2; At this point we identified the queries involved on this deadlock and we executed a second prompt that we used was: Suggest options to prevent or mitigate the recent deadlock in my database and we got very useful information about multiple options to prevent or mitigate the deadlock, like, Index Optimization, Force a Query Plan, Rewriting Transactions, Retry Logic in Application or T-SQL Code, etc.. For application retry logic, we traced the source code that triggered the issue and added a retry mechanism to handle the deadlock. Here’s a simple example of what we implemented. Basically, this small code, creates two tables, runs two transactions in two different threads and depending on the value of retry parameter retry or not the deadlock operation. # Constants for table name used in the deadlock scenario TABLE_A_NAME = "dbo.MS_TableA_MS" TABLE_B_NAME = "dbo.MS_TableB_MS" TABLE_A_COLUMN = "ValueA" TABLE_B_COLUMN = "ValueB" ITEM_ID = 1 TABLE_A_INITIAL_VALUE = 'Value A1' TABLE_B_INITIAL_VALUE = 'Value B1' def simulate_deadlock(retry=False, retry_attempts=3): """ Simulates a deadlock by running two transactions in parallel that lock resources. If retry is enabled, the transactions will attempt to retry up to a specified number of attempts. """ setup_deadlock_tables() thread1 = threading.Thread(target=run_transaction_with_retry, args=(deadlock_transaction1, retry_attempts,retry)) thread2 = threading.Thread(target=run_transaction_with_retry, args=(deadlock_transaction2, retry_attempts,retry)) thread1.start() thread2.start() thread1.join() thread2.join() def run_transaction_with_retry(transaction_func, attempt_limit,retry): attempt = 0 while attempt < attempt_limit: try: transaction_func(retry) break except pyodbc.Error as e: if 'deadlock' in str(e).lower() and retry: attempt += 1 logging.warning(f"Deadlock detected. Retrying transaction... Attempt {attempt}/{attempt_limit}") time.sleep(1) else: logging.error(f"Transaction failed: {e}") break def setup_deadlock_tables(): """ Sets up the tables required for the deadlock simulation, using constants for table and column names. """ conn, dbNameReturn = ConnectToTheDB() if conn is None: logging.info('Error establishing connection to the database. Exiting setup.') return cursor = conn.cursor() try: cursor.execute(f""" IF OBJECT_ID('{TABLE_A_NAME}', 'U') IS NULL CREATE TABLE {TABLE_A_NAME} ( ID INT PRIMARY KEY, {TABLE_A_COLUMN} VARCHAR(100) ); """) cursor.execute(f""" IF OBJECT_ID('{TABLE_B_NAME}', 'U') IS NULL CREATE TABLE {TABLE_B_NAME} ( ID INT PRIMARY KEY, {TABLE_B_COLUMN} VARCHAR(100) ); """) cursor.execute(f"SELECT COUNT(*) FROM {TABLE_A_NAME}") if cursor.fetchone()[0] == 0: cursor.execute(f"INSERT INTO {TABLE_A_NAME} (ID, {TABLE_A_COLUMN}) VALUES ({ITEM_ID}, '{TABLE_A_INITIAL_VALUE}');") cursor.execute(f"SELECT COUNT(*) FROM {TABLE_B_NAME}") if cursor.fetchone()[0] == 0: cursor.execute(f"INSERT INTO {TABLE_B_NAME} (ID, {TABLE_B_COLUMN}) VALUES ({ITEM_ID}, '{TABLE_B_INITIAL_VALUE}');") conn.commit() logging.info("Tables prepared successfully.") except Exception as e: logging.error(f"An error occurred in setup_deadlock_tables: {e}") conn.rollback() finally: conn.close() def deadlock_transaction1(retry=False): conn, dbNameReturn = ConnectToTheDB() if conn is None: logging.info('Error establishing connection to the database. Exiting transaction 1.') return cursor = conn.cursor() try: cursor.execute("BEGIN TRANSACTION;") cursor.execute(f"UPDATE {TABLE_A_NAME} SET {TABLE_A_COLUMN} = 'Transaction 1' WHERE ID = {ITEM_ID};") time.sleep(2) cursor.execute(f"UPDATE {TABLE_B_NAME} SET {TABLE_B_COLUMN} = 'Transaction 1' WHERE ID = {ITEM_ID};") conn.commit() logging.info("Transaction 1 completed successfully.") except pyodbc.Error as e: logging.error(f"Transaction 1 failed: {e}") conn.rollback() if 'deadlock' in str(e).lower() and retry: raise e # Rethrow the exception to trigger retry in run_transaction_with_retry finally: conn.close() def deadlock_transaction2(retry=False): conn, dbNameReturn = ConnectToTheDB() if conn is None: logging.info('Error establishing connection to the database. Exiting transaction 2.') return cursor = conn.cursor() try: cursor.execute("BEGIN TRANSACTION;") cursor.execute(f"UPDATE {TABLE_B_NAME} SET {TABLE_B_COLUMN} = 'Transaction 2' WHERE ID = {ITEM_ID};") time.sleep(2) cursor.execute(f"UPDATE {TABLE_A_NAME} SET {TABLE_A_COLUMN} = 'Transaction 2' WHERE ID = {ITEM_ID};") conn.commit() logging.info("Transaction 2 completed successfully.") except pyodbc.Error as e: logging.error(f"Transaction 2 failed: {e}") conn.rollback() if 'deadlock' in str(e).lower() and retry: raise e # Rethrow the exception to trigger retry in run_transaction_with_retry, finally: conn.close()188Views0likes0CommentsLesson Learned #509: KeepAliveTime parameter in HikariCP
Today, I have been working on a service request where, at certain times, we observed that connections could be disconnected due to external factors such as firewalls or other components due to inactivity policies. For this reason, I would like to share my experience using the KeepAliveTime parameter.1.3KViews2likes1Comment