Forum Widgets
Latest Discussions
Partitioning in Azure Synapse
Hello, Im currently working on an optimization project, which as led me down a rabbithole of technical differences between the regular MSSQL and the dedicated SQL pool that is Azure PDW. I noticed, that when checking the distributions of partitions, when creating a table, for lets say splitting data by YEAR([datefield]) with ranges for each year '20230101','20240101' etc, the sys partitions view claims that all partitions have equal amount of rows. Also from the query plans, i can not see any impact in the way the query is executed, even though partition elimination should be the first move, when querying with Where [datefield] = '20230505'. Any info and advice would be greatly appreciated.AbuasRinroeMay 21, 2025Copper Contributor21Views0likes0CommentsAccess dedicated SQL pool from notebook
I have some notebooks where I use the com.microsoft.spark.sqlanalytics library to fetch the data from the dedicated SQL pool. Everything was working fine until a couple of days when we started getting the errors which are not very helpful. The error is like this: Py4JJavaError: An error occurred while calling o4062.count. : org.apache.spark.SparkException: Job aborted due to stage failure: Task 2 in stage 0.0 failed 4 times, most recent failure: Lost task 2.3 in stage 0.0 (TID 21) (vm-00321611 executor 2): java.lang.IllegalArgumentException: For input string: 'null' The code was working without issues up until a couple of days and there were no new deployments prior to that. The error occurs when the data is being accessed. Here is an excerpt: dfData = spark.read.option(Constants.DATABASE, "sql_db_pool1").synapsesql(query) cnt = dfData.count() The error is coming deep from the library and there is no way to determine what argument is null. Anybody ran into an issue like this? RegardsljupcheApr 02, 2025Copper Contributor185Views1like3CommentsSynapse workspace cost reduction
I have a Cosmos DB where I have one container that contains different documents. One document is a main document that has another related document. Both documents are related to each other by using a partition key. There will be one main document and multiple event documents with the same partition key. The main document has fields like date, country, and categories, which the event document does not have, while the event document has fields like event type, event dateandtime etc. To filter how many events happened for a particular category on a particular day, we have to use the main document. The events can be repetitive on a single day. My requirement is to create a Power BI report to display how many events happened on a particular day and for which country in the last 2 months (each event should display only one time per category, per country in a day). I want to get this data from Synapse and load it into Power BI for the last 2 months. I used the Synapse view and implemented incremental dataset refresh in a power BI. In a Synapse view, I created a main view that loads data for a main document, and in another view, I get those partition keys from the main view and then load the data for an event document. There are 2 dates in a main document: created date and change date. I cannot use the change date in incremental dataset refresh as it creates duplicate records, so I used the created date and then used the data to detect changes for the last 30 days (this is the time period where the main document can change). It works well, but the problem here is that it takes a lot of time to execute the query, which is causing more cost for data processing in Synapse. Is there any suggestion to reduce the cost consumption of Synapse as well as query execution time/dataset refresh time in Power BI?SynLoverMar 11, 2025Copper Contributor31Views0likes0CommentsIP whitelist for Synapse Spark Pools to reach external endpoint with firewall?
I am trying to reach an external vendor SFTP site from my Synapse Spark notebook. The site is behind a firewall. I want to get the IP range for all of our Spark pools to the vendor so they can whitelist them. Struggling to get a clear idea of that list. Closest I found so far was "Azure Cloud East US", which is rather broad. Any advice/ideas how to get a refined list/range of IPs?PeterDanielsFeb 28, 2025Copper Contributor96Views0likes1CommentRest api call with multiple pages
Hello, I need to migrate an SSIS package that make API calls and return multiple pages. The call goes to https://{{HOSTNAME}}/personnel/v1/person-details, and I can manually do https://{{HOSTNAME}}/personnel/v1/person-details?page=1&per_page=200, my issue is that I do not get any metadata to create the rules on the foreach loop on the first call, and I am not sure how to go about it I get something like this, no metadata [ { "additionalName1": null, "additionalName2": null, "nationality1": null, "nationality2": null, "nationality3": null, }, { "additionalName1": null, "additionalName2": null, "nationality1": null, "nationality2": null, "nationality3": null, }, { "additionalName1": null, "additionalName2": null, "nationality1": null, "nationality2": null, "nationality3": null, } ]AstridMalankaFeb 13, 2025Copper Contributor28Views0likes1CommentAccessing serverless sql pool tables from dedicated sql pool
I'm trying to access the tables available in the synapse serverless sql pool from the dedicated sql pool. I'd like to create some simple stored procedures to import data from delta/parquet tables that are mapped as external tables in the serverless sql pool and load them into some dedicated sql pool tables. Is there a simple way to do this without having to define external tables in the dedicated sql pool too? I tried this and there seem to be many limitations (delta not supported, etc.).clayduvallFeb 13, 2025Copper Contributor77Views0likes1CommentGitFlow possible with Synapse?
My team has been using synapse for some time across our dev, uat, and production environments. So far, they have not utilized any CD to deploy up environments but instead promote artifacts manually. The reason being is that their UAT environment is rarely ready to go to production. Features often sit in UAT for many months before deployment to production, but are required to be in UAT for testing with the full dataset. This seems to indicate the need for a branching strategy like GitFlow to allow for selective PRs using cherry-picking or git revert to allow for only ready features to production. Has anyone faced this issue or have any tips on how to resolve this challenge. It seems unlike traditional app development; feature flags don't solve the issue as they only really can work inside pipelines. Thanks!SynapseFanJan 21, 2025Copper Contributor65Views0likes2CommentsAzure Synapse: What’s Next?
With the recent introduction of Microsoft Fabric, which aims to unify various data and analytics workloads into a single platform, how will this impact the future of Azure Synapse Analytics? Specifically, will Azure Synapse Analytics become obsolete, or will it continue to play a significant role alongside Microsoft Fabric? Additionally, what are the recommended migration paths and considerations for organizations heavily invested in Azure Synapse Analytics?”saturnJan 20, 2025Copper Contributor378Views0likes1CommentAzure Synapse Issue
Hi, I have a question regarding the backend mechanism of Synapse Spark clusters when running pipelines. I have a notebook that installs packages using !pip since %pip is disabled during pipeline execution. I understand that !pip is a shell command that installs packages at the driver level. I’m wondering if this will impact other pipelines that are running concurrently but do not require the same packages. Thank you for your help.zbensisiJan 20, 2025Copper Contributor25Views0likes1Commenthow do you have any questions or hell
How to use the latest version of the pr ye sab ki aap karte Microsoft word Excel PowerPoint template for your time to explore new the latest news on the latest news agency of my life I have a good the world on fire of my life I have..osude127wJan 12, 2025Copper Contributor14Views0likes0Comments