Forum Discussion

martin_larsson_ellevio's avatar
martin_larsson_ellevio
Copper Contributor
May 12, 2025
Solved

Error in copy activity with Oracel 2.0

I am trying to migrate our copy activities to Oracle connector version 2.0. The destination is parquet in Azure Storage account which works with Oracle 1.0 connecter.

Just switching to 2.0 on the linked service and adjusting the connection string (server) is straight forward and a "test connection" is successful. But in a pipeline with a copy activity using the linked service I get the following error message on some tables.

ErrorCode=ParquetJavaInvocationException,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=An error occurred when invoking java, message: java.lang.ArrayIndexOutOfBoundsException:255
total entry:1
com.microsoft.datatransfer.bridge.parquet.ParquetWriterBuilderBridge.addDecimalColumn(ParquetWriterBuilderBridge.java:107)
.,Source=Microsoft.DataTransfer.Richfile.ParquetTransferPlugin,''Type=Microsoft.DataTransfer.Richfile.JniExt.JavaBridgeException,Message=,Source=Microsoft.DataTransfer.Richfile.HiveOrcBridge,'

As the error suggests in is unable to convert a decimal value from Oracle to Parquet. To me it looks like a bug in the new connector. Has anybody seen this before and have found a solution? The 1.0 connector is apparently being deprecated in the coming weeks.

Here is the code for the copy activity:

{
                            "name": "Copy",
                            "type": "Copy",
                            "dependsOn": [],
                            "policy": {
                                "timeout": "1.00:00:00",
                                "retry": 2,
                                "retryIntervalInSeconds": 60,
                                "secureOutput": false,
                                "secureInput": false
                            },
                            "userProperties": [
                                {
                                    "name": "Source",
                                    "value": "@{pipeline().parameters.schema}.@{pipeline().parameters.table}"
                                },
                                {
                                    "name": "Destination",
                                    "value": "raw/@{concat(pipeline().parameters.source, '/', pipeline().parameters.schema, '/', pipeline().parameters.table, '/', formatDateTime(pipeline().TriggerTime, 'yyyy/MM/dd'))}/"
                                }
                            ],
                            "typeProperties": {
                                "source": {
                                    "type": "OracleSource",
                                    "oracleReaderQuery": {
                                        "value": "SELECT @{coalesce(pipeline().parameters.columns, '*')}\nFROM \"@{pipeline().parameters.schema}\".\"@{pipeline().parameters.table}\"\n@{if(variables('incremental'), variables('where_clause'), '')}\n@{if(equals(pipeline().globalParameters.ENV, 'dev'),\n'FETCH FIRST 1000 ROWS ONLY'\n,''\n)}",
                                        "type": "Expression"
                                    },
                                    "partitionOption": "None",
                                    "convertDecimalToInteger": true,
                                    "queryTimeout": "02:00:00"
                                },
                                "sink": {
                                    "type": "ParquetSink",
                                    "storeSettings": {
                                        "type": "AzureBlobFSWriteSettings"
                                    },
                                    "formatSettings": {
                                        "type": "ParquetWriteSettings",
                                        "maxRowsPerFile": 1000000,
                                        "fileNamePrefix": {
                                            "value": "@variables('file_name_prefix')",
                                            "type": "Expression"
                                        }
                                    }
                                },
                                "enableStaging": false,
                                "translator": {
                                    "type": "TabularTranslator",
                                    "typeConversion": true,
                                    "typeConversionSettings": {
                                        "allowDataTruncation": true,
                                        "treatBooleanAsNumber": false
                                    }
                                }
                            },
                            "inputs": [
                                {
                                    "referenceName": "Oracle",
                                    "type": "DatasetReference",
                                    "parameters": {
                                        "host": {
                                            "value": "@pipeline().parameters.host",
                                            "type": "Expression"
                                        },
                                        "port": {
                                            "value": "@pipeline().parameters.port",
                                            "type": "Expression"
                                        },
                                        "service_name": {
                                            "value": "@pipeline().parameters.service_name",
                                            "type": "Expression"
                                        },
                                        "username": {
                                            "value": "@pipeline().parameters.username",
                                            "type": "Expression"
                                        },
                                        "password_secret_name": {
                                            "value": "@pipeline().parameters.password_secret_name",
                                            "type": "Expression"
                                        },
                                        "schema": {
                                            "value": "@pipeline().parameters.schema",
                                            "type": "Expression"
                                        },
                                        "table": {
                                            "value": "@pipeline().parameters.table",
                                            "type": "Expression"
                                        }
                                    }
                                }
                            ],
                            "outputs": [
                                {
                                    "referenceName": "Lake_PARQUET_folder",
                                    "type": "DatasetReference",
                                    "parameters": {
                                        "source": {
                                            "value": "@pipeline().parameters.source",
                                            "type": "Expression"
                                        },
                                        "namespace": {
                                            "value": "@pipeline().parameters.schema",
                                            "type": "Expression"
                                        },
                                        "entity": {
                                            "value": "@variables('sink_table_name')",
                                            "type": "Expression"
                                        },
                                        "partition": {
                                            "value": "@formatDateTime(pipeline().TriggerTime, 'yyyy/MM/dd')",
                                            "type": "Expression"
                                        },
                                        "container": {
                                            "value": "@variables('container')",
                                            "type": "Expression"
                                        }
                                    }
                                }
                            ]
                        }

 

  • martin_larsson_ellevio's avatar
    martin_larsson_ellevio
    May 19, 2025

    I need to add to this that the solution to my situation was to add the following line to the Parquet dataset json code:

    "useParquetV2": true

    at the same level as compressionCodec

    in combination with adaardor suggestion with "Support V1 data types" above. Without it the Parquet file could not be read by Databricks with the error that decimal precision of 256 is not allowed, max 38.

    Full code for Parquet dataset:

    {

        "name": "Lake_PARQUET_folder",

        "properties": {

            "linkedServiceName": {

                "referenceName": "DataLake",

                "type": "LinkedServiceReference"

            },

            "parameters": {

                "source": {

                    "type": "string"

                },

                "namespace": {

                    "type": "string"

                },

                "entity": {

                    "type": "string"

                },

                "partition": {

                    "type": "string"

                },

                "container": {

                    "type": "string",

                    "defaultValue": "raw"

                }

            },

            "folder": {

                "name": "Data Lakes"

            },

            "annotations": [],

            "type": "Parquet",

            "typeProperties": {

                "location": {

                    "type": "AzureBlobFSLocation",

                    "folderPath": {

                        "value": "@concat(dataset().source, '/', dataset().namespace, '/', dataset().entity, '/', dataset().partition)",

                        "type": "Expression"

                    },

                    "fileSystem": {

                        "value": "@dataset().container",

                        "type": "Expression"

                    }

                },

                "compressionCodec": "gzip",

                "useParquetV2": true

            },

            "schema": []

        },

        "type": "Microsoft.DataFactory/factories/datasets"

    }

5 Replies

  • Nikhilver's avatar
    Nikhilver
    Copper Contributor

    Hi , Can you please precisely show where did you keep the setting "Support V1 data types" as mentioned above ? 

  • adaardor's avatar
    adaardor
    Copper Contributor

    In your linked service connection settings, have you tried the additional connection properties to support the v1 data types?

     

     

      • martin_larsson_ellevio's avatar
        martin_larsson_ellevio
        Copper Contributor

        I need to add to this that the solution to my situation was to add the following line to the Parquet dataset json code:

        "useParquetV2": true

        at the same level as compressionCodec

        in combination with adaardor suggestion with "Support V1 data types" above. Without it the Parquet file could not be read by Databricks with the error that decimal precision of 256 is not allowed, max 38.

        Full code for Parquet dataset:

        {

            "name": "Lake_PARQUET_folder",

            "properties": {

                "linkedServiceName": {

                    "referenceName": "DataLake",

                    "type": "LinkedServiceReference"

                },

                "parameters": {

                    "source": {

                        "type": "string"

                    },

                    "namespace": {

                        "type": "string"

                    },

                    "entity": {

                        "type": "string"

                    },

                    "partition": {

                        "type": "string"

                    },

                    "container": {

                        "type": "string",

                        "defaultValue": "raw"

                    }

                },

                "folder": {

                    "name": "Data Lakes"

                },

                "annotations": [],

                "type": "Parquet",

                "typeProperties": {

                    "location": {

                        "type": "AzureBlobFSLocation",

                        "folderPath": {

                            "value": "@concat(dataset().source, '/', dataset().namespace, '/', dataset().entity, '/', dataset().partition)",

                            "type": "Expression"

                        },

                        "fileSystem": {

                            "value": "@dataset().container",

                            "type": "Expression"

                        }

                    },

                    "compressionCodec": "gzip",

                    "useParquetV2": true

                },

                "schema": []

            },

            "type": "Microsoft.DataFactory/factories/datasets"

        }

Resources