FIX: Queries against PolyBase external tables return duplicate rows in SQL Server 2016 and 2017

Applies to: SQL Server 2016 DeveloperSQL Server 2016 EnterpriseSQL Server 2016 Enterprise Core

Symptoms


Assume that you create a PolyBase external table that uses a PARQUET file as data source in SQL Server 2017 and Microsoft SQL Server 2016. The PARQUET file is split into multiple files in Hadoop Distributed File System (HDFS), and each file is greater than the block size of HDFS. In this situation, when you query data from this external table, duplicate rows may be returned.

Resolution


This issue is fixed in the following cumulative updates for SQL Server:

       Cumulative Update 1 for SQL Server 2017

       Cumulative Update 6 for SQL Server 2016 RTM

       Cumulative Update 6 for SQL Server 2016 SP1

Status


Microsoft has confirmed that this is a problem in the Microsoft products that are listed in the "Applies to" section.
 

References


Learn about the terminology that Microsoft uses to describe software updates.