Symptoms

Assume that you create a PolyBase external table that uses a PARQUET file as data source in SQL Server 2017 and Microsoft SQL Server 2016. The PARQUET file is split into multiple files in Hadoop Distributed File System (HDFS), and each file is greater than the block size of HDFS. In this situation, when you query data from this external table, duplicate rows may be returned.

Resolution

This issue is fixed in the following cumulative updates for SQL Server:

       Cumulative Update 1 for SQL Server 2017

       Cumulative Update 6 for SQL Server 2016 RTM

       Cumulative Update 6 for SQL Server 2016 SP1

Each new cumulative update for SQL Server contains all the hotfixes and all the security fixes that were included with the previous cumulative update. Check out the latest cumulative updates for SQL Server:

Latest cumulative update for SQL Server 2017

Latest cumulative update for SQL Server 2016

Status

Microsoft has confirmed that this is a problem in the Microsoft products that are listed in the "Applies to" section.
 

References

Learn about the terminologythat Microsoft uses to describe software updates.

Need more help?

Expand your skills
Explore Training
Get new features first
Join Microsoft Insiders

Was this information helpful?

What affected your experience?

Thank you for your feedback!

×