Symptoms
Assume that you create a PolyBase external table that uses a PARQUET file as data source in SQL Server 2017 and Microsoft SQL Server 2016. The PARQUET file is split into multiple files in Hadoop Distributed File System (HDFS), and each file is greater than the block size of HDFS. In this situation, when you query data from this external table, duplicate rows may be returned.
Resolution
This issue is fixed in the following cumulative updates for SQL Server:
Cumulative Update 1 for SQL Server 2017
Each new cumulative update for SQL Server contains all the hotfixes and all the security fixes that were included with the previous cumulative update. Check out the latest cumulative updates for SQL Server:
Status
Microsoft has confirmed that this is a problem in the Microsoft products that are listed in the "Applies to" section.
References
Learn about the terminologythat Microsoft uses to describe software updates.