Assume that you create a PolyBase external table that uses a PARQUET file as data source in SQL Server 2017 and Microsoft SQL Server 2016. The PARQUET file is split into multiple files in Hadoop Distributed File System (HDFS), and each file is greater than the block size of HDFS. In this situation, when you query data from this external table, duplicate rows may be returned.
This issue is fixed in the following cumulative updates for SQL Server:
Microsoft has confirmed that this is a problem in the Microsoft products that are listed in the "Applies to" section.
Learn about the terminologythat Microsoft uses to describe software updates.