Is the data of the same field after spark groupby in the same partition?

suppose there are ten partitions in a RDD. When you groupby this RDD, you get a new RDD,. Is the data of the same field in the same partition?

my test results show that data from the same grouping field is divided into the same partition, and data from other fields can exist in the same partition.

extension problem:
(1) if the data of the same field is in the same partition, then the groupByRDD.mapValues obtained after groupBy gets all the values data corresponding to this field. When the amount of data is large,
groupByRdd.mapValues (_ .tolist (). Sortby)) will cause memory overflow.
is this understood correctly

Mar.11,2021

groupbykey is in the same partition.
in theory, mapValues should overflow if it is too large, but I can test a very large set of data and there is no overflow.
according to the guess of the document and log, when a subsequent action triggers the mapValues operation, the rdd of mapvalues will be serialized to disk, so it is estimated that it will not be read into memory together, but read and write a little bit through disk exchange.

Menu