MPEG 3D Graphics – bits of information related to graphics standards

Why MPEG made two codecs for Point Cloud Compression?

MPEG approved in 2020 FDIS (the final step in standards preparation) for two codecs for Point Cloud Compression: the first one called V-PCC (Video-based Point Cloud Compression) and the second called G-PCC (Geometry-based Point Cloud Compression). A legitimate question may be why two codecs in the same time? Are these codecs in competition one with the other or are they designed for different purposes?

The short answer is Yes, V-PCC and G-PCC are each one designed for different types of point clouds: V-PCC for dense point clouds such as the ones representing natural objects (reconstructed from a set of images) or graphics objects (designed by artists), and G-PCC for sparse point clouds such as the ones obtained by LIDARs when aquiring large spaces. Of course, someone may use V-PCC for sparse point clouds and G-PCC for dense one but this will be a wrong usage and conduct to sub-optimal results.

Let’s take some time to explain the reasons behind this two codecs strategy. Point clouds are now used in various applications with various requirements. The figure below shows two examples:

Both images are representing point clouds, the left one (by 8i Labs) is obtained by 3D reconstructions from a set of images captured by cameras positioned arround the person and the right one (by MERL) is obtained by using a LIDAR mounted on a car. Both point clouds are dynamic. This visual inspection allows to note that the first point cloud is dense and the second is sparse.

When the (3D) point cloud is dense, its projection on 2D planes remains dense enough (no or few holes in the projected image), therefore it is possible to transform the dense 3D point cloud into a set of images (or videos) and to encode this 2D representation instead of the original 3D one. Applying the same method (3D to 2D projection) for sparse point clouds is also possible but will conduct to poor images (with holes) that traditional image/video codecs will badly encode. V-PCC is based on this mecanism of projecting the 3D point cloud in a set of 2D patches, combine them in a 2D video and encode it with traditional video codecs. The first lesson here is: do not try to encode sparse point clouds with V-PCC, this will be sub-optimal.

When the (3D) point cloud is sparse, other representation space is needed to obtain a good compression. There are many methods proposed in the litterature and all of them try to exploit the 3D geometry to construct a predictable space. One method, generic enough, is the decomposition of the 3D space in octrees – a hierarchical segmentation of the volume that has as a benefit to quickly separate big empty spaces and refine the partitioning only where the information exists (non-empty blocks). Such approach is used in G-PCC and explains its name, Geometry-based point cloud compression. The second lesson here is : G-PCC is designed for sparse point clouds. The question is: can G-PCC be used for dense point clouds as well? The answer is yes but, be aware, this usage is sub-optimal.

Bottom line: V-PCC is designed (and optimal) for dense point clouds, G-PCC is designed (and optimal) for sparse point clouds. Comparing V-PCC with G-PCC on same content will systematically show that V-PCC is better (if the content is dense) and G-PCC is better (if the content is sparse).

Reference Software in MPEG

There are more and more research articles on point cloud compression. Some of them are proposing new methods or some improvements on top of both V-PCC and G-PCC. Some others are analyzing and discussing the various compression approaches and tools in V-PCC and G-PCC. And some of them are presenting compression results. A common characteristic of many of these articles is that they are using the MPEG Reference Software for generating the V-PCC and G-PCC bitstreams. In general, using this Reference Software is OK, however, the authors should understand what Reference Software means, how it should be used and how it should not be used.

The purpose of this article is to clarify the role of the MPEG PCC Reference Software.

MPEG PCC has as a main objective to standardize the bistream syntax of an encoded representation of point cloud data. The direct implication of this is that the decoder behaviour is standardized (what the decoder should do when reading a bit from the compressed data). A less direct implication is that the encoder is NOT standardized. Any encoder able to output the conformant bitstream is OK, by whatever means it uses to create it. By doing so, MPEG allows to have competition in the market, different organisations being able to invent new mecanisms and tools to better exploit signal properties and obtain therefore better compression performances. Such strategy gives also a longer life to the MPEG standards, the same version of the decoder being able to decode less or more performant bitstreams (several generations of Encoders can be built on same Decoder).

During the incremental production of the standard (textual specification), MPEG is also producing what is called a Test Model (TM), a pair of Encoder and Decoder software implementations, used to evaluate various tools proposed by various entities participating to the standardisation. All tools producing benefits are important and the ones who have normative impact (meaning impact on the bitstream synthax) are included in the TM for the next iteration. Sometimes, some tools with non-normative impact may be included in the encoder, but this is not a general rule. In general, tools with non normative impact are kept by companies to be used in their own product, therefore obtaining a competitive advantage in the market.

At the end of the standardisation process, the TM containing only the tools retained in the standard becomes the Reference Software. It consists in a Decoder, able to decode the conformant bitstreams and (sometimes) in an Encoder, able to produce them. In the case of V-PCC and G-PCC, there are two TMs and each of them has both the Encoder and Decoder. The Reference Software is freely distributed by ISO and many companies are using it as a basis for their products and substancially improve them (especially the Encoder) before puting it on the market.

OK, you may say, so what?! This is old story everybody knows, what is the point here?

As I mentioned at the begining of this post, many papers are using the V-PCC and G-PCC Reference Software and present compression performances. The problem is that they forget to mention that what they compare are the performances of a non-optimized Encoder that has as roles to help the development of the standard (validate/invalidate tools) during the two years of creating it and to lower the entry barrier for standard users. A compression performance analysis of the Reference Software is not representative for the full capability of the standard. As an exemple, for MPEG-4 Video, the third generation of the commercial encoders (using the same bitstream synthax) was 3 times better than the Reference Software. Same is expected to happen for V-PCC and G-PCC and then, such articles become obsolete, even before reaching their readers.