DCT can provide energy compaction because of the way its basis function is formed. Just like we have cos(0) = 1 and cos(90) = 0, likewise the energy of the subblock of image gets concentrated to the low frequency coefficients and the high frequency coefficients have magnitude tending towards zero.
(on a smilar idea, DST will not be useful as sin(0) = 0 and sin(90) = 1, which is not helpful for compression as then we will need all 64 coefficienst in zigzag scanning and entropy coding).