The Sims™ Technical Aspects

CFP File Format


The following information is not based on any proprietary knowledge or restricted documentation—it was entirely derived from observation, experiment, and public information, thus it may be inaccurate or incomplete.

Analyzed by Greg Noel

A CFP (Compressed Floating Point) file contains only IEEE short floating point numbers in a compressed format.  This format is used in the game to describe an animation sequence.  That is, it contains a series of positions ("poses") for the character, much like the frames of a film.  Each frame of the animation sequence describes a slightly different position so that the illusion of motion is achieved. 

Unlike virtually all the other game formats, this format is not self-describing.  There's nothing that says how many values there are or how the values are separated into different categories.  Instead, this data is located in a skill in a CMX or BCF file.  The skill information specifies how many frames (sets of values) of offset and rotation values there are, and which frames are associated with each bone being animated. 

The CFP file format uses a simple compression algorithm to save space and reduce load time.  It encodes over 27,500 floating point values into less than 16 megabytes.  Uncompressed, that would require over 110 megabytes, so that's about a 7-to-1 compression, which is pretty good for such a simple scheme. 

The scheme is not lossless; an encoded value can have less precision than the true value.  Details of the compression algorithm are still being worked out, but it is believed that values retain at least three significant digits (which is enough for an animation sequence). 

This table describes how the values are organized for some number of offsets and rotations:

CFP Contents
Number Value
offsets Offset X values
offsets Offset Y values
offsets Offset Z values
rotations Rotation W values
rotations Rotation X values
rotations Rotation Y values
rotations Rotation Z values

The BCF skill specifies how many total frames of offsets and how many total frames of rotations there are.  Offsets are a three-space position requiring three values (X, Y, and Z).  Rotations are a quaternion requiring four values (W, X, Y, and Z). 

Each value in the file is encoded with the method described below. 

Since the compression algorithm (see below) operates on the similarity between successive values, the best way to allocate the values is to have all the X values together, all the Y values together, and so on. 

(Don Hopkins points out that many of the graphics transformations are uniformly applied to all the elements.  Long vectors of values make the code simpler and are more cache-friendly, so they run faster.) 

CFP values are encoded using one of three techniques.  (It's possible there's a fourth way, but there are no known samples.)

Float Value
Offset Size Value
0 1 0xFF
1 4 Float value

Individual values are encoded as a five-byte sequence.  The first byte is 0xFF and is followed by a four-byte IEEE floating point number encoded in little-endian order.  This method is used only for values that cannot be encoded by some other technique (below). 

Of all the values encoded this way, the smallest value found was -15 and the largest was 16.1833.  There are only about 120 values whose value is greater than ten in absolute value.  Given this restricted range, it might have been possible to encode the values even more densely, in four (or even three) bytes. 

Repeat Previous
Offset Size Value
0 1 0xFE
1 2 Repeat count

The previous value may be repeated by use of a three-byte sequence.  The first byte is 0xFE and the next two bytes are a zero-relative little-endian integer (that is, zero means that the value is repeated once for a total of two occurrences).  Repeat counts are surprisingly useful, as parts of the Sim's body may be motionless (or rigid for rotations) for extended portions of an animation; this allows those portions to be represented efficiently. 

Always using a two-byte count is not as efficient as it could be.  Although the largest count found is 3190, repeat counts of 256 and below outnumber larger counts by a factor of better than 20-to-1, so having another code with a one-byte count would increase the compression by an additional percent or two.  (It's possible that the otherwise unused code of 0xFD was intended for this use.)

Table Value
Offset Size Value
0 1 Table code

In addition to providing a float value using an 0xFF code, values can be provided through a one-byte code.  This code causes the previous value to be changed by a small offset (a delta).  Each table code causes a different delta to be applied. 

One-byte compression codes occur in two ranges: one range from 0 to 119 (120 values) and one range from 133 to 252 (also 120 values).  The first range applies a negative delta and the second range a positive delta. 

Although research can provide approximations for the code values, the exact values would be impossible to reverse-engineer.  Dave Baum fit the values to an equation; his suggested formula is this:

	f(x) = 3.9676e-10 * (x-126)^3 * abs(x-126)

Note that f(126) is zero, so a compression code of 126 should repeat the previous value.  One would expect that a value that is repeated once or twice would appear as one or two deltas of zero instead of the longer three-byte repeat sequence, yet this is not done. 

Even if it is now possible to decode existing animations, it's not completely clear how to apply this compression scheme to new animations.  See the research notes for the latest details. 

In the worst case, it should still be possible to create new animations simply by encoding all float values using the five-byte sequence (plus repeats when possible) instead of using table values.  A new animation might be several times larger than it could be if it was fully encoded (and the delay to load it correspondingly longer), but it should still work compatibly in the game. 


Reminder: This information is not based on any proprietary knowledge or restricted documentation—it was entirely derived from observation, experiment, and public information, thus it may be inaccurate or incomplete.

Valid XHTML 1.1! Valid CSS!
Copyright © 2001-2008 Dave Baum and Greg Noel. All rights reserved.
The Sims™ is a trademark of Maxis and Electronic Arts.
This page was last modified Wednesday, 24-Mar-2004 13:06:59 UTC.
Made on a Mac
SourceForge