The Sims™ Technical Aspects

CMX/BCF File Format


The following information is not based on any proprietary knowledge or restricted documentation—it was entirely derived from observation, experiment, and public information, thus it may be inaccurate or incomplete.

Analyzed by Greg Noel.  Thanks go to Don Hopkins, who corrected the nomenclature and clarified a few points. 

The BCF format (.bcf) is a binary-encoded form of the CMX (.cmx) ASCII files, thus it usually occurs as a filename with the suffix .cmx.bcf.  It is not known what CMX stands for, but it's possible that BCF stands for Binary CMX Format. 

The BCF format is an exact match, field-for-field, with an equivalent CMX-format file.  Some values in the CMX file have more precision than is possible in the BCF file, suggesting that the .cmx file is the master source and .cmx.bcf files are normally generated from .cmx files. 

The CMX/BCF format has three parts, each describing a different type of data. 

In the description below, when discussing the binary BCF format:

In the description below, when discussing the text CMX format:

Some .cmx files have floating point values that are around 1e-45 in absolute value.  This is too small to fit in a short float (it's within the range of a long float) so the corresponding value in the .cmx.bcf file is the smallest possible, around 1e-38.  This means that it's not always possible to convert between file formats without losing information. 

BCF Format
Offset Size Value
0 4 Count of skeletons
4 var Skeleton
var var . . .
var 4 Count of suits
var var Suit
var var . . .
var 4 Count of skills
var var Skill
var var . . .

The CMX format has two lines of identifying information at the beginning and then follows the same structure as the BCF format:

	// Character File. Copyright 1997, Maxis Inc.
	version 300
	integer-skeletons
	<skeleton> ...
	integer-suits
	<suit> ...
	integer-skills
	<skill> ...

Each part of the CMX/BCF format has a count followed by that many elements of the type.  The three types are skeletons, suits, and skills.  Each of these types is discussed in detail below. 

No CMX/BCF file has been observed with more than one non-zero count.  That is, exactly one count is non-zero and the other two counts are zero.  This is not an intrinsic restriction; it would be possible to combine all of the CMX/BCF data into a single huge file.  Placing related content (for example, all of the animations for a particular interaction) into a single file was more convenient for the artists who created the work. 

BCF Property List (Props)
Offset Size Value
0 4 Sublist count
4 var Sublist 1
Property Sublist
Offset Size Value
0 4 Prop count
4 var Prop name 1
var var Prop value 1
var var Prop name 2
var var Prop value 2
var var . . .

A Property List is a set of attributes that may be attached to various points in the structure.  The same structure is used in each of the places, so we describe it here.  The contents of the property list are informally called props

The CMX format follows the same structure as the BCF format:

	integer-sublist-count
	integer-prop-count
	string-prop-name
	string-prop-value
	... (repeated pairs of strings)

A property list is a strangely complex item.  On the one hand, it appears to be a two-dimensional ragged array, but on the other hand, it has never been observed to have more than one sublist, so it's not obvious why a simple vector was not chosen. 

The sublist count has only been observed to have the values zero and one.  Instead of being a count, Don Hopkins suggests it's a flag saying that a single sublist is attached.  If that is the case, however, it's harmless to treat it as a count.  (If it's a flag, note that it's redundant, since the count within the sublist must necessarily be non-zero.) 

A sublist is simply a count (the sublist count) followed by the specified number of pairs of strings.  The first string is the name of the property and the second string is the value.  The name and value are arbitrary; their meaning is assigned by convention. 

The sublist count is almost always one; in a few very rare cases it is two.  No other values have been observed. 


Again, remember that this information is not based on any proprietary knowledge or restricted documentation—it was entirely derived from observation, experiment, and public information, thus it may be inaccurate or incomplete.

About skeletons

(Dem bones, dem bones, dem dry bones.  The foot bone's connected to the ankle bone.  The ankle bone's connected to the shin bone.  The shin bone's connected to the knee bone.  The knee bone's connected to the thigh bone.  The thigh bone's connected to the hip bone.  The hip bone's connected to the back bone.  The back bone's connected to the neck bone.  The neck bone's connected to the jaw bone.  The jaw bone's connected to the head bone.   . . .)

BCF Skeleton
Offset Size Value
4 var Skeleton name
var 4 Number of bones (N)
var var Bone 1
var var Bone 2
var var Bone 3
var var . . .
var var Bone N

The CMX format follows the same structure as the BCF format, with each element on its own line:

	string-skeleton-name
	integer-number-of-bones
	<bone> ...

The skeleton CMX/BCF information begins with a string name.  For the four known skeleton files, the string value is either adult, child, dog, or kat.  It is not believed possible to have other than the four standard skeletons in the game, since the skeleton is chosen based upon a one-character code that matches the first letter of the skeleton name (a, c, d, or k, respectively). 

The number of bones is 29 for human skeletons, 37 for the dog skeleton, and 40 for the cat skeleton.  These numbers are not wired into the game, so it should be possible to have a modified skeleton as long as the existing bones remain in the same relationship. 

Each bone describes a connection between two names.  The skeleton describes a tree.  The tree is rooted at the name ROOT and the leaves are the head, fingers, and toes.  Don Hopkins informs us that this is a restriction of the modeling tool used; the game doesn't care if the skeleton is a more general structure, such as an acyclic graph or even a full network. 

Since the skeleton is a tree, each bone also introduces a new primary name.  That is, ROOT and each bone name occur exactly once as a primary name.  (NULL is not the name of a bone and does not occur as a primary name; it's only used as the parent for ROOT.) 

When animating, the ROOT acts as the center of gravity and is moved in a smooth line.  The actual skeleton is animated relative to this center of gravity, giving a more natural motion. 

BCF Skeleton Bone
Offset Size Value
0 var Name of bone
var var Parent bone
var var Props
var 12 X,Y,Z position
var 16 Rotation quaternion
var 4 Can-translate flag
var 4 Can-rotate flag
var 4 Suits can blend
var 4 Wiggle value
var 4 Wiggle power

The CMX format follows the same structure as the BCF format:

	string-bone-name
	string-parent-bone
	<props>
	| float-w float-y float-x |
	| float-w float-y float-x float-w |
	flag-can-translate
	flag-can-rotate
	flag-can-blend
	integer-wiggle-value
	float-wiggle-power

Each skeleton bone is itself a variable-length structure.  It contains two strings naming body parts and a whole bunch of numbers. 

Different modeling tools treat skeleton components differently.  A component could be an unnamed bone between two named joints.  A component could be an unnamed joint between two named bones.  A component could be a joint, connected to another named joint.  Or a component could be a bone, connected to another named bone. 

Since the skeleton is describing a tree, these interpretations are equivalent for the most part; the major difference is whether the name refers to the joint or the bone.  Based on the names, we assume that each component describes a bone, so we will call them that and otherwise ignore the fine semantic distinctions. 

The supposition is that the component as a whole describes something about how the two bones are oriented in space and how they move relative to each other.  What is being specified must be reasonably complex, since there are an awful lot of values for each bone. 

There are two names that don't correspond to true bones that provide assistance in articulating the skeleton: NULL and ROOTNULL is used as the (nonexistent) parent for ROOTROOT is the base of the tree and is only connected to the PELVIS

The 28 names corresponding to bones in the human skeletons are PELVIS, SPINE, SPINE1, SPINE2, NECK, HEAD, R_ARM, R_ARM1, R_ARM2, R_HAND, R_FINGER0, R_LEG, R_LEG1, R_FOOT, R_TOE0, R_TOE01, R_TOE02, L_ARM, L_ARM1, L_ARM2, L_HAND, L_FINGER0, L_LEG, L_LEG1, L_FOOT, L_TOE0, L_TOE01, and L_TOE02.  The presence of the "finger" on each hand is probably to make implementation of a pointing finger easier.  It's not obvious why there are so many toe bones.  (Don Hopkins says the fingers and toes are vestigial, from the modeling tool they were using, and are never used.  However, some hands have been observed to bend, so it may be that only the toes are unused.)

Only one property has been observed.  The property is "name" and the value has been "adult" or "child," respectively.  It is attached to the PELVIS bone for the adult skeleton and to the ROOT bone for the child.  Neither the dog nor the cat skeletons have a property (although there is a strange empty suit in the same file).  Note that the name is the same as the skeleton as a whole so it's not obvious why this attribute is present.  It's most likely a glitch (Don Hopkins confirms this). 

The position is the direction (and implicitly, the length) of this bone.  For the text CMX format, these three values are on the same text line with a vertical bar at the beginning and end of the line.  If the orientation is the same as other graphic objects, the x direction is from side to side, the y direction is forward and back, and the z direction is up and down. 

The rotation quaternion is four floats (X, Y, Z, W) used to rotate a bone smoothly.  For the text CMX format, these four values are on the same text line with a vertical bar at the beginning and end of the line.  The rotation quaternion is always normalized.  The first three floats are the direction vector matching the corresponding floats in the position vector; the fourth float is amount to rotate. 

The names of the three flags come from the blueprint website and have been confirmed by Don Hopkins.  Their exact meaning is unknown. 

The two wiggle quantities are either both zero or both non-zero.  When they are non-zero, the wiggle value has very large values ending in five zeros when displayed as a decimal integer (the largest observed value is 700000) and the wiggle power typically has values of a few hundredths (such as 0.02 or 0.05).  They are not exact reciprocals of each other, but they do seem to vary inversely.  Don Hopkins says they are left over from an attempt to use Perlin noise to introduce some randomness into the animations, so that an animation would look a little different each time it was run. 


Again, remember that this information is not based on any proprietary knowledge or restricted documentation—it was entirely derived from observation, experiment, and public information, thus it may be inaccurate or incomplete.

About suits

Farewell, Monsieur Traveler, look you lisp, and wear strange suits.

BCF Suits
Offset Size Value
0 var Suit name
var 4 Suit type
var 4 zero (Props?)
var 4 Count of skins
var var Skin
var var . . .

The CMX format follows the same structure as the BCF format:

	string-suit-name
	integer-suit-type
	integer-zero
	integer-skins
	<skin> ...

The suit name almost always begins with "suit-" and has either the name of a character ("clown") or the name of an item ("flask") after that.  The remainder of the name is usually suggestive of some activity (thus, suit-clown-spin or suit-flask-drink). 

The suit type almost always has the value zero.  The exceptions are the two censor files (adult-censor and child-censor) where it has the value one.  At a guess, it is a flag that sets off the blur effect.  (The blur effect is more complex than this simple supposition, but this flag does seem to be related somehow.)  (Don Hopkins says zero is a "normal" suit and one is a censorship bounding box.  There may be other types as well, such as clipping optimizations.

The unknown integer is always zero.  Its use is unknown.  (Don Hopkins has said that just about every element of a .cmx file can have props.  If so, this is the only candidate for this element.  This could be the first props field, which would always be zero if there are never any properties.  Some experimentation should resolve this.) 

The count of skins says how many skins there are.  Each skin specifies a file containing mesh and texture information.  Originally, skins were wrapped around a single bone, so to wrap the entire body, a suit would normally specify a list of skins covering most of the bones.  With the advent of deformable meshes, skins needed to attach to more than one bone, so bone names are now specified in the mesh itself.  (Each mesh is suspended in space relative to the bone(s) and is then draped with the texture.) 

Although the suit is still used to attach a whole-body skin, the suit is now primarily used to attach accessories.  Multiple accessories may be attached to the same bone (for example, adding both a clown hat and a clown nose to the head) or accessories may be attached to different bones (for example, adding a dust pan to one hand and a brush to the other).  Some special effects are done by attaching and detaching skins (the roach can may or may not have spray, but the skin for the can is unchanged).  Presumably, all of these can be mixed-and-matched in any combination. 

BCF Skin
Offset Size Value
0 var Bone name
var var Skin name
var 4 Censor flag bits
var 4 zero (Props?)

The CMX format follows the same structure as the BCF format:

	string-bone-name
	string-skin-name
	censor-flag-bits
	integer-zero

The bone name matches a name from the skeleton definition, e.g., L_HAND.  The primary bone used by the skin is redundantly named here for backward compatibility. 

The skin name is the basename of the file containing the default skin (a SKN or BMF format file) that is to be displayed relative to the bone.  (That is, the skin name is the filename without the path prefix and without the .skn or .bmf extension.)  It invariably begins "xskin-" with the rest being mnemonic of the use.  The skin name is case-insensitive, presumably because filenames are case-insensitive on the reference platform. 

Non-zero flags have only been observed in the censor files.  The bits are specified in Behavior.iff STR# resource 178.  At a guess, they are how the flag bits in the censorship field of the person data structure are mapped to the specific (name of the) bone to be blurred. 

The unknown integer is always zero.  It could be reserved for expansion, but the exact use is unknown.  (As above, if this element has props, this is the only candidate.  Some experimentation should also resolve this.) 


Again, remember that this information is not based on any proprietary knowledge or restricted documentation—it was entirely derived from observation, experiment, and public information, thus it may be inaccurate or incomplete.

About skills

All in the golden afternoon
Full leisurely we glide,
For both our oars with little skill
By little arms are plied.

BCF Skill
Offset Size Value
0 var Skill name
var var Animation name
var 4 Duration?
var 4 Distance?
var 4 Moving flag?
var 4 Total position entries
var 4 Total rotation entries
var 4 Count of motions
var var Motion
var var . . .

The CMX format follows the same structure as the BCF format:

	string-skill-name
	string-animation-name
	float-duration
	float-distance
	flag-moving
	integer-total-positions
	integer-total-rotations
	integer-motions
	<motion> ...

The CMX/BCF skill entry is used in conjunction with an animation in a CFP file.  Unlike the other file types used by the game, a CFP file is not self-describing and it must be interpreted in conjunction with a skill.  The skill entry provides the information missing from the CFP file in order to use the animation. 

Interactions are always from a Sim (adult or child) to some other thing (adult, child, or object).  Thus, a skill name begins with a2a-, a2c-, a2o-, c2a-, c2c-, or c2o- and the rest of the name is a mnemonic for what it does.  For example, a2o-make-bed is the interaction for the adult to make the bed.  The skill name is the full interaction label, such as a2o-make-bed.  It is probably not case-sensitive. 

The animation name is the basename of the CFP file containing the animation.  (That is, the animation name is the filename without the path prefix and without the .cfp extension.)  It invariably begins "xskill-" with the rest of the name being mnemonic of the action.  Often, the mnemonic is the same as the skill name ("xskill-a2o-make-bed") but sometimes it uses a related name ("xskill-adult-make-bed").  Animations can be used by more than one skill; a skill named a2o-coke-drink might use xskill-a2o-pepsi-drink for the animation sequence.  The animation name is case-insensitive, presumably because filenames are case-insensitive on the reference platform. 

The duration is a value that appears throughout the skill.  Its value has never been observed to be other than an integer, and usually a four-digit integer; why it's encoded as a float is a mystery.  The value changes from skill to skill, but values ending in 999 are more common.  It's correlated with the number of frames, so it's probably the duration of the animation (in thousanths of a second?). 

The distance is a float that is non-zero for activities that apparently require some sort of movement.  The moving flag is zero except when the distance is non-zero, in which case it is one.  (Why both are required is yet another mystery.)  It is probably the distance traveled (in grid units?) during the animation. 

The animation file contains a set of position entries and a set of rotation entries.  The total position entries field tells how many position entries there will be in the animation file.  The total rotation entries field tells how many rotation entries there will be in the animation file.  They are the sum of the respective frame counts within the motions. 

Each motion probably specifies the starting position for the animation, so that the game can drive the character to that position while loading the animation.  The count of motions tells how many bones are animated.  Not all bones are animated; in fact, it is common for an anmiation to affect only a part of the body (the upper body or the head). 

BCF Motions
Offset Size Value
0 var Bone name
var 4 Animation frames
var 4 Duration?
var 4 Positions-used flag
var 4 Rotations-used flag
var 4 Position offset
var 4 Rotation offset
var var Props
var 4 Count of time props
var var Time prop
var var . . .

The CMX format follows the same structure as the BCF format:

	string-bone-name
	integer-animation-frames
	float-duration
	flag-positions
	flag-rotations
	offset-to-positions
	offset-to-rotations
	<prop> ...
	number-of-time-props
	<time-prop> ...

The bone name is a name from the skeleton definition, e.g., L_HAND

The animation frames field tells how many frames will be generated when animating this bone.  Although it is possible for bones to use different frame values (that is, bones would stop moving at different times), this could lead to strange animations.  In practice, all the bones for a given skill have the same number of animation frames. 

The duration from the top level repeats here, once for each bone.  It is not known why the value is repeated or what would happen if the value were not the same.  It's probably here for the same reason that the animation frames value is present in each motion, rather than being present once in the skill header, but this is just speculation. 

If the positions-used flag is set, then there will be one position entry in the animation file for each animation frame.  The first XYZ position value for this bone is the position offset field, which is a zero-relative index (that is, if the offset value is ten, the first value for this bone is the eleventh one.)  If the positions-used flag is zero, then the position offset is -1. 

Position offsets in successive bones increase by the number of frame entries.  That is, if the positions-used flag is set, the position offset for one bone will be the next available offset after the range used by the prior bone that has a positions-used flag set.  (The offset on the first bone is zero.) 

Similarly, if the rotations-used flag is set, then there will be one rotation entry in the animation file for each animation frame.  The first WXYZ rotation quaternion for this bone is the zero-relative rotation offset field.  Furthermore, rotation offsets in successive bones also increase by the number of frame entries. 

The rotations-used flag has never been observed to be zero.  In the unlikely event that the motion only contains positions, presumably the rotations-used flag would be zero and the rotation offset would be -1. 

It's interesting to note that the offset values and the totals in the header are completely redundant.  They could have been calculated from the number of animation frames and the two flags.  In fact, if rotations must be present (which makes sense, since a bone that always stayed at a fixed angle to its parent wouldn't look very reasonable), then only one flag would be needed. 

Only two props have been observed in skills and those only occur in skills with "test" in their name.  (As far as can be determined, no skill used in the game has attributes.  Maybe they were intended for future flexibility or maybe they are something that is no longer used.)  One is "canblend" with a value of "0" (a zero) and the other is "name" with a value of "adult." They are always attached to the ROOT bone.  Even though these attributes are attached to a bone, the names suggest that they apply to the whole skeleton, so why they are attached to a bone is a mystery. 

The count of time props has only been observed to have the values zero and one, so this may be a flag indicating that a time props is present.  However, in that case, treating it as a count is at worst harmless.  (If it's a flag, note that it's redundant, since the count within the time props must necessarily be non-zero.) 

BCF Time prop
Offset Size Value
0 4 Count of times
4 var Time list
var var . . .

The CMX format follows the same structure as the BCF format:

	integer-time-lists
	<time-list> ...

The time prop is a very strange object.  It is a vector that is itself a member of a vector, and it has a substructure that is potentially an array, so it would be possible to construct quite a complex structure (essentially a three-dimensional ragged array).  Fortunately, there's never more than one time prop and it's rare that a contained time has more than one item, so actual occurrences tend to be pretty simple. 

The count of times is followed by that many time lists.  This is the only level of the time prop that varies very much.  Usually there are only a few time lists (it's rare to have more than ten) but there are observed cases of up to 153 time lists. 

BCF Time List
Offset Size Value
0 4 Time
4 var Event sublist

The CMX format follows the same structure as the BCF format:

	integer-time
	<event-sublist>

An event is an action that is triggered during an animation.  At the specified time, the events in the sublist are evalutated and the associated activity executed. 

The observed time values range from zero to 14866, but the vast majority of values are below 3000.  Virtually all of the values end in 00, 33, 34, 66, or 67 (and most of the exceptions are near one of those values; 04 and 95, for example).  Of the common values, every other one is skipped (i.e., the sequence starts 000, 067, 133, . . .), so this suggests the time is measured in thousandths of a second and actions are scheduled about every one-fifteenth of a second (but sometimes there's a need for very fine resolution).  The animation frame rate is believed to be fifteen frames per second, so this is in good agreement. 

The event sublist (format is the same as the property sublist described above) contains the events that will be fired at the specified time.  Each such event has a name and a value; the name is the type of action to take and the value is passed to the action routine. 

Although virtually all of these events are in lower case, apparently they are not case sensitive, as there are a few that are in upper case or mixed case.  (Of course, those may be errors; some of the events are clearly misspelled, so the auditing of events is not what it could be.) 

Some of the actions taken for the events are well understood, while some are not.  Some experimentation may help to resolve the functionality.  It is believed that some events are obsolete but have never been removed. 

By far the most common event is "xevt" with a integer value.  The values are in several disjoint ranges.  There's a range from 0 to 21, then a range from 100 to 109, then a range from 200 to 204, then a final range from 300 to 305. 

In the Sims virtual machine, the "animate" instruction is used to start an animation.  If the animation completes, the instruction returns success.  However, if an "xevt" event is encountered, the instruction returns false and places the event value where the code can examine it.  Once the code has handled the event, it goes back to the "animate" instruction to continue the animation.  These events are deeply embedded in the game; many special-purpose events that used to be "burned in" to the animation logic have now been replaced by these general-purpose events. 

Another common event is "footstep" with an integer range of -2 to 2.  In the game, the sound level for footsteps is set by the "closeness" to the source and by the type of flooring, so this could an adjustment to the volume.  The high value could represent running (heavy footsteps), while the low value could be when a character is tiptoeing. 

The last common event is "anchor" with an integer range of 0 to 2.  It's applied to feet, so it could be the "stickiness" of the foot relative to the ground (that is, how much "weight" the character has on it).  For example, a foot with an anchor value of zero could move freely, while a foot with an anchor value of two would move the character instead.  By alternating the anchor value while moving the feet, the character could be made to walk or run.  Not all animations that cause movement use this event, so it may well be obsolete. 

A less common event value is "interruptable" [sic, unless they are referring to a table of interrup values] with an integer range of 0 to 2.  It could be an indication of how hard it is to abort the animation (it's clear that some activities, once started, cannot be stopped). 

A possibly useful event is "sound."  The value usually ends in "vox," suggesting that these are sounds that are emitted during an activity.  This is probably a newly-implemented event since it only occurs in add-on objects downloaded from Maxis (the turkey and the Pepsi machine), so it may be used more in the future. 

Other potentially interesting events are "translation" (the values are numeric, so it's not obvious what is intended) and "pixelate" (which you would think controls the blur effect, but only occurs in one file, adult-toilet1-sittinggo, so it's probably obsolete and replaced by a general-purpose event). 

Oh, and the event "foo" (with a value of "bar") deserves an honorable mention; it's "a metasyntactic expression of angst" in c2o-idle-armscrossed with no obvious effect. 


Reminder: This information is not based on any proprietary knowledge or restricted documentation—it was entirely derived from observation, experiment, and public information, thus it may be inaccurate or incomplete.

Valid XHTML 1.1! Valid CSS!
Copyright © 2001-2008 Dave Baum and Greg Noel. All rights reserved.
The Sims™ is a trademark of Maxis and Electronic Arts.
This page was last modified Wednesday, 03-Mar-2004 18:22:22 UTC.
Made on a Mac
SourceForge