IO:Cross Platform IO Tips

From GPWiki
Jump to: navigation, search

Cross-Platform I/O

If you are writing cross-platform code that reads or writes data from/to a file or over the network, then you have to be aware of issues related to language, compiler, and platform differences. The major issues are data alignment, padding, sizes, byte-order, and certain language-specific issues.

Alignment

Certain data has to be aligned on certain boundaries on certain machines. If the data is not aligned properly, the results can vary from poor performance to a crash. When you read data from an I/O source into memory, make sure it is aligned correctly.


Padding

"Padding" is space added between elements in aggregated data, usually to ensure that they are properly aligned. The amount of padding may differ between compilers and platforms.

C/C++

Do not assume that the size of a struct and the location of its members are the same for all compilers and platforms. Do not read or write an entire struct at once because the program that writes the data might pad it differently than the the program that reads it. This applies to fields, too.

Structure Packing

Some compilers insert padding between struct members in some situations.

In Visual C++, you can use the pack pragma to eliminate this padding:

#pragma pack(1)
struct WHATEVER {
    ...
};
#pragma pack()

This will force the struct members to be 1-byte aligned, at the cost of some efficiency

GCC can do a similar thing in a nicer way (it's not stateful like the pragma) using __attribute__.

Of course, the main reason that the padding would be different in the first place is that the type sizes are different, and if the type sizes are different, packing won't make the naïve way or doing I/O work.

Sizes of Types

The sizes of variable types can differ depending on the platform and compiler. In C/C++, the size of an intrinsic type is completely up to the compiler (within certain constraints). Do not read or write types whose sizes are not explicitly specified.

C/C++

The following relationships are guaranteed among C++'s types: 1 == sizeof(char) <= sizeof(short) <= sizeof(int) <= sizeof(long) sizeof(float) <= sizeof(double) <= sizeof(long double) The range of char is either the range of signed char or the range of unsigned char, but char is a type distinct from both. std::numeric_limits<unsigned char>::max() >= 0xFF std::numeric_limits<unsigned short>::max() >= 0xFFFF std::numeric_limits<unsigned int>::max() >= 0xFFFF std::numeric_limits<unsigned long>::max() >= 0xFFFFFFFF

Take those facts into account when choosing a data type.

Because of byte ordering and such you shouldn't ever be writing out types directly, so their sizes are unimportant. If for some reason you need precisely-sized types, look up C99's stdint.h header or Boost's boost/cstdint.hpp.

The following table describes the usual sizes of types on the x86 architecture:

	int8, uint8                 char, signed char, unsigned char, enum, bool
	int16, uint16               short, signed short, unsigned short, enum
	int32, uint32               int, signed int, unsigned int, long, signed long, unsigned long, enum
	int64, uint64               long, signed long, unsigned long
	int128, uint128             long long, unsigned long long
	float32                     float
	float64                     double

Remember, however, that it would be perfectly legal for char's, short's, int's, and long's to all be 42 bits.

Correction: In C++ an enum can be any size yet it must be big enough to allow the biggest element of the enum fit into the type. Using this knowledge you can force a enums size to be at least a specific size.

Byte-Order

The byte-order of a value is the order that the bytes are stored in memory. Different processors store multi-byte numbers in memory in different orders. Little-endian processors store bytes in the order of least significant byte to most significant byte (in other words, backwards from the way numbers are written). Big-endian processors store bytes in the order of most significant byte to least significant byte (the same way numbers are written). If the byte-order of a value does not match the byte-order of the processor that is reading or writing it, then it must be converted. Also, as a way to standardize the order of bytes transmitted over a network, there is a network byte-order. See Endian Operations for details on converting between the various byte-orders.

Language-Specific Issues

C/C++

char - signed or unsigned?

A little-known fact is that char can be signed or unsigned by default -- it is up to the compiler( yet you can tell the compiler what to use). The result is that when you convert from char to another type (such as int), the result can differ depending on the compiler. For example:

char    x;
int     y;

read( fd, &x, 1 ); // read a byte with the value 0xff
y = x;             // y is either 255 or -1, depending on the compiler

Do not read a value into a generic char. Always specify signed or unsigned.