If you're going to write image-processing or other code which manipulates huge arrays of bytes, you'd better be assuming endianness and reading and writing 32-bit or larger values, or your code will be dog slow. There's no way around this. If it has to be portable, you'll probably have to write two different version for different endiannesses.