8 bytes that’ll drive you mad

Today I’ve discovered a very strange bug in my software: after using an internal viewer in Files with PkgInfo in app package, Files crashes randomly and alerting on memory corruption.
Something like this:
Thread 1: EXC_BAD_ACCESS (code=1, address=0x0)
or this:
Files(85331,0x7fff7e8cc310) malloc: *** error for object 0x600000018a70: Heap corruption detected, free list canary is damaged
*** set a breakpoint in malloc_error_break to debug

A following story is below.


It was funny, since this viewer can easily handle a gigabyte-scale files. So I blamed the last refactoring and checked previous releases. They crash too, with the same symptoms.
Ok, I started investigations.
PkgInfo is 8-bytes file with text APPL????.
Copied this file to another place (some system-level locks maybe?) and viewed it. Crash.
Renamed file. Crash.
Created another 8-byte file with content 01234567 and tried it. Crash.
Created 7-byte and 9-byte files and tried them. Worked like a charm. Hmm…
It was something definitely related with alignment logic somewhere in low-level modules.
My suspicions fall onto data analysis module, which answers the questions like “what is the likely encoding for this bulk of data?” or “is this bulk of data binary or text?”.
Turned off this module and tried 8-byte file again. Crash.
Things becoming worse since sometimes viewer crashed even before it’s logic (layout, render, navigation etc) get to work.
Desperately I start tracking everything down from the very beginning of viewer’s initialization and came to my string decoders. At least it was a place where memory can become corrupted and I’ve changed encoding to Western (Mac OS Roman) from UTF-8 to see any difference. Worked like a charm. Hmm…
A few minutes later I finally stopped on relatively old UTF-8 conversion function:
void InterpretUTF8BufferAsIndexedUniChar(
const unsigned char* _input,
size_t _input_size,
unsigned short *_output_buf, // should be at least _input_size 16b words long
uint32_t *_indexes_buf, // should be at least _input_size 32b words long
size_t *_output_sz, // size of an output
unsigned short _bad_symb // something like ‘?’ or U+FFFD
)

And after tracing the decoding of 8 bytes “01234567” I finally stopped at a humble line:
    *_output_buf = 0;
}
That’s a zero-terminating of resulting UniChar buffer. But! -> // should be at least _input_size 16b words long
When looked into upper-level calling code there was a precise memory allocation for this buffer:
m_DecodeBuffer = (UniChar*) calloc(m_FileWindow->WindowSize(), sizeof(UniChar));
So when file is 8-bytes long – the allocated memory is precisely 8 bytes long (when is was 7 or 9 bytes then allocation size was aligned onto some boundary and writing a zero-terminator didn’t cause any effect) and zeroing it (m_DecodeBuffer[8]=0) resulting in corruption of a following heap controlling structures, which causes a crash later.
Of course this zero-termination was unnecessary since every later function operating with UniChar buffer don’t rely on it’s zero-terminator.
The quest was complete.

Leave a Reply

Your email address will not be published. Required fields are marked *