Blog  |   Publications  |   Puzzles  |   About
 

A Field Guide to Your Files

A few of my friends occasionally run into problems caused by invalid file types. This can happen when someone accidentally types the wrong file extension on a file when saving it.

For example, my friend Beatrice, who does transcriptions, often receives audio files that are named "something.mp3", but the files are not actually in the MP3 format. They're something else, like AIFF or WAV. Other times, she may receive a file with no extension at all, and she can't figure out what type it is.

When this happens, it can cause the software that is trying to use the file to complain (while other software mysteriously works fine). You can often fix these problems by figuring out the correct file format, and changing the file extension accordingly.

The way I usually spot this problem is by looking at the actual bytes of the file, using a hex dump utility. This is the nerd equivalent of opening the hood of a car to investigate a problem. The Mac (and other BSD boxes) has a built-in utility called hexdump which you can access from the Terminal command line. Windows doesn't come with one, but you can find a free one online.

If you're on a Mac, and you hate the command-line, I would suggest using HexEdit which is a nice graphical hex dumper/editor (and be sure to read the credits).

To use the Mac/BSD commmand-line hexdump, I typically type something like

hexdump -C filename | head

The result is a list of the first few bytes of the file, which will produce something like this, if you're looking at an ordinary text file:

00000000  48 69 20 65 76 65 72 79  6f 6e 65 2c 0a 0a 48 65  |Hi everyone,..He|
00000010  72 65 20 61 72 65 20 73  6f 6d 65 20 63 68 61 6e  |re are some chan|
00000020  67 65 73 20 74 68 61 74  20 77 69 6c 6c 20 68 61  |ges that will ha|
Each row of the display shows 16 bytes from the file. On the left, you see the address of the first byte, in hexadecimal notation (base 16). Then the values of each of those 16 bytes, in hexadecimal format, and finally, a very useful display of those same bytes in ASCII notation. If any of those bytes contain readable text, as they do here, you'll be able to read them. With the Mac/BSD hexdump, you won't see the (very useful) ascii column, if you don't include the -C option.

I suggest doing this for a few of the common types of files you deal with on a daily basis. Spreadsheets, images, music files etc. You'll notice that files of the same format tend to have a characteristic look to them, which can help you identify them.

Here are some sample dumps from files I commonly deal with. I'm highlighting some of the characteristic things to look for in yellow, to provide a kind of field guide. I hope you find this useful.

GRAPHICS FILES

Adobe Photoshop File (.psd) 00000000 38 42 50 53 00 01 00 00 00 00 00 00 00 04 00 00 |8BPS............| 00000010 0b 71 00 00 10 dd 00 08 00 03 00 00 00 00 00 00 |.q...?..........| 00000020 6f c4 38 42 49 4d 04 04 00 00 00 00 00 07 1c 02 |o?8BIM..........| JPEG image (.jpg) 00000000 ff d8 ff e0 00 10 4a 46 49 46 00 01 01 01 00 48 |????..JFIF.....H| 00000010 00 48 00 00 ff db 00 43 00 06 04 05 06 05 04 06 |.H..??.C........| 00000020 06 05 06 07 07 06 08 0a 10 0a 0a 09 09 0a 14 0e |................| PNG image (.png) 00000000 89 50 4e 47 0d 0a 1a 0a 00 00 00 0d 49 48 44 52 |.PNG........IHDR| 00000010 00 00 03 20 00 00 02 58 08 06 00 00 00 9a 76 82 |... ...X......v.| 00000020 70 00 00 0c d9 69 43 43 50 69 63 63 00 00 78 da |p...?iCCPicc..x?| GIF image (.gif) 00000000 47 49 46 38 39 61 10 00 10 00 b3 0d 00 3f 3f 3f |GIF89a....?..???| 00000010 bf bf bf 2a 2a 2a 55 55 55 7f 7f 7f 15 15 15 40 |???***UUU......@| 00000020 40 40 60 60 60 c0 c0 c0 2f 2f 2f 90 90 90 ff ff |@@```???///...??| Adobe Illustrator File (.ai) 00000000 25 50 44 46 2d 31 2e 34 0d 25 e2 e3 cf d3 0d 0a |%PDF-1.4.%????..| 00000010 31 20 30 20 6f 62 6a 3c 3c 2f 50 61 67 65 73 20 |1 0 obj<</Pages | 00000020 32 20 30 20 52 2f 54 79 70 65 2f 43 61 74 61 6c |2 0 R/Type/Catal|

MUSIC FILES

MP3 Music Track (.mp3) 00000000 49 44 33 03 00 00 00 00 00 6f 54 49 54 32 00 00 |ID3......oTIT2..| 00000010 00 0e 00 00 00 54 68 65 20 4f 74 68 65 72 20 4d |.....The Other M| 00000020 61 6e 54 52 43 4b 00 00 00 02 00 00 00 33 54 50 |anTRCK.......3TP| WAV file (.wav) 00000000 52 49 46 46 62 b7 01 00 57 41 56 45 66 6d 74 20 |RIFFb?..WAVEfmt | 00000010 10 00 00 00 01 00 01 00 44 ac 00 00 88 58 01 00 |........D?...X..| 00000020 02 00 10 00 64 61 74 61 3e b7 01 00 57 01 bd 01 |....data>?..W.?.| AIFF file (.aif) 00000000 46 4f 52 4d 00 2a ef cc 41 49 46 46 43 4f 4d 54 |FORM.*??AIFFCOMT| 00000010 00 00 01 c2 00 01 00 00 00 00 00 00 00 12 43 72 |...?..........Cr| 00000020 65 61 74 6f 72 3a 20 4c 6f 67 69 63 20 50 72 6f |eator: Logic Pro|

TEXT FILES

Text file (often .txt, but not always) 00000000 48 69 20 65 76 65 72 79 6f 6e 65 2c 0a 0a 48 65 |Hi everyone,..He| 00000010 72 65 20 61 72 65 20 73 6f 6d 65 20 63 68 61 6e |re are some chan| 00000020 67 65 73 20 74 68 61 74 20 77 69 6c 6c 20 68 61 |ges that will ha| Microsoft Word/Office (.doc, .xls) 00000000 d0 cf 11 e0 a1 b1 1a e1 00 00 00 00 00 00 00 00 |??.....?........| 00000010 00 00 00 00 00 00 00 00 3e 00 03 00 fe ff 09 00 |........>...??..| 00000020 06 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00 |................| Adobe PDF (.pdf) - Very similar to Adobe Illustrator and other Postscript formats 00000000 25 50 44 46 2d 31 2e 34 0d 25 e2 e3 cf d3 0d 0a |%PDF-1.4.%????..| 00000010 36 20 30 20 6f 62 6a 20 3c 3c 2f 4c 69 6e 65 61 |6 0 obj <</Linea| 00000020 72 69 7a 65 64 20 31 2f 4c 20 34 34 30 36 38 2f |rized 1/L 44068/|

ANIMATION & VIDEO

Flash movie (.swf) 00000000 43 57 53 08 ac 43 00 00 78 9c ed 7a 77 58 93 c9 |CWS.?C..x.?zwX.?| 00000010 d6 f8 49 25 f4 80 94 50 0d 45 4a 00 e9 45 b0 04 |??I%?..P.EJ.?E?.| 00000020 44 44 45 e9 55 d0 80 44 01 11 10 11 01 75 0d bd |DDE?U?.D.....u.?| Quicktime Movie (.mov) 00000000 00 00 00 20 66 74 79 70 71 74 20 20 20 05 03 00 |... ftypqt ...| 00000010 71 74 20 20 00 00 00 00 00 00 00 00 00 00 00 00 |qt ............| 00000020 00 00 03 55 6d 6f 6f 76 00 00 00 6c 6d 76 68 64 |...Umoov...lmvhd|

A handy short cut

Now that you've gotten this far, you may be wondering if there is a handy utility that will look at the file for you, and find these signature characteristics, and identify the file for you. On most Linux and Mac systems, there is a command called file that will do just that. For example:

$ file *.mp3
JTrack.mp3:  MP3 file with ID3 version 2.3.0 tag
JTrack2.mp3: MP3 file with ID3 version 2.3.0 tag
However, I still think it's a good idea to get comfortable "opening the hood" on your files, and I hope you give it a try!


Copyright © 2022 by KrazyDad. All Rights Reserved.
Privacy Policy
Contact Krazydad