Apple’s Siri – The Missing Documentation
Let's face it: Apple's Siri is an amazing evolution to Voice Recognition technologies but it was implemented in a very restrictive manner. Siri is exclusive to iPhone 4S and everyone who is Geek enough it trying to set it free, accordingly this is my take for pushing this efforts forward: The Missing Siri Protocol Documentation.
Before we start, you must know some facts about Apple's way in breaking some standards:
- HTTP Protocol: Apple breaks it by using ~20 GB Content-Length header and removing the Content-Encoding header. Also Apple uses Connection: Close header as a HTTP response to state that the Server is okay to open a Siri connection.
- Deflate Algorithm: Apple uses zlib implementation for deflate algorithm and breaks it by using Sync Flush option which removes the checksum and decompressed data size from Header and Trailer of the compressed data. Also Apple removes the entire zlib Header from all of the compressed packets except the first one.
- HTTP Streaming: Apple breaks it by using a single Always-on HTTP connection between it's Server and Siri Client, other than that, all the data is a compressed binary stream without any HTTP warping. In fact, the data is sent "pure" without any kind of warping what-so-ever.
- Property List: Apple uses Plists as it's main data container, even the audio files are split into many Plists and stored inside it with a key called Speech Packet and a value of a Byte Array. This Byte Array is one part of the audio file that's being transmitted.
The above points ensures one thing: Any software that works on HTTP layer will break when dealing with a Siri connection. The only way to deal with Siri is using any software that works on the TCP layer. Also if you're considering WireShark or any other Packet Sniffing Software I'll inform you upfront: It's useless! because Apple uses SSL. The entire Siri connection is encrypted end-to-end so all your going to see is Packets of Digital Noise!
Now, Let's start:
Part 1: The Connection from Siri Client to Apple Server:
1.0 iDevice initiate a HTTPs Request to guzzoni.apple.com with four specific Headers encoded in ASCII.
ACE /ace HTTP/1.0 Host: guzzoni.apple.com User-Agent: Assistant(iPhone/iPhone4,1; iPhone OS/5.0.1/9A405) Ace/1.0 Content-Length: 2000000000 X-Ace-Host: 2b861830-4146-11e1-b86c-0800200c9a66 \r\n (Empty Line)
1.1 The HTTP Method is Adaptive Communication Environment (ACE). This means it's a programmable HTTP connection and not our typical GET or POST Methods. Anyway, it has no effect at all other than breaking HTTP compliant software. Also Apple uses it to ignore any non-ACE HTTP Requests.
1.2 The custom User-Agent identify Siri as Assistant and stating the requesting device hardware and Operating System with it's version. Also Apple uses it to ignore any non-Siri approved hardware like iPods, iPads or iPhones before 4S.
1.3 The insane Content length have no effect on Siri inner-works, it is just here to break your HTTP-Layer software. This specific header forces anyone trying to play with Siri to go down to TCP layer and get his hands dirty with SSL/TLS.
1.4 The X-Ace-Host is the most interesting header, It is a Unique Identifier generated by Apple Server for each Siri-Approved device. It changes daily by a request from Apple Server to Siri Client device after a connection has been established with the old X-Ace-Host. Apple Server has the option to force the Siri Client device to drop the connection with old X-Ace-Hot and re-connect with the new one, but Apple Server usually accept the connection as being the last one with the old X-Ace-Host.
1.5 The line is an empty line. It is important because it marks the end of the Headers and the start of the Binary encoded data.
2.0 The Binary data starts with a four bytes Magic Code, to understand it we will convert it to Hexadecimal
Raw: 1010101011001100111000000010 Hex: AACCE02
2.1 This code marks the start of the Compressed Binary data.
3.0 The compressed binary data in hexadecimal:
78DA62626060724E2AC8C92C2E3130B8C2C8C4CCC2CAC6CE119A9C93585C1C9A98 6159C7B2F6F7DB8A8F63C3750C28C85D3DAFE954E227257BA85927A5DFC45832F3 7F7DF97960C8CBD6E6164E3503BA9FEF25CDBE35B3D1CCE54291CDC79AC4CA32C8 981834190419C41964199418FC1862199A18AA191A18B611AC33A86AB0C7F20763 231422DE76740014CAC00000000FFFF626260608C232D04A516971465268322172 8A2E268606CEAA46B64626AA26B626162A06BE96C6AA8EB68E1E8626CE96A666C6 26E8E232D5DE4E48A2942326D092865C50B48A1AA4DCD4DCCCC510E4E4D2C4ACE8 61FC48C0A00000000FFFF020631C36458105F860431346821C10C0E6060183AB91 A993A3BBAB9EA9ABA9899EA9A189ABAE85A9A3999E91A9B5B98989BB8381BB9999 AC638E7009DE49C9F57925A5182234039F845A5156C3DCC58EEA1C00000000FFFF
3.1 The Header of the compressed data starts with 78DA which is the Header of a zlib compressed data.
3.2 The Trailer of the compressed data ends with 0000FFFF which is the Trailer of zlib when the option Sync Flush is used while compressing the data.
3.3 Sync Flush option means that we don't have the Chechsum or the size of the uncompressed data.
3.4 The compressed data is actually a Binary Stream of data which will be uncompressed to another Binary Stream of data! There is no files to search for yet.
4.0 The uncompressed binary data in hexadecimal:
020000009862706C6973743030D301020304050655616365496455636C61737355 67726F75705F102441383841343443382D423135352D343946372D383643312D45 37464230374136353639455F100F5365745265737472696374696F6E735F101463 6F6D2E6170706C652E6163652E73797374656D080F151B21485A00000000000001 01000000000000000700000000000000000000000000000071020000009362706C 6973743030D301020304050655616365496455636C6173735567726F75705F1024 33443536313935322D344637432D344539452D393034372D464236353642383844 3836325C436C656172436F6E746578745F1014636F6D2E6170706C652E6163652E 73797374656D080F151B2148550000000000000101000000000000000700000000 00000000000000000000006C
4.1 The first 5 Bytes when converted to hexadecimal will have the parttern of 0200XXXXXX.
4.2 When with convert the hexadecimal number XXXXXX into a 32-bit Integer it will tell us the length of the next binary data which represent a binary Propriety List.
4.3 Since we know the length of the binary data to read in advance, we can easily now de-serialize the Plists. In the above sample data, we have two Plists.
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd"> <plist version="1.0"> <dict> <key>aceId</key> <string>B2ED6435-9F83-4E42-8E3F-03ABE52807D6</string> <key>class</key> <string>LoadAssistant</string> <key>group</key> <string>com.apple.ace.system</string> <key>properties</key> <dict> <key>assistantId</key> <string>34923801-1956-4911-a7d3-1cb5de46b5ba</string> <key>sessionValidationData</key> <data> AoRtzxP6LG/0lQrZXPOmqAtrBaBTRzxXeDg9sxIn+qByAAAA4AMAAABJAAAA gMrURnb01X7gHigsBfLGoVPXRZcozLMCG4CbLGR369bSPGeRm8BGwzUsPWzn 3J2SlKmAtzc4ZG2F9EX+7pqcolyLCqA/0BCVXB4TguyxWkCIGGgwuxmiCO2l QMqox1MAYG+pJrh/5zMu0Q2HK1inqjimgAP2ubDmBEUy2hMmwL5kAAAAAAAA AE8BvASX9FH7UQCHAYj0TP5bBbxoRDIAAAA2BASNv6TFiWOxLWSIVAGcARRC Txf22u3q76yhiosQ8117o1Y9RPmT263ZENsNbapJ9+gRg5m/ </data> <key>speechId</key> <string>d5da031e-edbc-43a9-aaa0-b33310ecf4d5</string> </dict> </dict> </plist>
4.4 The entire communication between Siri Client and Apple is done in this form. A series of compressed binary Plists.
Note: I'm in the process of documenting the Plists exchanged between Siri Client and Apple Server. Till now I found more then 22 different kinds of Plists which are holding all kinds of data, from Commands to Contacts, GPS, Authorization data, etc.
5.0 After the last Plist, the next 5 Bytes will look like this in hexadecimal:
0300000001 0300000002 0300000003 0300000004 0300000005 0300000006
5.1 The new pattern is 0300XXXXXXX and following the same conversion as before, we will find this pattern always incremented by 1.
5.2 This pattern is in fact the Pings the Client send to Apple Server when it finishes sending some of the data and waiting fora respond.
Note: I'll keep updating this post until i finish documenting the entire protocol. Sorry because it will take my days to finish it since my daily job consumes a lot of my brain cycles