When I started this blog, I didn't make a resolution that I would update it every week, or every month, or at any time at all, but I did intend that it would become a repository for all my random musings, as the title suggests...
However, this has been somewhat sidetracked by the Kickstarter project Zac & I started, which has not only taken up a lot more of my time than I was expecting, but also has created it's own little side-blog...
Since the subject matter is RFID, which I'm sure at least a few potential readers might be interested in, I felt it would be remiss of me not to include those entries here:
Sending messages to a tag
Baseband Basics - How all the different RFID modulation schemes work under the hood.
Higher level encoding - Modulation schemes layered above the baseband.
HITAG2 FTW! - Reading and writing Hitag2 tags.
and, most recently:
Fun with UIDs - WFT is a UID anyway? (and yes, I do know the difference between "your" and "you're".. it was a typo, OK? :)
Enjoy!
Adam
Obviously a Major Malfunction...
Random musings that would otherwise have no means of escape from my head.
Tuesday 25 February 2014
Friday 2 August 2013
RFIDler - An open source Software Defined RFID Reader/Writer/Emulator
I've said it before and I'll say it again: I don't understand how it works.
Not only that, but I don't want to understand, and I don't need to understand!
Well, that's not quite true - I need to understand enough to know which bits I don't need to understand, but then that's it! Stop! Enough already!!!
RFID is, as with a lot of these technologies, mysterious by nature. It relies on strange physical phenomena like "induction" and "electro-magnetism" and "near-fields", etc. Yes, what we Code Monkeys like to call "Magic Moonbeams". It's all very nasty and analoguey. I don't like it. Give me the nice binary digital, please!
So in my never ending quest to find tools that convert the scary analogue world into a nice friendly digital one, RFID is clearly a prime candidate. There are lots of RFID/NFC devices out there these days, and you've probably got one or two in your pocket right now - whether it's your car keys, alarm fob, door entry card, credit card, etc. Of course, there are endless varieties of RFID readers to access them with, but what I'd like is something that reads them all, and meets my standard criteria: small and cheap.
To be fair, there are plenty of readers out there that seem to meet this criteria. You can buy a simple RFID USB reader for as little as 10-15 quid, but you'll find that it's of limited use as it will almost certainly be dedicated to one 'standard', and you'd therefore need dozens of them to be able to read 'everything'. There are also tools like the Proxmark3 that are truly universal and can read pretty much anything, but, unfortunately, these are not cheap.
However, it is certainly worth looking at the PM3 as it really is quite an amazing bit of kit - often described as the 'Swiss Army Knife of RFID', it is versatile enough to read pretty much any tag in the standard LF/HF frequency ranges, so will at least be useful in giving us an idea as to what we're up against... We'll be using it later to look at some specific examples.
So, going right to the beginning, what does an RFID tag actually do?
Well, it depends. There are basically two functions, and the rest is 'details':
Firstly, pretty much every RFID tag will IDENTIFY itself. That is function one.
Secondly, some tags will store DATA. That is function two.
The 'details' revolve around how it does those two things - is it blindly spitting out an ID and/or DATA, or is there some security or other command structure built around it? That's the simple view, and if you want the longer, more detailed explanation, there are entire volumes written about it. The 'details' can run into hundreds of pages, so I'm not going to even start. My goal, here, is to talk about the low level fundamental communications that seperate the evil analogue underworld from the lovely, friendly digital fairy garden, where we all like to play.
And it all begins with our friend "induction". At the very low level, RFID/NFC relies on the fact that if you energise a coil and place another coil near it, the second coil will pick up some of that energy through induction. Moreover, the two coils become magically (or magnetically, depending which world you come from) 'coupled', so it's possible for the second coil to effect the voltage on the first, and it does this by shorting itself out. If it does so, there will be a drop in the voltage on the first coil, and this is called "DAMPING". That's it. In a nutshell, that's how RFID works: the coils 'talk' to each-other by either sending energy (from the READER), or causing an energy drop (from the TAG).
In a little more detail, what happens is this (and for the purpose of this section, we'll assume the TAG is the dumbest type that just spits out an ID): the READER energises its coil by powering it on and off repeatedly. For a standard LF system, this will be 125,000 times per second, or 125KHz. This is known as the 'CARRIER'. The TAG, when placed in this field, will scavenge some power from its now inductively-coupled coil and come to life. If the reader needs to send an extra 'wake up' (or any other) command, it can do so by simply switching it's CARRIER off altogether for short periods. The TAG stores enough energy that it can keep running for long enough to interpret these gap periods, even though it's temporarily lost its power source. The length of the CARRIER signal between the gaps will usually signify a 0 or a 1, so like this the READER can send binary messages.
In other words, they're talking 'ASK': Amplitude Shift Keying. Data is sent by shifting the amplitude of the signal. More accurately, they're talking 'OOK': On Off Keying. A message going from the READER to the TAG is signalled by the CARRIER being ON or OFF and the message coming back is either DAMPED or UN-DAMPED.
Things get a lot easier to understand if we visualise them, so here is a plain 125KHz CARRIER viewed from an oscilloscope:
And here is the READER sending a message to the TAG:
In this case it's using a long pulse to represent a '1' and a short a '0', so the message here is '11000', or 'START_AUTH' if you're a HITAG2.
As I mentioned, the TAG can also send messages back to the READER by shorting it's own coil and DAMPING the READER's coil. The result looks something like this:
It looks quite similar to when the READER sends a message, but you'll note that it can't reduce the CARRIER all the way down to nothing - instead it's either a 'DAMPED' or 'UN-DAMPED' wave. This is because it's not directly controlling the voltage on the READER's coil, only affecting it through induction. However, it is clearly still perfectly readable. In this case, if we take the damping action as a '1' and non-damped as a '0', we get '1010101010010110011010', which is, in fact, a MANCHESTER encoded bitstream.
So what has MANCHESTER got to do with anything? Well, this is where it starts to get interesting - if you look in details at the specs for these kind of tags, you'll find that they mention modulation schemes such as 'Manchester', 'Bi-Phase', 'FSK', 'PSK', 'NRZI', 'ASK'...
WTF? We've already established that the two devices can only do ASK, so where does all this FSK/PSK/Manchester malarky come from???
This is where I think a lot of the confusion lies. Once you start trying to do something other than what a particular manufacturer intended with an RFID tag, you quickly get lost in a mire of conflicting modulation schemes and other irrelevancies. This particularly applies to the readers as well - if I want to find a reader that gives me access to the raw data, just forget it. Most readers want to go that little bit further and fully de-modulate the signal for me. And to do that, they need to know exactly how the signal was modulated in the first place... And to know that, they need to know which standard tag type they're going to read, and so on, and so on...
It's easy to see how we've ended up with this situation. A lot of these devices were built back in the days when computing power didn't come cheap, so the designers tried to do as much as possible in the analogue world before handing over to the digital at the last moment. This meant building circuitry that not only handled the low-level ASK communication, but also the secondary modulation schemes that were layered on top. The RF world have been going through a revolution, moving to SDR, in which they're doing the same thing - handling only the very low level analogue stuff in circuitry and doing the rest in software on small powerful microcomputers, and it's time we did the same for RFID!
So when I asked our tame (well, nearly... he's mostly house-trained and has at least stopped biting the postman) Chip Monkey to build me an RFID reader that only gave me the lowest level data, he was somewhat surprised to be unable to find an existing reference circuit that exactly fit the bill. We both thought it would be simple, but no - every circuit was tied to a further demodulation scheme - FSK, PSK, Bi-Phase, etc., and some of them were horrendous! They look more like they're designed to take you to the moon, rather than read a few bits from a wibbly coil! For example, here is Microchip's 200 page document with separate example circuits for ASK, PSK and FSK.
However, after a bit more searching, we found a 'simple' design: 'The World's Simplest RFID Reader'. This led to an 'improved' version, although it's still described as a 'DIY FSK RFID Reader', so possibly still not quite what we're looking for.
But hang on a minute, surely it doesn't matter what type of tag they were using it to read. We've established that the low level communication is ALWAYS using ASK, so the final demodulation of FSK/PSK/Manchester/Whatever is going to be handled by the microprocessor, so we should be able to use this circuit to do any type of tag, not just FSK.
So this brings us back to the question of these 'extra' modulation schemes. Where do they come from, and how do we deal with them?
Let's use the PM3 to take a look at the raw data we get from each type of tag... The PM3 will act as a basic reader circuit and filter out the CARRIER, leaving only the effect of the DAMPING.
This is an ASK modulated tag:
Exactly what we would expect - a straightforward square wave created by the CARRIER either being DAMPED or UN-DAMPED.
Here is an FSK tag:
Note the two different pulse types - thin ones and fat ones, so we really are seeing different frequency pulses. Weird!
And the strangest yet, PSK:
Now that's just crazy. What the hell is that all about?
First, it helps to understand exactly what we're looking at. The green line is showing us the voltage on the READER coil, after the tag has done it's DAMPING (or not). We don't really care what the scale is, just that the bottom of the screen is 0 volts or fully DAMPED and the top of the screen is *some* volts or UN-DAMPED. The circuit that produces this is effectively looking for the presence of a 125KHz CARRIER and raising or lowering the output line accordingly. How it works is irrelevant. I just don't care. That is Chip Monkey's problem! :)
So now I know what the lines mean, the question is how they get to look like they do. The first one is simple: ASK/OOK modulation is either ON or OFF, so we get a line at the top of the screen when we're UN-DAMPED and a line at the bottom when we're DAMPED.
If we think of what the tag's doing as creating a mask which either lets the CARRIER through or not, this makes perfect sense. Here I've marked the image with a red blob whenever the tag is DAMPING its coil:
So far, so obvious. Let's look at the FSK signal in the same way:
So now, instead of signalling data directly by DAMPING for a 0 or a 1, we are creating a whole new CARRIER by DAMPING for different periods and allowing short or long pulses of the original CARRIER through. The fully DAMPED signal doesn't mean anything, it's the width and number of pulses of the UN-DAMPED signal that carries the information. In this case, 5 fat spikes means '0' and 7 thin spikes means '1' (or 12 thin spikes means '11'), so we've got '011010'. Neat!
OK, so what about the crazy PSK guy?
Well, this is interesting because what's actually happening here is that the coil is being DAMPED relative to the frequency of the original CARRIER itself. In this case, it's going at exactly half the speed, so it's effectively blocking half the CARRIER pulses, and most of the time producing a signal that isn't quite strong enough to reach the top of the screen, and doesn't have time to reach the bottom either, which gives us the chunky bits in the middle. However, whenever there is a phase change, the resulting half-bit repetition means that the DAMPING or UN-DAMPING now lasts for a full clock cycle of the original CARRIER, so we get a little burst of HIGH or LOW popping through, hence the spikes. In this images I've marked the mask in pink for where we're DAMPING 50% of the time, and the phase shifts are red or black as normal.
We can see that the fully DAMPED sections line up with the LOW spikes and the UN-DAMPED with the HIGH. It's harder to read these by eye, but basically whenever there is a phase change (i.e. a spike in either direction), the bit value changes. If there is no phase change, the bit value stays the same. The number of bits depends on how long the gap is between the spikes, so if one was to overlay a grid and you knew how long a bit period was, you could simply count off the periods between bit changes and you've got your bitstream.
If we assume we're starting with a 0, this decodes as: '01101010001111100111000100010110000111010011100101101100001'. Easy-peasy! :)
Great, so now we understand what's going on, let's see if the circuit we've found will do the job...
As it turned out, the answer was 'not quite' for two reasons:
1. It couldn't handle the 'down' spikes of PSK.
2. As soon as we started playing with it, we were hit with a flurry of feature-creep! Yes, we wanted it to do more...
For a start, why have only a reader when you could also have a writer? Unlike some technologies, there is really no difference in the RFID world between a reader and a writer. As long as the reader can send signals to the tag, it can send 'write' and 'data' commands, just as easily as reading the tag's emissions. Well, almost as easily - it just needs to be able to switch off it's coil as well as modulating it. Clearly that's not an issue.
And why have only a writer when you could have an emulator? Again, the only fundamental difference between a TAG and a READER/WRITER is that one energises it's coil and the other grounds it. Everything else works pretty much the same, so given the right hardware, we have the perfect platform for a Software Defined device...
So Chip Monkey built us one. And it was good. In fact, it was so good that we decided that instead of just publishing the schematics and blogging about it, we would try and give it life and free it into the world...
Accordingly, RFIDler was born, and is now live on Kickstarter:
http://www.kickstarter.com/projects/1708444109/rfidler-a-software-defined-rfid-reader-writer-emul
The full details and some videos of the prototype in action are be on the project page, but in short the aim is to produce a tool for LF RFID research, for under £30.00, and a cut-down version that be embedded into your own hardware projects for under £20.00.
I have started a firmware project based around the PIC32, which runs on the aforementioned UBW32 BitWhacker, to get the ball rolling, and will add ports to other platforms as we (or you) test them...
The firmware will live here:
https://github.com/ApertureLabsLtd/RFIDler
If you'd like to have one of these to play with, please go and sign up!
Not only that, but I don't want to understand, and I don't need to understand!
Well, that's not quite true - I need to understand enough to know which bits I don't need to understand, but then that's it! Stop! Enough already!!!
RFID is, as with a lot of these technologies, mysterious by nature. It relies on strange physical phenomena like "induction" and "electro-magnetism" and "near-fields", etc. Yes, what we Code Monkeys like to call "Magic Moonbeams". It's all very nasty and analoguey. I don't like it. Give me the nice binary digital, please!
So in my never ending quest to find tools that convert the scary analogue world into a nice friendly digital one, RFID is clearly a prime candidate. There are lots of RFID/NFC devices out there these days, and you've probably got one or two in your pocket right now - whether it's your car keys, alarm fob, door entry card, credit card, etc. Of course, there are endless varieties of RFID readers to access them with, but what I'd like is something that reads them all, and meets my standard criteria: small and cheap.
To be fair, there are plenty of readers out there that seem to meet this criteria. You can buy a simple RFID USB reader for as little as 10-15 quid, but you'll find that it's of limited use as it will almost certainly be dedicated to one 'standard', and you'd therefore need dozens of them to be able to read 'everything'. There are also tools like the Proxmark3 that are truly universal and can read pretty much anything, but, unfortunately, these are not cheap.
However, it is certainly worth looking at the PM3 as it really is quite an amazing bit of kit - often described as the 'Swiss Army Knife of RFID', it is versatile enough to read pretty much any tag in the standard LF/HF frequency ranges, so will at least be useful in giving us an idea as to what we're up against... We'll be using it later to look at some specific examples.
So, going right to the beginning, what does an RFID tag actually do?
Well, it depends. There are basically two functions, and the rest is 'details':
Firstly, pretty much every RFID tag will IDENTIFY itself. That is function one.
Secondly, some tags will store DATA. That is function two.
The 'details' revolve around how it does those two things - is it blindly spitting out an ID and/or DATA, or is there some security or other command structure built around it? That's the simple view, and if you want the longer, more detailed explanation, there are entire volumes written about it. The 'details' can run into hundreds of pages, so I'm not going to even start. My goal, here, is to talk about the low level fundamental communications that seperate the evil analogue underworld from the lovely, friendly digital fairy garden, where we all like to play.
And it all begins with our friend "induction". At the very low level, RFID/NFC relies on the fact that if you energise a coil and place another coil near it, the second coil will pick up some of that energy through induction. Moreover, the two coils become magically (or magnetically, depending which world you come from) 'coupled', so it's possible for the second coil to effect the voltage on the first, and it does this by shorting itself out. If it does so, there will be a drop in the voltage on the first coil, and this is called "DAMPING". That's it. In a nutshell, that's how RFID works: the coils 'talk' to each-other by either sending energy (from the READER), or causing an energy drop (from the TAG).
In a little more detail, what happens is this (and for the purpose of this section, we'll assume the TAG is the dumbest type that just spits out an ID): the READER energises its coil by powering it on and off repeatedly. For a standard LF system, this will be 125,000 times per second, or 125KHz. This is known as the 'CARRIER'. The TAG, when placed in this field, will scavenge some power from its now inductively-coupled coil and come to life. If the reader needs to send an extra 'wake up' (or any other) command, it can do so by simply switching it's CARRIER off altogether for short periods. The TAG stores enough energy that it can keep running for long enough to interpret these gap periods, even though it's temporarily lost its power source. The length of the CARRIER signal between the gaps will usually signify a 0 or a 1, so like this the READER can send binary messages.
In other words, they're talking 'ASK': Amplitude Shift Keying. Data is sent by shifting the amplitude of the signal. More accurately, they're talking 'OOK': On Off Keying. A message going from the READER to the TAG is signalled by the CARRIER being ON or OFF and the message coming back is either DAMPED or UN-DAMPED.
Things get a lot easier to understand if we visualise them, so here is a plain 125KHz CARRIER viewed from an oscilloscope:
And here is the READER sending a message to the TAG:
In this case it's using a long pulse to represent a '1' and a short a '0', so the message here is '11000', or 'START_AUTH' if you're a HITAG2.
As I mentioned, the TAG can also send messages back to the READER by shorting it's own coil and DAMPING the READER's coil. The result looks something like this:
It looks quite similar to when the READER sends a message, but you'll note that it can't reduce the CARRIER all the way down to nothing - instead it's either a 'DAMPED' or 'UN-DAMPED' wave. This is because it's not directly controlling the voltage on the READER's coil, only affecting it through induction. However, it is clearly still perfectly readable. In this case, if we take the damping action as a '1' and non-damped as a '0', we get '1010101010010110011010', which is, in fact, a MANCHESTER encoded bitstream.
So what has MANCHESTER got to do with anything? Well, this is where it starts to get interesting - if you look in details at the specs for these kind of tags, you'll find that they mention modulation schemes such as 'Manchester', 'Bi-Phase', 'FSK', 'PSK', 'NRZI', 'ASK'...
WTF? We've already established that the two devices can only do ASK, so where does all this FSK/PSK/Manchester malarky come from???
This is where I think a lot of the confusion lies. Once you start trying to do something other than what a particular manufacturer intended with an RFID tag, you quickly get lost in a mire of conflicting modulation schemes and other irrelevancies. This particularly applies to the readers as well - if I want to find a reader that gives me access to the raw data, just forget it. Most readers want to go that little bit further and fully de-modulate the signal for me. And to do that, they need to know exactly how the signal was modulated in the first place... And to know that, they need to know which standard tag type they're going to read, and so on, and so on...
It's easy to see how we've ended up with this situation. A lot of these devices were built back in the days when computing power didn't come cheap, so the designers tried to do as much as possible in the analogue world before handing over to the digital at the last moment. This meant building circuitry that not only handled the low-level ASK communication, but also the secondary modulation schemes that were layered on top. The RF world have been going through a revolution, moving to SDR, in which they're doing the same thing - handling only the very low level analogue stuff in circuitry and doing the rest in software on small powerful microcomputers, and it's time we did the same for RFID!
So when I asked our tame (well, nearly... he's mostly house-trained and has at least stopped biting the postman) Chip Monkey to build me an RFID reader that only gave me the lowest level data, he was somewhat surprised to be unable to find an existing reference circuit that exactly fit the bill. We both thought it would be simple, but no - every circuit was tied to a further demodulation scheme - FSK, PSK, Bi-Phase, etc., and some of them were horrendous! They look more like they're designed to take you to the moon, rather than read a few bits from a wibbly coil! For example, here is Microchip's 200 page document with separate example circuits for ASK, PSK and FSK.
However, after a bit more searching, we found a 'simple' design: 'The World's Simplest RFID Reader'. This led to an 'improved' version, although it's still described as a 'DIY FSK RFID Reader', so possibly still not quite what we're looking for.
But hang on a minute, surely it doesn't matter what type of tag they were using it to read. We've established that the low level communication is ALWAYS using ASK, so the final demodulation of FSK/PSK/Manchester/Whatever is going to be handled by the microprocessor, so we should be able to use this circuit to do any type of tag, not just FSK.
So this brings us back to the question of these 'extra' modulation schemes. Where do they come from, and how do we deal with them?
Let's use the PM3 to take a look at the raw data we get from each type of tag... The PM3 will act as a basic reader circuit and filter out the CARRIER, leaving only the effect of the DAMPING.
This is an ASK modulated tag:
Exactly what we would expect - a straightforward square wave created by the CARRIER either being DAMPED or UN-DAMPED.
Here is an FSK tag:
Note the two different pulse types - thin ones and fat ones, so we really are seeing different frequency pulses. Weird!
And the strangest yet, PSK:
Now that's just crazy. What the hell is that all about?
First, it helps to understand exactly what we're looking at. The green line is showing us the voltage on the READER coil, after the tag has done it's DAMPING (or not). We don't really care what the scale is, just that the bottom of the screen is 0 volts or fully DAMPED and the top of the screen is *some* volts or UN-DAMPED. The circuit that produces this is effectively looking for the presence of a 125KHz CARRIER and raising or lowering the output line accordingly. How it works is irrelevant. I just don't care. That is Chip Monkey's problem! :)
So now I know what the lines mean, the question is how they get to look like they do. The first one is simple: ASK/OOK modulation is either ON or OFF, so we get a line at the top of the screen when we're UN-DAMPED and a line at the bottom when we're DAMPED.
If we think of what the tag's doing as creating a mask which either lets the CARRIER through or not, this makes perfect sense. Here I've marked the image with a red blob whenever the tag is DAMPING its coil:
So far, so obvious. Let's look at the FSK signal in the same way:
So now, instead of signalling data directly by DAMPING for a 0 or a 1, we are creating a whole new CARRIER by DAMPING for different periods and allowing short or long pulses of the original CARRIER through. The fully DAMPED signal doesn't mean anything, it's the width and number of pulses of the UN-DAMPED signal that carries the information. In this case, 5 fat spikes means '0' and 7 thin spikes means '1' (or 12 thin spikes means '11'), so we've got '011010'. Neat!
OK, so what about the crazy PSK guy?
Well, this is interesting because what's actually happening here is that the coil is being DAMPED relative to the frequency of the original CARRIER itself. In this case, it's going at exactly half the speed, so it's effectively blocking half the CARRIER pulses, and most of the time producing a signal that isn't quite strong enough to reach the top of the screen, and doesn't have time to reach the bottom either, which gives us the chunky bits in the middle. However, whenever there is a phase change, the resulting half-bit repetition means that the DAMPING or UN-DAMPING now lasts for a full clock cycle of the original CARRIER, so we get a little burst of HIGH or LOW popping through, hence the spikes. In this images I've marked the mask in pink for where we're DAMPING 50% of the time, and the phase shifts are red or black as normal.
We can see that the fully DAMPED sections line up with the LOW spikes and the UN-DAMPED with the HIGH. It's harder to read these by eye, but basically whenever there is a phase change (i.e. a spike in either direction), the bit value changes. If there is no phase change, the bit value stays the same. The number of bits depends on how long the gap is between the spikes, so if one was to overlay a grid and you knew how long a bit period was, you could simply count off the periods between bit changes and you've got your bitstream.
If we assume we're starting with a 0, this decodes as: '01101010001111100111000100010110000111010011100101101100001'. Easy-peasy! :)
Great, so now we understand what's going on, let's see if the circuit we've found will do the job...
As it turned out, the answer was 'not quite' for two reasons:
1. It couldn't handle the 'down' spikes of PSK.
2. As soon as we started playing with it, we were hit with a flurry of feature-creep! Yes, we wanted it to do more...
For a start, why have only a reader when you could also have a writer? Unlike some technologies, there is really no difference in the RFID world between a reader and a writer. As long as the reader can send signals to the tag, it can send 'write' and 'data' commands, just as easily as reading the tag's emissions. Well, almost as easily - it just needs to be able to switch off it's coil as well as modulating it. Clearly that's not an issue.
And why have only a writer when you could have an emulator? Again, the only fundamental difference between a TAG and a READER/WRITER is that one energises it's coil and the other grounds it. Everything else works pretty much the same, so given the right hardware, we have the perfect platform for a Software Defined device...
So Chip Monkey built us one. And it was good. In fact, it was so good that we decided that instead of just publishing the schematics and blogging about it, we would try and give it life and free it into the world...
Accordingly, RFIDler was born, and is now live on Kickstarter:
http://www.kickstarter.com/projects/1708444109/rfidler-a-software-defined-rfid-reader-writer-emul
The full details and some videos of the prototype in action are be on the project page, but in short the aim is to produce a tool for LF RFID research, for under £30.00, and a cut-down version that be embedded into your own hardware projects for under £20.00.
I have started a firmware project based around the PIC32, which runs on the aforementioned UBW32 BitWhacker, to get the ball rolling, and will add ports to other platforms as we (or you) test them...
The firmware will live here:
https://github.com/ApertureLabsLtd/RFIDler
If you'd like to have one of these to play with, please go and sign up!
Tuesday 14 May 2013
Back to skule: One Pad, Two Pad, Me Pad, You Pad - Cryptanalysis for beginners
A couple of weeks ago, Kev Sheldrake from Head Hacking gave a fascinating talk on NLP and Social Engineering at London's DEFCON group, DC4420 (called "Social Engineering Lies!"). Afterwards, over drinks, he told me about a free cryptography course that Stanford was running, and how much fun he and his workmates were having competing with each other to solve the 'homework' problems that were set each week...
Now, Cryptography is a subject close to my heart, and I spend a large proportion of my working life mucking about with it, or, to put it another way, breaking it. However, I've never formally studied it, and, until now, I've firmly believed I didn't need to. In fact, I felt that not knowing what every cryptographer knows actually gives me a slight advantage as I won't fall into the trap of making certain assumptions when faced with a cryptographic problem. Since I'm usually attacking an implementation, and not the crypto algorithm itself, knowledge of how the underlying maths works is not necessarily beneficial, as long as I understand the mechanics of it's implementation.
But my curiosity was piqued, so I decided to take a look...
Spoiler alert! If you are taking or thinking of taking Dan Boneh's excellent class, then step away from the blog now! I will be revealing details of the problems he sets, and my own solutions to them, and I don't want to spoil anything for anyone.
Unfortunately, the six week course already started several weeks ago, and I'm too impatient to wait for a new session to begin, so I'm already doomed to fail before I've even begun, due to late filing penalties for the homework assignments, but I wouldn't let that put you off - it's not all about the score you achieve. There will be plenty of reward in the learning itself (at least that was true for me).
OK, so the first week concentrates on Stream Ciphers. I'm not going to go over everything here, just the specifics that I think are of particular interest. One of these was One Time Pads. In the lectures, Dan proves that these are 'perfectly secure' as they have a unique key that is the same size as the message and are therefore impervious to cryptanalysis. The cryptographic operation itself is very simple: XOR the message with the key and you have the ciphertext. He goes into a lot of detail to prove the security of a message enciphered in this way, and the mathematics can get pretty daunting if, like me, you left school more years ago than you care to mention, and it's easy to get lost in the terminology of the worked through algorithms. To be fair to Dan, he does say he's going to blast through it as it's all on video and you can simply pause and rewind if you're not keeping up. But let's forget about the maths and focus instead on the outcome of the XOR, which is the only thing that really matters. There are three things we need to know:
Firstly, if you XOR two values together, either value can be recovered. In other words, if XORing A with B gives you X, then XORing X with A or B will give you the reciprocal value. If we represent XOR with '^', it looks like this:
A ^ B = X
X ^ A = B
X ^ B = A
Secondly, if you XOR any value with itself, you get NULL:
A ^ A = NULL
B ^ B = NULL
X ^ X = NULL
Finally, if you XOR something with NULL, nothing happens:
A ^ NULL = A
B ^ NULL = B
X ^ NULL = X
NULL ^ NULL = NULL
That's it. The rest is just logic, and if we really do need to convince ourselves of any of the proofs, we can do so with a little bit of python instead of trying to understand the underlying maths. I'll give an example of that in a minute.
Right, so back to One Time Pads. The problem with such a pad comes if you re-use the key. We all know from best practice that you should never re-use these keys, and Dan stresses this several times during the lectures, but do we know exactly why? What is it about re-using the key that compromises it?
Well, the answer is pretty simple: since XORing anything with itself cancels out, XORing two messages that were originally XORed with the same key actually removes the key and leaves you with the XOR of the two messages. This is clear if we work it through:
If our first message is ABCD, and our key is 1234, our ciphertext will be:
A^1 B^2 C^3 D^4
and if our second message is WXYZ, our ciphertext is:
W^1 X^2 Y^3 Z^4
so now if we XOR pairs of bytes from the two messages, the keys cancel out:
(A^1) ^ (W^1) = (A^W) ^ (1^1) = (A^W) ^ NULL = A^W
and we end up with:
A^W B^X C^Y D^Z
OK, so we've now got a new ciphertext that is the product of the two messages. So what? At this point it's tempting to think we're close to an answer - we've got two messages so all we need to do is extract one and the other will reveal itself, but the truth is we still don't know anything about either of them. In fact, all we've done is swapped one secret key for another - message A is now acting as the key for message B and vice-versa.
Since the ciphertext bytes reveal nothing about themselves, we are no further forward. We can prove this if we actually perform the XOR: A^W gives the same result as B^T, and, indeed, 255 other possible combinations:
>>> count=0
... for x in range(256):
... for y in range(256):
... if x^y == ord('A')^ord('W'):
... count += 1
... print "There are %d combinations that are the same as A^W" % count
There are 256 combinations that are the same as A^W
So we can invent anything meaningful we like as message A, and we'll still get something out for message B.
Now what?
In the problem set, Dan says:
"Hint: XOR the ciphertexts together, and consider what happens when a space is XORed with a character in [a-zA-Z]."
And here's another hint: all the above would be true if we didn't know anything about the messages (i.e. they were comprised of completely arbitrary characters in the range 0-255), but in this case we do know something about them: they are text messages, comprised only of readable characters. This changes everything. Now we can apply some rules to the XOR outcomes to learn something about the messages. Removing the key from the equation has done something magical - now we are no longer dealing with any arbitrary component (as we can assume the key was entirely random), but only with bytes in a specific range.
So what does happen when you XOR a character in the range [a-zA-Z] with a space? Well, let's knock up some python and see:
>>> for x in range(26):
... print '%s' % chr((ord('a')+x) ^ ord(' ')),
...
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
>>> for x in range(26):
... print '%s' % chr((ord('A')+x) ^ ord(' ')),
...
a b c d e f g h i j k l m n o p q r s t u v w x y z
Well, look at that! They swap case. So if I XOR an 'A' with a space, I get an 'a', and vice-versa. Excellent. That means that wherever I see a character in the range [a-zA-Z] in my new ciphertext, it is likely to be the product of a space and another character (this is also true if we see a NULL, as that means both messages had a space at that location). This is what is known as a 'crib', and, once we have a crib, we can use it to reveal something about the key. In this case, as it's a simple XOR, our crib gives us direct access to the key byte: simply XOR a space with the corresponding byte in the original ciphertext and you will reveal the secret key byte.
However, we have a small problem. Although we now know that a space was used at a particular location in our newly created ciphertext, we don't know whether it was in message A or message B. But that's no big deal, right? If we try it against both message A and message B we'll get two key bytes, one of which will be correct. We can then simply try them both against all our messages and if they make sense in all cases then we know we've got the right one. True, but it's going to be hard work. The average word in English is five letters long, so we can expect to see a space every sixth letter. This means that if you take a correctly decoded message you'll reveal every sixth letter and it's going to look something like:
s_____t_____s_____i_____h_____s_____a_____e
and (for example) an incorrectly decoded one:
e_____t_____a_____d_____d_____m_____s_____.
On top of that, the correct and incorrect key bytes will be all mixed together, so some of the characters may be correct, and others not. So how do I tell them apart? How do I know which are the correct ones?
Well, once the messages start filling out and making a bit more sense, it will become pretty obvious when a character is wrong, but when we are this stage looking at single letters spaced so far apart it is a real problem. However, in the same way we know that the average word length in the English language is five, we also know which are the most common letters and should therefore appear the most times in our deciphered messages. We could, therefore, do some frequency analysis on the results of decrypting with each of our keys and work out which is likely to be the correct one. But on short messages or pads for which we only have a few samples, this is unlikely to bear a whole lot of fruit.
Another method is to 'expand' our crib. We are currently working with a single space, but what if we expand that to a common word and see if we get a good result for all the letters in that combination. If we have a crib that consists of two spaces three letters apart, it would be easy enough to try some common three letter words at that location, such as " and " or " man " or " day " etc. this will give you five consecutive key bytes instead of just one, so now our correctly decoded message will look like:
_______sing ___________
and an incorrect one will be obviously incorrect:
_______qW#'a____________
In this case we can immediately see what looks like a 'hit' on our crib, which decodes as "sing ", which could be the last four characters of several common words, so it looks like we've correctly recovered five of our key bytes. You can actually try something similar across the whole message, just moving your crib one character at a time along the length of the message and seeing if anything sensible pops out. This is known as 'crib dragging'. You can also try common two-letter combinations (Bigrams), such as "th" or "he" or "in" and do the same thing.
Fine. So we can use crib dragging to find key bytes hidden amongst our XORed messages, and the more cribs we try against the more XOR combinations of messages, the more key bytes we're likely to find. Great! Progress!
But hang on a minute. This all seems a bit imprecise, and an awful lot of work. I was expecting this to be an exact science, not, what appears to the non-cryptanalyst layman to be, erm, how shall I put it?... "guessing".
Surely there must be a better way?
At this point I had something of a minor moral dilemma... The course is conducted under an 'Honour Code', in which you agree that all your answers are 'all your own work'. I wanted to go and do some research and see if anyone else had come to the same conclusion, but didn't want to taint my 'work' with someone else's ideas. The crib-dragging stuff is all discussed in the video lectures so there's no problem with that, but what if I come across some other technique? In the end, I decided to spend a couple of hours working on a solution, and then only after submitting my results going and taking a look to see if anyone had done the same. I even came to the conclusion that once I'd decided on a specific methodology I could also go and search to see if someone had already coded it as it could be argued that I don't need to write every line of code as long as I independently came up with the concept (after all, I didn't write the python interpreter, and I don't feel guilty for using that bit of code). However, in the end I didn't find any examples of either discussion or implementation of the following method, so the question simply didn't arise. All the discussions I found pointed to 'crib dragging' as the solution, and I couldn't find any useful code at all for actually implementing a one time pad attack of any type (of course, this is not to say they're not out there. I just didn't find them).
I've previously discussed the trap of following a path that appears to be giving quick results and not bothering to look any further. This is a common thing, and I call it a trap because although it often leads to the desired outcome, it can mean you spend a lot more time and effort than was actually necessary, and which could have been avoided had you taken a step back and asked some more questions before diving in and going for the prize.
In this case, we got a quick answer by finding some possible key bytes through locating spaces in the messages, and it would be tempting to then dive in and start crib-dragging and/or guessing at the pairs of potential keys to see if we can start to recover all of the message.
I often find it helpful at this point to think about the original question that revealed something to us and ask the secondary question: "what didn't it reveal?". This will give us another goal, which, if we solve it, may make our life a whole lot easier.
So in this case, the thing that was revealed was that we could locate a space in message A or message B. The thing that wasn't revealed was which of the two messages the space was in. Was it message A or message B? Our goal, then, is to determine which message the space was in. If we can do that, we can go right back to extracting single key bytes without any guess work.
So how do we determine which message the space was in? Let's look at the process in detail. For this illustration, I don't care what the actual message is, just where the spaces are, so I'll represent a character with 'X' and a space with '_'. If we detect a space, we'll mark that with a 'Y'.
Say message A looks like this:
XXX_XXXXX_XX_XXXXXX
and message B looks like this:
XXXX_XXXX_XXXX_XXXX
then if we XOR them together and check for the presence of a space, we'll get:
A: XXX_XXXXX_XX_XXXXXX
B: XXXX_XXXX_XXXX_XXXX
S: ---YY----Y--Y-Y----
As expected we've found some spaces, but we don't know whether they were in A or B. So what if we now do the same thing again, only this time we XOR message A with message C:
A: XXX_XXXXX_XX_XXXXXX
C: X_X_XXX_XXX_XXXXXXX
S: -Y-Y---Y-Y-YY------
Now all we need to do is compare our 'hits', and since the only constant in both tests was message A, that should mean that any hits that are in the same location in both test results must have come from message A. We'll mark these with an 'M':
S: ---YY----Y--Y-Y----
S: -Y-Y---Y-Y-YY------
M: ---M-----M--M------
and if we compare our M line with our original message A:
A: XXX_XXXXX_XX_XXXXXX
M: ---M-----M--M------
we can see we were spot on. Bingo! We now have an accurate method for recovering every key byte wherever there was a space in message A. Of course, we can repeat the process for every message we've got, replacing message A with message B and so on, until we've tried all possible combinations, and then we will have recovered every key byte wherever there was a space in ANY message. Hopefully, that should be quite a lot of the key bytes.
This is all very well in theory, but what happens in practice? It's a very simple algorithm, so it was easy to knock up a test program in python (have I mentioned that python rocks? :) This program takes the ciphertexts as hex value arguments and then performs the above process using each message as the 'master' and building up the key. It then outputs all the messages as plaintext, decrypting only those bytes that it thinks it's recovered the key for. If we feed it the eleven ciphertexts in the course example, we get:
manypad.py 315c4eeaa8b5f8aaf9174145bf43e1784b8fa00dc71d885a804e5ee9fa40b16349c146fb778cdf2d3aff021dfff5b403b510d0d0455468aeb98622b137dae857553ccd8883a7bc37520e06e515d22c954eba5025b8cc57ee59418ce7dc6bc41556bdb36bbca3e8774301fbcaa3b83b220809560987815f65286764703de0f3d524400a19b159610b11ef3e 234c02ecbbfbafa3ed18510abd11fa724fcda2018a1a8342cf064bbde548b12b07df44ba7191d9606ef4081ffde5ad46a5069d9f7f543bedb9c861bf29c7e205132eda9382b0bc2c5c4b45f919cf3a9f1cb74151f6d551f4480c82b2cb24cc5b028aa76eb7b4ab24171ab3cdadb8356f 32510ba9a7b2bba9b8005d43a304b5714cc0bb0c8a34884dd91304b8ad40b62b07df44ba6e9d8a2368e51d04e0e7b207b70b9b8261112bacb6c866a232dfe257527dc29398f5f3251a0d47e503c66e935de81230b59b7afb5f41afa8d661cb 32510ba9aab2a8a4fd06414fb517b5605cc0aa0dc91a8908c2064ba8ad5ea06a029056f47a8ad3306ef5021eafe1ac01a81197847a5c68a1b78769a37bc8f4575432c198ccb4ef63590256e305cd3a9544ee4160ead45aef520489e7da7d835402bca670bda8eb775200b8dabbba246b130f040d8ec6447e2c767f3d30ed81ea2e4c1404e1315a1010e7229be6636aaa 3f561ba9adb4b6ebec54424ba317b564418fac0dd35f8c08d31a1fe9e24fe56808c213f17c81d9607cee021dafe1e001b21ade877a5e68bea88d61b93ac5ee0d562e8e9582f5ef375f0a4ae20ed86e935de81230b59b73fb4302cd95d770c65b40aaa065f2a5e33a5a0bb5dcaba43722130f042f8ec85b7c2070 32510bfbacfbb9befd54415da243e1695ecabd58c519cd4bd2061bbde24eb76a19d84aba34d8de287be84d07e7e9a30ee714979c7e1123a8bd9822a33ecaf512472e8e8f8db3f9635c1949e640c621854eba0d79eccf52ff111284b4cc61d11902aebc66f2b2e436434eacc0aba938220b084800c2ca4e693522643573b2c4ce35050b0cf774201f0fe52ac9f26d71b6cf61a711cc229f77ace7aa88a2f19983122b11be87a59c355d25f8e4 32510bfbacfbb9befd54415da243e1695ecabd58c519cd4bd90f1fa6ea5ba47b01c909ba7696cf606ef40c04afe1ac0aa8148dd066592ded9f8774b529c7ea125d298e8883f5e9305f4b44f915cb2bd05af51373fd9b4af511039fa2d96f83414aaaf261bda2e97b170fb5cce2a53e675c154c0d9681596934777e2275b381ce2e40582afe67650b13e72287ff2270abcf73bb028932836fbdecfecee0a3b894473c1bbeb6b4913a536ce4f9b13f1efff71ea313c8661dd9a4ce 315c4eeaa8b5f8bffd11155ea506b56041c6a00c8a08854dd21a4bbde54ce56801d943ba708b8a3574f40c00fff9e00fa1439fd0654327a3bfc860b92f89ee04132ecb9298f5fd2d5e4b45e40ecc3b9d59e9417df7c95bba410e9aa2ca24c5474da2f276baa3ac325918b2daada43d6712150441c2e04f6565517f317da9d3 271946f9bbb2aeadec111841a81abc300ecaa01bd8069d5cc91005e9fe4aad6e04d513e96d99de2569bc5e50eeeca709b50a8a987f4264edb6896fb537d0a716132ddc938fb0f836480e06ed0fcd6e9759f40462f9cf57f4564186a2c1778f1543efa270bda5e933421cbe88a4a52222190f471e9bd15f652b653b7071aec59a2705081ffe72651d08f822c9ed6d76e48b63ab15d0208573a7eef027 466d06ece998b7a2fb1d464fed2ced7641ddaa3cc31c9941cf110abbf409ed39598005b3399ccfafb61d0315fca0a314be138a9f32503bedac8067f03adbf3575c3b8edc9ba7f537530541ab0f9f3cd04ff50d66f1d559ba520e89a2cb2a83 32510ba9babebbbefd001547a810e67149caee11d945cd7fc81a05e9f85aac650e9052ba6a8cd8257bf14d13e6f0a803b54fde9e77472dbff89d71b57bddef121336cb85ccb8f3315f4b52e301d16e9f52f904
building masterkey:
000000000000D800000000000063000000AF000000000000A00000C90000000000000000000000000000000000000000000000F00000000000000200000000770000000000009C00000026000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000890000D800000000000063000000AFCE000000ED00A00000C90000000000000000000000000000000000000000000000F000000000D80002005B00007733000000EC009C00006B260000004E0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
66000089C900D80000000000CD63000000AFCE000000ED00A00000C90029000000000000000000000000000000800000000000F0120000CDD80002D05B0000773300AEFCEC009C00006B268B00BF4EF0000000000000009A0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
66390089C900D80000000000CD6300102EAFCE000000ED00A00000C90029000000000000000000000000007000800000000000F0120000CDD80002D05B0087773300AEFCEC009C00006B268B00BF4EF0000000000000009A006100000000000000CF000000000000000000A8000000020000000000000000000000500000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
66390089C900D80000000000CD6300102EAFCE00AA00ED00A00000C900290000000000000000AA00000000700080C000000000F0120000CDD8E802D05BA987773300AEFCECD59C00006B268B00BF4EF0000000000000009A006100000000000000CFD20000000000000000A80000000200002400E2000000450000500000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
66390089C9DBD80000740000CD6300102EAFCE78AA00ED00A00000C900290000000000000000AA00000000708F80C000000000F0120000CDD8E802D05BA987773300AEFCECD59C00006B268B00BF4EF00000000000BB009A316100000000000000CFD20000000000370000A8C20000027C002400E2A10000450000500000A10000007800000000000000000000020000EF000000A900000000000000C08300006700009E00000000004C00000000000000000000000000000000
66390089C9DBD80000740000CD6300102EAFCE78AA00ED00A00000C9002900000000000000F8AA00000000708F80C000C70000F0123100CDD8E802D05BA987773300AEFCECD59C43006B268B60BF4EF00000000000BB009A316100000000000022CFD200D2000000376E00A8C20000027C002400E2A10000450200500000A10000007800000000000000000000020000EF000000A900000000000000C08300006700009E00000000004C00000000000000000000000000000000
66390089C9DBD8CB00740000CD6300102EAFCE78AA00ED28A00000C9002900000000000000F8AA00000000708F80C000C70000F0123100CDD8E802D05BA987773300AEFCECD59C43006B268B60BF4EF00000000000BB009A3161ED000000000022CFD200D2000000376E00A8C20000027C002400E2A10000450200500000A10000007800000000000000000000020000EF000000A900000000000000C08300006700009E00000000004C00000000000000000000000000000000
66390089C9DBD8CB00740000CD6300102EAFCE78AA00ED28A00000C98D2900000000000000F8AA00000000708F80C000C70000F0123100CDD8E802D05BA987773300AEFCECD59C43006B268B60BF4EF00000000000BB009A3161EDC70000000022CFD200D2000000376E00A8C20000027C002400E2A10000450200500000A10000007800000000000000000000020000EF000000A900000000000000C08300006700009E00000000004C00000000000000000000000000000000
66390089C9DBD8CB98740000CD6300102EAFCE78AA00ED28A00000C98D2900000000000000F8AA00000000708F80C000C70000F0123100CDD8E802D05BA98777335DAEFCECD59C433A6B268B60BF4EF00000000000BB009A3161EDC70000000022CFD200D2000000376E00A8C20000027C002400E2A10000450200500000A10000007800000000000000000000020000EF000000A900000000000000C08300006700009E00000000004C00000000000000000000000000000000
66390089C9DBD8CB9874002ACD6300102EAFCE78AA00ED28A00000C98D2900000000000000F8AA00000000708F80C066C70000F0123100CDD8E802D05BA98777335DAEFCECD59C433A6B268B60BF4EF03C00000000BB009A3161EDC70000000022CFD200D2000000376E00A8C20000027C002400E2A10000450200500000A10000007800000000000000000000020000EF000000A900000000000000C08300006700009E00000000004C00000000000000000000000000000000
plaintexts:
We_can aac_or _he num_er __ wi_______tu____mputer__ We_can also factor the number____w_th a ____tra_n___to_ba__ t_r_e __me_ __R___r________
Eu_er whul_ pr_bably _njo__tha_______is____orem b__ome_ a corner stone of crypto ____n_nymou____ Eu_e___ t_eo__m
Th_ nicb t_ing_about _eey__q i_______e ____tograp__rs _an drive a lot of fancy ca____ _an Bo___
Th_ cipoer_ext_produc_d b__a w_______ry____n algo__thm_looks as good as ciphertex____o_uced ____ st_o___en_ry__io_ _lg__it_m__ ___l__________a__
Yo_ don t _ant_to buy_a s__ of_______ys____m a gu__who_specializes in stealing ca____ _arc R____ber_ ___me_ti__ o_ _li__er
Th_re aue _wo _ypes o_ cr__tog_______ t____which __ll _eep secrets safe from your____t_e sis____ an_ ___t _hi__ w_l_ k__p _e__e___s__________o__ ___e_______br__u__ _____i__
Th_re aue _wo _ypes o_ cy__ogr_______ne____t allo__ th_ Government to use brute f____ _o bre____he _o___ a_d __e _h_t __qu_r__ ___ __________ __ ___ _______ __ __ _____ ________________
We_can tee_the_point _her__the_______s ____ppy if__ wr_ng bit is sent and consume____r_ powe____om _h___nv_ro__en_ _ A__ S_a__r
A _privfte_key_ encr_pti__ sc_______at____ algor__hms_ namely a procedure for ge____t_ng ke____a p_o___ur_ f__ e_c_yp__ng_ __d___p__________o__d___y_______
T_e Coici_e O_fordDi_tio__ry _______de____es cry__o a_ the art of writing o r s____n_ code___
Th_ secuet_mes_age is_ Wh__ us_______tr____cipher__nev_r use the key more than on__
which, if I do say so myself, is not bad for a first pass. Of the 83 bytes in the final 'challenge' message, we've decoded 60. That's 72%. Sweet! :)
But it's not perfect. There are large chunks missing where there clearly would have been at least one space, and, indeed, there are smaller chunks with what was clearly a space, such as "We_can" that, according to our theory, should have been detected. There are also very obviously incorrect bytes: "privfte" is almost certainly actually "private", and "secuet" is probably "secret". So where did we go wrong? Well, it turns out that it's not just XORing with a space that will produce '[a-zA-Z]'. Some other combinations of punctuation and/or numerical characters will do the same. This means we get some false positives that are messing up our results. However, not to worry. We now have enough plaintext that we are no longer blindly guessing, but rather filling in the blanks and correcting the errors. To do this, all we need to do is pick an obviously wrong byte, XOR the original message with the correct value and that will reveal the correct key byte. We then re-process all the messages with the updated key byte and all of them will be one character more correct. Each time we do this, the remaining errors or omissions should become a little more obvious and the true texts will quickly reveal themselves. A bit like filling in a crossword, as opposed to, erm... guessing.
Although this bit is 'easy', it's also tedious, fiddly and time-consuming. It involves a lot of looking up of hex values, recalculating XORs and counting byte positions etc., so I looked for a tool that would help me do it more efficiently. Again, I didn't find one. Accordingly, I give you 'cribtastic'. A tool for manually tweaking cribs and generally playing with ciphers. It also implements the above automated attack and I hope will evolve to do any others I come across during the course of the class.
It works like this:
You invoke it in the same was as my test program above, with one additional argument: the key (or partial key) if you know it. If you don't, just pass an empty string. This will then pop up a GUI...
The upper section shows the key value, followed by the ciphertexts and the decrypted plaintext. If a key value is 'known', it will be highlighted in green, and if not, in red.
Since we're starting with a blank key, we can hit the 'OTP' button in the commands window and it will perform the 'manypad' attack as above. Discovered key bytes and plaintexts will then be populated:
However, we can see that some of the bytes have not decrypted, and some of them are incorrect, so we can now edit them. Just click on an obviously incorrect value and type in the corrected version. For example, the first plaintext starts "We_can" which should probably be "We can", so we can simply replace the unknown key value (byte 3 of the key is highlighted in red) with a space. The program will recalculate the key value based on this new entry, reprocess all the ciphertexts, and we can then see how the rest of the plaintexts look using this new value:
Similarly, if we edit the obviously wrong "privfte" to "private" in the ninth plaintext:
and so on... As discussed, the more we correct the easier the rest of the errors are to spot as there will always be at least one word or space that stands out.
There are two other features worth mentioning. As you can see, when we edited the plaintext values, the corresponding key bytes were automatically 'locked'. This simply protects them from being recalculated if you were to hit the 'OTP' button again. The other feature is the set of tick-boxes to the left of the Cipher and Plaintext rows. These allow you to include or exclude them from the attack calculations. By experimenting with which ciphers are XORed together during the attack we may find that we can remove some of the false positives and reveal more of the key bytes.
Here we can see two chunks that have not decrypted at all:
Through experimentation, I found that if I lock all the key values I'm happy with and exclude the 10th cipher from the attack profile, I get this when I repeat the OTP attack:
and we've revealed a further 8 bytes of the key.
The tool was pretty trivial (if a little time-consuming) to write - it's just a bunch of text boxes, after all - but was definitely worth it with regards to the time and effort it saved once working on the actual problem. When put to use, recovery of the challenge text took less than three minutes, and it was possible to recover all messages excluding the last 14 bytes of the longest one with as good as absolute certainty that all solutions were correct - i.e. we recovered all bytes where there was any overlap. Fun stuff!
For what it's worth, the code is published here. As always, it's very much a work in progress (i.e. a filthy hack), and may turn into something more elegant and well written if it proves useful (and I may even implement crib-dragging :)...
Now, Cryptography is a subject close to my heart, and I spend a large proportion of my working life mucking about with it, or, to put it another way, breaking it. However, I've never formally studied it, and, until now, I've firmly believed I didn't need to. In fact, I felt that not knowing what every cryptographer knows actually gives me a slight advantage as I won't fall into the trap of making certain assumptions when faced with a cryptographic problem. Since I'm usually attacking an implementation, and not the crypto algorithm itself, knowledge of how the underlying maths works is not necessarily beneficial, as long as I understand the mechanics of it's implementation.
But my curiosity was piqued, so I decided to take a look...
Spoiler alert! If you are taking or thinking of taking Dan Boneh's excellent class, then step away from the blog now! I will be revealing details of the problems he sets, and my own solutions to them, and I don't want to spoil anything for anyone.
Unfortunately, the six week course already started several weeks ago, and I'm too impatient to wait for a new session to begin, so I'm already doomed to fail before I've even begun, due to late filing penalties for the homework assignments, but I wouldn't let that put you off - it's not all about the score you achieve. There will be plenty of reward in the learning itself (at least that was true for me).
OK, so the first week concentrates on Stream Ciphers. I'm not going to go over everything here, just the specifics that I think are of particular interest. One of these was One Time Pads. In the lectures, Dan proves that these are 'perfectly secure' as they have a unique key that is the same size as the message and are therefore impervious to cryptanalysis. The cryptographic operation itself is very simple: XOR the message with the key and you have the ciphertext. He goes into a lot of detail to prove the security of a message enciphered in this way, and the mathematics can get pretty daunting if, like me, you left school more years ago than you care to mention, and it's easy to get lost in the terminology of the worked through algorithms. To be fair to Dan, he does say he's going to blast through it as it's all on video and you can simply pause and rewind if you're not keeping up. But let's forget about the maths and focus instead on the outcome of the XOR, which is the only thing that really matters. There are three things we need to know:
Firstly, if you XOR two values together, either value can be recovered. In other words, if XORing A with B gives you X, then XORing X with A or B will give you the reciprocal value. If we represent XOR with '^', it looks like this:
A ^ B = X
X ^ A = B
X ^ B = A
Secondly, if you XOR any value with itself, you get NULL:
A ^ A = NULL
B ^ B = NULL
X ^ X = NULL
Finally, if you XOR something with NULL, nothing happens:
A ^ NULL = A
B ^ NULL = B
X ^ NULL = X
NULL ^ NULL = NULL
That's it. The rest is just logic, and if we really do need to convince ourselves of any of the proofs, we can do so with a little bit of python instead of trying to understand the underlying maths. I'll give an example of that in a minute.
Right, so back to One Time Pads. The problem with such a pad comes if you re-use the key. We all know from best practice that you should never re-use these keys, and Dan stresses this several times during the lectures, but do we know exactly why? What is it about re-using the key that compromises it?
Well, the answer is pretty simple: since XORing anything with itself cancels out, XORing two messages that were originally XORed with the same key actually removes the key and leaves you with the XOR of the two messages. This is clear if we work it through:
If our first message is ABCD, and our key is 1234, our ciphertext will be:
A^1 B^2 C^3 D^4
and if our second message is WXYZ, our ciphertext is:
W^1 X^2 Y^3 Z^4
so now if we XOR pairs of bytes from the two messages, the keys cancel out:
(A^1) ^ (W^1) = (A^W) ^ (1^1) = (A^W) ^ NULL = A^W
and we end up with:
A^W B^X C^Y D^Z
OK, so we've now got a new ciphertext that is the product of the two messages. So what? At this point it's tempting to think we're close to an answer - we've got two messages so all we need to do is extract one and the other will reveal itself, but the truth is we still don't know anything about either of them. In fact, all we've done is swapped one secret key for another - message A is now acting as the key for message B and vice-versa.
Since the ciphertext bytes reveal nothing about themselves, we are no further forward. We can prove this if we actually perform the XOR: A^W gives the same result as B^T, and, indeed, 255 other possible combinations:
>>> count=0
... for x in range(256):
... for y in range(256):
... if x^y == ord('A')^ord('W'):
... count += 1
... print "There are %d combinations that are the same as A^W" % count
There are 256 combinations that are the same as A^W
So we can invent anything meaningful we like as message A, and we'll still get something out for message B.
Now what?
In the problem set, Dan says:
"Hint: XOR the ciphertexts together, and consider what happens when a space is XORed with a character in [a-zA-Z]."
And here's another hint: all the above would be true if we didn't know anything about the messages (i.e. they were comprised of completely arbitrary characters in the range 0-255), but in this case we do know something about them: they are text messages, comprised only of readable characters. This changes everything. Now we can apply some rules to the XOR outcomes to learn something about the messages. Removing the key from the equation has done something magical - now we are no longer dealing with any arbitrary component (as we can assume the key was entirely random), but only with bytes in a specific range.
So what does happen when you XOR a character in the range [a-zA-Z] with a space? Well, let's knock up some python and see:
>>> for x in range(26):
... print '%s' % chr((ord('a')+x) ^ ord(' ')),
...
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
>>> for x in range(26):
... print '%s' % chr((ord('A')+x) ^ ord(' ')),
...
a b c d e f g h i j k l m n o p q r s t u v w x y z
Well, look at that! They swap case. So if I XOR an 'A' with a space, I get an 'a', and vice-versa. Excellent. That means that wherever I see a character in the range [a-zA-Z] in my new ciphertext, it is likely to be the product of a space and another character (this is also true if we see a NULL, as that means both messages had a space at that location). This is what is known as a 'crib', and, once we have a crib, we can use it to reveal something about the key. In this case, as it's a simple XOR, our crib gives us direct access to the key byte: simply XOR a space with the corresponding byte in the original ciphertext and you will reveal the secret key byte.
However, we have a small problem. Although we now know that a space was used at a particular location in our newly created ciphertext, we don't know whether it was in message A or message B. But that's no big deal, right? If we try it against both message A and message B we'll get two key bytes, one of which will be correct. We can then simply try them both against all our messages and if they make sense in all cases then we know we've got the right one. True, but it's going to be hard work. The average word in English is five letters long, so we can expect to see a space every sixth letter. This means that if you take a correctly decoded message you'll reveal every sixth letter and it's going to look something like:
s_____t_____s_____i_____h_____s_____a_____e
and (for example) an incorrectly decoded one:
e_____t_____a_____d_____d_____m_____s_____.
On top of that, the correct and incorrect key bytes will be all mixed together, so some of the characters may be correct, and others not. So how do I tell them apart? How do I know which are the correct ones?
Well, once the messages start filling out and making a bit more sense, it will become pretty obvious when a character is wrong, but when we are this stage looking at single letters spaced so far apart it is a real problem. However, in the same way we know that the average word length in the English language is five, we also know which are the most common letters and should therefore appear the most times in our deciphered messages. We could, therefore, do some frequency analysis on the results of decrypting with each of our keys and work out which is likely to be the correct one. But on short messages or pads for which we only have a few samples, this is unlikely to bear a whole lot of fruit.
Another method is to 'expand' our crib. We are currently working with a single space, but what if we expand that to a common word and see if we get a good result for all the letters in that combination. If we have a crib that consists of two spaces three letters apart, it would be easy enough to try some common three letter words at that location, such as " and " or " man " or " day " etc. this will give you five consecutive key bytes instead of just one, so now our correctly decoded message will look like:
_______sing ___________
and an incorrect one will be obviously incorrect:
_______qW#'a____________
In this case we can immediately see what looks like a 'hit' on our crib, which decodes as "sing ", which could be the last four characters of several common words, so it looks like we've correctly recovered five of our key bytes. You can actually try something similar across the whole message, just moving your crib one character at a time along the length of the message and seeing if anything sensible pops out. This is known as 'crib dragging'. You can also try common two-letter combinations (Bigrams), such as "th" or "he" or "in" and do the same thing.
Fine. So we can use crib dragging to find key bytes hidden amongst our XORed messages, and the more cribs we try against the more XOR combinations of messages, the more key bytes we're likely to find. Great! Progress!
But hang on a minute. This all seems a bit imprecise, and an awful lot of work. I was expecting this to be an exact science, not, what appears to the non-cryptanalyst layman to be, erm, how shall I put it?... "guessing".
Surely there must be a better way?
At this point I had something of a minor moral dilemma... The course is conducted under an 'Honour Code', in which you agree that all your answers are 'all your own work'. I wanted to go and do some research and see if anyone else had come to the same conclusion, but didn't want to taint my 'work' with someone else's ideas. The crib-dragging stuff is all discussed in the video lectures so there's no problem with that, but what if I come across some other technique? In the end, I decided to spend a couple of hours working on a solution, and then only after submitting my results going and taking a look to see if anyone had done the same. I even came to the conclusion that once I'd decided on a specific methodology I could also go and search to see if someone had already coded it as it could be argued that I don't need to write every line of code as long as I independently came up with the concept (after all, I didn't write the python interpreter, and I don't feel guilty for using that bit of code). However, in the end I didn't find any examples of either discussion or implementation of the following method, so the question simply didn't arise. All the discussions I found pointed to 'crib dragging' as the solution, and I couldn't find any useful code at all for actually implementing a one time pad attack of any type (of course, this is not to say they're not out there. I just didn't find them).
I've previously discussed the trap of following a path that appears to be giving quick results and not bothering to look any further. This is a common thing, and I call it a trap because although it often leads to the desired outcome, it can mean you spend a lot more time and effort than was actually necessary, and which could have been avoided had you taken a step back and asked some more questions before diving in and going for the prize.
In this case, we got a quick answer by finding some possible key bytes through locating spaces in the messages, and it would be tempting to then dive in and start crib-dragging and/or guessing at the pairs of potential keys to see if we can start to recover all of the message.
I often find it helpful at this point to think about the original question that revealed something to us and ask the secondary question: "what didn't it reveal?". This will give us another goal, which, if we solve it, may make our life a whole lot easier.
So in this case, the thing that was revealed was that we could locate a space in message A or message B. The thing that wasn't revealed was which of the two messages the space was in. Was it message A or message B? Our goal, then, is to determine which message the space was in. If we can do that, we can go right back to extracting single key bytes without any guess work.
So how do we determine which message the space was in? Let's look at the process in detail. For this illustration, I don't care what the actual message is, just where the spaces are, so I'll represent a character with 'X' and a space with '_'. If we detect a space, we'll mark that with a 'Y'.
Say message A looks like this:
XXX_XXXXX_XX_XXXXXX
and message B looks like this:
XXXX_XXXX_XXXX_XXXX
then if we XOR them together and check for the presence of a space, we'll get:
A: XXX_XXXXX_XX_XXXXXX
B: XXXX_XXXX_XXXX_XXXX
S: ---YY----Y--Y-Y----
As expected we've found some spaces, but we don't know whether they were in A or B. So what if we now do the same thing again, only this time we XOR message A with message C:
A: XXX_XXXXX_XX_XXXXXX
C: X_X_XXX_XXX_XXXXXXX
S: -Y-Y---Y-Y-YY------
Now all we need to do is compare our 'hits', and since the only constant in both tests was message A, that should mean that any hits that are in the same location in both test results must have come from message A. We'll mark these with an 'M':
S: ---YY----Y--Y-Y----
S: -Y-Y---Y-Y-YY------
M: ---M-----M--M------
and if we compare our M line with our original message A:
A: XXX_XXXXX_XX_XXXXXX
M: ---M-----M--M------
we can see we were spot on. Bingo! We now have an accurate method for recovering every key byte wherever there was a space in message A. Of course, we can repeat the process for every message we've got, replacing message A with message B and so on, until we've tried all possible combinations, and then we will have recovered every key byte wherever there was a space in ANY message. Hopefully, that should be quite a lot of the key bytes.
This is all very well in theory, but what happens in practice? It's a very simple algorithm, so it was easy to knock up a test program in python (have I mentioned that python rocks? :) This program takes the ciphertexts as hex value arguments and then performs the above process using each message as the 'master' and building up the key. It then outputs all the messages as plaintext, decrypting only those bytes that it thinks it's recovered the key for. If we feed it the eleven ciphertexts in the course example, we get:
manypad.py 315c4eeaa8b5f8aaf9174145bf43e1784b8fa00dc71d885a804e5ee9fa40b16349c146fb778cdf2d3aff021dfff5b403b510d0d0455468aeb98622b137dae857553ccd8883a7bc37520e06e515d22c954eba5025b8cc57ee59418ce7dc6bc41556bdb36bbca3e8774301fbcaa3b83b220809560987815f65286764703de0f3d524400a19b159610b11ef3e 234c02ecbbfbafa3ed18510abd11fa724fcda2018a1a8342cf064bbde548b12b07df44ba7191d9606ef4081ffde5ad46a5069d9f7f543bedb9c861bf29c7e205132eda9382b0bc2c5c4b45f919cf3a9f1cb74151f6d551f4480c82b2cb24cc5b028aa76eb7b4ab24171ab3cdadb8356f 32510ba9a7b2bba9b8005d43a304b5714cc0bb0c8a34884dd91304b8ad40b62b07df44ba6e9d8a2368e51d04e0e7b207b70b9b8261112bacb6c866a232dfe257527dc29398f5f3251a0d47e503c66e935de81230b59b7afb5f41afa8d661cb 32510ba9aab2a8a4fd06414fb517b5605cc0aa0dc91a8908c2064ba8ad5ea06a029056f47a8ad3306ef5021eafe1ac01a81197847a5c68a1b78769a37bc8f4575432c198ccb4ef63590256e305cd3a9544ee4160ead45aef520489e7da7d835402bca670bda8eb775200b8dabbba246b130f040d8ec6447e2c767f3d30ed81ea2e4c1404e1315a1010e7229be6636aaa 3f561ba9adb4b6ebec54424ba317b564418fac0dd35f8c08d31a1fe9e24fe56808c213f17c81d9607cee021dafe1e001b21ade877a5e68bea88d61b93ac5ee0d562e8e9582f5ef375f0a4ae20ed86e935de81230b59b73fb4302cd95d770c65b40aaa065f2a5e33a5a0bb5dcaba43722130f042f8ec85b7c2070 32510bfbacfbb9befd54415da243e1695ecabd58c519cd4bd2061bbde24eb76a19d84aba34d8de287be84d07e7e9a30ee714979c7e1123a8bd9822a33ecaf512472e8e8f8db3f9635c1949e640c621854eba0d79eccf52ff111284b4cc61d11902aebc66f2b2e436434eacc0aba938220b084800c2ca4e693522643573b2c4ce35050b0cf774201f0fe52ac9f26d71b6cf61a711cc229f77ace7aa88a2f19983122b11be87a59c355d25f8e4 32510bfbacfbb9befd54415da243e1695ecabd58c519cd4bd90f1fa6ea5ba47b01c909ba7696cf606ef40c04afe1ac0aa8148dd066592ded9f8774b529c7ea125d298e8883f5e9305f4b44f915cb2bd05af51373fd9b4af511039fa2d96f83414aaaf261bda2e97b170fb5cce2a53e675c154c0d9681596934777e2275b381ce2e40582afe67650b13e72287ff2270abcf73bb028932836fbdecfecee0a3b894473c1bbeb6b4913a536ce4f9b13f1efff71ea313c8661dd9a4ce 315c4eeaa8b5f8bffd11155ea506b56041c6a00c8a08854dd21a4bbde54ce56801d943ba708b8a3574f40c00fff9e00fa1439fd0654327a3bfc860b92f89ee04132ecb9298f5fd2d5e4b45e40ecc3b9d59e9417df7c95bba410e9aa2ca24c5474da2f276baa3ac325918b2daada43d6712150441c2e04f6565517f317da9d3 271946f9bbb2aeadec111841a81abc300ecaa01bd8069d5cc91005e9fe4aad6e04d513e96d99de2569bc5e50eeeca709b50a8a987f4264edb6896fb537d0a716132ddc938fb0f836480e06ed0fcd6e9759f40462f9cf57f4564186a2c1778f1543efa270bda5e933421cbe88a4a52222190f471e9bd15f652b653b7071aec59a2705081ffe72651d08f822c9ed6d76e48b63ab15d0208573a7eef027 466d06ece998b7a2fb1d464fed2ced7641ddaa3cc31c9941cf110abbf409ed39598005b3399ccfafb61d0315fca0a314be138a9f32503bedac8067f03adbf3575c3b8edc9ba7f537530541ab0f9f3cd04ff50d66f1d559ba520e89a2cb2a83 32510ba9babebbbefd001547a810e67149caee11d945cd7fc81a05e9f85aac650e9052ba6a8cd8257bf14d13e6f0a803b54fde9e77472dbff89d71b57bddef121336cb85ccb8f3315f4b52e301d16e9f52f904
building masterkey:
000000000000D800000000000063000000AF000000000000A00000C90000000000000000000000000000000000000000000000F00000000000000200000000770000000000009C00000026000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000890000D800000000000063000000AFCE000000ED00A00000C90000000000000000000000000000000000000000000000F000000000D80002005B00007733000000EC009C00006B260000004E0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
66000089C900D80000000000CD63000000AFCE000000ED00A00000C90029000000000000000000000000000000800000000000F0120000CDD80002D05B0000773300AEFCEC009C00006B268B00BF4EF0000000000000009A0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
66390089C900D80000000000CD6300102EAFCE000000ED00A00000C90029000000000000000000000000007000800000000000F0120000CDD80002D05B0087773300AEFCEC009C00006B268B00BF4EF0000000000000009A006100000000000000CF000000000000000000A8000000020000000000000000000000500000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
66390089C900D80000000000CD6300102EAFCE00AA00ED00A00000C900290000000000000000AA00000000700080C000000000F0120000CDD8E802D05BA987773300AEFCECD59C00006B268B00BF4EF0000000000000009A006100000000000000CFD20000000000000000A80000000200002400E2000000450000500000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
66390089C9DBD80000740000CD6300102EAFCE78AA00ED00A00000C900290000000000000000AA00000000708F80C000000000F0120000CDD8E802D05BA987773300AEFCECD59C00006B268B00BF4EF00000000000BB009A316100000000000000CFD20000000000370000A8C20000027C002400E2A10000450000500000A10000007800000000000000000000020000EF000000A900000000000000C08300006700009E00000000004C00000000000000000000000000000000
66390089C9DBD80000740000CD6300102EAFCE78AA00ED00A00000C9002900000000000000F8AA00000000708F80C000C70000F0123100CDD8E802D05BA987773300AEFCECD59C43006B268B60BF4EF00000000000BB009A316100000000000022CFD200D2000000376E00A8C20000027C002400E2A10000450200500000A10000007800000000000000000000020000EF000000A900000000000000C08300006700009E00000000004C00000000000000000000000000000000
66390089C9DBD8CB00740000CD6300102EAFCE78AA00ED28A00000C9002900000000000000F8AA00000000708F80C000C70000F0123100CDD8E802D05BA987773300AEFCECD59C43006B268B60BF4EF00000000000BB009A3161ED000000000022CFD200D2000000376E00A8C20000027C002400E2A10000450200500000A10000007800000000000000000000020000EF000000A900000000000000C08300006700009E00000000004C00000000000000000000000000000000
66390089C9DBD8CB00740000CD6300102EAFCE78AA00ED28A00000C98D2900000000000000F8AA00000000708F80C000C70000F0123100CDD8E802D05BA987773300AEFCECD59C43006B268B60BF4EF00000000000BB009A3161EDC70000000022CFD200D2000000376E00A8C20000027C002400E2A10000450200500000A10000007800000000000000000000020000EF000000A900000000000000C08300006700009E00000000004C00000000000000000000000000000000
66390089C9DBD8CB98740000CD6300102EAFCE78AA00ED28A00000C98D2900000000000000F8AA00000000708F80C000C70000F0123100CDD8E802D05BA98777335DAEFCECD59C433A6B268B60BF4EF00000000000BB009A3161EDC70000000022CFD200D2000000376E00A8C20000027C002400E2A10000450200500000A10000007800000000000000000000020000EF000000A900000000000000C08300006700009E00000000004C00000000000000000000000000000000
66390089C9DBD8CB9874002ACD6300102EAFCE78AA00ED28A00000C98D2900000000000000F8AA00000000708F80C066C70000F0123100CDD8E802D05BA98777335DAEFCECD59C433A6B268B60BF4EF03C00000000BB009A3161EDC70000000022CFD200D2000000376E00A8C20000027C002400E2A10000450200500000A10000007800000000000000000000020000EF000000A900000000000000C08300006700009E00000000004C00000000000000000000000000000000
plaintexts:
We_can aac_or _he num_er __ wi_______tu____mputer__ We_can also factor the number____w_th a ____tra_n___to_ba__ t_r_e __me_ __R___r________
Eu_er whul_ pr_bably _njo__tha_______is____orem b__ome_ a corner stone of crypto ____n_nymou____ Eu_e___ t_eo__m
Th_ nicb t_ing_about _eey__q i_______e ____tograp__rs _an drive a lot of fancy ca____ _an Bo___
Th_ cipoer_ext_produc_d b__a w_______ry____n algo__thm_looks as good as ciphertex____o_uced ____ st_o___en_ry__io_ _lg__it_m__ ___l__________a__
Yo_ don t _ant_to buy_a s__ of_______ys____m a gu__who_specializes in stealing ca____ _arc R____ber_ ___me_ti__ o_ _li__er
Th_re aue _wo _ypes o_ cr__tog_______ t____which __ll _eep secrets safe from your____t_e sis____ an_ ___t _hi__ w_l_ k__p _e__e___s__________o__ ___e_______br__u__ _____i__
Th_re aue _wo _ypes o_ cy__ogr_______ne____t allo__ th_ Government to use brute f____ _o bre____he _o___ a_d __e _h_t __qu_r__ ___ __________ __ ___ _______ __ __ _____ ________________
We_can tee_the_point _her__the_______s ____ppy if__ wr_ng bit is sent and consume____r_ powe____om _h___nv_ro__en_ _ A__ S_a__r
A _privfte_key_ encr_pti__ sc_______at____ algor__hms_ namely a procedure for ge____t_ng ke____a p_o___ur_ f__ e_c_yp__ng_ __d___p__________o__d___y_______
T_e Coici_e O_fordDi_tio__ry _______de____es cry__o a_ the art of writing o r s____n_ code___
Th_ secuet_mes_age is_ Wh__ us_______tr____cipher__nev_r use the key more than on__
which, if I do say so myself, is not bad for a first pass. Of the 83 bytes in the final 'challenge' message, we've decoded 60. That's 72%. Sweet! :)
But it's not perfect. There are large chunks missing where there clearly would have been at least one space, and, indeed, there are smaller chunks with what was clearly a space, such as "We_can" that, according to our theory, should have been detected. There are also very obviously incorrect bytes: "privfte" is almost certainly actually "private", and "secuet" is probably "secret". So where did we go wrong? Well, it turns out that it's not just XORing with a space that will produce '[a-zA-Z]'. Some other combinations of punctuation and/or numerical characters will do the same. This means we get some false positives that are messing up our results. However, not to worry. We now have enough plaintext that we are no longer blindly guessing, but rather filling in the blanks and correcting the errors. To do this, all we need to do is pick an obviously wrong byte, XOR the original message with the correct value and that will reveal the correct key byte. We then re-process all the messages with the updated key byte and all of them will be one character more correct. Each time we do this, the remaining errors or omissions should become a little more obvious and the true texts will quickly reveal themselves. A bit like filling in a crossword, as opposed to, erm... guessing.
Although this bit is 'easy', it's also tedious, fiddly and time-consuming. It involves a lot of looking up of hex values, recalculating XORs and counting byte positions etc., so I looked for a tool that would help me do it more efficiently. Again, I didn't find one. Accordingly, I give you 'cribtastic'. A tool for manually tweaking cribs and generally playing with ciphers. It also implements the above automated attack and I hope will evolve to do any others I come across during the course of the class.
It works like this:
You invoke it in the same was as my test program above, with one additional argument: the key (or partial key) if you know it. If you don't, just pass an empty string. This will then pop up a GUI...
The upper section shows the key value, followed by the ciphertexts and the decrypted plaintext. If a key value is 'known', it will be highlighted in green, and if not, in red.
Since we're starting with a blank key, we can hit the 'OTP' button in the commands window and it will perform the 'manypad' attack as above. Discovered key bytes and plaintexts will then be populated:
However, we can see that some of the bytes have not decrypted, and some of them are incorrect, so we can now edit them. Just click on an obviously incorrect value and type in the corrected version. For example, the first plaintext starts "We_can" which should probably be "We can", so we can simply replace the unknown key value (byte 3 of the key is highlighted in red) with a space. The program will recalculate the key value based on this new entry, reprocess all the ciphertexts, and we can then see how the rest of the plaintexts look using this new value:
Similarly, if we edit the obviously wrong "privfte" to "private" in the ninth plaintext:
and so on... As discussed, the more we correct the easier the rest of the errors are to spot as there will always be at least one word or space that stands out.
There are two other features worth mentioning. As you can see, when we edited the plaintext values, the corresponding key bytes were automatically 'locked'. This simply protects them from being recalculated if you were to hit the 'OTP' button again. The other feature is the set of tick-boxes to the left of the Cipher and Plaintext rows. These allow you to include or exclude them from the attack calculations. By experimenting with which ciphers are XORed together during the attack we may find that we can remove some of the false positives and reveal more of the key bytes.
Here we can see two chunks that have not decrypted at all:
Through experimentation, I found that if I lock all the key values I'm happy with and exclude the 10th cipher from the attack profile, I get this when I repeat the OTP attack:
and we've revealed a further 8 bytes of the key.
The tool was pretty trivial (if a little time-consuming) to write - it's just a bunch of text boxes, after all - but was definitely worth it with regards to the time and effort it saved once working on the actual problem. When put to use, recovery of the challenge text took less than three minutes, and it was possible to recover all messages excluding the last 14 bytes of the longest one with as good as absolute certainty that all solutions were correct - i.e. we recovered all bytes where there was any overlap. Fun stuff!
For what it's worth, the code is published here. As always, it's very much a work in progress (i.e. a filthy hack), and may turn into something more elegant and well written if it proves useful (and I may even implement crib-dragging :)...
Tuesday 12 March 2013
You can ring my bell! Adventures in sub-GHz RF land...
Dammit! Now that song is stuck in my head and will be going around and around for the next three days... Thanks, Anita Ward! (and apologies if it's now stuck in yours too! :)
But she's right: you can ring my bell. And I can ring yours. And hers. What the hell - let's just all ring each-other's bells shall we? And dim your lights. And open your garage door. And let's do it from the comfort and safety of my car, whilst driving around...
Speaking of hell, what the hell am I talking about???
A little while ago I got involved in a project that needed some hardware security testing. It was a complex system that used just about every protocol under the sun, including RF.
Now RF, like other 'invisible' transport mechanisms, always gets me interested because, in my experience, once data becomes invisible, something magical happens: they forget about security. Nobody can see what's going on, so we don't need to worry about it, right? Wrong. Time and time again I've seen this... MagStripes, InfraRed, RFID, Bluetooth, Magic Moon Beams. You name it, they'll send data over it insecurely.
In this case the RF was mostly standard stuff like WiFi and Zigbee, but there was also something going on in the 400MHz band, so how to take a look at what was there?
The obvious answer is to use an SDR (Software Defined Radio), and from previous projects I have a USRP which fits the bill. However, as I travel a lot, I prefer something a little more portable, so I'm always on the lookout for smaller alternatives. As it happens, a friend gave me a nice Christmas gift (thanks CJ!) of a FunCube dongle:
This very cool device can receive on any frequency from 64MHz to 1.7GHz and fits in my laptop bag so is absolutely ideal. It also presents itself to the PC as a pseudo sound card, so is very easy to interface to. This was a fantastic bonus for me as I'm already comfortable with the idea of converting audio into data and have used the soundcard in my laptop for that purpose on many previous projects (e.g. magstripes).
Radio is, almost by definition, very mysterious. You can't see it and you can't hear it, so using a soundcard is actually a very good shortcut to helping understand this completely unknown source of data. It's not intuitively obvious that it should be that way, but the human brain is very good at recognising patterns, and the soundcard not only provides us with auditory data that our ears will immediately be able to latch onto, but also visual data in the form of an editable wave file. The bottom line is that I don't understand how radio works, and I don't particularly want to - all I want is to be able to capture whatever's being sent over it and convert into something I can deal with - i.e. bits and bytes. So how to do that?
The first task is to determine exactly what frequency our signal is on. There are several ways of doing this, and the simplest is to make a rough guess and just take a listen. If you're anywhere close you'll hear something when you activate the device, and you can then tune up or down until you've found the centre frequency and you're getting nice crisp clean signals. This is particularly important when trying to convert mysterious airy-fairy analogue signals back into nice reliable 0s and 1s, as any deviation can end up corrupting your data beyond all recognition.
Another way is to use a spectrum analyser. This is essentially another type of RF receiver, that listens on a very wide band and shows you any spikes or other discrepancies, one of which will be the signal you're looking for. This can be in the form of software using the FunCube itself, such as HDSDR (Windows) or QUISK (Linux), or a standalone hardware device like the RF Explorer. I actually use both. The RF Explorer to quickly find the signal, and then QUISK or HDSDR to fine tune.
So getting back to our examination, I can't talk about the actual device in question, but since I have a wireless doorbell, let's take a look at that instead...
Like most such devices, it helpfully tells you what frequency it's using on its R&TTE approval label. In this case, 433.92 MHz.
Putting that into HDSDR and hitting the button produces a nice 'hot' line right on the centre (the white and orange blob in the top window), so it looks like we're in the money... We can also hear what is obviously data.
OK, so now what? How do we get it from the sound card into nice friendly binary data?
Although we've decided against using the USRP, it's companion software, GNU Radio, is the obvious choice. It has a great helper tool called GNU Radio Companion which makes this kind of task an absolute doddle. There is a plugin for the FunCube which is now bundled with the main GNU Radio distribution, so no extra work is required to get it up and running.
Firing it up, we can build a simple setup that connects our funcube to our speakers:
and again, if we run it, we get some nice 'data' sounding output... So we can hear it, and it sounds like data, but we still can't do anything useful with it. Saving it to a wavefile is just as easy:
and now this is where the fun begins. We can edit that file with any audio editor. I used Audacity but pretty much anything will do.
We can clearly see our data bursts, and if we zoom in:
we can see some proper structure to it. This not only sounds like data but it looks like it too. What we appear to have is long pulses and short pulses, so we can imagine they may represent 0s and 1s just as they are - maybe a short pulse is a 0 and a long a 1...
Now I know I said I wasn't interested in understanding radio, but there is one little thing that will help to convert our data from it's current analogue form into proper digital, and that's modulation. There are many modulation schemes out there, but the two you're most likely to encounter at this level are FM (Frequency Modulation) and AM (Amplitude Modulation). FM is normally used for things that need reasonably high fidelity, like speech or music, but AM, although it can also be used for speech and music, is perfectly suited to binary data as all it needs to be is either 'ON' or 'OFF'. This is also known as OOK, or On-Off Keying, and as we can see from our sample, this is clearly what we are dealing with here. We have a flat line when we're 'OFF', which then becomes wavy when we're 'ON'.
Now we know we're dealing with AM, we can get GNU Radio to do one more job for us: demodulate the AM signal.
And our signal now looks like this:
Now, instead of bursts of wavy stuff, it's pretty much a straight line that goes high or low which is very clearly binary data. We have long pulses and short pulses, and the whole sample is simply this short pattern repeating. If we assume the short pulse represents a 0 and the long a 1, this decodes as:
0100000011110
Add some leading zeros to bring it up to a multiple of eight bits and we get 00001000 00011110, which is 08 1E in hex. Of course it may actually be interpreted differently - the 0 and 1 may be the other way around, and the bit order may be reversed, but for our purposes, at this stage, it really doesn't matter as long as it makes some kind of sense.
Great, so now what? I know my doorbell push-button is spitting out '081E', so therefore the bell itself must be listening for '081E' and ringing when it hears it. My neighbour's bell-push won't set it off as it's presumably sending out a different number. But how to test this?
Ideally, I'd like to transmit my own signal, from something other than the bell-push, and if the bell rings I know I've got it right. Unfortunately, as cool as it is, the Fun Cube is just a receiver, so we need something that can transmit as well... The easy option would be to go back to the USRP, but I've already discounted that as it won't fit in my laptop bag and I'd like to be able to do this on the move...
As I mentioned, the original device we were looking at was also using WiFi and Zigbee, so we were using an Ubertooth 2.4GHz dongle to poke around with that. I knew there were chips in the same device range that did sub-GHz frequencies, so I asked Mike Ossman, the Ubertooth's designer, if he knew of any projects utilising these. I was in luck: he did. Not only had he got some fun and interesting research of his own, but he pointed me at RFCat, a new project (at the time) designed to do exactly this kind of thing. Perfect! Not only would I be able to receive the signals from the bell-push, but I should be able to emulate them as well.
RFCat is based around a Texas Instruments SoC (System on Chip) called the CC1111. These are really very cool devices, which provide microprocessor and built-in RF transceiver all in one package. This one even has an AES capable crypto co-processor and built-in USB, so it is the ideal platform for this kind of tomfoolery...
Development kits in 433, 868 and 915 MHz bands are available off the shelf, and come in two forms: either as a standalone USB dongle (868/915 only):
or these nifty wristwatches:
RFCat is a replacement firmware package for the USB dongle part of the kit, and allows low level access to the radio functions via a simple USB command interface. Oh, and it's in python. Joy! :)
So one impatient wait for an overnight delivery later I'm in business and I've got my RFCat dongle up and running. It has a nice object oriented interface, so all you need to do is create an instance and start doing stuff with it (my commands are in bold)...
$ rfcat -r
'RfCat, the greatest thing since Frequency Hopping!'
Research Mode: enjoy the raw power of rflib
currently your environment has an object called "d" for dongle. this is how you interact with the rfcat dongle:
>>> d.ping()
>>> d.setFreq(433000000)
>>> d.setMdmModulation(MOD_ASK_OOK)
>>> d.makePktFLEN(250)
>>> d.RFxmit("HALLO")
>>> d.RFrecv()
>>> print d.reprRadioConfig()
Python 2.7.2+ (default, Jul 20 2012, 22:15:08)
Type "copyright", "credits" or "license" for more information.
IPython 0.10.2 -- An enhanced Interactive Python.
? -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help -> Python's own help system.
object? -> Details about 'object'. ?object also works, ?? prints more.
In [1]: d.ping()
PING: 26 bytes transmitted, received: 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' (0.002653 seconds)
PING: 26 bytes transmitted, received: 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' (0.002528 seconds)
PING: 26 bytes transmitted, received: 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' (0.004721 seconds)
PING: 26 bytes transmitted, received: 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' (0.004821 seconds)
PING: 26 bytes transmitted, received: 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' (0.004573 seconds)
PING: 26 bytes transmitted, received: 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' (0.002605 seconds)
PING: 26 bytes transmitted, received: 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' (0.002430 seconds)
PING: 26 bytes transmitted, received: 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' (0.002678 seconds)
PING: 26 bytes transmitted, received: 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' (0.002519 seconds)
PING: 26 bytes transmitted, received: 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' (0.002820 seconds)
Out[1]: (10, 0, 0.0331571102142334)
In [2]: d.setFreq(433920000)
In [3]: d.RFxmit('\x08\x1E')
In [4]:
At this point, not surprisingly, my doorbell didn't ring. This is because our interpretation of the data, giving us HEX 081E, is a little bit simplistic. The RFCat dongle doesn't understand that we want to represent a 0 as a short pulse and a 1 as a long, so we have to do a bit more work to get it into a format that RFCat can deal with...
A traditional microprocessor controlled radio circuit would typically have a separate circuit or daughterboard for the radio portion, and the microprocessor would signal the data it wanted to send by toggling a pin HIGH/LOW. The microprocessor would be entirely responsible for making sure that the timing was correct - i.e. that it held the pin HIGH for as long as it wanted the RF to be 'ON', and LOW for the duration of the 'OFF' period. However, in these SoC devices, the radio part is all done for you and you simply need to tell it what modulation scheme you want, speed of transmission etc., feed it some data and it will do the rest.
As we already know the modulation scheme (AM/OOK), that bit's easy, so now we just have to think of our original signal in terms of OOK, and what our data would need to look like to produce the same signal.
Looking at the original trace, it's pretty 'readable', but If we want to really tidy it up we can turn it into a square wave and this will make visual checking of bit lengths much easier and more accurate. Since a .wav file is just a header with a bunch of values for each sample, it's really easy to manipulate. In this case we want to take any value that is below 0 and set it to absolute minimum, and anything above 0 we set to maximum, which is effectively what the original source was doing before the signal got converted and sent over RF - a 0 was a pin going LOW and a 1 was a pin going HIGH...
Accordingly, I wrote a little command line tool for tweaking wav files:
$ wav-cli.py /tmp/test1.wav square 0 out /tmp/ts.wav
Converting to square wave
Writing /tmp/ts.wav
and this is the result:
Now we can accurately convert the signal into true OOK binary. We take the smallest element as our single binary digit, and then represent the data with a 1 when we want the line to go high, and 0 when we want it to go low, taking into account the size of our pulse compared to the single binary digit. In this case we only have two different size pulses - short and long, so we can represent them with a single or double digit:
Again, we need to add some leading or trailing 0s to give ourselves an 8-bit multiple, so the final number we end up with is 00101100 10010010 01001001 01101101 10110010, or 2C 92 49 6D B2 in HEX. Since it's always the same, we don't really need to understand what this message 'means', only to be able to reproduce it.
So in theory, if I set up RFCat to work in OOK mode with the correct speed and modulation, I should be able to transmit 2C92496DB2 and my doorbell should ring (the speed I get by measuring a short pulse width in seconds) ...
In [1]: d.setFreq(433920000)
In [2]: d.setMdmDRate(int(1.0/0.000302))
In [3]: d.setMdmModulation(MOD_ASK_OOK)
In [4]: d.RFxmit('\x2C\x92\x49\x6D\xB2')
Hmm.... Nothing. However, my bell-push didn't just transmit the message once, it sent it dozens of times, so maybe I need to do the same:
In [5]: d.RFxmit('\x2C\x92\x49\x6D\xB2' * 60)
Nope, still nothing. Going back to my original trace I could see there was a gap between each data pulse, which we can easily simulate by adding some extra '0' bits, so:
In [6]: d.RFxmit('\x2C\x92\x49\x6D\xB2\x00\x00\x00' * 60)
Bingo! The doorbell rings and my dogs go crazy telling me there's someone at the door! Nice!!!
Well, this is very cool and all, but it's not very, erm... Bond, is it? I mean, Daniel Craig isn't going to get the girl, save the world and keep Dame Judi happy by saying... "Hang on Bad Guys, I've just got to get my laptop out... plug in this USB dongle... nearly got it... just a tick... Ouch, that hurt!"
No. Not really.
What we need is something much cooler, sexier, and, well.... shiny! Something Gucci that's always right there, ready to go at a moment's notice... .
But wait! What's that in the box of bits that came with my dev kit? A wristwatch? With a frikkin' transmitter built into it???? OK.... now that's what I'm talkin' 'bout!
Come to Papa...
And so, I give you the latest thing from my local toy store... It's called "radio":
It's basically a cut-down RFCat-like firmware package that allows you to use the watch to transmit arbitrary signals. You can set it up either from the watch itself, or via the original Chronos dongle with a python helper, and then the up/down buttons on the right of the watch do the transmitting.
The python helper looks like this:
$ chronic-cli.py
Usage: /usr/local/bin/chronic-cli.py <COMMAND> [ARG(s)] ... [<COMMAND> [ARG(s)] ... ]
Commands:
BAUD <RATE> Set RF modem baudrate
BYRON Configure for Byron doorbell emulation
DELAY <0-255> Delay in MS between each DATA transmission
DOWN <HEX> <HEX> <HEX> Set DATA for DOWN button - 3 * 63 bytes
EXIT Force sync mode EXIT on Chronos
FREQ <FREQUENCY> Set Frequency (e.g. 433920000)
FRIEDLAND Configure for Friedland doorbell emulation
MAN <'ON'|'OFF'> Set Manchester Encoding
MOD <FSK|GFSK|OOK|MSK> Modulation:
FSK - Frequency Shift Keying
GFSK - Gaussian Frequency Shift Keying
OOK - On-Off Keying (in ASK mode)
MSK - Multiple Frequency Shift Keying
REPEAT <0-255> Number of times to repeat DATA when button pressed
RUKU Configure for Ruku garage door emulation
SERIAL <BAUD> Set access point comms baudrate (default 115200)
PULSE <WIDTH> Set pulsewidth (baud rate = 1.0/pulsewidth)
TIME Synchronise time/date
UP <HEX> <HEX> <HEX> Set DATA for UP button - 3 * 63 bytes
Commands will be executed sequentially and must be combined as appropriate.
It is recommended to finish with an EXIT to help conserve battery.
Full instructions are in the README in the github repo, but here is an example of setting it up to ring my doorbell. Put the watch in 'SYNC' mode, and then:
$ chronic-cli.py freq 433920000 man off delay 0 repeat 60 pulse 0.000320 up 2C92496DB2000000 '' '' down 2C92496DB2000000 '' '' exit
Setting Frequency: 433920000 (OK)
Setting Manchester Encoding: OFF (OK)
Setting delay: 0 (OK)
Setting repeat: 60 (OK)
Setting pulsewidth: 0.00032 (3124.237061 Hz) (OK)
Setting UP Button: (OK)
Setting DOWN Button: (OK)
Sending EXIT command
Or you can take the shortcut:
$ chronic-cli.py byron
Setting up for Byron Doorbell
Setting Frequency: 433920000
Setting Manchester Encoding: OFF
Setting Delay: 0
Setting Repeat: 60
Setting PulseWidth: 0.000320 (3124.237061 Hz)
Setting UP button: 2C92496DB2000000
Setting DOWN button: 2C92496DB2000000
Or use the built-in menu option:
And there are plenty of other targets...
Discussions on the gnuradio mailing list back in 2006 show that the obvious one of a car key was being looked at. Matt Ettus says:
"After the Wired article today, I've received a couple of email from people who are concerned that the USRP could be used to clone their keyfob transmitters for car alarms and garage doors. I'm not concerned, since there are already many ways to do this (just check the back of pupular science magazine). However, I am curious about it. I know that we can capture and play back any rf signal. The question is whether that replayed signal would result in the door being unlocked. I was under the impression that most of those systems allow an unlock code to only be used once, but does anyone out there know for sure?"
Well, here's your answer:
Unlocking and re-locking my son's Beemer:
And the wife's Disco (note the pause and the second set of 'clunks' - this is because the first command only opens the driver's door, but because we have the option to send multiple sequences we can send another open command which then opens the rest of the doors):
Of course, opening car doors is a nice party trick, but because modern vehicles are secured by rolling codes, that's all it is - a party trick. You'll be able to do this once and once only with each 'hacked' sequence...
What's of more concern to me are devices like the 'Owl Plug':
These handy little devices allow you to control mains voltage appliances via RF. Clearly, this could have serious consequences if care is not taken when switching things on and off. What if it's an electric heater and it got shoved into a corner to vacuum the room? It gets switched back on and bingo, the curtains are on fire! Let's hope they've made the protocol nice and secure then!
Oh, dear. No rolling code. Same bit sequence every time:
And the only difference between the five buttons on the remote is a few bits. I suspect, therefore, that the only difference between my remote and my neighbour’s will also only be a few bits, so it's probably not much of an exercise to figure out which ones I need to brute force to be able to go around switching things on and off at random (I've ordered another one and will check, so watch this space...).
As usual, the code is available on the Aperture Labs tools page, but please bear in mind that while playing with your own RF devices is perfectly OK in any reasonable society, playing with other people's (without their permission) is most definitely not (and probably illegal)! Behave! :)
But she's right: you can ring my bell. And I can ring yours. And hers. What the hell - let's just all ring each-other's bells shall we? And dim your lights. And open your garage door. And let's do it from the comfort and safety of my car, whilst driving around...
Speaking of hell, what the hell am I talking about???
A little while ago I got involved in a project that needed some hardware security testing. It was a complex system that used just about every protocol under the sun, including RF.
Now RF, like other 'invisible' transport mechanisms, always gets me interested because, in my experience, once data becomes invisible, something magical happens: they forget about security. Nobody can see what's going on, so we don't need to worry about it, right? Wrong. Time and time again I've seen this... MagStripes, InfraRed, RFID, Bluetooth, Magic Moon Beams. You name it, they'll send data over it insecurely.
In this case the RF was mostly standard stuff like WiFi and Zigbee, but there was also something going on in the 400MHz band, so how to take a look at what was there?
The obvious answer is to use an SDR (Software Defined Radio), and from previous projects I have a USRP which fits the bill. However, as I travel a lot, I prefer something a little more portable, so I'm always on the lookout for smaller alternatives. As it happens, a friend gave me a nice Christmas gift (thanks CJ!) of a FunCube dongle:
This very cool device can receive on any frequency from 64MHz to 1.7GHz and fits in my laptop bag so is absolutely ideal. It also presents itself to the PC as a pseudo sound card, so is very easy to interface to. This was a fantastic bonus for me as I'm already comfortable with the idea of converting audio into data and have used the soundcard in my laptop for that purpose on many previous projects (e.g. magstripes).
Radio is, almost by definition, very mysterious. You can't see it and you can't hear it, so using a soundcard is actually a very good shortcut to helping understand this completely unknown source of data. It's not intuitively obvious that it should be that way, but the human brain is very good at recognising patterns, and the soundcard not only provides us with auditory data that our ears will immediately be able to latch onto, but also visual data in the form of an editable wave file. The bottom line is that I don't understand how radio works, and I don't particularly want to - all I want is to be able to capture whatever's being sent over it and convert into something I can deal with - i.e. bits and bytes. So how to do that?
The first task is to determine exactly what frequency our signal is on. There are several ways of doing this, and the simplest is to make a rough guess and just take a listen. If you're anywhere close you'll hear something when you activate the device, and you can then tune up or down until you've found the centre frequency and you're getting nice crisp clean signals. This is particularly important when trying to convert mysterious airy-fairy analogue signals back into nice reliable 0s and 1s, as any deviation can end up corrupting your data beyond all recognition.
Another way is to use a spectrum analyser. This is essentially another type of RF receiver, that listens on a very wide band and shows you any spikes or other discrepancies, one of which will be the signal you're looking for. This can be in the form of software using the FunCube itself, such as HDSDR (Windows) or QUISK (Linux), or a standalone hardware device like the RF Explorer. I actually use both. The RF Explorer to quickly find the signal, and then QUISK or HDSDR to fine tune.
So getting back to our examination, I can't talk about the actual device in question, but since I have a wireless doorbell, let's take a look at that instead...
Like most such devices, it helpfully tells you what frequency it's using on its R&TTE approval label. In this case, 433.92 MHz.
Putting that into HDSDR and hitting the button produces a nice 'hot' line right on the centre (the white and orange blob in the top window), so it looks like we're in the money... We can also hear what is obviously data.
OK, so now what? How do we get it from the sound card into nice friendly binary data?
Although we've decided against using the USRP, it's companion software, GNU Radio, is the obvious choice. It has a great helper tool called GNU Radio Companion which makes this kind of task an absolute doddle. There is a plugin for the FunCube which is now bundled with the main GNU Radio distribution, so no extra work is required to get it up and running.
Firing it up, we can build a simple setup that connects our funcube to our speakers:
and again, if we run it, we get some nice 'data' sounding output... So we can hear it, and it sounds like data, but we still can't do anything useful with it. Saving it to a wavefile is just as easy:
and now this is where the fun begins. We can edit that file with any audio editor. I used Audacity but pretty much anything will do.
We can clearly see our data bursts, and if we zoom in:
Now I know I said I wasn't interested in understanding radio, but there is one little thing that will help to convert our data from it's current analogue form into proper digital, and that's modulation. There are many modulation schemes out there, but the two you're most likely to encounter at this level are FM (Frequency Modulation) and AM (Amplitude Modulation). FM is normally used for things that need reasonably high fidelity, like speech or music, but AM, although it can also be used for speech and music, is perfectly suited to binary data as all it needs to be is either 'ON' or 'OFF'. This is also known as OOK, or On-Off Keying, and as we can see from our sample, this is clearly what we are dealing with here. We have a flat line when we're 'OFF', which then becomes wavy when we're 'ON'.
Now we know we're dealing with AM, we can get GNU Radio to do one more job for us: demodulate the AM signal.
And our signal now looks like this:
Now, instead of bursts of wavy stuff, it's pretty much a straight line that goes high or low which is very clearly binary data. We have long pulses and short pulses, and the whole sample is simply this short pattern repeating. If we assume the short pulse represents a 0 and the long a 1, this decodes as:
0100000011110
Add some leading zeros to bring it up to a multiple of eight bits and we get 00001000 00011110, which is 08 1E in hex. Of course it may actually be interpreted differently - the 0 and 1 may be the other way around, and the bit order may be reversed, but for our purposes, at this stage, it really doesn't matter as long as it makes some kind of sense.
Great, so now what? I know my doorbell push-button is spitting out '081E', so therefore the bell itself must be listening for '081E' and ringing when it hears it. My neighbour's bell-push won't set it off as it's presumably sending out a different number. But how to test this?
Ideally, I'd like to transmit my own signal, from something other than the bell-push, and if the bell rings I know I've got it right. Unfortunately, as cool as it is, the Fun Cube is just a receiver, so we need something that can transmit as well... The easy option would be to go back to the USRP, but I've already discounted that as it won't fit in my laptop bag and I'd like to be able to do this on the move...
As I mentioned, the original device we were looking at was also using WiFi and Zigbee, so we were using an Ubertooth 2.4GHz dongle to poke around with that. I knew there were chips in the same device range that did sub-GHz frequencies, so I asked Mike Ossman, the Ubertooth's designer, if he knew of any projects utilising these. I was in luck: he did. Not only had he got some fun and interesting research of his own, but he pointed me at RFCat, a new project (at the time) designed to do exactly this kind of thing. Perfect! Not only would I be able to receive the signals from the bell-push, but I should be able to emulate them as well.
RFCat is based around a Texas Instruments SoC (System on Chip) called the CC1111. These are really very cool devices, which provide microprocessor and built-in RF transceiver all in one package. This one even has an AES capable crypto co-processor and built-in USB, so it is the ideal platform for this kind of tomfoolery...
Development kits in 433, 868 and 915 MHz bands are available off the shelf, and come in two forms: either as a standalone USB dongle (868/915 only):
or these nifty wristwatches:
RFCat is a replacement firmware package for the USB dongle part of the kit, and allows low level access to the radio functions via a simple USB command interface. Oh, and it's in python. Joy! :)
So one impatient wait for an overnight delivery later I'm in business and I've got my RFCat dongle up and running. It has a nice object oriented interface, so all you need to do is create an instance and start doing stuff with it (my commands are in bold)...
$ rfcat -r
'RfCat, the greatest thing since Frequency Hopping!'
Research Mode: enjoy the raw power of rflib
currently your environment has an object called "d" for dongle. this is how you interact with the rfcat dongle:
>>> d.ping()
>>> d.setFreq(433000000)
>>> d.setMdmModulation(MOD_ASK_OOK)
>>> d.makePktFLEN(250)
>>> d.RFxmit("HALLO")
>>> d.RFrecv()
>>> print d.reprRadioConfig()
Python 2.7.2+ (default, Jul 20 2012, 22:15:08)
Type "copyright", "credits" or "license" for more information.
IPython 0.10.2 -- An enhanced Interactive Python.
? -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help -> Python's own help system.
object? -> Details about 'object'. ?object also works, ?? prints more.
In [1]: d.ping()
PING: 26 bytes transmitted, received: 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' (0.002653 seconds)
PING: 26 bytes transmitted, received: 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' (0.002528 seconds)
PING: 26 bytes transmitted, received: 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' (0.004721 seconds)
PING: 26 bytes transmitted, received: 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' (0.004821 seconds)
PING: 26 bytes transmitted, received: 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' (0.004573 seconds)
PING: 26 bytes transmitted, received: 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' (0.002605 seconds)
PING: 26 bytes transmitted, received: 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' (0.002430 seconds)
PING: 26 bytes transmitted, received: 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' (0.002678 seconds)
PING: 26 bytes transmitted, received: 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' (0.002519 seconds)
PING: 26 bytes transmitted, received: 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' (0.002820 seconds)
Out[1]: (10, 0, 0.0331571102142334)
In [2]: d.setFreq(433920000)
In [3]: d.RFxmit('\x08\x1E')
In [4]:
At this point, not surprisingly, my doorbell didn't ring. This is because our interpretation of the data, giving us HEX 081E, is a little bit simplistic. The RFCat dongle doesn't understand that we want to represent a 0 as a short pulse and a 1 as a long, so we have to do a bit more work to get it into a format that RFCat can deal with...
A traditional microprocessor controlled radio circuit would typically have a separate circuit or daughterboard for the radio portion, and the microprocessor would signal the data it wanted to send by toggling a pin HIGH/LOW. The microprocessor would be entirely responsible for making sure that the timing was correct - i.e. that it held the pin HIGH for as long as it wanted the RF to be 'ON', and LOW for the duration of the 'OFF' period. However, in these SoC devices, the radio part is all done for you and you simply need to tell it what modulation scheme you want, speed of transmission etc., feed it some data and it will do the rest.
As we already know the modulation scheme (AM/OOK), that bit's easy, so now we just have to think of our original signal in terms of OOK, and what our data would need to look like to produce the same signal.
Looking at the original trace, it's pretty 'readable', but If we want to really tidy it up we can turn it into a square wave and this will make visual checking of bit lengths much easier and more accurate. Since a .wav file is just a header with a bunch of values for each sample, it's really easy to manipulate. In this case we want to take any value that is below 0 and set it to absolute minimum, and anything above 0 we set to maximum, which is effectively what the original source was doing before the signal got converted and sent over RF - a 0 was a pin going LOW and a 1 was a pin going HIGH...
Accordingly, I wrote a little command line tool for tweaking wav files:
$ wav-cli.py /tmp/test1.wav square 0 out /tmp/ts.wav
Converting to square wave
Writing /tmp/ts.wav
and this is the result:
Now we can accurately convert the signal into true OOK binary. We take the smallest element as our single binary digit, and then represent the data with a 1 when we want the line to go high, and 0 when we want it to go low, taking into account the size of our pulse compared to the single binary digit. In this case we only have two different size pulses - short and long, so we can represent them with a single or double digit:
Again, we need to add some leading or trailing 0s to give ourselves an 8-bit multiple, so the final number we end up with is 00101100 10010010 01001001 01101101 10110010, or 2C 92 49 6D B2 in HEX. Since it's always the same, we don't really need to understand what this message 'means', only to be able to reproduce it.
So in theory, if I set up RFCat to work in OOK mode with the correct speed and modulation, I should be able to transmit 2C92496DB2 and my doorbell should ring (the speed I get by measuring a short pulse width in seconds) ...
In [1]: d.setFreq(433920000)
In [2]: d.setMdmDRate(int(1.0/0.000302))
In [3]: d.setMdmModulation(MOD_ASK_OOK)
In [4]: d.RFxmit('\x2C\x92\x49\x6D\xB2')
Hmm.... Nothing. However, my bell-push didn't just transmit the message once, it sent it dozens of times, so maybe I need to do the same:
In [5]: d.RFxmit('\x2C\x92\x49\x6D\xB2' * 60)
Nope, still nothing. Going back to my original trace I could see there was a gap between each data pulse, which we can easily simulate by adding some extra '0' bits, so:
In [6]: d.RFxmit('\x2C\x92\x49\x6D\xB2\x00\x00\x00' * 60)
Bingo! The doorbell rings and my dogs go crazy telling me there's someone at the door! Nice!!!
Well, this is very cool and all, but it's not very, erm... Bond, is it? I mean, Daniel Craig isn't going to get the girl, save the world and keep Dame Judi happy by saying... "Hang on Bad Guys, I've just got to get my laptop out... plug in this USB dongle... nearly got it... just a tick... Ouch, that hurt!"
No. Not really.
What we need is something much cooler, sexier, and, well.... shiny! Something Gucci that's always right there, ready to go at a moment's notice... .
But wait! What's that in the box of bits that came with my dev kit? A wristwatch? With a frikkin' transmitter built into it???? OK.... now that's what I'm talkin' 'bout!
Come to Papa...
And so, I give you the latest thing from my local toy store... It's called "radio":
Chronos Integrated Commander
Or ChronIC for short...It's basically a cut-down RFCat-like firmware package that allows you to use the watch to transmit arbitrary signals. You can set it up either from the watch itself, or via the original Chronos dongle with a python helper, and then the up/down buttons on the right of the watch do the transmitting.
The python helper looks like this:
$ chronic-cli.py
Usage: /usr/local/bin/chronic-cli.py <COMMAND> [ARG(s)] ... [<COMMAND> [ARG(s)] ... ]
Commands:
BAUD <RATE> Set RF modem baudrate
BYRON Configure for Byron doorbell emulation
DELAY <0-255> Delay in MS between each DATA transmission
DOWN <HEX> <HEX> <HEX> Set DATA for DOWN button - 3 * 63 bytes
EXIT Force sync mode EXIT on Chronos
FREQ <FREQUENCY> Set Frequency (e.g. 433920000)
FRIEDLAND Configure for Friedland doorbell emulation
MAN <'ON'|'OFF'> Set Manchester Encoding
MOD <FSK|GFSK|OOK|MSK> Modulation:
FSK - Frequency Shift Keying
GFSK - Gaussian Frequency Shift Keying
OOK - On-Off Keying (in ASK mode)
MSK - Multiple Frequency Shift Keying
REPEAT <0-255> Number of times to repeat DATA when button pressed
RUKU Configure for Ruku garage door emulation
SERIAL <BAUD> Set access point comms baudrate (default 115200)
PULSE <WIDTH> Set pulsewidth (baud rate = 1.0/pulsewidth)
TIME Synchronise time/date
UP <HEX> <HEX> <HEX> Set DATA for UP button - 3 * 63 bytes
Commands will be executed sequentially and must be combined as appropriate.
It is recommended to finish with an EXIT to help conserve battery.
Full instructions are in the README in the github repo, but here is an example of setting it up to ring my doorbell. Put the watch in 'SYNC' mode, and then:
$ chronic-cli.py freq 433920000 man off delay 0 repeat 60 pulse 0.000320 up 2C92496DB2000000 '' '' down 2C92496DB2000000 '' '' exit
Setting Frequency: 433920000 (OK)
Setting Manchester Encoding: OFF (OK)
Setting delay: 0 (OK)
Setting repeat: 60 (OK)
Setting pulsewidth: 0.00032 (3124.237061 Hz) (OK)
Setting UP Button: (OK)
Setting DOWN Button: (OK)
Sending EXIT command
Or you can take the shortcut:
$ chronic-cli.py byron
Setting up for Byron Doorbell
Setting Frequency: 433920000
Setting Manchester Encoding: OFF
Setting Delay: 0
Setting Repeat: 60
Setting PulseWidth: 0.000320 (3124.237061 Hz)
Setting UP button: 2C92496DB2000000
Setting DOWN button: 2C92496DB2000000
Or use the built-in menu option:
And there are plenty of other targets...
Discussions on the gnuradio mailing list back in 2006 show that the obvious one of a car key was being looked at. Matt Ettus says:
"After the Wired article today, I've received a couple of email from people who are concerned that the USRP could be used to clone their keyfob transmitters for car alarms and garage doors. I'm not concerned, since there are already many ways to do this (just check the back of pupular science magazine). However, I am curious about it. I know that we can capture and play back any rf signal. The question is whether that replayed signal would result in the door being unlocked. I was under the impression that most of those systems allow an unlock code to only be used once, but does anyone out there know for sure?"
Well, here's your answer:
Unlocking and re-locking my son's Beemer:
And the wife's Disco (note the pause and the second set of 'clunks' - this is because the first command only opens the driver's door, but because we have the option to send multiple sequences we can send another open command which then opens the rest of the doors):
Of course, opening car doors is a nice party trick, but because modern vehicles are secured by rolling codes, that's all it is - a party trick. You'll be able to do this once and once only with each 'hacked' sequence...
What's of more concern to me are devices like the 'Owl Plug':
These handy little devices allow you to control mains voltage appliances via RF. Clearly, this could have serious consequences if care is not taken when switching things on and off. What if it's an electric heater and it got shoved into a corner to vacuum the room? It gets switched back on and bingo, the curtains are on fire! Let's hope they've made the protocol nice and secure then!
Oh, dear. No rolling code. Same bit sequence every time:
And the only difference between the five buttons on the remote is a few bits. I suspect, therefore, that the only difference between my remote and my neighbour’s will also only be a few bits, so it's probably not much of an exercise to figure out which ones I need to brute force to be able to go around switching things on and off at random (I've ordered another one and will check, so watch this space...).
As usual, the code is available on the Aperture Labs tools page, but please bear in mind that while playing with your own RF devices is perfectly OK in any reasonable society, playing with other people's (without their permission) is most definitely not (and probably illegal)! Behave! :)
Subscribe to:
Posts (Atom)