Friday, December 12, 2014

Skipping a python unittest with a custom decorator

Example custom Python unittest skip decorator:

import functools
import glob
import os

def skipIfNoYamlFiles(f):
  @functools.wraps(f)
  def wrapper(*args, **kwargs):
    yaml_glob = os.path.normpath(os.path.dirname(__file__) + "/../../definitions/*.yaml")
    if glob.glob(yaml_glob):
      return f(*args, **kwargs)
    else:
      return unittest.skip("No YAML found with %s, skipping this test." % yaml_glob)
  return wrapper
Use it like this:
class MyTests(unittest.TestCase):

  @test_lib.skipIfNoYamlFiles
  def testValidation(self):
    """Ensure all Yaml validates."""
    print "test code goes here"

Wednesday, December 10, 2014

Migrating Chrome Profiles to per-channel directories

If you run custom chrome profiles you might see a message like this in the terminal if/when you launch chrome from the commandline:
[9894:9894:1210/140235:WARNING:sxs_linux.cc(139)] User data dir migration is only possible when the profile is used with the same channel.
It seems chrome recently made a change to where user profiles are stored. Previously different tracks (stable, beta etc.) all used the same profile like:
~/.config/google-chrome/myprofilename
Now however, these are split out by track like this:
~/.config/google-chrome/myprofilename
~/.config/google-chrome-beta/myprofilename
and if you run chrome beta with a profile that has also been used with chrome stable then chrome won't use it, and does a half-assed job of warning you about the problem (i.e. it prints a warning to the terminal as above, which you will only ever see if you're running it interactively). My reading of the bug was that this data wouldn't get migrated automatically, but you could migrate it yourself.

I did this for each profile and it seemed to work fine, all my settings and themes look correct:
cp -a ~/.config/google-chrome/myprofilename ~/.config/google-chrome-beta/

Friday, November 28, 2014

Make newly added files show up in git diff output

Sometimes it's handy to be able to show a normal diff for all the changes in a patch. The default behaviour is to hide the diffs of new files, i.e. in this case the output of "git diff" is missing the contents of newfile:
$ git init
Initialized empty Git repository in /gittemp/.git/
$ echo "blah" > newfile
$ git add newfile 
$ git diff
$ git status
# On branch master
#
# Initial commit
#
# Changes to be committed:
#   (use "git rm --cached ..." to unstage)
#
# new file:   newfile
#
If you'd like to see the complete content of the new file in the diff output use the "-N" option to git add:
$ echo "blahmore" > newfile2
$ git add -N newfile2
$ git diff
diff --git a/newfile2 b/newfile2
index e69de29..39b5650 100644
--- a/newfile2
+++ b/newfile2
@@ -0,0 +1 @@
+blahmore

Sunday, November 23, 2014

Thunderbolt DMA attacks on OS X

The current TL;DR on Thunderbolt DMA attacks is that the VT-d IOMMU is doing its job and Ivy-Bridge (2012 and later hardware) Macs running OS X >= 10.8.2 are not vulnerable to the easy direct-write-to-memory-style attacks we saw popularized with firewire.

While Inception claims to work with Thunderbolt, it's really only a firewire attack, so you need a Thunderbolt to firewire converter and it's subject to the same limitations as normal firewire, as described at the end of this post.

There was a great 2013 Blackhat talk by Russ Sevinsky that covered lots of chip-level reverse engineering for Thunderbolt, but ultimately he didn't come up with an attack (excellent description of the reverse engineering process though). More recently snare described how to set up an attack on thunderbolt with an FPGA board connected to a mac via a Thunderbolt-PCIe device. But the IOMMU foiled his efforts on modern hardware. Snare says he's working on trying to bypass VT-d, so there may be interesting developments in the future.

For now you should probably still be more worried about snare's other work using PCIe option ROMs as a bootkit.

Sunday, November 9, 2014

SMTP email conversation threads with python: making gmail associate messages into a thread

I have some python software that sends emails, and I wanted gmail to group messages that were related to the same subject in a conversation. It's not immediately obvious how this works, and there's plenty of bad advice out there, including people stating that you just need to add "RE:" to the subject line, which is just wrong.

The way conversation threads are constructed is by using the SMTP "Message-ID" as a reference to the original email in the "In-Reply-To" and "References" headers. RFC2822 has the details, explains some fairly complex multi-parent cases, and includes some good examples. My use case was very simple, just two messages I wanted associated. The first message is sent with a message-id, the second one references it (you'll need to store this ID somewhere if you want to send more messages in the same thread):
import email.utils
from email.mime.multipart import MIMEMultipart
import smtplib

myid = email.utils.make_msgid()
msg = MIMEMultipart("alternative")
msg["Subject"] = "test"
msg["From"] = "myuser@mycompany.com"
msg["To"] = "myuser@mycompany.com"
msg.add_header("Message-ID", myid)
s = smtplib.SMTP("smtp.mycompany.com")
s.sendmail("myuser@mycompany.com", ["myuser@mycompany.com"], msg.as_string())

msg = MIMEMultipart("alternative")
msg["Subject"] = "test"
msg["From"] = "myuser@mycompany.com"
msg["To"] = "myuser@mycompany.com"
msg.add_header("In-Reply-To", myid)
msg.add_header("References", myid)
s = smtplib.SMTP("smtp.mycompany.com")
s.sendmail("myuser@mycompany.com", ["myuser@mycompany.com"], msg.as_string())
Note that it is up to the host generating the message ID to guarantee it is unique. This function gives you an RFC-compliant message id and uses a datestamp to give you a unique ID. If you are sending lots of mail that may not be enough so you can pass it extra data that will get appended to the ID:
In [3]: import email.utils

In [4]: email.utils.make_msgid()
Out[4]: '<20141110055935 data-blogger-escaped-.21441.10732="" data-blogger-escaped-myhost.mycompany.com="">'

In [5]: email.utils.make_msgid('extrarandomsauce')
Out[5]: '<20141110060140 data-blogger-escaped-.21441.5878.extrarandomsauce="" data-blogger-escaped-myhost.mycompany.com="">'
I wasn't particularly careful with my first test message ID and sent a non-compliant one :) Gmail recognizes this and politely fixes it for you:
Message-ID: <545c12bc data-blogger-escaped-.240ada0a.6dba.5421smtpin_added_broken="" data-blogger-escaped-gmr-mx.google.com="">
X-Google-Original-Message-ID: testafasdfasdfasdfasdf

Wednesday, November 5, 2014

Splitting an array into fixed-size chunks with python

If you're looking for a way to split data into fixed size chunks with python you're likely to run across this recipe from the itertools documentation:
def grouper(iterable, n, fillvalue=None):
    "Collect data into fixed-length chunks or blocks"
    # grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx
    args = [iter(iterable)] * n
    return izip_longest(fillvalue=fillvalue, *args)
which certainly works, but why it works is less than obvious. In my case I was working with a small array where the data length was guaranteed to be a multiple of 4. I ended up using this, which is less sexy but more comprehendable:
[myarray[i:i+4] for i in xrange(0, len(myarray), 4)]

Monday, October 27, 2014

Authenticode signing windows executables on linux

We had a particularly painful build and sign workflow that required multiple trips between linux and windows.  I looked around and found the following options for signing windows binaries on linux:
  • jsign, a java implementation
  • signcode from the Mono project, as suggested by Mozilla.  It's in the mono-devel ubuntu package.
  • osslsigncode, an Openssl-based implementation of authenticode signing that uses curl to make the timestamp requests.
The Mozilla instructions are good for getting your keys and certs into a format that will work with these tools. Some minor additions to those below:

openssl pkcs12 -in authenticode.pfx -nocerts -nodes -out key.pem
openssl rsa -in key.pem -outform PVK -pvk-strong -out authenticode.pvk
openssl pkcs12 -in authenticode.pfx -nokeys -nodes -out cert.pem
cat Thawte_Primary_Root_CA_Cross.cer >> cert.pem
openssl crl2pkcs7 -nocrl -certfile cert.pem -outform DER -out authenticode.spc
shred -u key.pem
Once you're done here you have authenticode.pvk with your encrypted private key, and authenticode.spc with your public certs. Appending the cross cert is necessary to make signature validation work with some tools. The windows GUI "Properties|Digital Signatures|Details" dialog will tell you "This digital signature is OK" but if you check with signtool verify on Windows, you'll find it isn't:
>"C:\Program Files\Microsoft SDKs\Windows\v7.1\Bin\signtool.exe" verify /v /kp my.exe

Verifying: my.exe

[snip]

SignTool Error: Signing Cert does not chain to a Microsoft Root Cert.

Number of files successfully Verified: 0
Number of warnings: 0
Number of errors: 1
I suspect the GUI uses the local cert store and/or APIs that automatically fetch the required cross cert, but signtool and 3rd-party signature verifiers do not. With the cross cert added to the spc as above it can be correctly verified and mentions the MS cross cert:
Z:\signing\windows>"C:\Program Files\Microsoft SDKs\Windows\v7.1\Bin\signtool.exe" verify /v /kp my.exe

Verifying: my.exe

[snip]

Cross Certificate Chain:
    Issued to: Microsoft Code Verification Root

[snip]

Successfully verified: my.exe
If you use Bit9 it's also worth checking that it will verify your binary using the dascli.exe tool:
>"C:\Program Files (x86)\Bit9\Parity Agent\DasCLI.exe" certinfo my.exe
File[C:my.exe]
Elapsed[630ms]
CertValidated[Y] Detached[N] Publisher[My Inc]
FileVerified[Y]

[snip]
So, back to signing on Linux. At first I tried installing mono and using "signcode". It claims to succeed:
$ signcode sign -spc authenticode.spc -v authenticode.pvk -a sha1 -$ commercial -n MyApp -t http://timestamp.verisign.com/scripts/timestamp.dll -tr 5 my.exe
Mono SignCode - version 3.2.8.0
Sign assemblies and PE files using Authenticode(tm).
Copyright 2002, 2003 Motus Technologies. Copyright 2004-2008 Novell. BSD licensed.

Enter password for authenticode.pvk: 
MY_GODDAM_PASSWORD_IN_CLEARTEXT
Success
And in the process echoes your password in cleartext!?! This is something I was prepared to fix with a "read -s -p 'Password'" wrapper script like this guy, but the signature was no good. I could see it appended in a hexeditor but Windows didn't give me a Digital Signature tab in the GUI and signtool couldn't find it either:
>"C:\Program Files\Microsoft SDKs\Windows\v7.1\Bin\signtool.exe" verify /v /kp my.exe

Verifying: my.exe
SignTool Error: No signature found.

Number of files successfully Verified: 0
Number of warnings: 0
Number of errors: 1
It's possible that there's something weird about our exe that caused this to fail. Someone else reported a similar problem but then later claimed it was due to a corrupted exe. In any case, not being particularly wedded to, or happy with, mono and signcode at this point I tried osslsigncode, which worked fine and produced a valid signature.
sudo apt-get install libcurl4-openssl-dev
./configure
make
sudo make install
osslsigncode sign -certs authenticode.spc -key authenticode.pvk -n "MyApp" -t http://timestamp.verisign.com/scripts/timstamp.dll -in my.exe -out my_signed.exe
Update: After coming across this mozilla post, I suspect my problem with mono's signcode was that signcode may not support 64 bit, but I didn't go back to check.

Monday, October 6, 2014

Python: add an element to an array only if a filter function returns True

This post was going to be about a fairly obscure feature of Python I found, but is now a minor cautionary tale about trying to be too clever :)

I was looking for an elegant solution to the problem of appending elements to a list only if a filter function returns true. The long (and in retrospect much more readable) way to write this is something like:
filter_result = self.FilterFunction(response)
if filter_result:
  processed_responses.append(filter_result)
There is in fact a one-liner that can do this for you, but since its fairly obscure it makes the code much harder to understand.
processed_responses += filter(None, [self.FilterFunction(response)])
This works because when the first argument to filter is None, the effect is to remove all items from the sequence that evaluate to False. In this case that means if self.FilterFunction is False you get an empty array, and appending the empty array has no effect on processed_responses. If it's True, you append a single element.

Obvious huh?

Mocking out python OS specific imports

Testing python code that needs to run on a different OS is painful. A major part of the difficulty is that even though you can (somewhat) easily mock out the API calls used, you can't import the code successfully because the modules only exist on the target OS. Lets take an example of code that imports and calls functions from the win32api module. How do you mock out the import so you can test it on linux? I know of two main approaches.

One is the proxy module. Basically you define a module to hide all of the OS-specific imports behind, and do a conditional import in that module. So instead of having code like this:
import win32api
win32api.GetLogicalDriveStrings()
you do
import windows_imports
windows_imports.win32api.GetLogicalDriveStrings()
and then in windows_imports/__init__.py:
import platform

if platform.system() == "Windows":
  import win32api
  import winerror
  import wmi
Then inside your tests you need to create stubs to replace your API calls, e.g. for windows_imports.win32api.GetLogicalDriveStrings. Theoretically this should be fairly straightforward, but when I started down this path it got fairly complicated and I struggled to make it work. In the end I gave up and settled on the second approach, as below.

The second approach, described here, is to delay the import of the OS specific code in your tests until after you modify sys.modules to stub out all the OS-specific modules. This has the distinct advantage of leaving your production code untouched, and having all the complexity in your test code. Using the mock library makes this much easier. Below is an example of mocking out a WMI call made from python.
import mock
import test_fixture
import unittest

class WindowsTests(unittest.TestCase):

  def setUp(self):
    self.wmimock = mock.MagicMock()
    self.win32com = mock.MagicMock()
    self.win32com.client = mock.MagicMock()
    modules = {
        "_winreg": mock.MagicMock(),
        "pythoncom": mock.MagicMock(),
        "pywintypes": mock.MagicMock(),
        "win32api": mock.MagicMock(),
        "win32com": self.win32com,
        "win32com.client": self.win32com.client,
        "win32file": mock.MagicMock(),
        "win32service": mock.MagicMock(),
        "win32serviceutil": mock.MagicMock(),
        "winerror": mock.MagicMock(),
        "wmi": self.wmimock
        }

    self.module_patcher = mock.patch.dict("sys.modules", modules)
    self.module_patcher.start()

    # Now we're ready to do the import
    from myrepo.actions import windows
    self.windows = windows

  def tearDown(self):
    self.module_patcher.stop()

  def testEnumerateInterfaces(self):

    # Stub out wmi.WMI().Win32_NetworkAdapterConfiguration(IPEnabled=1)
    wmi_object = self.wmimock.WMI.return_value
    wmi_object.Win32_NetworkAdapterConfiguration.return_value = [
        test_fixture.WMIWin32NetworkAdapterConfigurationMockResults()]

    enumif = self.windows.EnumerateInterfaces()
    interface_dict_list = list(enumif.RunWMIQuery())

HOWTO change the extension of lots of files while keeping the original filename

With rename you get all the power of perl regex substitution in a simple interface. e.g. to backup all the .yaml files in a directory you could use:
rename --no-act 's/(.*)\.yaml$/$1.yaml.bak/' *.yaml
Remove the --no-act to actually make the changes.

The next logical progression is to want to do this more than once, so adding a timestamp to the backup is desirable, but I couldn't think of a way to make rename do this (you can't put backticks in the regex for instance). So here's a workaround
find . -name *.yaml -exec mv {} {}.bak.`date +%Y-%m-%dT%H:%M:%S%z` \;

Friday, August 22, 2014

Powershell and WMI cheatsheet

I finally bit the bullet and spent a bit of time learning some powershell and WMI querying basics. Unfortunately lots of the tutorials and intros out there are missing key pieces that make it usable, like aliases. So here's my cheatsheet as an attempt to remedy that problem.

Listing Classes and Namespaces

First of all, figuring out what information is available is not that easy. It's in classes inside nested namespaces. There's good online documentation, but I'd prefer to get it from the tool itself where I can see the values on a live system. So, lets start with listing all the top-level WMI Namespaces.
PS C:\> Get-WmiObject -class __Namespace -namespace root | Format-Table name

name
----
subscription
DEFAULT
cimv2
Cli

[snip]
Note that Namespaces can be nested, the following ones sit below root\cimv2 which is the default namespace for Get-WmiObject if you don't specify one. If you really want to see everything you're going to need to list them recursively.
PS C:\> Get-WmiObject -class __Namespace -namespace root\cimv2 | Format-Table name

name
----
Security
sms
power
ms_409
TerminalServices
Applications
To list all classes within a namespace (see here for a high-level description of the naming scheme for classes):
PS C:\> Get-WmiObject -List -namespace root\cimv2 | Select name
All classes with 'Net' in the name:
PS C:\> Get-WmiObject -List | Where-Object { $_.name -match 'net'} | Select Name

Retrieving Object Values

Retrieve the class values:
PS C:\> Get-WmiObject -class Win32_NetworkAdapterConfiguration

DHCPEnabled      : True
IPAddress        : {192.168.0.10, abcd::eff:fff:aaaa:abdc}
DefaultIPGateway : {192.168.0.1, abce::eff:feff:aaaa:265}
DNSDomain        : blah.com
ServiceName      : 
Description      : Intel(R) Gigabit Network Connection
Index            : 7

Aliases and Functions

Sick of typing 'Get-WmiObject' yet? Thankfully 'gwmi' is an alias for Get-WmiObject:
PS C:\> gwmi -class Win32_NetworkAdapterConfiguration
The list of built-in aliases can be retrieved with:
PS C:\> get-alias

CommandType     Name
-----------     ----
Alias           % -> ForEach-Object
Alias           ? -> Where-Object
Alias           ac -> Add-Content
Alias           asnp -> Add-PSSnapin
Alias           cat -> Get-Content
Alias           cd -> Set-Location
[snip]
Microsoft has some good doco on aliases here. You can define your own aliases for cmdlets. If you want to do something more complicated you'll need a function:
PS C:\> Function NetIFs {gwmi -class Win32_NetworkAdapter}
PS C:\> Set-Alias nif NetIFs

SQL-like querying, where "select *" doesn't mean everything

To query inside the class you can do this:
gwmi -query "Select * from Win32_NetworkAdapterConfiguration where IPEnabled=1"
There is a gotcha here however. The WMI query will return all the properties of the object, but powershell won't display them all. So you will be missing most of the properties and values listed in the documentation. To see all object properties and values, you can use the Select-Object cmdlet (alias select):
gwmi -query "Select * from Win32_NetworkAdapterConfiguration where IPEnabled=1" | select *
For advanced powershell examples, there's lots of good stuff on tasteofpowershell.blogspot.com.

Friday, May 30, 2014

Cross platform python method for calculating disk free space

I wanted a cross platform (Mac/Win/Linux) method for calculating disk free space. For my application ideally I'd like to: read it out of a file (or registry key) somewhere, failing that have a python library do it for me, and failing that shell out to something and parse the output. I couldn't find the free space in any files (including /proc), registry keys, or anywhere else file-like. So like this stackoverflow thread, I came to the conclusion that WMI for windows and os.statvfs for Mac+Linux were the best options.

First Windows, it's fairly straightforward. The MS doco for this WMI call is here, and it also explains the DriveType codes.
PS > $query = "select * from win32_logicaldisk"

PS > Get-WmiObject -Query $query


DeviceID     : C:
DriveType    : 3
ProviderName : 
FreeSpace    : 190249115648
Size         : 249690058752
VolumeName   : 

DeviceID     : Z:
DriveType    : 4
ProviderName : \\share\homedir\username
FreeSpace    : 15784280064
Size         : 26843545600
VolumeName   : nethomes$
Those numbers are bytes, so windows is pretty easy. On to the mess that is statvfs. First, the statfs man page gives a little history:
The original Linux statfs() and fstatfs() system calls were not designed with extremely large file sizes in mind. Subsequently, Linux 2.6 added new statfs64() and fstatfs64() system calls that employ a new structure, statfs64. The new structure contains the same fields as the original statfs structure, but the sizes of various fields are increased, to accommodate large file sizes. The glibc statfs() and fstatfs() wrapper functions transparently deal with the kernel differences. Some systems only have , other systems also have , where the former includes the latter. So it seems including the former is the best choice. LSB has deprecated the library calls statfs() and fstatfs() and tells us to use statvfs(2) and fstatvfs(2) instead.
Sounds like we should use statvfs and python has a os.statvfs so we should be good. Don't get fooled by this nasty deprecation notice, it's referring the the statvfs module which just defined a few constants. That's deprecated, but the os.statvfs function is alive and well in recent Python versions.

But wait, there's chatter about statvfs being dangerous on glibc systems and the df code said not to use it at some stage. Basically if you have a network filesystem listed in /proc/mounts and it is unreachable (e.g. because there is no network), statvfs will hang on stat'ing the network directory, even if you called statvfs on a completely different directory. df works around this by continuing to use statfs on glibc systems. I tested this with strace and it's true on my Ubuntu linux machine:
$ strace df
[snip]
statfs("/usr/local/home/user", {f_type=0x65735546, f_bsize=4096, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid={0, 0}, f_namelen=1024, f_frsize=4096}) = 0
statfs("/nethome/user", {f_type="NFS_SUPER_MAGIC", f_bsize=8192, f_blocks=367001600, f_bfree=159547821, f_bavail=159547821, f_files=31876689, f_ffree=12707362, f_fsid={0, 0}, f_namelen=255, f_frsize=8192}) = 0
[snip]
We can see that python os.statvfs is doing the same (and so is "stat -f"). So we should be safe using python's os.statvfs.
$ strace python -c "import os;os.statvfs('/')" 2>&1 | grep statfs
statfs("/", {f_type="EXT2_SUPER_MAGIC", f_bsize=4096, f_blocks=9743394, f_bfree=5606442, f_bavail=5118199, f_files=2441216, f_ffree=2066663, f_fsid={1820746783, 1207614867}, f_namelen=255, f_frsize=4096}) = 0

# No statvfs calls
$ strace python -c "import os;os.statvfs('/')" 2>&1 | grep statvfs
execve("/usr/bin/python", ["python", "-c", "import os;os.statvfs('/')"], [/* 56 vars */]) = 0

# stat -f does the same
$ strace stat -f / 2>&1 | grep statfs
statfs("/", {f_type="EXT2_SUPER_MAGIC", f_bsize=4096, f_blocks=9743394, f_bfree=5606442, f_bavail=5118199, f_files=2441216, f_ffree=2066663, f_fsid={1820746783, 1207614867}, f_namelen=255, f_frsize=4096}) = 0
The next question is, how do you actually calculate the free space in bytes? Starting with: what is the block size? f_bsize is the "Preferred file system block size" and f_frsize is the "Fundamental file system block size" according to the python doco, and if you read the statfs man page it says "optimal transfer block size" and "fragment size (since Linux 2.6)" respectively. Confusing much?

On my linux machine they are the same:
In [6]: import os

In [7]: st = os.statvfs("/")

In [8]: st.f_bsize
Out[8]: 4096

In [9]: st.f_frsize
Out[9]: 4096

In [10]: !stat -f -c "Block size (for faster transfers): %s, Fundamental block size (for block counts): %S" /
Block size (for faster transfers): 4096, Fundamental block size (for block counts): 4096
On OS X they are not:
In [1]: import os

In [2]: st = os.statvfs("/")

In [3]: st.f_bsize
Out[3]: 1048576

In [4]: st.f_frsize
Out[4]: 4096
So on OS X f_bsize is 1MB, but that isn't actually the block size used by the filesystem, so using f_frsize looks like the best option for both platforms. The remaining sticking point is that pre-2.6-kernel linux machines don't have f_frsize, so we should check if it is zero and use f_bsize instead in that case.

OK so we have a blocksize, but what free size should we use? f_bfree is "free blocks in fs" and f_bavail is "free blocks available to unprivileged user". These can actually be quite different, e.g. mkfs.ext3 reserves 5% of the filesystem blocks for the super-user by default. Which one you care about probably depends on why you are measuring free disk space. In my case I chose f_bavail, (which is also what df reports).

The final product:
In [16]: def PrintFree(path):
   ....:     st = os.statvfs(path)
   ....:     if st.f_frsize:
   ....:         print "Free bytes: %s" % (st.f_frsize * st.f_bavail) 
   ....:     else:
   ....:         print "Free bytes: %s" % (st.f_bsize * st.f_bavail)
   ....:         

In [17]: PrintFree("/")
Free bytes: 127470809088

In [18]: !df -B 1
Filesystem                1B-blocks        Used    Available Use% Mounted on
/dev/sda1              153117560832 17845137408 127470809088  13% /

Wednesday, May 28, 2014

Mach-O filetype identification

I wanted to write a quick and dirty file-type identifier for Mach-O, turns out this is more tricky than I expected. From /usr/share/file/magic/mach:
# $File: mach,v 1.9 2009/09/19 16:28:10 christos Exp $
# Mach has two magic numbers, 0xcafebabe and 0xfeedface.
# Unfortunately the first, cafebabe, is shared with
# Java ByteCode, so they are both handled in the file "cafebabe".
# The "feedface" ones are handled herein.
and from /usr/share/file/magic/cafebabe:
# Since Java bytecode and Mach-O universal binaries have the same magic number, the test
# must be performed in the same "magic" sequence to get both right.  The long
# at offset 4 in a mach-O universal binary tells the number of architectures; the short at
# offset 4 in a Java bytecode file is the JVM minor version and the
# short at offset 6 is the JVM major version.  Since there are only 
# only 18 labeled Mach-O architectures at current, and the first released 
# Java class format was version 43.0, we can safely choose any number
# between 18 and 39 to test the number of architectures against
# (and use as a hack). Let's not use 18, because the Mach-O people
# might add another one or two as time goes by...
GAAAH! Unsurprisingly more than one engineer wanted to use the cutesy "cafebabe" for their magic string. I ended up using this regex, which will also match Java bytecode, but was good enough for my purpose:
^(cffaedfe|cefaedfe|feedface|feedfacf|cafebabe)
The full Mach-O filetype doco is here. The various magic byte strings are as follows:
  • cefaedfe: Mach-O Little Endian (32-bit)
  • cffaedfe: Mach-O Little Endian (64-bit)
  • feedface: Mach-O Big Endian (32-bit)
  • feedfacf: Mach-O Big Endian (64-bit)
  • cafebabe: Universal Binary Big Endian. These fat binaries are archives that can include binaries for multiple architectures, but typically contain PowerPC and Intel x86.

Bash man page colours

There are many pages out there describing how to get coloured bash man pages. Tuxarena has one of the better ones that tries to explain what's going on, but unfortunately it is somewhat of a black art due to the obscure colour codes used. Here's the snippet from my bash_aliases that I use:
man() {
    env LESS_TERMCAP_mb=$'\E[01;31m' \
    LESS_TERMCAP_md=$'\E[01;38;5;74m' \
    LESS_TERMCAP_me=$'\E[0m' \
    LESS_TERMCAP_se=$'\E[0m' \
    LESS_TERMCAP_so=$'\E[01;41;33m' \
    LESS_TERMCAP_ue=$'\E[0m' \
    LESS_TERMCAP_us=$'\E[04;38;5;146m' \
    man "$@"
}
I only found one site that actually documented the color options available, and he basically had to reverse engineer it. I'll include the color codes below, since everyone likely has their own personal preference and wants to tweak things slightly.
0   = default colour
1   = bold
4   = underlined
5   = flashing text
7   = reverse field
31  = red
32  = green
33  = orange
34  = blue
35  = purple
36  = cyan
37  = grey
40  = black background
41  = red background
42  = green background
43  = orange background
44  = blue background
45  = purple background
46  = cyan background
47  = grey background
90  = dark grey
91  = light red
92  = light green
93  = yellow
94  = light blue
95  = light purple
96  = turquoise
100 = dark grey background
101 = light red background
102 = light green background
103 = yellow background
104 = light blue background
105 = light purple background
106 = turquoise background

Wednesday, May 14, 2014

Python Mix-ins

Python mix-ins are a handy way to augment functionality of a class.  LinuxJournal has a good detailed article about them.  To mix in a class dynamically you just need to modify the __bases__ class attribute:

class Base(object):pass
class BaseClass(Base):pass
class MixInClass(object):pass

BaseClass.__bases__ += (MixInClass,)

Note that you seem to only be able to do thsi if the base class doesn't inherit directly from object, hence the extra "Base" class above. This is what the failure looks like:
In [10]: class BaseClass(object):pass

In [11]: class MixInClass(object):pass

In [12]: BaseClass.__bases__ += (MixInClass,)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
 in ()
----> 1 BaseClass.__bases__ += (MixInClass,)

TypeError: Cannot create a consistent method resolution
order (MRO) for bases MixInClass, object
Also note that unless both classes call super in their __init__ you will have problems with initialization. It is also possible to mix in a class to an object instance, instead of a class, as mentioned on Stack Overflow like this:
obj = BaseClass()
obj.__class__ = type('MixInClass',(BaseClass,MixInClass),{})

Monday, May 5, 2014

Don't use == for comparing secrets

TIL: You shouldn't use == to compare HMACs, or anything sensitive really. Doing so creates a timing side channel that can reveal the secret to an attacker. Instead you need to use a comparison function that takes a constant amount of time for all values, not matter how similar they are to the actual HMAC. The python example given in the article is:

def is_equal(a, b):
  if len(a) != len(b):
      return False

  result = 0
  for x, y in zip(a, b):
      result |= x ^ y
  return result == 0
This function is available in python 3.3+ as:
hmac.compare_digest(a, b)

Tuesday, February 11, 2014

Antennas for free-to-air broadcast TV in the US (i.e. how to watch the olympics for free)

Watching the winter and summer olympics is literally the only reason I have to watch TV these days, and since the content is so locked up in media deals, buying a cable TV package seems to be the only option to see it in the US.  Not so.

NBC (which carries the olympics) is broadcast free to air in HD, along with a few other major channels and a pile of crap that isn't fit for human consumption (like 24hr home shopping).  Getting at this free content is basically a case of picking the right antenna.

1. Check available channels

Put your zipcode into the station direction finder at antennaweb.org to find out what channels are available in your area and where the TV transmitters are.

2. Buy an antenna

For best results you should probably use a high-gain outdoors antenna.  But if you are renting, or want to do it on the cheap, you might want to check out Lifehacker's list of the best indoor antennas. In the SF bay area I can tune 82 channels, including NBC aka KNTV, with the un-amplified Mohu leaf which cost me $40.  The leaf also happens to be the top recommendation from Lifehacker.

I have it mounted indoors about 9ft up the wall and I'm receiving all the channels listed by antennaweb.  I even get KTLN, which antennaweb says is being broadcast from 57 miles away!  The antennaweb recommended antenna to receive at that kind of distance is a large directional antenna with pre-amp, but the Mohu leaf is killing it.  Amazing.

3. Tune channels

Follow your TV's instructions for tuning channels.  You may have to move your antenna around to find the best reception.