Sunday, February 28, 2010

Testing JSON django fixtures

My JSON testing fixture for django was bad, and django gave a really unhelpful error:
ValueError: No JSON object could be decoded

A line number would have been nice, jeez. To find out what the actual problem was I used the json module in ipython:

import json
a=open("testing.json",'r').read()
json.loads(a)

Which will either work, or give you a line number for the problem :)

Saturday, February 27, 2010

Accessing Australian public transport data and Google's General Transit Feed Specification (GTFS)

I've used some pretty ordinary public transport websites. To do something about the problem, I recently embarked on a mission to create a google maps overlay. The idea was to show a particular public transport provider how their information delivery could be improved. In the process I found out lots of interesting stuff about public transport information, so I thought I'd record that first.

The San Francisco Bay Area Rapid Transit (BART) website is probably the best example of delivering useful information to commuters. They have a public, open, real-time API for developers that has spawned some great tools such as iPhone apps that display real-time arrival times for your favourite stations, provide route-planning and much more.

BART also publishes a General Transit Feed Specification (GTFS) zip file, which is a format developed by Google that allows you to provide amazing public transport functionality through google maps. Whenever you plan a route with google maps you can click on 'by public transit' and it will give you a selection of routes, times, and turn-by-turn directions for any walking required. See this random example trip I made in San Francisco.

All this is well and good for the US, but what about Australia? I started looking around, and I was amazed by the TransPerth website.


View Larger Map

Not only have they published a GTFS zip file so you can view stops and do fantastic trip planning via google maps, they also have a mobile website with real-time arrival times for bus stops and train stations, live timetables, and a 'stops near you' search on their website. The trip planner on the website is also the best I have ever used, and rivals that of google maps (they may just use a maps api...). Not surprisingly there is a Perth iPhone app, which seems to be much more feature-complete than the other capital cities, due to the data provided by TransPerth, which is infinitely preferable to screen-scraping. Of all the cities currently providing GTFS files, sadly Perth is the only Australian participating.

So, step 1 in giving commuters access to better public transport information is to publish a GTFS file. This at least means you have a database of all your stop coordinates, route names, timetable info etc. Updating the GTFS file is going to be easiest if you just use the GTFS format as your schema, it looks fairly sensible. I imagine there are lots of operators with giant excel spreadsheets holding route information, so it is probably already a step up. Next, take transperth as an example - they are kicking some goals.

PS. I was thinking - a cheap way to do GPS positioning for buses would be to stick an android phone in each one and hook it up to google latitude.

Using ssh forced-command with rsync for backups

I want to use ssh forced command to limit a backup user to just running rsync. The idea is to allow backups to be deposited without granting full shell access. The trickiest part of the problem is figuring out what command rsync will run on the server. The rsync man page gives a clue with this cryptic statement:
--server and --sender are used internally by rsync, and should never be typed by a user under normal circumstances. Some awareness of these options may be needed in certain scenarios, such as when setting up a login that can only run an rsync command. For instance, the support directory of the rsync distribution has an example script named rrsync (for restricted rsync) that can be used with a restricted ssh login.

As an aside, rrsync is a perl script that parses SSH_ORIGINAL_COMMAND and provides a way to limit rsync to certain directories. This is not a bad idea, but I always want to run the same command, so it is over-kill.

I found an insight into the --server option, which solved the mystery of what command rsync runs. Just run your regular command with '-v -v -n', and rsync will tell you. Neat!

rsync -rtz -v -v -n /home/mw/src backup@host:/home/backup/
opening connection using: ssh -l backup host rsync --server -vvntrze.iLs . /home/backup/

The actual command I will run uses one less 'v' and ditches the dry-run 'n'. So now my SSH forced command in ~/.ssh/authorized_keys looks like this:

command="rsync --server -vtrze.iLs . /home/backup/",no-pty,no-agent-forwarding,no-port-forwarding,no-X11-forwarding ssh-dss AAAA....

Chuck it in a cron, and we are done.

Thursday, February 25, 2010

HOWTO set a proxy for apt updates with a local mirror and problems with http_proxy overriding apt settings

This has annoyed me for some time now. The http_proxy environment variable in Ubuntu overrides the APT proxy settings, despite this apparently being fixed in Debian in Jan 2009. The apt configuration is more specific, and should win out over the environment variable. I'll explain why this is a problem.

Here is how it is supposed to work.

The simplest case is to set a proxy for apt by editing "/etc/apt/apt.conf" and adding this line:

Acquire::http::Proxy "http://proxy.mydom:3128";

The problems start if you have a local mirror - I do this to save bandwidth due to a large number of ubuntu installs on the network. For this config, remove any proxy lines from /etc/apt/apt.conf and create /etc/apt/apt.conf.d/30proxy:

Acquire
{
http {
Proxy "http://proxy.example.com:8080/";
Proxy::ubuntu.mydom.com "DIRECT";
}
}

With the http_proxy environment variable unset this works fine, until you go to install something like flashplugin-nonfree, which downloads a tarball from adobe. Apt completely ignores your proxy configuration and tries to download it directly:

Connecting to archive.canonical.com|91.189.88.33|:80

Which obviously doesn't work. You can set the http_proxy environment variable, but then apt won't work because it sends everything through the proxy, and the local mirror settings (ubuntu.mydom.com) you have in /etc/apt/sources.list can't go through the proxy (and shouldn't). That's what the "DIRECT" above is supposed to do.

The only way to actually make this work is described by Troy. You need to set the no_proxy environment variable:

export no_proxy="ubuntu.mydom.com"

Then make sure it actually gets kept by sudo. First get the list of var's sudo is currently preserving (look at those under "Environment variables to preserve"):

sudo sudo -V

Change /etc/sudoers with 'sudo visudo' and add:

Defaults env_keep="no_proxy http_proxy https_proxy ftp_proxy XAUTHORIZATION XAUTHORITY TZ PS2
PS1 PATH MAIL LS_COLORS KRB5CCNAME HOSTNAME HOME DISPLAY COLORS"

Check that it got kept:

sudo printenv | grep no_proxy

Chuck no_proxy and http_proxy in ~/.bashrc and you are good to go. Simple, right?

Wednesday, February 24, 2010

Copying a compressed disk image across the network using netcat and dd

Here is a handy command to copy a disk image (such as a VM) across the network compressed. We also do a SHA1 hash to make sure it copied correctly. The idea is to only read and write the data once, to make it as quick as possible.

On the box you are copying the disk from:

mkfifo /tmp/disk.dat; sha1sum /tmp/disk.dat & dd bs=256k if=mydisk.dd | tee /tmp/disk.dat | gzip -1 | nc -q 2 10.1.1.1 8181

On the box you are copying the disk to:

nc -l 8181 | gunzip | tee image.dd | sha1sum | tee image.dd.sha1
The quick and dirty version with no hash checking is below. Note if your source is OS X you want -w instead of -q in the netcat command. I've used this with two macs connected via thunderbolt/firewire, one in TDM, and one sending the image to a linux box:
dd bs=256k if=mydisk.dd | gzip -1 | nc -q 2 10.1.1.1 8181
nc -l 8181 | gunzip > image.dd

Tuesday, February 23, 2010

Django user profiles

While the Django doco is pretty good, it is a bit light-on for user profiles. User profiles are for when you want to extend the information stored per user on top of the Django defaults (firstname, lastname, email etc.). There is a good blog post that fills in the gaps, although be sure to read the comments, because there is a gotcha and a work around. Basically this is what you want:

from site.app.models import UserProfile
from django.contrib.auth.admin import UserAdmin as RealUserAdmin

class UserProfileInline(admin.StackedInline):
model = UserProfile

class UserAdmin(RealUserAdmin):
inlines = [ UserProfileInline ]

admin.site.unregister(User)
admin.site.register(User,UserAdmin)

Time offsets in python

I always forget how to do time offsets (i.e. current time - 20 seconds) in python. Here's how:

from datetime import datetime,timedelta
datetime.now() - timedelta(seconds=20)

Sunday, February 21, 2010

Installing django and postgres on ubuntu

To install django with postgres on ubuntu:

sudo apt-get install python-django postgresql python-psycopg2

django-admin.py startproject mysite

Edit settings.py:

DATABASE_ENGINE = 'postgresql_psycopg2'
DATABASE_NAME = 'blahdb'
DATABASE_USER = 'blah'
DATABASE_PASSWORD = 'blah'
DATABASE_HOST = 'localhost'
DATABASE_PORT = ''

Use psql to create the user and database, granting all privs on the database. If you want to use django testing, your user also needs to be able to create a database. Use this syntax:

alter user django createdb;
\du django


Then syncdb and startapp.

Thursday, February 4, 2010

dpkg basics - listing installed packages etc.

Here are some basic dpkg operations that come in handy.

List all installed packages:
dpkg -l

List the files that a package installs:
dpkg -L [packagename]

Find out which package a particular file was installed by:
dpkg --search [filename]

Tuesday, February 2, 2010

HOWTO allow multiple users write access to a directory without changing umask

I have often run into the problem of giving multiple users write access to a code repo. The main problem is what permissions are set on files which are added in new commits. The default umask is 022, so you get directories as 755 and files as 644, which obviously doesn't work.

The solution I have used in the past is to change the umask in /etc/profile and /etc/login.defs to 002. You have to do both, otherwise files added via ssh and other means don't get the right mask. The disadvantage is that now all files get created as 775,664, when you only really need it for one directory. There is a better way, enter filesystem acls.

First, change your /etc/fstab to include the 'acl' option for the mount point where your repo resides:

/dev/sda1 / ext3 defaults,acl 0 0

Do some of the regular prep to make sure you files are owned right, and dirs have the sticky bit set.

chown -R user:group /code
chown -R g+w /code
find /code -type d -exec chmod g+s {} \;

Use setfacl to set the default acls for new files and directories:

setfacl -R -m d:u::rwx,d:g::rwx,d:o:r-x /code

And check the result with 'getfacl'. Also when you use 'ls', you should see a '+' at the end of the usual permissions string that indicates there are more acls:

drwxrwsr-x+

Possibly the stupidest IT security comment I have ever read

From SANS news bites:
TOP OF THE NEWS
--High Stakes in Covert Cyber War
(January 26, 2010)
Christian Science Monitor Editor John Yemma points out that the recently disclosed long term cyber attacks against US oil companies could result in "lost jobs and higher energy prices." The attackers infiltrated the companies' networks and remained inside, quietly stealing valuable bid data, which could allow them to make bids on potentially valuable oil and gas tracts without having to invest the considerable research funds spent by the targeted companies. Evidence suggests that the attacks originated in China.
http://www.csmonitor.com/Commentary/editors-blog/2010/0126/Why-the-China-virus-hack-at-US-energy-companies-is-worrisome
(Northcutt): One sensible approach is pretty simple. We make people stand in long lines to clear customs, let's do the same thing for packets. Now before you flame me for being an idiot, I am not suggesting all packets; let's start with SMTP. If a mail message comes from a known site or country that is a major source of malicious traffic, or has a link back to such a place, force it through a series of gateways. Who pays for this? The entity that wants to deal with the US. We can call it a packet visa. Counterpoint 1: "It will never work because there are a million pathways between here and there." Ah, very true, but there are a finite number of targets, US Government including DoD, the industrial defense contractors, Fortune 500 companies, critical infrastructure, and resource brokers such as oil companies. It is the old 80/20 rule. I am betting a guy like Tom Liston can write the code in an afternoon, though it will take some DHS contractor sixty people to maintain and improve it.]


Northcutt, wtf? Does having long lines at Customs actually make your border more secure, or just slower? Presumably the security is in the checking that happens when you get to the counter, or beforehand when you book the flight. How does having a line make you more secure?

So what you would like to do is purposefully implement a DOS on SMTP? If you are so sure the sources are malicious, why not just block them instead of delivering the mail slowly? If you aren't sure enough to block them you are probably DOSing legitimate email. And what difference does it make to the attacker if the email is delivered slowly? The attack is still delivered.

I could go on, but I think this definitely wins the prize for stupidest IT security comment. I'll let you know when I read something worse.