Casualties from Nuking Tokyo Bay

I’ve never delved deeply into the history of the motivation for dropping the atomic bombs over Hiroshima and Nagasaki, but a recent article on the topic by Quartz caught my attention. The Quartz article argued that because of the catastrophic civilian consequences of the bombings, they were an atrocity (and likely war crime). Furthermore, this civilian cost was fundamentally needless, as the primary purpose of the bombings was to demonstrate that the United States had achieved a decisive strategic advantage which doomed the Japanese defense and made immediate, unconditional surrender the strictly dominant strategy. In order to perform this demonstration, the US selected inefficient targets–they accumulated higher civilian costs with weaker signaling than an obvious alternative target: Tokyo bay.

While I don’t have the historical background to argue meaningfully about the effects of differing signaling approaches on the Japanese Government, I do know that testing the assertion about civilian casualties is rather straightforward: just simulate the bombs using NukeMap.

If we drop Little Boy on Tokyo Bay, what will the detonation look like?

Tokyo

So, the fatal effects of the explosion wouldn’t even be remotely close to the densely-populated areas around the bay. Interesting, and generally in sync with the claim made in the Quartz article.  This neglects a critically important consideration: Fallout.

TokyoFalloutNukemap describes the Color Gradations on the Fallout pattern as follows:

Yellow: Fallout contour for 1 rads per hour:
  • Maximum downwind cloud distance: 112 km
  • Maximum width: 10.8 km
  • Approximate area affected: 1,240 km²
 Orange: Fallout contour for 10 rads per hour:
  • Maximum downwind cloud distance: 72.4 km
  • Maximum width: 6.53 km
  • Approximate area affected: 564 km²
Red: Fallout contour for 100 rads per hour:
  • Maximum downwind cloud distance: 32.5 km
  • Maximum width: 2.3 km
  • Approximate area affected: 148 km²

Now, this is actually a sizable problem. The Japanese population density through this area is quite high (estimated in 2009 to be in excess of 1000 people per square kilometer). It’s not clear from the NukeMap what kind of exposure to the cloud is likely fatal, but Wikipedia puts the LD50 at around 250 Rads. Let’s put that on the map:

TokyoFallout2

Now, 250 Rads is the Red section. Note how it is almost entirely over water. The 100 Rad section is approximately 2/3 over land, so let’s estimate casualties based on this. Let’s say it constitutes approximately 100 populated square km, each inhabited by about 1000 people. This simulation assumes a breeze of about 24 kph, meaning that the cloud can disperse 100 Rad doses over the course of \(\frac{32.5k}{24kph}\approx80min\).

Assuming a uniform population distribution of 1000 people per square km and 24 kilometer per hour north-easterly wind, the cloud could expose 100,000 people to radiation, though very few will experience lethal doses. Accordingly, this component of the Quartz piece checks out: nuking Tokyo Bay would have provided an awesome display of power, visible from the tactical intimacy of Japan’s own capital, with orders of magnitude fewer civilian casualties.

Hadleyverse: A Pointless R Package

I recently wrote a silly R package as a practical joke.  It’s called “Hadleyverse.” It just attaches Hadley Wickham’s packages.

Installation

# install.packages("devtools")
library("devtools")
install_github("aaboyles/hadleyverse")

Use

library("hadleyverse") 
# All of the Hadleyverse is now available in your environment
# No need to call library("plyr"), etc!

detach("package:hadleyverse")
# All of the Hadleyverse has been removed from your environment again

What Happens

When you install an R Package, R checks the DESCRIPTION file for dependencies. If you have unmet dependencies, R tries to install them from CRAN. Then, whenever you load the package, R makes those dependencies available. This package just “depends on” everything Hadley Wickham has published to CRAN, despite the fact that it doesn’t do anything. Here are the packages it loads:

  1. plyr
  2. ggplot2
  3. dplyr
  4. tidyr
  5. readr
  6. haven
  7. lubridate
  8. stringr
  9. readxl
  10. devtools
  11. xml2
  12. testthat
  13. assertthat

Detaching this package automatically detaches all of its dependencies. So after you detach the Hadleyverse, you’ll have to attach the packages you need again.

How to build a Geospatial Event Database

It occurred to me recently that it would be useful and interesting to be able to perform arbitrary geospatial querying on event data. The typical use case for querying event data geospatially is to filter on the Country (which has been a field in every event dataset with which I’ve worked), but this isn’t really “geospatial”. A true geospatial querying system enables you to specify arbitrary geographic constraints.

So, for example, let’s say I’m working with Phoenix and I’m interested in events that occured within 50 miles of Atlanta, GA.  The easy way to do this is to filter events on LocationName.

#library("devtools")
#install_github("ahalterman/phoxy") #In case you don't have phoxy yet

library("dplyr")
library("phoxy")

download_phoenix("Phoenix")
events <- ingest_phoenix("Phoenix")

events %>% filter(LocationName=="Atlanta")

(Adapting the code for ICEWS uses a similar, albeit more memory-intensive process. That is left as an exercise for the reader.)

The problem with this approach is that it only gives you events that the geocoder identified as being in Atlanta.  Luckily we have Latitudes and Longitudes, which offer a bit more precision. Latitudes and Longitudes can be mapped to approximate distances directly by hand (although this approach gets increasingly inaccurate for Longitudes as you move away from the equator). To convert between a degree and miles, we can do a little dimensional analysis. There are 1.15 miles in a nautical mile, and 60 minutes in an equatorial degree, so,

\(\frac{50miles}{1}*\frac{1naut}{1.15miles}*\frac{1degree}{60nauts}\approx.725degrees\)

50 miles is approximately .725 equatorial degrees.  Now we can write a filter to check whether the latitude and longitude are within .725 degrees of Atlanta’s latitude and longitude (33.755, -84.39).

events %>% filter(abs(Lat-33.755)<.725, abs(Lon+84.39)<.725)

This gives you what’s called a bounding box.  If you plotted it on a map, it would look a little bit like this:

Atlanta’s 50-mile bounding box

The problem is that the bounding box contains more space than you’re actually interested in.  “Within 50 miles” implies a circle with its center at the center of Atlanta and a radius of 50 miles.  If you actually queried this, you’d get results from Gainesville, GA (Northeast corner of the box), even though Gainesville is actually further than 50 miles from Atlanta.  At this point, we could write a SQL function to calculate the Euclidean distance between an arbitrary Lat, Lon and Atlanta.  But we’re not going to do that, because A) a pretty good solution won’t assume the Earth is a flat, two dimensional plane, or even a sphere, but an ellipsoid. B) There are a ton of mistakes we could make on the way to getting a pretty good solution, and C) this is a solved problem.

In particular, the PostGIS extension to PostgreSQL represents the best-in-class collection of algorithms for solving geospatial querying problems of this variety.  So let’s set one up!  We could do it locally, but what’s the fun in that?  Plus, if you want to build an app to show off to the world, you really don’t want to host it off your laptop.  So let’s do it from Amazon’s Relational Database Service.

The first step is to set up your Amazon AWS account. Next, create a new database using Amazon RDS.  Select “PostgreSQL.”  At Step 2, if you want to do this with your AWS Free Tier account (which I enthusiastically recommend), select “No”.  Step 3 is a little more complicated, but should be manageable for the adventurous.  Set the DB Engine version to the latest available (presently 9.4.1), DB Instance Class to “db.t2.micro” (necessary for Free tier), Multi AZ deployment to “No” (again, necessary for Free Tier), and then enter an identifier, username, and password.

Screenshot from 2015-04-13 12:24:54

Click for full-size

You’re going to need to remember your credentials.  Be sure to copy those down somewhere.  The defaults for Step 4 are all fine, but you’ll need to give your database a name. Copy that down too.  At this point, you can click the big blue button at the bottom of the screen and watch the computer spin its loading spinner for a few minutes.  When it stops, your database will be provisioned and launched.  Just copy down your endpoint where you wrote down all your other credentials, and you’ll be set to go.

Now that we have launched our database, we need to enable the PostGIS extensions.  We can actually accomplish this inside of R.  To start, let’s create a file I’m calling “creds.R“. The reason I propose you save this file (as opposed to merely running it) is that we’ll use it in several places. Infosec Wisdom here: The less often you type in your security credentials while you’re developing, the fewer files you’ll end up having to keep secret when you share your code or deploy your app. This R script will only do two things: First, load up the RPostgreSQL package, and second, create a connection to your database in an object creatively named “con”. Copy and modify as appropriate:

library("RPostgreSQL")

con <- dbConnect(dbDriver("PostgreSQL"),
  host="your database's host",
  port="5432",
  user="your database's username",
  password="your database's password",
  dbname="your database's name")

Got that modified? Great. Save it to your R working directory. Now we can connect to the database with R.  From there, we can submit the (astonishingly few) queries needed to get PostGIS running:

source("creds.R")

dbGetQuery(con,"CREATE EXTENSION postgis;")
dbGetQuery(con,"CREATE EXTENSION postgis_topology;")
dbGetQuery(con,"CREATE EXTENSION fuzzystrmatch;")
dbGetQuery(con,"CREATE EXTENSION postgis_tiger_geocoder;")

The database is now setup and configured!  Now, to write the data into your database…

dbWriteTable(con,"phoenix",data.frame(events))

Notice how we wrap events in a data.frame. This is a necessity from a slight eccentricity of the RPostgreSQL package. Anyway, that will run for a little while, but once it returns, you’ll have all of your data in the database.  To see some of it,

dbGetQuery(con, "SELECT * FROM phoenix LIMIT 20;")

To actually use this data, we’re going to need to learn a little bit of PostGIS.  A full PostGIS tutorial is well outside of my intent, but the internet is full of helpful documentation and interesting how-tos.  To measure if a thing is within a given distance of another thing, there’s a PostGIS function called “ST_DWithin“.  I say “thing” because there are generally four types of things in Geospatial Data: points (generally useful), linestrings (good for roads), polygons (good for geopolitical areas), and multi* (good for anything that’s fragmented, like describing the borders of the US since it has non-contiguous land areas).  Events are simple: every event is a point.  So, we’ll also need to use a little PostGIS function to make a Point Object in the database. They’ve helpfully called it “ST_MakePoint.”  Here’s how we piece it all together:

dbGetQuery(con, 'SELECT * FROM phoenix WHERE ST_DWithin( ST_MakePoint(33.755, -84.39), ST_MakePoint("Lat", "Lon"), 0.725);')

And there you have it!  Phoenix presently contains approximately 400 events that happened within 50 miles of Atlanta. If you’ve never worked with Geospatial Querying before, you probably aren’t sold yet.  After all, it was sort of a long way around for this payoff.  This type of treatment for Political Event Data isn’t unprecedented.  In particular, Hammond and Weidmann put GDELT into a PostGIS database in order to tease out a Capitol-bias in the geocoder.  (They didn’t control for centroid bias, but that’s about two papers worth of criticism and methodology). If you want to really feel the power, I’m going to need a little time to dream an interface to demonstrate it.  In the meanwhile, take a look at the PostGIS documentation and let me know if you have any interesting applications in mind.

How to Set up a Shiny Server

I recently embarked upon the strange, complicated journey of building a personal Shiny server.  This is how I did it:

Step 1: Amazon Web Services Account

Sign up for one.  AWS has a generous free tier: you get enough computing time on EC2 to run a small server continuously every month for a year.  After your year is up, they’ll start charging you for the server, but it isn’t expensive. (For comparison, I’m running a t2-micro, which runs full-time about $9.50/month).

Step 2: Provision an EC2 Instance

Once you’ve set up your AWS account (complete with credit card information, uncomfortable as that may be), head over to the EC2 Launch Wizard.  Select the “Ubuntu Server 14.04 LTS (HVM), SSD Volume Type”.  Most of the default configuration settings will be good enough, so just click “Next” through steps 3, 4 and 5.  Step 6 (“Configure Security Groups”) is rather important, so let’s stop there a moment.  It will allow SSH connections by default, but this isn’t permissive enough.  We need to be able to connect using HTTP (so people who visit the server on the web will be able to see things).  Click “Add Rule” and select “HTTP” from the dropdown on the left.  If you want to install RStudio, do this one more time, only leave the left dropdown alone and put 8787 in the “Port Range” box, and “Anywhere” on the dropdown on the right.  Here’s what this configuration should look like:

Screenshot from 2015-03-23 12:26:11

Click “Review and Launch.”  Now, when you click “Launch” on the next screen, it doesn’t immediately launch your new virtual machine.  Instead, it brings up a VERY IMPORTANT dialogue.  Here’s what it looks like:

Screenshot from 2015-03-23 12:28:30

Be absolutely certain that you create a new Key Pair, name it, and download it somewhere you’ll be able to find it.  Once you’ve done that, you click “Launch Instances” and your instance will start booting up.  Go to the Instances Screen to see your server.

Step 3: SSH to your new Instance

From the instances screen, you can see a bunch of information about the server.  The only thing about which we’re interested is the Public IP, which should be in one of the rightwardly columns.  Copy that IP Address so you can paste it into the command below.

Note: If you’re a Windows user, the most typical approach to doing this usually involves PuTTY.

If you’re on an Apple or *nix system, just open a terminal emulator, cd to the directory where you downloaded that secret key, and enter this:

ssh -i RStudio.pem ubuntu@YOUR.PUBLIC.IP

…where YOUR.PUBLIC.IP is the IP address you copied from the beginning of this step. This should give you a welcome message and a command prompt controlling the server.

If you have any problems, you can right-click on your instance in the instances menu and select “Connect”, which will give you much more verbose instructions about how to do this.  Plus a handy Java Applet if you are overcome by despair!

Step 4: Install R

From your SSH connection, let’s install R.  Ubuntu provides a version of R, but they don’t keep it up to date, so we want to add a CRAN link to our sources list.

sudo echo 'deb http://cran.rstudio.com/bin/linux/ubuntu trusty/' >> /etc/apt/sources.list

Now, we need to add the security key for R, update the software list, and install R:

sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys E084DAB9
sudo apt-get update
sudo apt-get -y install r-cran

Step 5 (optional): Install RStudio

You don’t need RStudio to have or run Shiny Server, but it’s incredibly convenient to have both running on the same server.  That way, you have an R console that can edit the files in-place, and changes can be tested as quickly as you can change tabs and refresh.

RStudio maintains the step-by-step guide to installing RStudio, but here’s the short list of commands you need to run:

sudo apt-get -y install gdebi-core libapparmor1
wget http://download2.rstudio.org/rstudio-server-0.98.1103-amd64.deb
sudo gdebi rstudio-server-0.98.1103-amd64.deb

This gets RStudio on your system, but it doesn’t permit logins by default.  That’s because it uses a password for authentication, but the only eligible user on the system (ubuntu) is only configured to login by private key (remember that RStudio.pem file?).  To create a password for yourself:

sudo passwd ubuntu
> Enter new UNIX password: 
> Retype new UNIX password: 
> passwd: password updated successfully

Remember that password: whenever you go to your RStudio instance, you’ll need it to login. Which, by the way, you are now able to do.  Open your web browser and surf to http://YOUR.PUBLIC.IP:8787/  and enter username “ubuntu” and that password you just set.  You should now be looking at the glorious RStudio interface, right in your browser!  Take a moment to rejoice.

Step 6: Install Shiny Server

Once again, RStudio maintains the step-by-step guide to installing Shiny Server, but here’s the shortlist of requisite commands:

sudo su - -c "R -e \"install.packages(c('shiny','rmarkdown'), repos='http://cran.rstudio.com/')\""
sudo apt-get -y install gdebi-core
wget http://download3.rstudio.org/ubuntu-12.04/x86_64/shiny-server-1.3.0.403-amd64.deb
sudo gdebi shiny-server-1.3.0.403-amd64.deb

Step 7: Configure Shiny Server

Check out the Administrator’s guide for much more detailed instructions.  Here are the only two config changes I made:

Permissions

When it’s installed, Shiny server creates a new user on the system.  This is all well and good, but it means that you’re going to get weird permissions errors that you can only fix by prepending ‘sudo’ to most of your commands, which is generally bad practice.  Furthermore, R packages are local to users, meaning that any packages you install using RStudio (if you installed RStudio Server) won’t be accessible to your Shiny Applications.  Thus, to fix all this, run these commands:

sudo sed -i 's/run_as shiny;/run_as ubuntu;/g' /etc/shiny-server/shiny-server.conf
sudo chown -R ubuntu /srv/shiny-server/
sudo chgrp -R ubuntu /srv/shiny-server/

Port

When you type in a URL, it doesn’t usually contain a port number.  By default, HTTP uses port 80, so http://aaboyles.com is (usually) equivalent to http://aaboyles.com:80.  Shiny server, however, uses port 3838 by default.  If you want to give out an address without a port for the Shiny Apps you host, you should change the port number in the Shiny Configuration file.  To do that, run this command:

sudo sed -i 's/3838/80/g' /etc/shiny-server/shiny-server.conf
sudo shiny-server restart

You should be able to visit http://YOUR.PUBLIC.IP and see the Shiny Server landing page, complete with two demo apps.

Step 8: Load up your Shiny Apps

The easiest way to get Shiny apps is usually to clone them with git, so let’s install git, and cd into the server’s served directory:

sudo apt-get -y install git
cd /srv/shiny-server/

From here, you only need to git clone your shiny apps, and they’ll be publically available at http://YOUR.PUBLIC.IP/REPO.

Rejoice! Bask in the glory of your greatness! You have conquered Shiny Server!