James Bond Data Analysis

This Bloomberg article on James Bond is pretty much perfect. These are some of my favourite things: data analysis, data visualization, and cheese.

I like that the authors sat through all Bond films to classify such things as:

  • Time spent flirting
  • Number of double entendres
  • Aggregate time spent in tuxedos – with (or the morning after adventurous nights) without jacket

I also like that the method of data gathering and delineation of time in/out of tuxedos (or shirts) is precisely specified. Accuracy is important in maintaining significance and respectability.

Also important is the attention to detail in Bond’s romantic escapades, and kudos must go to the authors for including “making eyes” in this category – an easily missed, but crucial and subtle part of any Bond film worth its salt.

Scoldings from M, Q and anyone else alphabetically named are all scientifically categorised. Gadgets are carefully distinguished from plain, unsophisticated weapons.

Poker, Baccarat, and a kaleidoscope of cocktails do not escape the authors’ scrutiny, and of most importance is recognition of Bond’s surname first self introductions.

What I like most of all of is the deep, rewarding insights the authors have uncovered:

  • Sean Connery spent the most time in top drawer dress, followed by Daniel Craig
  • Connery was also the most shirtless of all bonds (possibly a reason to stick to the newer ones)
  • Daniel Craig managed to find something other than romance to do for over 95% of his movies. It must be hard resisting the temptations of being so tempting.
  • The safest Bond to hook up with is Timothy Dalton, the only Bond to have no love interests perish during their movies. Of course, Timothy Dalton may not be the most appealing Bond to choose from, and so perhaps some of the others are worth the risk.
  • If you want an Aston Martin to stay in good shape, give it to me, not James Bond. Please.
  • Brosnan’s Bond was the most prolific with puns and double entendres. Perhaps if he was more focused on delivering value, his movies would have been more appealing.

Particularly worth attention, I think, is the in-depth analysis of correlation between flirting, love interest and mortality for Bond’s female companions. While most Bonds manage to endanger their love interests to the point of fatality in at least some of their movies, Craig, Connery and Moore all manage to lose two love interests in the same movie at least once.

I love the timelines of each movie, which reveal the typically short timeframe Bond takes to recover from the loss of a loved one before moving onto the next opportunity:




If only all scientific method was so exhaustive. Most recommended, A+++, would buy again.

Read the original article here.

Service health monitoring with Reactive Extensions

When integrating independent services to build larger systems, it’s often important for services to keep track of the status of the other services that they depend on. Especially for a microservices approach, where services should expect their dependencies can be absent at any point in time. Services that cope with dependencies being unavailable make a less flaky and more resilient, hands off system.

Services can dynamically change their behaviour as the states of their dependencies change. If a dependency is offline, a service can decide to re-route work, buffer it until processing can resume, or explicitly reject requests. Doing this proactively by reacting to changes in dependencies’ statuses makes this much more fluid.

This post will look at a simple way of using Reactive Extensions to keep track of dependency status.

Continue reading

Solving GCHQ’s Christmas nonogram in 0.07 seconds

GCHQ throws down the gauntlet

A while back I found the GCHQ Director’s Christmas card, which came in the form of a nonogram. GCHQ has a history of puzzle setting and even hiring people through puzzles. The WWII codebreakers were hired through crosswords and other puzzles in the newspaper, which was featured in The Imitation Game.

I was new to nonograms, but quickly found out they’re a “paint by numbers” puzzle where hints for rows and columns give you series of segments (of varying lengths) to colour in. Applying all the clues together logically lets you work out whether each cell is filled in or empty progressively until you reach the final solution.

Typically you then have a badly pixelated picture of something and a sense of accomplishment. With GCHQ’s puzzle, you end up with a QR code that leads you to the next puzzle. So the picture is not very pretty and the sense of accomplishment is short lived. Here’s what GCHQ’s best Christmas wishes look like:


And I thought I was bad at Christmas cards.

Continue reading

Identifying Market Spoofing with Data Visualizations

I love data and data visualizations. They can give deep insight into problems and behaviours, and they can make you interested in something you previously thought dull.

I came across the article “How to Catch a Spoofer” from Bloomberg, by Matthew Leising, Mira Rojanasakul and Adam Pearce. The article gives a fascinating view into the trading activity on the Chicago futures exchange, and how to identify “spoofing” within trading activity. The visualizations take a combination of a difficult concept and large volume of data, and extract genuine, novel insights.

Continue reading

Roomba algorithms and visualization

I once had an interview question asking for an algorithm for a Roomba that ensures it covers every square of a room divided into grid cells, given that the room shape and location of obstacles are unknown. It’s similar to the idea of solving a maze, except that instead of getting to a specific point, you’re trying to visit every point in the room – to clean it!

It’s a pretty common problem, but I hadn’t seen it in the guise of a physical robot before. Running a Depth First Search covers every piece of floor easily enough, but casting it as a physical device that has to move implies a large cost to popping back up the stack that’s generated during DFS. There’s a lot of backtracking in a DFS based approach for a Roomba, so it makes for a slower vacuuming job.

It made me wonder whether there was some better approach than DFS that would be more efficient.

Continue reading

Solving Boggle boards at scale

Princeton’s Algorithms II course includes an assignment on finding Boggle words. Briefly, Boggle is a game where you have a two dimensional grid of random letters and players try to find as many real words as they can from the board by stringing together neighbouring letters.

This post looks at how tweaking the initial implementation can give a 2x speedup, but picking the right data structure gives a 4,200x speedup.

Continue reading