Bladderworts – nature’s fastest plant

I collect outdoor UK carnivorous plants. They are interesting from an evolutionary perspective because they would have to have come after the evolution of insects (which was after the evolution of plants). In my collection, I have several terrestrial Bladderworts (Utricularia).


These plants have one of the most complex yet sophisticated mechanisms in nature – a super fast sensitive trap. The bladders are vacuum-driven and are therefore under negative pressure in relation to the environment. When insects brush against the hairs connected to the trap door, they are sucked into the bladder. When the bladder is full of water, the trap door closes and the insect is trapped inside. The process takes about 10 thousandths of a second.

Image: Jakob Sturm’s “Deutschlands Flora in Abbildungen”, Stuttgart (1796) [wikipedia]

Top Ten (9th) finish in Kaggle Avito

The goal of the Avito competition was to predict whether 2 adverts were duplicates or not. Think of Avito as a Russian Ebay where users post their items for sale. The problem is that over-zealous users may post their advert several times or even have several user accounts and post their advert as differing users.

We were provided with the whole advert: Title, Description, Location, Category, Price and all the associated images. The adverts had either been manually labelled as duplicates or labelled by computer-generated methods. This added slight complexity into the problem because of the added noise of generation method.

The 2 main problems with this task were the number of images – over 10 million – and the fact all the text is in Russian (not a problem if you’re Russian. By the way, a Russian did actually win the competition!) I am an image-phobe. I have a rubbish GPU (graphics processing unit) that is the car equivalent of an Only Fools and Horses Robin Reliant. I decided to just use image hashes using Python’s Image Library and then calculated the Hamming distance between all the images for the 2 sets of adverts. The data was aggregated in R to retrieve the mins, max, mean, standard deviation etc of the image similarities.

For the text, I removed Russian stop-words (common words like “the”, “and” “in”) using the R package tm and calculated Cosine Similarity using the R package stringdist.   I also used a pretrained Russian Word2vec model to assess Russian synonyms between the adverts.

I created a total of 100 features which, when run through DataRobot, achieved a single XGB model score of 0.93 which took me to the top of leaderboard (it didn’t last long)


The features I created fell into the following categories:

## Length, first word, last word, number of shared words in Title and Description and sorted Title and sorted Description

## Cosine similarity of 1- and 2-ngrams in Title, Description and JSON attributes

## How many items have the same images_array?

## Same region? Same location? How far apart are the locations?

## Popular price? Difference in price between adverts and difference between the prices and the median price for that category

## Differences in images at 16 pixels and 32 pixels

## How many synonyms do they share?

These 100 features could get a score of about 0.933 on the Public Leaderboard.

A week before the competition end, team DataMinders asked me and my team mate to join up. They were ahead of us on the leaderboard but by merging teams we managed to give them some uplift in their score and we remained in the top 10 for the rest of the week.

It was a really good munging competition and thanks to NxGTR, Oleksii Renov, Inversion, DataGeek and David Shinn. Our team crossed 5 international time zones, so we did pretty well.

Parasitic bees are not cool!

A few years ago, I was extremely disappointed to find out that there are parasitic bees. Every year, I had Solitary Mining Bees returning to the garden and I used to watch them diligently dig their holes. They would spend each day flying to and from their hole with bags of coloured pollen on their legs. One day, I saw a red insect also go in and out of the holes and I retrieved the computer. Unfortunately, the insect was a parasitic, female Sweat Bee – they do not collect pollen and they lay their eggs in the nests of other bees (their larvae eats the larvae of the other bee!)


To make the story worse, the following day I found an injured black and white Bumblebee. Once again, I retrieved the computer and found out that this too was a parasitic species of bee. I left it to its natural fate. These types of bees are called Cuckoo bees after their nest-stealing nature.



Image 1 :

Image 2 :

So, you’ve hacked into Nigel Farage’s Twitter account

Hypothetically speaking, of course, if you managed to hack into Nigel Farage’s Twitter account, what would you have written? There is a constraint though – you have covered your tracks enough for it to be hard to be found, but if you do something bad enough that results in warrants / subpoenas to IP providers then you’re going to get caught. So, you want to do something bad but not too bad that $$$ lawyers are involved.

Once the initial excitement of hacking in wears off, you are left with the common problem of “how to intelligently frape”. Nigel Farage liking bums, willies, he smells just doesn’t cut the mustard. Maybe something nearly-racist that he could possibly say? Maybe something stupid about the economy? Something sly about other UKIP members? Then you realise that the man is just a fool and fools always have followers regardless.

Maybe causing them a slight 5 minute inconvenience of resetting the password will suffice. *Slowly backs away*