Between My Ears

Friday 5 April 2013

10x developer talk is fundamentally misguided

I've been reading all this 10x developer blogstorm recently and I must say I'm a bit puzzled why developers, not businesspeople, would ever slip into such talk. Speaking about productivity in such measurements is perhaps apt for a bricklayer or factory worker, or in general someone in a tightly standardised environment, where one unit of production is the same as any other unit.

Some may expect me to follow that intro up with talk of every developer being a beautiful and unique butterfly with their own strengths and weaknesses, but I actually want to make the point that differences can be much, much larger than 10x. Only in the narrowest of domains does this 10x metaphor hold, and then it's only a few lines of turing-complete code away from collapse. Developers deal with intricately complex systems that need to be comprehended and diagnosed. In that sense they're more akin to medical professionals.

My company has been running as a small agency for the past couple of years, helping entrepreneurs and businesses get their products off the ground, some times taking over from other teams. What we're seeing is that the difference between a good developer (or team) and a bad one is simply vast. A bad developer (or team) can cause bugs or worse, pile hack upon hack to deliver after delays an unmaintainable mess that barely works, and no one else will touch without a rewrite. This last point is important. A bad developer, through incompetence or malice, will lock you in, since switching costs will be sky high. In business terms, this can be the difference between launching on time and on budget, and getting lost in development hell. In other words, life and death. A good developer on the other hand, can help guide your development towards the more effective paths, and while developing, they can identify and remove assumptions that will open up new business pathways, and new dimensions for a product. With better code and architecture, maintenance and change become easier, which again can translate to life or death for a business under competitive pressure. A good developer will let you know when awesome functionality is just one integration step away. A good developer will understand the business and get things right the first time.

Nobody talks about the 10x painter, singer, doctor, actor, philosopher, or leader. And they shouldn't talk about 10x developers either.

Sunday 13 May 2012

DH7: When it's your job to fix you'r critics' cases for them

Sometime after I came accross Paul Graham's excellent disagreement hierarchy, I came accross a little known addition by a blogger known as Black Belt Bayesian:

DH7: To win, you must fight not only the creature you encounter; you must fight the most horrible thing that can be constructed from its corpse.

Paul's article revolved around civility in on-line forums, so I can see why he stopped at 6. Trying to construct an argument out of a jumbled mess of an argument that 'someone on the internet' makes is a hobby few may be interested in. Perhaps the purists, perhaps the philosophers.

Now, look again through the eyes of an entrepreneur. For many on Hacker News this requires no imagination whatsoever. You pitch your idea to many people every week. They may come back with a blurb of a potential problem. One that sounds like those you dismiss easily with a ready-made response. The other side will not push their case. They will not try to state their case clearly, nor will they counter-argue. It is a social situation, they are only making conversation, and are only too happy to move on to the next topic or tell you about their startup, or network with someone else. But for you, DH7 is a matter of life and death:

To win, you must fight not only the creature you encounter; you must fight the most horrible thing that can be constructed from its corpse.

Only this time, winning is not about some argument on a forum. The stakes are much higher. If you failed to recognise a valid argument in the mumblings of an experienced but unmotivated interlocutor, you may hear it again, loud and clear, in the epitaph of your business venture. So you must inquire, open up, defeat the ugh field, and push through until you have found what was lurking behind the bushes, or you find out it was only the wind afterall.

So open your ears fellow entrepreneurs, and don't let that tiny note of discord get lost in the noise. Steamrolling objections with your well-practiced arguments (or non-arguments) is good fun, until you miss that one valuable insight.

Sunday 29 January 2012

Solving Causes' Levenshtein Distance challenge in Python, the Sequel

The 3 faithful readers of this blog have probably seen my previous attempt at cracking Causes' Levenshtein distance challenge. It all went well until Adam Derewecki of Causes commented with the following:

...Pretty good solution though, about 15s on our benchmark machine. Record is 11.3s if you're up to the challenge :)

At first I was like "Yeah right, mate, you're not roping -me- in with that one, I have a startup to run." But the predictable engineer's mind just couldn't let it go. How could someone have done about 30% better in Python? What was I missing? So, I started hacking at the code again. Turns out (surprise!) I was missing quite a bit. Let's start with putting the original code up for you to see:

import string
w = set(open("00wordlist.txt").read().splitlines())
f, nf = set(), set(["causes"])

#from b, yield all unused words where levdist==1
def nextgen(b):
    for i in range(len(b)): #for each index in b
        for c in string.ascii_lowercase: #for letters [a..z]
            if c != b[i]:
                #substitute b[i] with c
                if b[:i] + c + b[i+1:] in w:
                    yield b[:i] + c + b[i+1:]
                #inject c before b[i]
                if b[:i] + c + b[i:] in w:
                    yield b[:i] + c + b[i:]
        #remove b[i]
        if b[:i] + b[i+1:] in w: yield b[:i] + b[i+1:]
    
    for c in string.ascii_lowercase: #for letters [a..z]
        if b + c in w: yield b + c #append c after b

while len(nf):
    cf = nf
    nf = set([j for i in cf for j in nextgen(i) if j not in f])
    w -= nf
    f |= nf

print len(f)

First, Adam's suggestion was very good by itself. Why write this:

nf = set([j for i in cf
        for j in nextgen(i) 
            if j not in f])

when you can omit the intermediate array and just write this:

nf = set(j for i in cf 
        for j in nextgen(i) 
            if j not in f)

But it gets better. Since I subtract nf from w, from where the values are sourced, why even check if j not in f? No reason. So, we end up with the much more palatable:

nf = set(j for i in cf for j in nextgen(i))

After improving that line, I noticed that I had a line above that was doing absolutely nothing whatsoever:

cf = nf

This line simply betrays my uncertainty about how python's comprehensions work. It turns out the next line can be simply written as follows, with no need to ever declare cf at all.

nf = set(j for i in nf for j in nextgen(i))

Next up, let's look at the little optimisation I had in line 9:

if c != b[i]:

Here I used a whole line to check that I wasn't going to be doing any useless checks. Even though I was aiming for small code. Even though Python has O(1) membership testing. When I looked again at the code and doubted my own premature optimisation, the results were damning: The test cost more time than it saved. Removing that line yields a speed improvement.

All these improvements were small. They saved 1-2 seconds over the total of 25 seconds it takes on my laptop. The big improvement came when I tried the technique seen in this stackoverflow answer. Interrupting the program while running for a few times indicated the culprit. The constant use of the slicing operation was not doing me any favours. For every given letter and every position in a string I did operations like this:

if b[:i] + c + b[i+1:] in w:
    yield b[:i] + c + b[i+1:]

That's 4 slice operations, and actually this is done twice for a total of 8 per letter. So I decided to do the slicing only once per position, assign the results to variables and use those for each letter. That sped things up enormously. It brought runtime from slightly under 23 to well under 17 seconds.

UPDATE: After some impromptu after-work tinkering with my co-founder Pagan, we realised Python iterates over lists faster than over strings, which means that adding the line

letters = list(string.ascii_lowercase)

to the setup part of the code speeds things up by a cool 4%.

All these improvements add up to 1/3 of the total running time. Since Adam said that my programme ran for 15 seconds on the benchmark machine, while the best Python they had ran at 11.3, I suspect this may be enough to beat the frontrunner. Now I just have to get Adam to test this one again.

Another change I did is improve the horrible variable naming I had last time around, and also add a few more comments. I also was very strict about keeping lines under 65 characters in length. So here is the resulting program:

import string
words = set(open("00wordlist.txt").read().splitlines())
frnds, newfrnds = set(), set(["causes"])
letters = list(string.ascii_lowercase)

#from word wd, yield all unused words where levdist==1
def freefrnds(wd):
    for i in range(len(wd)): #for each index in wd
        wd_upto_i,wd_from_i,wd_after_i = wd[:i],wd[i:],wd[i+1:]
        for char in letters: #for letters [a..z]
            #substitute wd[i] with char
            if wd_upto_i + char + wd_after_i in words:
                yield wd_upto_i + char + wd_after_i
            #inject char before wd[i]
            if wd_upto_i + char + wd_from_i in words:
                yield wd_upto_i + char + wd_from_i
        #remove wd[i] from word
        if wd_upto_i + wd_after_i in words:
            yield wd_upto_i + wd_after_i

    for char in letters: #for letters [a..z]
        #append char after word
        if wd + char in words: yield wd + char

while len(newfrnds):
    newfrnds = set(j for i in newfrnds for j in freefrnds(i))
    frnds |= newfrnds #add newfrnds to the frnds set
    words -= newfrnds #remove list of newfrnds from words

print len(frnds)

Thursday 3 November 2011

A 3-rectangle 17x17 grid

I've made no secret of my obsession with the 17x17 challenge. I started working on it in November 2009 and went straight at it for six months. At that point I had to stop to code/write up my PhD but started working on it again as soon as I was done. This problem has given me a reason to learn so many amazing things in both math and programming, that I would be happy to have worked on it even if I never produced anything worthwhile. It's now a weekend project given that I run a startup, but, after almost 2 years, I have something to show the world: A grid with 3 rectangles.

To be clear, this is not a solution. It would need to have 0 rectangles to be a solution. But it is the least broken solution I know of. Bill Gasarch posted with the problem a 4-rectangle grid by Rohan Puttagunta, but this has not been improved on since. Here is a leader board of the best grids known so far.

This solution is less impressive than that one in that it's missing two cells rather than one, but it does have fewer rectangles when extended to a full colouring, which is what makes it interesting. Without further ado, the solution, with the rectangles marked out:

If anyone has code they want to use to analyse this, here it is again in a machine-friendlier format:

4,2,1,3,1,2,3,4,4,1,2,4,2,1,4,3,3

2,4,2,1,3,2,1,4,1,4,1,2,4,3,3,4,3

1,2,4,1,3,1,3,4,3,1,4,2,3,4,2,2,4

3,1,1,4,3,1,4,3,4,2,3,2,2,4,1,4,2

1,3,3,3,3,3,4,2,2,4,1,4,2,1,1,2,4

2,2,1,1,3,4,4,1,2,3,4,3,4,2,4,3,1

3,1,3,4,4,4,3,4,1,1,2,3,1,2,3,2,2

4,4,4,3,2,1,4,3,3,3,1,2,1,2,3,1,1

4,1,3,4,2,2,1,3,2,4,4,1,3,1,2,3,1

1,4,1,2,4,3,1,3,4,1,4,3,2,2,2,1,3

2,1,4,3,1,4,2,1,4,4,3,3,3,3,2,1,2

4,2,2,2,4,3,3,2,1,3,3,1,4,4,1,1,2

2,4,3,2,2,4,1,1,3,2,3,4,1,4,1,2,3

1,3,4,4,1,2,2,2,1,2,3,4,4,2,3,3,1

4,3,2,1,1,4,3,3,2,2,2,1,1,3,2,4,4

3,4,2,4,2,3,2,1,3,1,1,1,2,3,4,3,4

3,3,4,2,4,1,2,1,1,3,2,2,3,1,4,4,3

In case anyone hasn't noticed, this solution is symmetric, i.e. colour(x,y) = colour(y,x) if you start numbering from the top left.

I have a lot more to write about the solution and method, and I'm not yet done with this approach, but if I start writing up I may never publish this, so I'll just stop here for the moment.

Sunday 2 October 2011

An accidental survey of the Hacker News ecosystem

So my little rant a week ago made a bit of a splash, briefly occupying the top of Hacker News. Looking at the inbound traffic, what surprised me was the variety in the HN ecosystem, particularly the alternative front ends that people use to access the HN stream.

Writing a HN front-end is an uncertain proposition: You only get to talk to your audience once - at launch. You don't have a viral loop, so users will most likely be within that initial spike, plus whatever you can get from word of mouth. There are a few online listings, but then the question is how many will get to see those without a static link from HN. This is my attempt to bring a little more attention to these very cool but underappreciated projects.

What I saw in the logs, is that at least 3% of the HN traffic came from alternative front ends. The number may seem small, but zooming into those visits reveals the surprising variety of ways people consume Hacker News. At this point I'd like to apologize to the coders of native apps: I can't see you in my logs, because there is no referrer in the traffic you sent my way. I know that 20%+ of the traffic I got is unaccounted for (and it damn sure wasn't 3600 people dropping in to see what's on my blog), but I can only guess where it came from. With that said, here's the situation, if my analytics dataset is to be trusted:

The most popular alternative HN front-end is hckrnews by @wvl, sending 69 visitors over. hckrnews competes with the default front end head-to-head by offering multiple convenient ways to access the submissions. It is also well integrated with the Hacker News plugins made by the same author for Chrome and Safari.

iHackerNews targets an unmet need, namely the lack of an official mobile website. iHackerNews is built by @ronnieroller and also offers something else much requested of HN: an API. 51 visitors were iHackerNewsers.

The hardest to measure HN-alternative of sorts is @cperciva's Hacker News Daily. The idea is simple: A feed with each day's 10 most upvoted submissions. It covers the need to know that even if you're gone for a day or two, you can still see the best posts when you come back. The reason it's hard to measure is that only a fraction of people go through the website, that sent 31 visits. I would expect most to consume this through the feed (as I do too). Google Reader tells me that Hacker News Daily has 1061 subscribers. That's significantly lower than the 36.564 subscribers to the main HN feed, but I suspect readers are much more likely to be engaged, since they get only one feed item per day.

Another mobile-friendly front end is hn.gethifi.com. As far as I can tell it's been made by @JoelSutherland, @KrisJordan and the gethifi.com team. While not as feature-complete as iHackerNews, I do prefer the look and feel which looks more tailored to a mobile device. 13 visitors came that way.

12 more visitors came through another mobile-optimized site, iCombinator.com. The selling point is the instapaper integration as well as its optimization for iPhones by using iUI.

For the fans of a more traditional reading experience there is Hacker Newspaper by Giles Bowkett. As you might expect, it renders the stories of the HN front page in a newspaper format. I imagine this would work quite well with a tablet. Hacker Newspaper sent 10 readers.

An interesting take on the HN frontpage is hnsort.com, which allows you to sort by points, comments, domain, submitter, and age amongst other things. 4 hnsorters showed up.

Another 3 came via one of my favorites, Hacker News Reader, which is a serverless app. This means that you download a static file that when executed on your browser, fetches the HN frontpage, parses it, and presents it in a different way. No server is involved, which means that if login was implemented, your credentials would never need to touch the developer's server. HN Reader is also installable as a Chrome Web App. While feature-limited, it looks optimized for mobile and also rocks instapaper integration. Definitely one to keep an eye on.

Then there is the long tail: The alternative front ends that barely registered by sending a single user my way. Since there are so many projects in this list, I am sure I may have missed as many if not more that didn't send a user to my post. There's fuhn.tk, the only project that explains its mission with a rage comic.

Then there’s HN overload, which nicely organizes each day's top links in a clean format. It scores by aggregating hn points, reddit karma, and number of retweets. Definitely tempted to use this one more.

There's Hackerslide.com, which organises hourly snapshots of the HN front page using an etherpad-style timeline. Genius.

hnvue.com promises to optimize your HN reading experience by allowing you to see both the front page on the left, the article page on top, as well as the comments page at the bottom. A very interesting experiment.

Another very clean and mobile-friendly site can be found at yhack.net. I like the minimalist well spaced layout as well as the refreshing blue colour scheme in a sea of orange competition.

As is obvious there is a lot of variety in the HN ecosystem. I hope at least one of these apps convinced you to give it a try, even if just for a day.

Besides the dedicated HN front-ends, I really should also mention the aggregators: websites that pull together an HN feed with feeds from other relevant sites. The main ones are jimmyr.com, popurls.com, and hackurls.com, in this order. Aggregators sent 94 user agents this way.

Another benefit of being able to dive into the referrer logs, is to see from what part of HN people came through. If you're wondering what that means, read on. There's /best, where articles rise and fall much more slowly, giving you access to great articles you may have missed over a few days of absence. There's /classic, which applies a different sorting algorithm. I believe it has something to do with usng only veteran users' votes. Then there's one I didn't know existed: It's /over. This allows you to set the threshold of points you want to see articles, well, over. Going to /over?points=100 will show you the latest(?) articles that have over 100 points. As for the two people who came with news.ycombinator.com/rss as the referrer, you sirs, are gentlemen and hackers.

Since we're talking data, another lesson I learned is that these 'follow me on Twitter' links you see at the end of some blog posts? They get clicked. I added one to my previous post a few hours after the initial spike, and I estimate it brought about 1.5 new followers per 1000 reads. Not earth shattering, but not too bad, either.

What do those links look like? Well, something like this:

If you read this far, why not follow me on Twitter here.

Sunday 25 September 2011

Why Facebook's 'Frictionless Sharing' violates HTTP

Facebook has this new feature, whereby the act of simply reading a web page, under certain conditions, gets it posted to your news feed, for your friends to see. Here's how ReadWriteWeb puts it:

With these apps you're automatically sending anything you read into your Facebook news feed. No "read" button. No clicking a "like" or "recommend" button. As soon as you click through to an article you are deemed to have "read" it and all of your Facebook friends and subscribers will hear about it. That could potentially cause you embarrassment and it will certainly add greatly to the noise of your Facebook experience.

Facebook calls this 'frictionless sharing'. This has raised all sorts of ‘creepy’ flags, and rightfully so. A big reason for this is that it breaks a fundamental contract of web interaction, in place since the beginnings of the web, that users have come to rely upon. This contract is the fact that merely browsing a webpage (Executing a GET in HTTP talk) should not cause effects that you, the visitor, are responsible for. Posting to your news feed is a side-effect, is a direct side-effect of your reading the article. You take no extra step to authorize this.

This violates a convention that is not there by accident. The HTTP Specification defines GET as a ‘safe’ operation, with certain guarantees. This line has been skirted for a very long time, but never by a company of this size, so publicly, and so blatantly. This is what the HTTP Spec has to say on the matter:

9.1.1 Safe Methods

Implementors should be aware that the software represents the user in their interactions over the Internet, and should be careful to allow the user to be aware of any actions they might take which may have an unexpected significance to themselves or others. In particular, the convention has been established that the GET and HEAD methods SHOULD NOT have the significance of taking an action other than retrieval. These methods ought to be considered "safe".

[…] Naturally, it is not possible to ensure that the server does not generate side-effects as a result of performing a GET request; in fact, some dynamic resources consider that a feature. The important distinction here is that the user did not request the side-effects, so therefore cannot be held accountable for them. (emphasis mine)

I don’t think it gets any clearer than that. It’s as if the HTTP committee had looked into the future and was personally addressing Mr. Zuckerberg. Now, the HTTP spec has no teeth. There is no enforcement body that goes around and metes out fines and punishment to the violators. It is a gentlemen’s agreement and the contract that good citizens of the web should keep. As such, I think it merits at least a mention when large companies find new and ‘frictionless’ ways to undermine the foundation upon which they (and everyone else) is building on.

Update: A number of people are pointing out the fact that the user authorizes the side effects by installing the app on facebook. However, I assume Facebook also agrees to the HTTP Spec by implementing it. Does getting user authorization allow you to violate HTTP? I don't see any such language in the spec. I think the safeness of GET is one of those rights that you shouldn't be able to give away, even if you wanted to, as doing so undermines the web for everyone else.

If you read this far, consider following me on twitter

Sunday 18 September 2011

Three times Google’s ‘strategy’ got in the way of success: Skype, GDrive, Google+

I just finished reading Sriram Krishnan’s excellent post 'Don’t be so f*king strategic' and couldn’t stop thinking that this must have infected Google as it had Microsoft.

Here are three times when the public learned of missteps of Google that were somehow related to a grand strategy of the company:

Skype

This is documented in Steven Levy’s book ‘In The Plex’ and the author has a more specific blog post on the issue. What is comes down to is this: Someone at Google thought, and was able to convince the higher-ups, that peer-to-peer is old technology, not consistent with their cloud model, so Skype was worthless to them. The fact that this someone was from a product group that would have to compete with Skype internally goes unmentioned, but what is important is to see that Google in 2009 passed up the opportunity to buy Skype for a fraction of what Microsoft paid for it in 2011. Skype is now integrated with Facebook.

GDrive / Dropbox

Drew Houston worried in his YCombinator application that Google would launch GDrive any day. It turns out Google didn’t and Dropbox is a billion dollar company today. Why? In The Plex has this to say (http://googlesystem.blogspot.com/2011/05/how-google-docs-killed-gdrive.html): The Google Docs team was able to convince the higher ups that files didn’t make sense in the cloud. File systems were a thing of the past and so GDrive was abandoned almost complete, the engineers sent to work on Chrome. It turns out that files aren’t quite as dead yet, and Google Docs itself now allows you to upload them. Recent rumours say that GDrive may have been resurrected and is getting launched, this time in a much more crowded space with credible independent competition.

Google+

The latest news is that Google+ is showing signs of decline. The causality here is not as strongly established, but the early demographic has not been happy with their real names policy. When I found out they were enforcing this policy, I was in disbelief. Surely when you’re entering a new market, you want to be friendlier than the competition, you want to be welcoming to those who were disenfranchised from your competitor. Google proceeded to shoot itself in the foot by affecting other services that the blocked users were on, shutting out ethnic groups that did not have names with a western structure, people who are known by a name that is not their legal name, as well as those who preferred to remain anonymous for security reasons. The statements coming out of the GooglePlex were to the effect that 'Google+ is not for everyone', and that they can't fight all the battles all the time. This is bizarre behaviour on its face, and I've learned that when smart people behave in ways which appear outright incompetent, there's usually higher level considerations at play. It turns out that Google sees Plus as an ‘identity service’, a part in a grand strategy we’re not privy to (but can make guesses about). To put this in plain terms, Google is jeopardizing their bet-the-company move to attack a competitor because they have some masterplan that may or may not be what users want in the long run.

This is three times when Google let their ‘strategy’ get in the way of success as far as we know. I’m sure more are known to the insiders. I hope this has documented what I’ve been seeing watching one of my favourite companies display a fondness for footguns. Here's my unsolicited advice to Google: Stop being so f*cking strategic and just focus on building the world’s coolest technology, what people love you for, before you end up boring like facebook.