As has been widely published, Microsoft has purchased Farecast for a reported 115 million USD. The acquisition (but not the price) is confirmed by a post on the Farecast blog.
“We’re excited to confirm that Farecast has been acquired by Microsoft! This acquisition creates tremendous opportunities for the Farecast team and our customers. We look forward to sharing more details in the weeks to come. On behalf of the Farecast team, thank you.”
Why this acquisition makes sense (to Microsoft)
To answer this question we firstly need to look at what the Farecast system is about:
- A system containing 13 billion airfares (mainly US, but has some US to Europe, Mexico, Caribbean and Canada routes)
- Helps consumers determine when would be a good time to buy a flight – as Farecast predicts whether the flight price is likely to go up or down in the near future (so nicely messes up airline yield management systems by using yield management on the consumer side – a poacher turned gamekeeper approach)
The approach that Farecast uses is based on predictive data mining. From the Farecast website….
How does Fare Guard Work?
Fare Guard is powered by Farecast’s patented and revolutionary predictive data mining technology. There are three main components to Farecast’s predictive technology: Data collection and persistence, predictive modeling and simulation testing.Data Collection and Persistence: Each day, Farecast systematically collects huge volumes of round trip airfare prices for a window of 90 departures dates and 7 corresponding return dates (see figure 2. below). This collection process occurs on each of Farecast’s 2000+ supported markets (e.g. LAX-JFK).
This means that for each departure and return date combination, Farecast will have measured the cheapest prices 90 times by the time of the departure. Farecast stores this data for all time. In fact, Farecast has collected over 150 billion airfare pricing observations to date! This data is used to build powerful predictive models that can predict the future direction of the cheapest prices in a market for specific departure/return dates.
Predictive Data Mining: Data Mining is the process of identifying novel, understandable, useful patterns in large, multivariate data stores. Using state-of-the-art techniques from the fields of machine learning and statistics, Farecast regularly builds predictive models that are created or trained using historical pricing data. Measurements of the raw historical data are engineered into features or attributes that are ultimately used to train a complex ensemble of predictive models.
Simulation Testing: How does Farecast know how well their technology works? Farecast uses sophisticated simulation technology that creates groups of virtual passengers with different purchasing profiles. Leveraging their vast store of persisted historical data, these virtual passengers arrive in the “past” and follow recommendations from the Farecast predictive models. Because the operation is run on historical data, the correctness of the prediction and profit or loss the consumer experience can be calculated instantly and tallied.
All in all, this is pretty impressive stuff. But why of interest to Microsoft? First, lets look at another ecommerce area where data mining is important.
The Netflix prize
Netflix are an online DVD rental company. One of their business challenges is improve the accuracy of predictions about how much someone is going to love a movie based on their movie preferences.
Netflix have built their own system for making these predictions but they were curious to see if their system was as good as it could be. As a result they have put up a prize (1 million USD) for any individual who can develop a predictive system that is better than their own. This competition is being co-ordinated by the Nexflix Prize website. So far they have received 25,000 submissions and they have a leaderboard. No one has won the grand prize (yet).
What can we learn from the Netflix prize?
One of the key findings so far is that, as reported by Consulting assistant professor Anand Rajaraman from the computer science department at Stanford University, is that more (diverse, but related) data usually beats better algorithms.
i.e. entries that have in some way incorporated data from the Internet Movie Database and other external sources tend to come out better than those which are solely based on algorithms and the shared, central, dataset.
This is much like the real situation in travel ecommerce – as every large OTA has pretty much the same data nowadays….. hence are, in the real world, much like the academically accurate environment that Netflix have setup.
Further discussion on Anand’s blog post on the subject
Back to Farecast / Microsoft
OK – so what Microsoft now has is data. Lots of it (and more than their direct “web search” competitors). What will they do with it?
I expect that somehow this data will be used to inform a very good travel vertical search engine (after all, Farecast is a great search mechanism already, from the consumers perspective). This search functionality will compete much better with the likes of Google, Yahoo etc because, looking at the learnings from Netflix, we know that more data often makes better search (rather than better algorithms)
What will Google do? We know that Google has some nice extra data for travel already (they are not just an algorithm any more). (see my previous post about how Google maybe categorising all travel advertisers or my summary post about what Google is upto in travel, including how they now have flight and hotel data on their main search results)
However Google don’t, to my knowledge, have any accurate, live, fare data (and I have no knowledge!). Maybe they will buy Expedia after all!
If you want to be notified next time something is published sign up for email alerts, subscribe to the RSS feed or say hello via Twitter @alexbainbridge. Thank you for reading!


Blog home







Consumer side yielding sounds interesting. However, i wonder how effective a consumer side ‘predictive’ system can be in advising when the best time to buy is to get the best price. It sounds like they are just looking at historic price data to reflect going forward. This may provide something useful for consumers, but is it a match for ‘business side yeilding’?
Travel yielding will typically take into account not just historic price data, but pace data (what shape of booking curve develops up to a ‘consumption’ date), inventory availability and the size and type of the piece of demand. Is it a 3 night stay as opposed to 1 or a multi leg trip as opposed to single thus selling less easily sold inventory? Consumer side yield only has access to price data… presumably.
Its easy to factor in that Easter is a different weekend this year, but how easy is if for them to pick up on a local event in a city that is on a different weekend this year? A trade show on a different weekend, a marathin on a different weekend.
It is certainly interesting, but i doubt the accuracy. If consumer side yielding actually moved demand to periods when better prices could be obtained, the travel yield system would almost instantaneously compensate.