It has been a long time since our last post as we have been busy working on our two initial ventures, Variab.ly and Ment.at (more on those soon). We have been working together with a group of outstanding individuals globally on various projects and one of the things that keeps coming up a lot in discussions is the huge divergence in outcomes for Silicon Valley startups vs the rest of the world. There was a great Quora thread on this recently :
As data science evolves, and Open Data initiatives, notably in the US and the UK, are gaining enough traction to feature on the headlines, we need to take a step back and rethink our paradigm. The "download, clean, analyse, report" paradigm is too minor a variation on old practices. We have non-linearised and high-dimensionalised our modelling tools, we have built huge industries around automated data preprocessing, we have switched from dull reports to dynamic infographics on iPads, and we can certainly handle more data than ever before. But we still largely think of data analysis as a procedure with a beginning (the data) and an end (the report).
And yet we know we can do better. We don't really need to look at the web giants such as Facebook, Twitter and Google to appreciate the massive added value of constantly revised, up-to-date insights, versus that of occasional analyses. Far more traditional industries have been playing the "on-the-fly" game for a long time - adaptive monitoring and control of complex systems in real-time is the norm in signal processing and industrial process control, and traders have been employing "sliding windows" to view data in the-most-recent-chunk format long before machine learning and Hadoop came into being. Indeed some basic ideas borrowed from such fields can take you a surprisingly long way in this field - but, naturally, it only gets interesting after the low-hanging fruit have been plucked.
I am enthusiastic about this field, and it's what my research is all about. But it presupposes constant access to up-to-date data - and that's a surprisingly rare animal. I don't need larger datasets, but data feeds, instead - much like upgrading from a puddle to a swimming pool doesn't do you much good if what you need is a river source. In high frequencies, data feeds are often referred to as data streams - but high-frequency is less crucial than continual updates (even weekly updates can prove challenging if the data is big enough). To my mind, the criterion that differentiates a data feed from a dataset is operational: can I build an analytics tool around it that will operate continually without my having to rebuild the thing after each data update? A naive representation of a data feed could be a file where each line is one observation, and one line is added every day, in exactly the same format, continually. But naturally real datasets do not look like this. Formats inevitable change, variables are revised, new information is added and some information is aggregated. And yet I am sure that such modifications could pose altogether much less hassle to the data scientist, if implemented with care and consideration.
So the Continual Data Blog is about the Data Feed revolution: the type of data sources that we can rely on to build "always-on" analytics on top of, and enable computational intelligence to permeate our environment in the same seamless way that weather forecasts, news headlines and twitter feeds have. For those of you that care to join me, our objectives will be to: a) identify existing data feeds (as opposed to one-off open data releases), b) brainstorm on how existing practices in this field can be improved.
Reblogged from : http://www.continualdata.blogspot.com/
GPC Works started out as a regret minimization project. Stubborn ideas over the years plus the usual fleeting thoughts of "...why is this still not around...", or "...there must be a better way to ...", or "... that is a good way to attack [xyz] problem ...", have recently culminated to the realization that a more structured approach would yield concrete answers rather than hunches.
We feel we are at the very start of a new wave of technological innovation driven by new disruptive business models, big-data centric analytics, machine learning methodologies and a deepening global technology adoption in both developed and emerging markets. Software has become the technological DNA of our increasingly connected society.
We believe that a team oriented approach to experimental problem solving is the right way to go about it. We aim to bring together a world class team of friends who are as enthusiastic as we are about attacking interesting problems in novel ways. We will be doing this in two ways:
The areas that are initially close to our heart are: ecommerce, data science & healthcare. We are already working on some projects that we are very enthusiastic about.
Jeff Bezos said : "Long on vision, short on detail" - couldn't have said it better to summarize our plan.
Looking forward to the journey.