Friday Aug 27, 2010
What is data science?
"The future belongs to the companies and people that turn data into products" says Mike Loukides on O'Reilly Radar. His "What is data science?" article, is an interesting read, that can be found here http://oreil.ly/dknxJV.
I've used some of the tools mentioned like the Python programming language and the Beautiful Soup library to clean up HTML. It has allowed me to deliver some analytics that have combined together data from multiple sources over the internet to project some predictions about the future. In one customer assignment, I combined together Australian Federal Government data, with local State based Population Projections to effectively create a wealth of data about future market shares. This was all done with Python and Beautiful Soup on my Mac Book Pro. I didn't need my own database or data warehouse as I was working with thousands of bits of summary data that was readily available over the internet.
In other activities, I've used Apache Hadoop and its Map Reduce framework to process Australian financial market statistics on the full trading day history of all 2000+ listed companies on the ASX. I've also recently investigated Apache Mahout with its machine learning capabilities and am in the process of learning Apache Pig & Apache Hive to store and process data on top of Apache Hadoop.
All this software is free open source and scales to process large volumes of data on commodity infrastructure.
However, some strong analysis and programming skills are required. I'm working on advancing my knowledge also of statistics that are pertinent to these endeavours. In the past I've found the O'Reilly's book Programming Collective Intelligence to be excellent.
I agree with the Hal Varian quote also mentioned in Mike's post "The ability to take data -- to be able to understand it, to process it, to extract value from it, to visualize it, to communicate it -- that's going to be a hugely important skill in the next decades."
Tags trends data visualization datascience analytics statistics | Comments 0
Saturday Apr 11, 2009
Coding and being productive
I still cut code! Yes, and I enjoy doing so. But many people, have never learnt to or have forgotten how to and the mere thought of writing a computer program is considered to be an inane task, that should be given to others.
When its given to others, a considerable amount of effort is required to translate domain knowledge into a form that the developer can learn from and then use to write the application. Most people underestimate the amount of effort involved here, the complexity of writing down, I mean really spelling out in its simplest form what needs to be achieved and hence do not put the effort into the required activities. This leads to a mismatch of expectations and systems developed that don't meet the dreams and aspirations of the original requestors.
How do you fix it? The simplest way (forgetting politics and previous training) is to get the people with domain knowledge to be stronger participants in the construction of the system. Am I saying that they should cut code? Yes!
They will understand what is feasible and realistic to be achieved. Far too often, I see one sentence requirements, being one of many functional requirements, that upon further investigation would require their own system to satisfy.
I saw today an interesting post on 3quarksdaily where a US College will be training Journalists also in IT. That is combining the skill sets, so Journalists will be code savvy.
Where else does this issue exist? It actually exists in ICT itself, where Business Analysts, in Analytics focused organisations, have developed strong skills in Spreadsheets (eg Excel) but can't write programs to harness data from other sources (eg online data from government or other providing organisations). What happens when the volume of data grows to Terabyte or greater sizes? Are these guys going to be able to process the data in Excel? More importantly are they going to be able to respond in a timely manner to ensure the Analytics that they produce give you a competitive advantage?
There is a wealth of new information being created that can be consumed through electronic means over the internet by writing a computer program. He who can leverage it in a timely manner has an advantage.
The person that can't write a computer program has no productivity advantage!Tags productivitiy business coding skills analytics ict | Comments 0
