Data analysis is a big subject, I’m therefore going to approach it through a number of posts. Here we’re going to talk about what I think you probably need for a good package, and some of the down sides of what are the most commonly used in the non-professional scientist world for data analysis. Next time I’ll move onto what I think if (for most applications) the best tools for the job and then in a third post I’ll give some alternatives that are just as good and provide a few reasons why you might want to use some other or varied tools.
So, what do I think you need and why? Well to begin with you need to ability to deal with a lot of data, anything that can’t handle a few thousand numbers is out straight away. You need to be able automate things. For the most part if you need to run some analysis to one dataset you will need to run it to many datasets, this is pretty much the nature of beast. If you are not crazy you will want some way of running the same process many times. Automation, you therefore need some type of scripting language.
Along similar lines to the automation if you are producing some kind of report, which everyone will have to from time to time, you will want all of your figures to look the same. So again you want to be able to write a script to automate your figure design. Not only will you want them to look the same but look nice. There are some other features of figures which may not be immediately apparent to those approaching this kind of thing for the first time. Journals often require figures to look specific ways, you need something that is versatile enough to allow this. But you have an added difficulty that if you are rejected from one journal, which is not uncommon, then you need some easy method to redraw your figures using the design parameters defined by the next journal you are going to submit to, again scripting and automation are your friends here. The quality of your figure is the final thing to consider, it is probably most appropriate to use some scalable vector image format for the best print quality of your figures.
So what packages do most non-scientists use for this kind of thing? Excel… or one of the similar products available in openoffice.org or libre office. These are good applications for what they are designed for, spreadsheets. They are not normally the most suitable for the kind of analysis described above. Automation is difficult, scripting is irritating, and it is normally very difficult to apply these tools to many datasets one after the other. Figures are also often a big problem, they tend to look rather unprofessional. There are other problems with these types of applications, they are not good at high level maths. Complex numbers are not handled particularly well or intuitively along with matrices and vectors. You may think that I’m being overly harsh here but DO NOT USE A SPREADSHEET FOR DATA ANALYSIS! Don’t believe me? Use one of the many programs designed for this kind of thing and see how easy it is compared. I’ll run through a few of these in my next post…