Data analysis is a big subject, I’m therefore going to approach it through a number of posts. Here we’re going to talk about what I think you probably need for a good package, and some of the down sides of what are the most commonly used in the non-professional scientist world for data analysis. Next time I’ll move onto what I think if (for most applications) the best tools for the job and then in a third post I’ll give some alternatives that are just as good and provide a few reasons why you might want to use some other or varied tools.
So, what do I think you need and why? Well to begin with you need to ability to deal with a lot of data, anything that can’t handle a few thousand numbers is out straight away. You need to be able automate things. For the most part if you need to run some analysis to one dataset you will need to run it to many datasets, this is pretty much the nature of beast. If you are not crazy you will want some way of running the same process many times. Automation, you therefore need some type of scripting language.
Along similar lines to the automation if you are producing some kind of report, which everyone will have to from time to time, you will want all of your figures to look the same. So again you want to be able to write a script to automate your figure design. Not only will you want them to look the same but look nice. There are some other features of figures which may not be immediately apparent to those approaching this kind of thing for the first time. Journals often require figures to look specific ways, you need something that is versatile enough to allow this. But you have an added difficulty that if you are rejected from one journal, which is not uncommon, then you need some easy method to redraw your figures using the design parameters defined by the next journal you are going to submit to, again scripting and automation are your friends here. The quality of your figure is the final thing to consider, it is probably most appropriate to use some scalable vector image format for the best print quality of your figures.
So what packages do most non-scientists use for this kind of thing? Excel… or one of the similar products available in openoffice.org or libre office. These are good applications for what they are designed for, spreadsheets. They are not normally the most suitable for the kind of analysis described above. Automation is difficult, scripting is irritating, and it is normally very difficult to apply these tools to many datasets one after the other. Figures are also often a big problem, they tend to look rather unprofessional. There are other problems with these types of applications, they are not good at high level maths. Complex numbers are not handled particularly well or intuitively along with matrices and vectors. You may think that I’m being overly harsh here but DO NOT USE A SPREADSHEET FOR DATA ANALYSIS! Don’t believe me? Use one of the many programs designed for this kind of thing and see how easy it is compared. I’ll run through a few of these in my next post…
During my time doing a PhD, and some time before, I have used, come across and talked about a great many software packages. These range from simple little tools written for linux to scarily complicated 3D modelling applications running on windows and everything in between and roundabouts…
I’m going to highlight a few of them and talk a bit more generally about scientific software in a short series of post. There will be both some recommendations and some general commentary.
Sorry about the lack of many posts recently, hopefully the reasons for this will become clear over the next few blog posts.
I thought I’d restart my blogging with a brief discussion of some events recently. I’m sorry about the lack of science content here but a big chunk of the physics is commercially sensitive so I cannot talk about it.
The following is a reworking of a previous post, for this I would apologise, but am pretty sure you mostly do not care.
Let me start by talking a bit about The Big Bang Theory (TBBT) tv show. I like the show for the most part, it seems to work well and I only normally feel slightly surprised when they do things like putting on safety goggles to use liquid nitrogen, safe yes, often carried out, no. The characters have attributes that I could easily point to many of my colleagues and say “they act like x”. Therefore I would say I am neither a fan boy nor a critic of the show, if it’s on I’ll watch it, but I tend not seek it out.
More generally now, there seems to be a perception with the non-scientists of the world that we are an elitist group who spend all of their time agreeing with each other. This is often visible to me in the form of online forum posts but recently TBBT also alarmed me with the views of one of its physicists. I bet you can’t guess which?
This is very much not the case. Science is a hot bed of arguments and discussion. I have recently seen a public display of this with a well known (in the UK) public figure and a blogger I hold in high regard. I’ll go into this in a little more depth below. First I will explain some of the problems I have with the character from TBBT.
These are people who believe maths is not required for modern science and that they have over turned some well known and understood theory with their own idea. This is normally conducted without any knowledge of the theory or supporting evidence for the theory they are claiming to have overturned past why you might identify by reading science articles in the Daily Mail newspaper…
These people often suffer from delusions of grandeur or believe that science is a big conspiracy. Here I shall ignore this and concentrate more on why it is flawed to think that way about any subject. Again though, I’m a physicist, so this will be coming from a physics perspective…
Or more to the point, avoiding having a thesis that never was.
I’m now a fourth year PhD student, doing physics, coming to the end of my time, clearly using my time well writing a blog post. During my time leading up to starting and in the first few years many people gave me many bits of advice. Some as a joke, some serious.
I have done the same to those who have followed in my footsteps and propose to discuss the most useful below. I’m a physicist, and this is written from my perspective, but I would hope that they can be stretched out wider into the other sciences and perhaps beyond…
Numerology, the art of putting numbers together to get out other numbers and being convinced you’ve discovered something brand new…
Or something like that.
If you are unsure about what I’m talking about here please take a quick look at the wikipedia article on numerology. This seems to cover it quite well. But I shall explain how I most often come across it. And a possible root cause, or at least a push for those who are likely to fall this way .