Within the last year, I’ve had a chance to use a number of tools within the Amazon Web Services (AWS) platform. This was for a number of different research projects, the latest of which actually focuses on using AWS to host a prototype for a business I’d like to start. Unfortunately for you readers, I’m still keeping very quiet on this work.
As I use the service to build my prototype, I keep thinking about how useful AWS is for researchers. While I realize that this blog focuses on social science research, it’s really a shame as to how difficult it is to develop software for the social scientific community. Unfortunately, AWS is nowhere near ready for most social science researchers, but if you have access to a research assistant, computer science student, or have a pre-existing technical background, this service may prove very useful.
A relatively easy to use tool is the Mechanical Turk. The tool is designed to allow people to “outsource” their work to a distributed network of workers. You submit a set of tasks or questions, and offer to pay people to complete or answer them. The amazing thing is you can pay people as little as $0.01 per answer, and hundreds still flock to do the work for you. There many papers that illustrate how to use the tool for various research projects, many of which focus on building training sets for machine learning tasks… But with features like filtering users by geographical area and pre-screening people for specific tasks, I think there’s a great opportunity to run other interesting experiments. For example, do people from different geographical regions answer political questions differently? What about providing a colour-blindness test in the pre-screening questions and then running experiments on vision or perception?
The idea behind the Mechanical Turk is called crowdsourcing, and you can even read more focused academic studies on this idea.
The second tool is Amazon’s Elastic Compute Cloud (EC2). If you can’t stop reading about cloud computing in the news, then this is part of the reason. For those scientists who don’t have access to dedicated hardware for experiments or data analysis, whether it’s a regular desktop computer or a supercomputer, this is a great tool to use. I currently pay about $2 / day to have access to the equivalent of a fully dedicated web server, and it’s been great for building prototypes and running test ideas without depending on my university’s infrastructure. This method is also much, much cheaper than setting up a server or buying a second computer.
All I can say at this stage is to check these out. If you’re unfamiliar with the tools or need help, feel free to e-mail me.