Child pages
  • Project Proposals
Skip to end of metadata
Go to start of metadata

<< Return to Summer of eResearch

19 October 2010


Discussing great projects with talented students and enthusiastic sponsors. Initial project allocations and confirmations made, and invitations to the coming NZ eResearch Symposium extended! More to come..


As Project Proposals are received, they will be placed up here. Please check back regularly to see what projects are being proposed.

What shall I do now?

  1. Once you've reviewed the proposals, get in touch directly with the Sponsors for the project, suggesting your interest in collaborating with them.
  2. Fill in a student application


The following proposals are initial suggestions for projects, where the sponsors have identified a strong need and interest.

Open access databank and website for a Model Ecosystem

This project involves developing the first version of a software and database infrastructure for a large collaborative ecosystem research project. The research project aims to genetically and ecologically characterize an entire ecosystem and is only the second of its kind in the world. It will involve intensive statistically rigorous genetic and environmental sampling of Little Barrier island - an offshore island in the Hauraki Gulf, which retains one of the most intact primitive ecosystems in New Zealand and is a restricted-access reserve harbouring many threatened New Zealand species.

The software will provide the public face of the project as well as the point of contact for the research scientists involved and will provide aggregation and collation functions for the data analysis. It will intregrate Google Maps visualizations and link out to other public resources such as Genbank and various Biodiversity and GIS databases.


  • Alexei Drummond, Associate Professor of Computational Biology
  • Auckland

skills required

  • The prospective programmer should have strong skills in Java-based web programming and SQL.
  • Knowledge of the Google Maps programming API and Genbank would also be beneficial, but are not required.
  • An interest in ecology or genomics is essential.

ONZE Miner integration with HPC

The NZILBB has developed a browser-based web application called ONZE Miner ( ), which organises and searches collections of interview transcriptions that have been time-aligned to media (primarily audio) files (e.g. for researchers to easily identify and analyse occurences of particular phoneme combinations within the corpora of recordings).
The system has so far been particularly useful for researchers in sociophonetics, but there are useful data processing tasks that are impractical with our current computing resources, for example:
 * extra-complicated searches that take too long to execute (i.e. querying existing data)
 * resource-hungry data-production processing like:
  - computing and storing formant-tracks for all of the audio data we have (1000 hours or so), to make this data easily available to search and analysis (i.e. processing a large amount of audio data using tools like Praat and storing the results - currently 14 hours of speech takes 6 hours to process)
  - parsing the text of the whole database to produce and store syntax-trees, which can then be searched and analysed (currently parsing often fails because of insufficient memory, and is time-consuming)
We would like to develop a mechanism to submit a task like those above (or new tasks that researchers dream up) via the browser interface, and have the web-server submit a job to an HPC system, along with any data or data-access required, monitor the progress of the job, and handle the final results.

ONZE Miner has allowed relatively quick progress on linguistic research questions that previously would have been unmanageably time-consuming to address. 
However, as automatic generation of word tags and word/phoneme alignments has improved, there is further research which seems tantalisingly close to being realisable, but long processing times (e.g. for identifying formant values for a large set of vowel tokens) mean that there are linguistic patterns that go unfound despite the data being available to identify them.
A mechanism for ONZE Miner to take advantage of available HPC resources when available would enable intensive computing jobs to be realised, making available a wealth of data for research projects (both undreamt-of and dreamt-of-but-rejected-as-impractical).
The Institute website is:


  • Robert Fromont -
  • Jennifer Hay -
  • Christchurch



  • must: develop infrastructure and UI within ONZE Miner for defining, submitting, monitoring, and finalising a job (in general) for HPC, and corresponding mechanism for the HPC system to receive and process the job.  The interface must allow for future specialisation of hitherto undreamt of computing tasks, beyond the three examples mentioned above.
  • must: implement one of the mentioned candidate tasks (search, formant-track generation, text parsing)
  • should: implement another of the candidate tasks
  • could: implement the last candidate task

skills required

  • Must know java.
  • Some understanding of Linguisitics would be useful.


BLAST is tool for finding regions of local similarity between sequences, frequently used by bioinformatics researchers.  While the BLAST software supports parallelism only by threading (i.e., within a single address space), the mpiBLAST package extends BLAST with parallelism based on MPI, allowing to scale well on a cluster or a massive parallel system - and mpiBLAST is reported to scale well on BlueGene systems (both L and P).
BLAST and mpiBLAST can search against databases provided by the researcher - but are very frequently used for searching the databases provided by NCBI (National Center for Biotechnology Information,  These datasets are frequently updated by NCBI and their local copies at a site need to be regularly updated. Due to implementation reasons, mpiBLAST needs to have the NCBI dataset pre-formatted depending on the maximum number of processors that would be used to run the job.  This needs to be done carefully so that jobs running at the time of the update are not effected.

The goal of this project is to make BLAST and mpiBLAST available for users of the BeSTGRID computational - in the form of a Grisu job template.

  • ARCS have already invested a lot of effort and built up experience - and would be willing to share (Simon Yin / Intersect).
  • Several researchers have already indicated strong interest in having BLAST available on BeSTGRID - and would be available to provide input into steering this project.
  • Anthony Poole (University of Canterbury School of Biological Sciences) will be available to provide input into this project.


  • Vladimir Mencl,
  • Tim David, BlueFern Director,
  • Christchurch


The particular milestones in this project are:

  • Becoming familiar with BLAST and running BLAST searches
  • Installing BLAST and mpiBLAST at a BeSTGRID cluster.
  • Installing mpiBLAST at the BlueFern BlueGene/L system. Setting up a framework for updating (and formatting) the NCBI databases
  • Designing a BLAST/mpiBLAST template for Grisu.

The Australian Research and Collaboration Services (ARCS) have already put some groundwork into making BLAST available via the grid.  This project will leverage the work done by ARCS, complete the work where needed, and get make BLAST and mpiBLAST to the BeSTGRID community.

Relevant links:
ARCS documentation:

skills required

Should have:

  • good Linux/POSIX systems administration skills
  • shell scripting skills
  • reasonable understanding of C/C++

Would benefit from:

  • background in high performance computing
  • some understanding of bioinformatics

Invader genetics for New Zealand conservation

Every week, islands around New Zealand are subject to a barrage of alien invasions.  The aliens are small, furry, four-legged creatures with sharp teeth and an appetite for native birds and plants. They are introduced mammal pests, including rats, stoats, and mice.  The best chance for us to ensure long-term survival of native species such as kiwi and kokako is to create island sanctuaries that are free of mammal pests.  Unfortunately, the invaders have plenty of tricks up their furry sleeves: they are excellent (and eager) swimmers, hitch rides on boats, and often turn up unexpectedly and catastrophically on sanctuary islands where there are breeding populations of endangered native birds.  Millions of dollars are spent on pest management in New Zealand each year - a single rat on a sanctuary island can cost tens of thousands of dollars to track down and remove.

When a rat or stoat invader turns up on an island, conservation managers need to know where it came from so they can best target preventative measures for the future.  Genetic analysis is rapidly becoming a key tool for conservationists.  The genetic profile of an invader can be matched to its source population.  If the source populations are sufficiently distinctive, we can determine the source of the invader.

We are currently at a key stage for genetic research feeding into conservation applications, in New Zealand and worldwide.  The Department of Conservation, the Regional Councils, and numerous community groups have recognised the benefits of genetic analysis.  We now need coordination of resources, data-sharing, and a user-friendly genetic analysis package so that we can make maximum use of the opportunities.


Co-applicants (joint mentors) :

  • Steven Miller, University of Waikato:
  • James Russell, University of Auckland:


  • EcoGene, based at Landcare Research, Auckland: contact email Dianne Gleeson,


  • Auckland Regional Council: contact email Jonathan Boow,
  • Department of Conservation: Auckland and Northland conservancies
  • Motu Kaikoura Trust community group
  • Guardians of the Bay of Islands community group
  • Auckland


  • Create a national DNA database for conservation.  The more source populations in the database, the better the chance that we will be able to locate the source of an invader.
  • Create a map-linked Graphical User Interface (GUI) for ready analysis of genetic data by conservation managers.  This will include features from simple chart creation to sophisticated animations.
  • Please see
    for more information, example outputs, links to Powerpoint presentations, and the research group website.

skills required

  • A simple database will be established - basic knowledge of databases very useful
  • For one or more of the team, a statistical background and knowledge of R (or willingness to learn)
  • Ability to create a graphical user interface for PC

Geospatial Lattices with HEALPix

Many geospatial science models use latitude-longitude grids for their mesh. Converging meridians at the poles present mathematical problems, as does the increasing requirement to maintain both low and high resolution features and support moving fluidly between them. This project seeks to use Adaptive Mesh Refinement (AMR) techniques to solve these problems in the context of New Zealand geospatial datasets.

This project will focus data storage and retrieval from newly encoded AMR geospatial datasets, seeking to compare different data storage strategies such as implementing the above for CouchDB (NoSQL/javascript), PostGIS (SQL), Neo4J (NoSQL/java), HDFS ... in a way that can be executed efficiently on a GRID data/compute resource. It is expected to produce abstract models and prototype implementations, rather than final production implementations.

An example from Atmospheric Sciences:


  • Auckland
  • Lincoln
  • Massey


Landcare Research maintains many of the geospatial research data sets for New Zealand, and is increasingly seeking to integrate analyses across these datasets. The informatics team has responsibility for this work, and this project is being sponsored by Informatics Science Leader Robert Gibb. For further information, see the Informatics research activity:

skills required

  • familiarity with databases, such CouchDB (NoSQL), PostGIS (SQL), HDFS will be useful
  • experience using GRID / distributed data/compute resources
  • basic understanding of geospatial systems very useful

Data analysis of drug discovery datasets

Drug Discovery is moving out of the wet lab and becoming (at least partly) a computational science! Finding candidates for drug development involves identifying the sites within 3 dimensional biological compounds where specific molecules can bind and hence change the potential for future activity to take place.

We've developed an initial Drug Discovery pipeline within BeSTGRID:
using initial commercial 3d docking software, and are now looking to run additional docking software in parallel, and add multi-parameter post processing scoring functions onto the pipeline. The aim is to significantly increase the accuracy with which candidates are identified for Drug Discovery.


  • Auckland


This approach will massively increase the data produced by the pipeline, and hence will require smart data analysis and clustering techniques to aid the researchers in interpreting the results. We want to:

  • use heuristics to review commercial packages to assess potential approaches to classify the outputs of scoring functions
  • implement several scoring functions and explore the data sets this produces
  • develop data clustering techniques for analysing the result sets
  • meet students interested in carrying on this work into postgraduate study

skills required

  • programming skills in a variety of languages
  • working knowledge of linux systems
  • awareness of data analysis and clustering techniques beneficial
  • an interest in chemistry and biological processes at a molecular level (smile)

Web-Based Genetic Marker Design

We have developed BioPython code to allow bulk design of single-nucleotide polymorphism (SNP) assays from next-generation sequencing data and used this to develop a large set of genetic markers for onion. We propose to enhance this code for bulk design of high-resolution melting (HRM) assays and make it web-accesible by deploying it on an instance of the Galaxy bioinformatics framework.


  • John McCallum, Plant and Food Research
  • Clare Churcher, Applied Computing Department, Lincoln
  • Walt Abell, Applied Computing Department, Lincoln
  • Vladimir Mencl, Canterbury
  • Christchurch, Lincoln


We have identified a lack of tools suitable for enabling web-based bulk design of genetic markers from the new generation of sequencing technologies. These tools will make it feasible for low-technology labs to exploit the much larger volumes of sequence data that can be economically generated for any organism. Exposing these tools on a public site will enable us to solicit input from other NZ researchers and identify potential for enhancement prior to full release and publication.

skills required

  • Basic knowledge of molecular biology useful, or a willingness to learn
  • Intermediate knowledge of Python required

Research Desktop

Researchers are increasingly needing computational analyses, working in collaborations spread across distances, and working with shared tools & applications. Each system they use often has its own access methods (client installs, web applications, specific protocols), login names and passwords, and quirks of use. All of this makes working across applications difficult and tiresome!

Within BeSTGRID we are developing high performance computing (HPC) based computational services, large multi-Terabyte data storage services, and tools and applications to access and manage these services. We are potentially adding to the difficulty for researchers who need to focus on their research!


  • Alan McCulloch, AgResearch
    • Alan has been involved in many successful eResearch projects related to Genomics, most recently the development of the open source platform Drupal based Biocommons:
  • Nick Jones, Director BeSTGRID, University of Auckland
  • Auckland
  • Otago, Dunedin


This project aims to address this, through the following approach:

  • work with several researchers to gather requirements for their research group and collaboration
  • develop a cross platform taskbar based monitor that integrates the HPC, data storage, network, and other systems into a Research Desktop. This will monitor computational jobs, provide an integration mechanism for complex security requirements, and integrate tools and applications
  • link the taskbar monitor to specific computational tools and services, data storage services, and to identity (user name and password) management services
  • build some exemplar demonstrations of how this will work for researchers
  • start an open source project, building on top of the Grisu platform (developed between BeSTGRID and their Australian counterpart, that will be open to other research developers across New Zealand and elsewhere

skills required

  • java, web services, rich client applications development
  • familiarity with xml an advantage
  • familiarity with java SWT or other portable native platform user interface frameworks
  • No labels