Projects

On this page you can find all the projects available for Friday!

Problem Statement

This project is about using Python to visualise and analyse network data in Python. In this case we will focus on loading using Python to analyse social network data (eg. Twitter, Facebook, etc.). The idea is to build networks that show how people in multidisciplinary fields are connected on social networks (i.e. who follows whom). This will allow us to identify the key individuals who connect independent groups that make up a multidisciplinary subject matter (eg. biologist, geologists, marine experts, policy makers, etc. who are involved with climate change).

Ambassador network

Project goals

  • Load data from static files/an API
  • Build and visualise a network
  • Perform simple analysis on this network

Datasets

Publicly available social network data from Twitter/Facebook/etc.

Problem Statement

This project is about analysing the metadata of academic publications.

Project goals

  • Get the publications related to that journal or topic of interest
  • Perform some sort of metadata analysis on the publications. For instance, we could:
    • analyse keyword occurrence within a certain category/topic of interest
    • analyse keyword co¬≠occurrence within a category
    • construct a graph to visualise the similarity between publications of a givencategory

Datasets

This data is available via APIs from certain publishers (e.g. Elsevier).

Problem Statement

This project will focus on learning to use perform convex optimisation in Python. We will mainly be using the cvxopt library to perform convex optimisation given an objective function and a set of constraints. We could then use this to perform some experiments on toy datasets.

Project goals

  • Learn to use cvxopt (or other) library to perform convex optimisation in Python

Problem Statement

This project aims to analyse influenza outbreak profiles. Specifically we will learn to use different data analysis and clustering methods to cluster countries based on outbreak profiles. Given these groups, one of the challenges is to then visualise this high-dimensional data in Python.

Project goals

  • Load data from static files
  • Learn to perform clustering in Python
  • Visualise high¬≠-dimensional data

Dataset

WHO database of influenza outbreak worldwide for the past 20 years

Problem Statement

Using a public database, build a classifier that can classify junk emails

Project goals

  • Import data from SpamAssassin public corpus
  • Select features
  • Build a binary classifier to decide if an email is spam or not
  • Uses numpy, regular expressions and nltk

Datasets

SpamAssassin public corpus

Problem Statement

In this project, we will learn how to load data from text files into Python, create visualisations of the data and finally combine these images to create a video. In problems based on reaction engineering we can get text files in which time and amount of molecules can be stored. If we have 3 or 4 species in our system and denote them with a distinct colour, it would be very interesting to create videos showing the increment and decrement of those species over time. The most common tool to describe production/deduction of a population is by plotting time vs. amount of molecules. However, it would also be very fascinating to describe that process through videos.

Project goals

  • Load data from text files into Python
  • Find a way to visualise the data
  • Combine the visualisations from multiple text files to create a video

Problem Statement

I propose to create a proteomic interaction map. Using protein interaction data available online, the project would be:

Project goals

  • to find a way to read this dataset with Python; and
  • to write a tool that can be used to visualise the interaction between individual proteins.
PPI

Datasets

Available online.