With the increasing prevalence of the Web and related internet technologies, societies around the world are witnessing a shift towards empowering citizens with more information and resources. However, this has started to transform into a two-way relationship; as more data becomes readily available (such as digital government, news, and commerce) and the number of user-friendly Web platforms increase citizens have started to provide their own data, often in the form of user-generated content on social networks and microblogging platforms. Moreover, it is the combination of these data sources that have to potential to support the growing needs of individuals, and society.

Recognised as a global issue, supporting the wellbeing of citizens has received significant attention over the last decade. Supporting citizen wellbeing, which could include areas of interest from supporting the elderly, to monitoring global health problems, have traditionally be studied and tackled with costly, often time-consuming offline methods. Whilst surveys, global censuses, field studies, and longitudinal studies all provide rich insights, with the capabilities of new technologies and the growing level of digital citizen engagement, we are now in a position to take advantage of them to support this global challenge.

In this project we want to explore the potential to harness different data sources in order to support the wellbeing of the global citizen. With access to a selection of demographic, Web, and Sensor datasets, the aim of the project is to :

  1. Formulate a set of questions which this data could be used to answer, including the wider social, economic, political implications (beneficial and issues) which may arise from using these data
  2. design and implement a proof-of-concept Web Observatory application which takes a selection (two or more) of datasets and provides insights which are not possible on their own.

2 x Presentations

  • Initial stating the problem and what they intend to do
  • Final, including their findings, wider implications, and then a walk through of their application and visualisation

Design Guidelines Documents

  • Short documentation of what was built, the thinking, and the methodology behind everything.

Working proof-of-concept

  • Code on Github or similar platforms
  • Working via the Web Observatory API
  • Uploaded to the Web Observatory API

To kick things off, you have been provided with a small prototype dashboard which takes a number of the datasets listed in the table below and visualises them in a working dashboard. a Demo of this can be found here:

Shenzhen Citizen-Dashboard

Demo Backend Code Frontend Code

NOTE: please use the openstreetmap branch, as Google maps will not work within the chinese firewall
  • Programming
    • Node, Javascript, Databases
  • Data/Analytics:
    • Transforming data from various formats into a consistent schema
    • Basic statistical analysis and understanding. Potential extension for regression analysis on forecasting the pollution of data
  • Theory:
    • Understanding of the PM25 readings
      • Wider societal effects that emerge from air pollution

Below are a set of simple to hard tasks which could be attempted with the datasets

Tasks that could be achieved within the week:

  • Simple:
    • Develop the dashboard to include the mapping and visualisation of Shenzhen air pollution data. this could be a separate visualisation, or an overlay on the current social data.
    • Map social media data which is relevant to air pollution discussion. Can you make this interactive?
    • Use NLP on the air pollution dataset in order to extract PM25 readings which can be geographically visualized
  • Medium:
    • Implement a time slider to allow interaction with the historic social media and air pollution.
    • Extend the dashboard to compare datasets (e.g., air pollution) between different cities.
      • Adelaide, Shenzhen, London, Irvine
    • In a similar approach to the historic view of the data, provide a real-time view of air pollution and social discussion within one or more citiies.
  • Hard:
    • Prediction of how the air pollution will change based on the historical data
      • Can the social media data be as a feature within this model?

Twitter Shenzhen Data

An ongoing collection of Twitter data containing tweets related to the area of Shenzhen. The datasets are specific to the geographic area of Shenzhen.

Total Records: > 50,000
From: 01/11/2015
Till: (ongoing)
Query []: “Shenzhen”

Twitter Air Pollution Data

An ongoing collection of Twitter data containing tweets related to air quality

Total Records: > 100,000
From: 26/10/2015
Till: (ongoing)
Query []: #airpollution OR #pm25 OR #pm2.5 OR "Air Quality Forecast" OR #airquality

Air pollution Data. (PM2.5)

From various countries around the world (including China, Singapore, USA, etc). To access end point. It is an hourly stream of data, requirs a GET command to access


Query: “http://www.pm25.in/api/querys/aqi_details.json?city=shenzhen&token=5j1znBVAsnSf5xQyNQyq”

Air pollution

US air quality data. Historical datasets.