The internet today offers an overwhelming amount of still growing resources such as websites, images, texts, and videos. The resulting Big Data Problem does not only consist of the handling of this immense volume of data. Moreover, data needs to be processed, cleaned, and presented in a user-friendly, graphical way.

The VANDA project addresses the challenges summarized in the four V’s: Volume (huge data amounts in the range of tera and peta bytes), Velocity (the speed in which data is created, processed, and analysed), Variety (the different heterogeneous data types, sources, and formats), Veracity (authenticity and validity of data).

Big Data driven interfaces in the VANDA project combine suitable backend and frontend technologies as well as automatic and semi-automatic approaches in order to analyse data in various business contexts. An important aspect is human intervention in developing and training machine learning algorithms (human in the loop).


Glyphs are small independent visual objects that map each data attribute to graphical attribute, such as size, shape, color and orientation. Its major strength is that patterns involving more than three dimensions can be more readily perceived and subsets of dimensions can form composite visual features that are easy to recognize.

We propose this visualization technique with different levels of detail to the problem of analyzing the features involved in clustering algorithms. Our concept relies on a mapping of each data item to a color-coded pixel in a scatterplot that is computed using Multidimensional Scaling. The resulting clusters are first explored by a data analyst, who then selects subsets or small clusters for detailed inspection. Once the amount of data items is reduced, another level of detail in the form of glyphs is presented for an in-depth analysis.


