A Dynamic Constrained Clustering Approach Designed for Partitioning Stream of Tweets
Dans le cadre des séminaires étudiants de l’IID, conférence de Sophie Baillargeon, étudiante au doctorat en statistique à l’Université Laval, sur une approche de regroupement dynamique sous contrainte conçue pour le partitionnement de flux de tweets.
Présentation de la conférence
Classical clustering methods can form clusters of similar units when the whole dataset is available. The data we want to separate into clusters are different: observations arrive continuously and their partitioning needs to be updated periodically. These are streams of tweets, which contain units forced to be clustered together (retweets and replies).
We have developed a flexible approach to partition this type of data. The method involves tuning parameters relating to the periodicity of the processing, the similarity measure and the baseline clustering algorithm. Experiments were conducted to identify the best possible settings according to internal and external cluster validity indices (e.g. silhouette and F1 score from manual annotations). An incremental adaptation of complete linkage hierarchical agglomerative clustering with must-link constraints stood out.
Restons en contact!
Vous souhaitez être informé des nouvelles et activités de l'IID? Abonnez-vous dès maintenant à notre infolettre mensuelle.