Keskeiset käsitteet
HyperGraphDis is a novel hypergraph-based method that effectively captures the intricate social structures, user relationships, and semantic/topical nuances to accurately and efficiently detect disinformation on social media platforms like Twitter.
Tiivistelmä
The HyperGraphDis method addresses the detection of disinformation in social networks by exploiting network structure as well as post content and user sentiment. It constructs a hypergraph from the user-user social network and the individual tweet-retweet cascades, which serve as the nodes of the hypergraph. The hypergraph structure is then used to perform binary classification of retweet cascades into fake and not-fake classes.
The key highlights of the HyperGraphDis method are:
-
Hypergraph Construction:
- The user-user social network is constructed and partitioned using the METIS graph partitioning algorithm.
- Each partition of users is replaced with the retweet cascades that the users have participated in, forming the hyperedges.
- This transformation reshapes the problem from a complex, multi-variable issue into a more straightforward node classification problem within the hypergraph.
-
Cascade Feature Vectors:
- Each cascade node is enriched with a set of features, including user-related attributes (e.g., DeepWalk embeddings, account details) and text-related features (e.g., sentiment analysis, topic detection).
- This allows the classification of retweet cascades as fake or not-fake.
-
Cascade Classification:
- The cascade classification problem is formulated as a node classification task within the hypergraph.
- A Hypergraph Convolution (HypergraphConv) layer is used, followed by fully connected layers, to predict the class of each cascade.
The evaluation of HyperGraphDis across four datasets (2016 US Presidential Election, COVID-19 pandemic, and two FakeHealth datasets) demonstrates its superior performance compared to existing state-of-the-art methods, such as Meta-graph, HGFND, and Cluster-GCN. HyperGraphDis achieves impressive F1 scores, ranging from 72.23% to 89.48% across the datasets, while also exhibiting significant improvements in computational efficiency for both model training and inference.
Tilastot
The COVID-19 pandemic dataset (MM-COVID) contains over 53,000 tweets, 10,000 replies, and 85,000 retweets by 93,000 users.
The 2016 US Presidential Election dataset contains 46.4K retweet cascades involving 19.6 million tweets, with 6,525 URLs labeled as "fake" and "non-fake".
The FakeHealth dataset includes the Health Release (60,006 tweets, 1,418 replies, 15,343 retweets) and Health Story (487,195 tweets, 23,632 replies, 105,712 retweets) datasets.
Lainaukset
"HyperGraphDis displays exceptional performance on a COVID-19-related dataset, achieving an impressive F1 score (weighted) of approximately 89.5%."
"This result represents a notable improvement of around 4% compared to the other state-of-the-art methods."
"Significant enhancements in computation time are observed for both model training and inference."