Neural age screening on question answering communities
Timilsina, Mohan ; Figueroa, Alejandro
Timilsina, Mohan
Figueroa, Alejandro
Loading...
Identifiers
https://hdl.handle.net/10379/18229
https://doi.org/10.13025/21283
https://doi.org/10.13025/21283
Repository DOI
Publication Date
2023-04-05
Type
journal article
Downloads
Citation
Timilsina, Mohan, & Figueroa, Alejandro. (2023). Neural age screening on question answering communities. Engineering Applications of Artificial Intelligence, 123, 106219. doi: https://doi.org/10.1016/j.engappai.2023.106219
Abstract
For online social networks, demographic analysis is absolutely essential for improving their services in many ways. It is instrumental in understanding their different audiences, members and competitors. As well as that, it is pivotal in designing effective personalization and contextualization strategies, especially for displaying and creating better content. There is, for this reason, a great bulk of research into how demographic variables are characterized and how they impact online platforms such as Facebook and Twitter. But surprisingly, only a handful of works delve into their characterization and effect on community Question-Answering (cQA) websites. In this particular context, the subject of age demographics remains largely unexplored.
This paper takes the lead on interpreting automatic age recognition on CQAs (a.k.a. age screening) as a regression task. To this effect, it compares state-of-the-art graph-based neural network regression and embedding models on a massive activity-graph encompassing ca. 16 and 837 million nodes (members) and edges, respectively. For this study, a large-scale subset of ca. 657,000 community fellows was automatically associated with their age via aligning their profile texts with a limited number of linguistic patterns.
In short, our results show that Node2vec significantly outperforms other embeddings regardless of the regression model used for casting predictions. When this embedding is combined with Artificial Neural Network Regressions, we obtained our best configuration scoring a Root Mean Square Error (RMSE) of 8.39. An interesting qualitative feature of this embedding space is that age-based centroid vectors tend to form a trail ordered by age. Lastly, our outcomes also signal that activity graph based models can rival its counterparts based on image and textual inputs, paving the way for constructing effective multi-modal approaches.
Funder
Publisher
Elsevier
Publisher DOI
Rights
Attribution 4.0 International