Comprehensive Analysis of Beauty Community Discourse on TikTok Through GPT Embeddings and BERTopic Modeling

Isawasan, Pradeep and Asmawi, Muhammad Akmal Hakim Ahmad and Ong, Song Quan and Ooi, Boonyaik Yaik and Savita, K. S. (2025) Comprehensive Analysis of Beauty Community Discourse on TikTok Through GPT Embeddings and BERTopic Modeling. In: UNSPECIFIED.

Full text not available from this repository.
Official URL: https://www.scopus.com/inward/record.uri?eid=2-s2....

Abstract

Short-form comments on TikTok's Beauty & Personal Care videos are rich yet noisy signals of consumer sentiment. This study deploys a transformer-based pipeline that couples GPT-derived sentence embeddings with BERTopic to surface salient discussion themes. We scraped 3,912 videos from the 20 highest-revenue beauty influencers and retrieved 34,597 engagement-ranked comments posted between January and August 2024. After a GPT -powered normalisation step that expands slang, translates Malay abbreviations, and converts emoji to text, comments were embedded with OpenAI's text-embedding-3-small model, reduced via UMAP, clustered using HDBSCAN, and distilled into topics with BERTopic. Topic quality was evaluated with four coherence metrics (c<inf>v</inf>, u<inf>mass</inf>, c<inf>uci</inf>, c<inf>npmi</inf>) to ensure semantic consistency. The model revealed two coherent, non-overlapping themes: product-usage experiences (e.g., frequency of use, perceived results) and product-feature commentary (e.g., scent, packaging, variants). Ac<inf>v</inf> score of 0.620 indicates strong interpretability despite the brevity and informality of TikTok discourse. These findings show that embedding-based topic modeling can unearth actionable insights: influencers should foreground authentic usage outcomes, while brands can boost engagement by highlighting sensory cues in visual storytelling. The study demonstrates the viability of LLM -enhanced analytics for short social media text and provides a replicable framework for future TikTok commerce research, noting limitations and opportunities for multi-category, cross-regional, and multimodal extensions. © 2025 IEEE.

Item Type: Conference or Workshop Item (UNSPECIFIED)
Additional Information: Cited by: 0
Uncontrolled Keywords: Embeddings; Beauty care; Beauty discourse; Bertopic; Comprehensive analysis; GPT; Noisy signals; Personal care; Tiktok; Topic Modeling; Semantics
Depositing User: Mr Ahmad Suhairi UTP
Date Deposited: 12 Jan 2026 12:18
Last Modified: 12 Jan 2026 12:18
URI: https://khub.utp.edu.my/scholars/id/eprint/20415

Actions (login required)

View Item
View Item