Experimental Comparison of Pre-Trained Word Embedding Vectors of Word2Vec, Glove, FastText for Word Level Semantic Text Similarity Measurement in Turkish

Cagatay Neftali Tulu

doi:10.12913/22998624/152453

Volume 16, Issue 4, 2022

Experimental Comparison of Pre-Trained Word Embedding Vectors of Word2Vec, Glove, FastText for Word Level Semantic Text Similarity Measurement in Turkish

Cagatay Neftali Tulu ¹

More details

Hide details

Software Engineering Department, Adana Alparslan Turkes Science and Technology University, Balcalı, Çatalan Cd., 01250 Adana, Turkey

Corresponding author

Cagatay Neftali Tulu

Software Engineering Department, Adana Alparslan Turkes Science and Technology University, Balcalı, Çatalan Cd., 01250 Adana, Turkey

Adv. Sci. Technol. Res. J. 2022; 16(4):147-156

DOI: https://doi.org/10.12913/22998624/152453

Article (PDF, 1.16 MB)

KEYWORDS

semantic word similarity

word embeddings

Turkish NLP

TOPICS

Computer Engineering

ABSTRACT

This study aims to evaluate experimentally the word vectors produced by three widely used embedding methods for the word-level semantic text similarity in Turkish. Three benchmark datasets SimTurk, AnlamVer, and RG65_Turkce are used in this study to evaluate the word embedding vectors produced by three different methods namely Word2Vec, Glove, and FastText. As a result of the comparative analysis, Turkish word vectors produced with Glove and FastText gained better correlation in the word level semantic similarity. It is also found that The Turkish word coverage of FastText is ahead of the other two methods because the limited number of Out of Vocabulary (OOV) words have been observed in the experiments conducted for FastText. Another observation is that FastText and Glove vectors showed great success in terms of Spearman correlation value in the SimTurk and AnlamVer datasets both of which are purely prepared and evaluated by local Turkish individuals. This is another indicator showing that these aforementioned datasets are better representing the Turkish language in terms of morphology and inflections.

Submit your paper

FAQ

Instructions for Authors

Articles in press

Indexes

We process personal data collected when visiting the website. The function of obtaining information about users and their behavior is carried out by voluntarily entered information in forms and saving cookies in end devices. Data, including cookies, are used to provide services, improve the user experience and to analyze the traffic in accordance with the Privacy policy. Data are also collected and processed by Google Analytics tool (more).

You can change cookies settings in your browser. Restricted use of cookies in the browser configuration may affect some functionalities of the website.

I agree I do not agree