• Release of the dsm-parameter-analysis GitHub repository

    From Andras Dobo@21:1/5 to All on Wed Sep 18 14:46:41 2019
    [Apologies for multiple postings]

    **** Release of the dsm-parameter-analysis GitHub repository ****


    Dear Colleagues,


    We are pleased to announce the release of the GitHub repository connected to the PhD dissertation of András Dobó:
    Dobó, A.: A comprehensive analysis of the parameters in the creation and comparison of feature vectors in distributional semantic models for multiple languages. University of Szeged (2019)
    http://doktori.bibl.u-szeged.hu/10120/1/AndrasDoboThesis2019.pdf

    The GitHub repository, including the source code, as well as the used libraries, resources and test datasets, is available at: https://github.com/doboandras/dsm-parameter-analysis


    The project implements a distributional semantic model (DMS), with 10 freely adjustable parameters. For some of the parameters more than a thousand possible settings are implemented, resulting in trillions of possible configurations. This freely
    configurable DSM can have any corpus or word vectors as input, and can be tested on multiple standard test datasets. It currently works for the following languages: English, Spanish and Hungarian.


    Abstract of the dissertation:
    Measuring the semantic similarity and relatedness of words is important for many natural language processing tasks. Although distributional semantic models designed for this task have many different parameters, such as vector similarity measures,
    weighting schemes and dimensionality reduction techniques, there is no truly comprehensive study simultaneously evaluating these parameters while also analysing the differences in the findings for multiple languages.

    We would like to address this gap with our systematic study by searching for the best configuration in the creation and comparison of feature vectors in distributional semantic models for English, Spanish and Hungarian separately, and then comparing our
    findings across these languages.

    During our extensive analysis we test a large number of possible settings for all parameters, with more than a thousand novel variants in case of some of them. As a result of this we were able to find such configurations that significantly outperform
    conventional configurations and achieve state-of-the-art results.


    For more information please see the below publications:

    Dobó, A.: A comprehensive analysis of the parameters in the creation and comparison of feature vectors in distributional semantic models for multiple languages. University of Szeged (2019)
    http://doktori.bibl.u-szeged.hu/10120/1/AndrasDoboThesis2019.pdf

    Dobó A., Csirik J.: Comparison of the Best Parameter Settings in the Creation and Comparison of Feature Vectors in Distributional Semantic Models Across Multiple Languages. In: MacIntyre J., Maglogiannis I., Iliadis L., Pimenidis E. (eds) Artificial
    Intelligence Applications and Innovations. AIAI 2019. IFIP Advances in Information and Communication Technology, vol 559. 487-499. Springer, Cham. (2019)
    http://www.inf.u-szeged.hu/~dobo/Publications/Comparison%20of%20the%20best%20parameter%20settings%20of%20DSMs%20across%20languages.pdf

    Dobó A., Csirik J.: A Comprehensive Study of the Parameters in the Creation and Comparison of Feature Vectors in Distributional Semantic Models. Journal of Quantitative Linguistics (2019)
    https://doi.org/10.1080/09296174.2019.1570897


    Best regards,
    Andras Dobo
    Institute of Informatics
    University of Szeged
    http://www.inf.u-szeged.hu/~dobo/

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)