Syntactic Coherence in Word Embedding Spaces

dc.contributor.author Ravindran, Renjith P.
dc.contributor.author Murthy, Kavi Narayana
dc.date.accessioned 2022-03-27T05:58:19Z
dc.date.available 2022-03-27T05:58:19Z
dc.date.issued 2021-06-01
dc.description.abstract Word embeddings have recently become a vital part of many Natural Language Processing (NLP) systems. Word embeddings are a suite of techniques that represent words in a language as vectors in an n-dimensional real space that has been shown to encode a significant amount of syntactic and semantic information. When used in NLP systems, these representations have resulted in improved performance across a wide range of NLP tasks. However, it is not clear how syntactic properties interact with the more widely studied semantic properties of words. Or what the main factors in the modeling formulation are that encourages embedding spaces to pick up more of syntactic behavior as opposed to semantic behavior of words. We investigate several aspects of word embedding spaces and modeling assumptions that maximize syntactic coherence - the degree to which words with similar syntactic properties form distinct neighborhoods in the embedding space. We do so in order to understand which of the existing models maximize syntactic coherence making it a more reliable source for extracting syntactic category (POS) information. Our analysis shows that syntactic coherence of S-CODE is superior to the other more popular and more recent embedding techniques such as Word2vec, fastText, GloVe and LexVec, when measured under compatible parameter settings. Our investigation also gives deeper insights into the geometry of the embedding space with respect to syntactic coherence, and how this is influenced by context size, frequency of words, and dimensionality of the embedding space.
dc.identifier.citation International Journal of Semantic Computing. v.15(2)
dc.identifier.issn 1793351X
dc.identifier.uri 10.1142/S1793351X21500057
dc.identifier.uri https://www.worldscientific.com/doi/abs/10.1142/S1793351X21500057
dc.identifier.uri https://dspace.uohyd.ac.in/handle/1/8969
dc.subject pos-tagging
dc.subject syntactic coherence
dc.subject syntax-semantics interface
dc.subject Word embeddings
dc.title Syntactic Coherence in Word Embedding Spaces
dc.type Journal. Article
dspace.entity.type
Files
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Plain Text
Description: