Latest In


Scientists Tested Natural Language Models To Predict Human Language Judgments

Natural language models may be used to test computational assumptions about how people understand language.

Author:Suleman Shah
Reviewer:Han Ju
Dec 29, 20220 Shares122 Views
Natural language modelsmay test computational assumptions about how people understand language. A group of scientists from Columbia University in New York, coordinated by Tal Golan and Matthew Siegelman, assessed the model human consistency of several language models using a unique experimental approach: problematic sentence pairs. Two language models differ regarding which sentence is more likely to appear in the real test for each contentious sentence pair. Taking into account nine language models (including n-gram, recurrent neural networks, and transformer models),
The researchers generated hundreds of such contentious sentence pairings by picking phrases from a corpus or synthetically optimizing sentence pairs to be highly controversial and controversial. Human volunteers subsequently made evaluations indicating which of the two termswas more plausible for each couple. Controversial phrase pairings successfully highlighted model flaws and found models most closely matched with human assessments. GPT-2 was the most human-consistent model studied. However, testing showed severe deficiencies in its alignment with human perception.

Natural Language Models

These researchers put nine models from three different classes to the test: n-gram models, recurrent neural networks, and transformers. The Natural Language Toolkit's open source code was used to train the n-gram models. The recurrent neural networks were trained using PyTorch designs and optimization processes. HuggingFace, an open-source repository, was used to build the transformers. They gathered opinions from 100 native English speakers who took an online exam. Participants in each experimental session were asked to determine which statements they would be "more likely to encounter in the world, as either speech or written text" and rate their confidence in their response on a 3 point scale.
Despite the consistency in model ranking between our findings and earlier work, GPT-2's severe failure in predicting human reactions to natural vs. synthetic contentious pairings reveals that GPT-2 does not adequately imitate the computations used in human processing of even short words. This result is somewhat predictable because GPT-2 is an off-the-shelf machine learning model that was not created with human psycholinguistic and physiological features in mind. Even though we found a lot of human inconsistency, a recent GPT-2 study found that almost all of the variations in how people responded to actual words could be explained.

Natural And Synthetic Sentences Pairs

The researchers arranged 90 sentence pairings into ten sets of nine sentences each and gave each set to a different group of ten individuals. They calculated the percentage of trials in which the model and the person agreed on which phrase was more likely to assess model-human alignment. All nine language models outperformed chance by predicting human choices for randomly generated natural phrase pairings (50% accuracy). They gave each group of ten individuals a different set of phrase pairs. We statistically analyzed between-model differences while accounting for both people and sentence pairs as random variables using a simple Wilcoxon signed-rank test across the ten participant groups.
A process for synthesizing contentious sentence pairs was created, in which naturally existing sentences serve as initializations for synthetic phrases and reference points that drive sentence synthesis. They started with a naturally occurring statement. They then keep replacing words in the sentence with comments from a predefined vocabulary to make the synthetic sentence less likely to be correct by one language model while ensuring that the synthetic sentence is at least as possible to be accurate by another model.
Human participants rated ten contentious synthetic-sentence pairings for each model pair. They assessed how well each model predicted human sentence choices in all of the controversial synthetic-sentence combinations in which it was one of two models tested.


The tests proved that:
  • There are many ways natural language processing models can generate controversial sentence pairs. They can pick pairs of sentences from a corpus or change natural sentences to make controversial predictions.
  • The contentious sentence pairs make it easy to quickly compare models that seem the same in terms of human consistency.
  • All of the existing natural language processing model classes mistakenly give a high probability to the following non-natural sentences: A simple statement may be modified such that its likelihood according to a specific model does not diminish. Still, as per human judgments, the phrase becomes much less likely.
  • This method of comparing and testing models may give new ideas about which types of models work best with human language perception, and which types of models need to be made in the future.
Jump to
Suleman Shah

Suleman Shah

Suleman Shah is a researcher and freelance writer. As a researcher, he has worked with MNS University of Agriculture, Multan (Pakistan) and Texas A & M University (USA). He regularly writes science articles and blogs for science news website and open access publishers OA Publishing London and Scientific Times. He loves to keep himself updated on scientific developments and convert these developments into everyday language to update the readers about the developments in the scientific era. His primary research focus is Plant sciences, and he contributed to this field by publishing his research in scientific journals and presenting his work at many Conferences. Shah graduated from the University of Agriculture Faisalabad (Pakistan) and started his professional carrier with Jaffer Agro Services and later with the Agriculture Department of the Government of Pakistan. His research interest compelled and attracted him to proceed with his carrier in Plant sciences research. So, he started his Ph.D. in Soil Science at MNS University of Agriculture Multan (Pakistan). Later, he started working as a visiting scholar with Texas A&M University (USA). Shah’s experience with big Open Excess publishers like Springers, Frontiers, MDPI, etc., testified to his belief in Open Access as a barrier-removing mechanism between researchers and the readers of their research. Shah believes that Open Access is revolutionizing the publication process and benefitting research in all fields.
Han Ju

Han Ju

Hello! I'm Han Ju, the heart behind World Wide Journals. My life is a unique tapestry woven from the threads of news, spirituality, and science, enriched by melodies from my guitar. Raised amidst tales of the ancient and the arcane, I developed a keen eye for the stories that truly matter. Through my work, I seek to bridge the seen with the unseen, marrying the rigor of science with the depth of spirituality. Each article at World Wide Journals is a piece of this ongoing quest, blending analysis with personal reflection. Whether exploring quantum frontiers or strumming chords under the stars, my aim is to inspire and provoke thought, inviting you into a world where every discovery is a note in the grand symphony of existence. Welcome aboard this journey of insight and exploration, where curiosity leads and music guides.
Latest Articles
Popular Articles