Welcome to the small world of words project!

Last modified: June 4, 2014 Voor Nederlandstalige informatie, klik hier

The small world of words project is a large-scale scientific study that aims to build a map of the human lexicon in the major languages of the world and make this information widely available. In contrast to a thesaurus or dictionary, this lexicon provides insight into what words and what part of their meaning are central in the human mind. This way it enables psychologists, linguists, neuroscientists and others to test new theories about how we represent and process language.

Current Project

The project started at the Experimental Psychology department of the University in Leuven (Belgium) in 2003 and already resulted in the largest available network of word associations in Dutch (over 5M responses) and English (over 1M responses). With the help of language researchers all over the world similar projects are now set up in many different languages. In the near future this will allow us to compare these lexicons across the globe and further study how language and meaning are represented in the mind.

Please support this effort by participating in one of our studies in one of the following languages:

English
smallworldofwords.com/en
Deutsch
smallworldofwords.com/de
Nederlands
smallworldofwords.com/nl
Français
smallworldofwords.com/fr
Español
smallworldofwords.com/es
Rio Platense
smallworldofwords.com/platense
日本語
smallworldofwords.com/jp
Tiếng Việt
smallworldofwords.com/viet
广州话
smallworldofwords.com/cantonese

Exploring the mental lexicon

You can browse the lexicon of the ongoing word association projects in English and Dutch by clicking the visualization menu. Apart from the recently collected small world of word data [3], you will also find previous word association lexicons for the British Edinburgh Association Thesaurus [1], the American University of South Florida Association norms [2]. The data on this website represent a snapshot of work in progress and this page is frequently updated.





References...
  1. Kiss, G.R., Armstrong, C., Milroy, R., and Piper, J. (1973) An associative thesaurus of English and its computer analysis. In Aitken, A.J., Bailey, R.W. and Hamilton-Smith, N. (Eds.), The Computer and Literary Studies. Edinburgh: University Press.
  2. Nelson, D. L., McEvoy, C. L., & Schreiber, T. A. (2004). The University of South Florida free association, rhyme, and word fragment norms. Behavior Research Methods, Instruments, & Computers, 36(3), 402-407.
  3. De Deyne, S., & Navarro, D., & Storms, G. (2012). Better explanations of lexical and semantic cognition using networks derived from continued rather than single word associations, Behavior Research Methods., 45, 480-498.

Visualizations

1- and 2-hop Networks. The associative graph showing the links between some 12,000 cue words in Dutch and 7,000 cue words in English. A 1-hop network or ego-network shows the direct links, while a 2-hop network also shows nodes two hops away from a searched word. To keep these networks simple some associations might be omitted. The colors indicate different communities detected using the Louvain method.

Thesaurus Networks. Thesaurus networks show the most similar nodes in a network. Similarity is defined as the degree two which two words have associations in common. This distributional overlap measure is based on cosine-values after weighting the association frequencies with a t-score transform.

Bubble Graphs. In the English and Dutch small world of words studies, participants gave three responses for each cue word. An interesting property of this procedure is that it not only generates weaker associations but also slightly different associations. To investigate the way later associates differ from the first, a bubble graph was set up using red, green and blue color mixtures (RGB). If an associate is predominantly found in the first position, it will have a red hue. If it's more typically given as a second response it will have a green hue, and a blue hue if it occurs mostly as a third response.

Technical

Network visualizations are using the D3 framework from Mike Bostich. The optimal label placement algorithm is based on a suggestion from Moritz Stefaner. The visualizations are abstractions intended to convey just enough information. In some cases, weak links are therefor omitted. The associations between two words are often not symmetric. To keep the graphs simple, this information is also not shown.

Nearly all visualizations on this site require a modern browser with svg support (basically any browser except for older versions of Internet Explorer). While this website is optimized to work on tablets as well, some of the network visualizations might take some time and can be demanding on such devices.

Thank you

A special thanks all the volunteers who participated over the years. Also a thank you to Doug Nelson and Cathy McEvoy, University of South Florida, Chie Nakatani (University of Leuven) for Japanese translations, Steve Majerus (University of Liège) for the French translation, Harald Baayen and Kaido Loo (University of Tuebingen) for the German translations, Álvaro Cabana, Camila Zugarramurdi, Juan Valle Lisboa (Universidad de la República, Uruguay) for the Rio Platense translations, Eric Chen, Venessa Poon and Christy Hui ((University of Hong Kong) for the Cantonese translations and Marc Brysbaert (University of Ghent) for co-funding the word association workshop in 2012. In addition, I am grateful for the support of my colleagues at the Concept and Categorization lab and my collaborators outside the lab who have also been involved in this project including Dan Navarro, Amy Perfors, Lili Sahakyan, Jeff Steward, Brita Elvevaag, Rose Bruffaerts, Marc Steyvers, Thomas Hills, Haim Dubossarsky, Dirk Geeraerts, Dirk Speelman, Emmanuel Keuleers, Michael Boiger, Marijn Van Vliet and Dirk Wulff.

Funding

This project was funded by the Flemish Research Council and is currently funded through the University of Leuven Research Council, Belgium.

Legal

This work is licensed under a Creative Commons
Attribution-NonCommercial-NoDerivs 3.0 Unported License.

Questions or Suggestions

News

23 March 2014


Rio Plantense Spanish joins Small World of Words project

Rio Platense Spanish is a form of Spanish spoken in Uruguay and Argentine. Researchers Álvaro Cabana, Camila Zugarramurdi and Juan Valle Lisboa at the Faculty of Science and Psychology in Uruguay are putting their shoulders under the first word association study on Rio Platense Spanish. In just a couple of months already over 3,000 persons have contributed to this study.

More details about this project are available in an article title Jugar con las palabras y la mente.

23 December 2013


Launch of Vietnamese Small World of Words

Xin chào bạn! A new Vietnamese version of the small world of word study in collaboration with Hien Pham is now available online at http://smallworldofwords.com/viet. If you are a native or fluent Vietnamese speaker, you can help us participate and pass it on to your family members and your friends.

Similar to the studies in French, German and Japanese, this set starts of with 1,000 cues that are frequently used. Please let us know what your experiences are. Happy playing! Chơi trò chơi từ vựng này sẽ giúp phản xạ ngôn ngữ nhạy bén hơn. Hãy giúp con bạn thúc đẩy khả năng ngôn ngữ với trò chơi này.

For more information about the Vietnamese project please contact Hien Pham, University of Alberta [email protected]

26 October 2013


New Japanese Small World of Words

ようこそ! A new Japanese version is now available online at http:/www.smallworldofwords.com/jp. If you are a native Japanese speaker, you can help us pilot this study, or better yet, participate and pass it on to your friends. Similar to the studies in French and German, this set starts of with 1,000 cues that are frequently used. There are some peculiarities handling non-western characters, so at this stage, please let us know what your experiences are...

Special thanks to Chie Nakatani for help with the translations and many useful suggestions!

25 September 2013


Detailed images of the mental lexicon

A new visualization shows the entire mental lexicon in English (or at least the 7,000 words collected so far). Node size indicates how important a node is in terms of incoming links (in-strenght). The transparancy of the edges is indicated by edge-betweenness.

Notice the emergence of a handfull of hubs in the lexicon. The hubs are visible as large red nodes and indicate which nodes are central in the network: work, love, food (...almost matching the Maslow's hierarchy of needs). A more detailed image is available here.

24 September 2013


Citizen Scientists Decode Meaning, Memory and Laughter

The Small World of Words Study is featured on the Scientific American Blog in a guest blog by Joshua Hartshorne at the Computational Cognitive science group at MIT. Be sure to check Verbcorner as well for a related approach to learn the meaning of a large set of words through 'Citizen Science'. A short piece in the printed version of Scientific American is expected to appear beginning next year.







Articles

The following articles are dedicated to the word association project or make extensive use of the data. For a complete list of publications, check my homepage at the Experimental Psychology group in Leuven.

Presentations

The workshop on wordassociations presentations are available from the blog.

Press

The following resources are in Dutch only

  • Web van Woorden, by Berthold van Maris (June, 2009) in NRC and De Standaard. This article gives nice introduction into the world of association networks, including its small-world structure.
  • Het verschil tussen naakt en bloot: het woordenboek in ons hoofd" by Berthold Maris (December 2009) in the Dutch periodical Onze Taal. It focusses on specific semantic relationships found in the network and differences between groups of people. Finally, as a response to this article, there's an interesting blog post further discusses the differences between the associations of men and women.
  • Het woordenboek van ons brein by An Swerts (June, 2013) in Bodytalk. This article goes into the findings of weak similarity and reflects on the use of word associations in Alzheimer patients.

Datasets and Code

I am currently in the middle of updating the datafiles with new norms (7,000 cues for English, 15,000 cues for Dutch). These data will be made available by the end of November. Please contact me if you have further questions.

Dutch Dataset

The Dutch word association project is a continuing effort and the data reflect a snapshot of cues and responses collected until November 2010 described in De Deyne, Navarro, & Storms (2012).
  • a 22Mb zipped CSV file with word association and participant information for 100 primary, secondary and tertiary responses to 12,571 cues
  • a 10Mb Matlab .mat file with the same data and some scripts for reading and basic processing of the CSV files
Note that the Dutch language includes accented characters (like ë, è , etc) and need to be read as UTF-8 files. If you experience difficulties importing the data, discover mistakes or have other suggestions, please let me know. I will try to update these data every time a new batch of word associations is completed.

Useful Matlab toolboxes

The matlab script depends on the optimized network toolbox developed by David Gleich which can be downloaded here.

Citing this work

A lot of effort has been put in this project, thank you for sharing the studies and citing our work.

De Deyne, S., & Navarro, D., & Storms, G. (2012). Better explanations of lexical and semantic cognition using networks derived from continued rather than single word associations, Behavior Research Methods., 45, 480-498.

Terms of Use

The data on this page is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License: it cannot be redistributed or used for commercial purposes.





 



Enter a word and click on the values in the table to visualize their distribution

This word is not part of the database yet...

Network MeasureG1 G12G123 USFEAT
In-strength -- -- -- -- --
Out-degree -- -- -- -- --
Set-size -- -- -- -- --
% Unknown -- -- -- -- --
Clustering Coefficient -- -- -- -- --
Coverage -- -- -- -- --
Network MeasureG1 G12G123
In-strenght -- -- --
Out-degree -- -- --
Set-size -- -- --
% Unknown -- -- --
Clustering Coefficient -- -- --
Coverage -- -- --

Participant Statistics



Visitor Statistics

100 most recent

total responses
visitors today
visitors alltime

Legend

female
male