Wikipedia:Articles for deletion/List of languages by number of words

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search
The following discussion is an archived debate of the proposed deletion of the article below. Please do not modify it. Subsequent comments should be made on the appropriate discussion page (such as the article's talk page or in a deletion review). No further edits should be made to this page.

The result was Rename/move to List of dictionaries by number of words. (non-admin closure) MorbidEntree - (Talk to me! (っ◕‿◕)っ♥)(please reply using {{ping}}) 05:47, 7 September 2016 (UTC)[reply]

List of languages by number of words (edit | talk | history | protect | delete | links | watch | logs | views) – (View log · Stats)
(Find sources: Google (books · news · scholar · free images · WP refs· FENS · JSTOR · TWL)

No purpose to this page: measuring the "number of words" in a language is exceedingly difficult to assess given the vastly different structures of languages compared. Article already sparked a name controversy on its talk page after just two weeks in existence. Better delete and forget. — JFG talk 02:13, 22 August 2016 (UTC)[reply]

Note: This debate has been included in the list of Language-related deletion discussions. Shawn in Montreal (talk) 02:19, 22 August 2016 (UTC)[reply]
Note: This debate has been included in the list of Lists-related deletion discussions. Shawn in Montreal (talk) 02:19, 22 August 2016 (UTC)[reply]
  • Rename and keep. The list is useful as a list of dictionaries. There is a relevant discussion on the talk page. Uanfala (talk) 07:11, 22 August 2016 (UTC)[reply]
  • Comment What is a "word"? This is a list of dictionaries. We already have lists of dictionaries Lists_of_dictionaries--Savonneux (talk) 08:27, 22 August 2016 (UTC)[reply]
    Each dictionary will have a pretty precise idea of what it counts as a word. Of course, these aren't always directly comparable across dictionaries and much less so across languages, but they do give a reasonable indication of the size and comprehensiveness of the dictionary. This list doesn't overlap with Lists_of_dictionaries, which is a list of lists. Uanfala (talk) 08:57, 22 August 2016 (UTC)[reply]
    I'm aware that it is a list of lists. The lists that the list lists are lists of dictionaries though. The article Headword says These values are cited by the dictionary makers, and may not use exactly the same definition of a headword. In addition, headwords may not accurately reflect a dictionary's size. So we have one page that says dictionary word counts are arbitrary, for lack of a better term, and not comparable across languages and this list which does directly compare them. Good thing we have Category:Articles contradicting other articles--Savonneux (talk) 11:00, 22 August 2016 (UTC)[reply]
    All the lists there are language- or subject-specific and I can't see how they could possibly overlap in scope with the list brought to this AfD. As for the caveats that go with the number of headwords as a measure of dictionary size, they are quite reasonable, but they are from making it anywhere near arbitrary. Much greater caveats go for example with the use of GDP, but it's still a useful indicator of a country's economic development and we won't think of deleting any of the Lists of countries by GDP. Uanfala (talk) 12:57, 22 August 2016 (UTC)[reply]
    GDP is all in units of money. A rose by any other name would smell as sweet but is ice cream one word or two.--Savonneux (talk) 13:10, 22 August 2016 (UTC)[reply]
  • Keep. There are definitions for both words and headwords. Each dictionary needs at least a de facto (case-by-case) one for each of both terms. Of course competing definitions must be aknowledged for, as should other issues too (see the original discussion). I agree that the title might not be the best one, but I think it is relevant to have a list of languages by number of words on authoritative dictionaries (or maybe replace words by entries, and detail each type when available). Because of languages and not because of dictionaries. It should include clarifications on how a language/dictionary pair treats issues that might affect the number. The article is relevant both to show how dificult it is to compare between languages as well as for the value of the specific comparison itself. Comment: I think the very fact that there was a discussion for renaming (currently closed without consensus) serves as argument that there is (at least some) support for it to be kept.Cato censor (talk) 16:20, 22 August 2016 (UTC)[reply]
That is literally the definition of synthesis.--Savonneux (talk) 04:28, 24 August 2016 (UTC)[reply]
I'm not sure if I follow you. All I proposed may be sourced and worded in a way that does not add qualifiers or coordinating conjunctions that could make it amount to WP:SYN. Cato censor (talk) 13:00, 24 August 2016 (UTC)[reply]
  • Keep. The experts in every dictionary must define to obviuosly difficult matter of what counts as a word. Is ice cream a word or not? Well, for that question spanish has the Real Academia de la Lengua Española and french has the equivalent. Instead od deleting it, it could be improved explaining the caveats of each language in a new column. --Jbaranao (talk) 17:12, 22 August 2016 (UTC)[reply]
  • Delete. This is at best a synthesis of not-totally-comparable data. It's vanishingly unlikely that, say, Duden and Shogakukan will have the same standards for what counts as a headword. Apparently the Svenska Akademien has only completed A-T, so there's obvious incommensurability among the dictionaries. Furthermore, any such list necessarily privileges the selected dictionaries' definitions of "word" above other, equally valid possible definitions and possible counts. The selection category is words "included in the dictionary considered the most authoritative or complete", but considered by whom? the Wikipedian who added that line to the chart? I don't see how this can be anything other than original research. Cnilep (talk) 02:45, 24 August 2016 (UTC)[reply]
    @Cnilep: what do you think about restructuring it into a list of dictionaries by number of words (in the absence of a readily available better indicator of size)? Uanfala (talk) 08:19, 24 August 2016 (UTC)[reply]
  • Delete This is a very problematic way of comparing languages, and constitutes WP:OR, incompetently done. Example: The entry for Dutch, based on the 14th edition of the Van Dale lists the number of Words as 90,000. Had one of the contributors instead used the Woordenboek der Nederlandsche Taal the Approximate number of words would have increased fivefold, to 450,000. The aim of the article per [[1]] appears to be to establish the POV that some languages are "richer" than others. To use the word count in a dictionary for that is ahem, unscientific.Mduvekot (talk) 17:12, 24 August 2016 (UTC)[reply]
    @Mduvekot:, do you have any objections to repurposing the article into a list of dictionaries? Uanfala (talk) 17:27, 24 August 2016 (UTC)[reply]
    @Uanfala: I have a problem with the premise of the article, regardless of its name: that something meaningful can be said about a language on the basis of the number of lemmas in a dictionary. Unless you can find several reliable sources that says that the number of headwords in a dictionary correlates to some property of the language, we should not have an article that is based on the assumption that such a correlation exists. Mduvekot (talk) 13:26, 25 August 2016 (UTC)[reply]
  • Comment: I'm no expert on WP policies, but I think most of the concerns raised here are a matter of WP:AQU. Instead of discussing its deletion two weeks after creation and even before the article's name is settled, shouldn't we be focused on improving it and attemtping to make it viable first, as per WP:BEFORE crit. C? As far as I understand, the concern for a number of commenters in this discussion is about form rather than substance. Cato censor (talk) 12:46, 25 August 2016 (UTC)[reply]
Clarification: of course this may be a problematic way of comparing/listing languages, but it may become objective and sourced as well. Just as you can compare/list flags by color, as long as you don't use that to measure beauty. Cato censor (talk) 12:55, 25 August 2016 (UTC)[reply]
Colours don't vary between languages though. Here's an entire paper on the concept of what a word is [2] The gist of it is that -> Linguists have no good basis for identifying words across languages (basically that morphology and syntax are very important, which vary drastically, in providing meaning). It's a false comparison.--Savonneux (talk) 13:15, 25 August 2016 (UTC)[reply]
So, let's keep it a list, without any reference to comparison? Cato censor (talk) 13:21, 25 August 2016 (UTC)[reply]
  • Comment - So, first of all, it definitely cannot keep the current title/frame. There is no objective number of words. Language is constantly developing, and even if we took a snapshot, none of these publications actually capture the whole language. In English, for example, you'd have to combine the OED with all manner of specialized/technical/scientific jargon, throw Urban Dictionary in there, etc. It is interesting, however, to think about a list of dictionaries by number of words -- and there are plenty of sources in the history of dictionaries/lexicography as well as the histories of reference, encyclopedias, writing, etc. that would support such a list. The difficult thing is finding sources that span up to the present, and then presenting them in a way that isn't WP:SYNTH. Ideally, we'd be drawing from other reliable sources' similar lists -- and this list does not appear to cite any such sources, as far as I can see... — Rhododendrites talk \\ 14:53, 27 August 2016 (UTC)[reply]
  • Comment: At this point the list is a list of dictionaries with most headwords. It has little to do with contemporary languages as many of the dictionaries collect, well, linguistic units in use for centuries (e.g. the Deutsches Wörterbuch and the Oxford English Dictionary) or even for millennia (the Zhonghua Zihai). Apart from the seemingly unsurmountable difficulties of defining word across languages at least the Zhonghua Zihai doesn’t even pretend to collect words, but only characters (many of which were only used on one occasion or in one name). There are descriptive dictionaries like the OED that claim to record all words in (current and historical) use and prescriptive ones like the Kamus Besar Bahasa Indonesia that claim to include only words considered worthy to be recommended for formal style. I am afraid there is little hope for this mess to ever turn into something tidy and encyclopaedic. A well-defined list of descriptive word dictionaries of contemporary standard and dialectal language use, for instance, simply doesn’t seem feasible. Love —LiliCharlie (talk) 10:12, 28 August 2016 (UTC)[reply]
P.S. I also recommend reading How do British and American attitudes to dictionaries differ?. Every nation — let alone speech community — has their own notion of what purpose a dictionary serves. Love —LiliCharlie (talk) 10:37, 28 August 2016 (UTC)[reply]
Relisted to generate a more thorough discussion and clearer consensus.
Please add new comments below this notice. Thanks, North America1000 01:13, 30 August 2016 (UTC)[reply]
  • Delete Neutral; found a source, see below 04:23, 1 September 2016 (UTC) I can't find any RS comparing languages by any kind of vocabulary-size metric, including sources comparing dictionary size, so the topic fails WP:NLIST. Nation & Waring 1997 note that some research has tried to extract workable values for word family count in particular dictionaries, but don't say anything about interlingual comparisons. Discussion participants have presented good arguments that this is probably because it's a fundamentally poor (that is, unencyclopedic) question: there's too much variation in the cultural roles and scopes of a dictionary, and there's no evidence that a language's total vocabulary size (if measurable) even matters, especially considering that no native speaker can know all the words in their own language. FourViolas (talk) 16:44, 30 August 2016 (UTC)[reply]
  • Delete - dubious original research. I am pretty much sure the comparison is 'oranges vs apples', because different vocabularies are quite possibly based in different language registers; eg. some include obsolete words, while other are not, etc. Also, coverage of slangs, argot, jargons, regionalisms an other corpora. In other words I concur with those who say this article must be based on RS which compare languages by word count. Staszek Lem (talk) 17:50, 31 August 2016 (UTC)[reply]
Hang on. http://doi.dx.org/10.1093/applin/14.2.188 (Metacognitive and Other Knowledge about the Mental Lexicon)contains a detailed analysis of the number of words in the OED (1st and 2nd ed.), Webster's Third New International, Webster's Unabridged, and the Random House Unabridged. There's some secondary material about what counts as a "word" for different purposes, and the relation between dictionary size and the number of words in a language (with estimates from several scholars for the latter). This makes me wonder if List of dictionaries by number of words or at least List of English-language dictionaries by number of words meets NLIST. Not sure yet. There's also a whole journal called "Dictionaries: Journal of the Dictionary Society of North America", which I'd missed and which looks promising. FourViolas (talk) 04:23, 1 September 2016 (UTC)[reply]
The above discussion is preserved as an archive of the debate. Please do not modify it. Subsequent comments should be made on the appropriate discussion page (such as the article's talk page or in a deletion review). No further edits should be made to this page.