Nui, datasets ka mea, he mea hiki i ka hopena; ka mea, ua hoopau iloko o lakou iho, aole.
O ka mua o ka poe maikai ano ekolu o ka nui ikepili i ka mea kamaʻilio nui: keia, ua nui ikepili. Keia mau ikepili kumu e hiki ai e nui i ka ekolu naʻano likeʻole: he nui poe kanaka, hailona o ka 'ike o ka ke kanaka, a me na loina ma luna o ka manawa. Me ka nui dataset mau hiʻohiʻona nō kekahi i ho'ākāka 'ano o ka hana noiʻi-ana heterogeneity, këia pohihihi hanana, AeAaO liilii oko ao, a me ka causal koho mai observational ikepili. He mea no hoi he e alakai aku i kekahiʻano o ka sloppiness.
Ka mua ka mea, no ka nui, ua maopopo pono, ua neʻe ma kela aoao o averages e hana koho no kekahi subgroups. No ka mea, laʻana, Gary King, Jennifer pa pulehu, a Molly Roberts (2013) ana i ka probability e lawelawe Media pou ma Kina e censored e ke aupuni. Wale i keia awelika probability o ka moʻolelo holoi ka mea, aole loa kōkua maikaʻi no ka noonoo, no ke aha la ke aupuni censors kekahi pou, aka, aole e ae. Aka, no ka mea, ko lakou dataset komo pū 11 miliona pou, ke alii, a me nā hoapili hoi he koho no ka probability o ka censorship i mau elele maluna o 85 hookaawale waeʻano (e like me, pornography, Tibet, a kalepa ae ma Beijing). Ma ke kapakai a ka probability o ka censorship no na pou i okoa waeʻano, ua hiki ke hoʻomaopopo aku i ka a no ke aha la ke aupuni censors kekahi mauʻano o nā pou lakou. Me 11 tausani pou (ma kahi o ka 11 miliona pou), aole lakou e ole, ua hiki i ka paka i keia mau waeʻano-kekahi koho.
Lua, ka nui, ua maopopo pono no ka mea këia ka pohihihi keia hanana. No ka mea, laʻana, Goel a me nā hoapili (2015) makemake e noiʻi i nāʻaoʻaoʻokoʻa e Tweets ke hele viral. No ka mea, nui cascades o ka hou Tweets he loa pohihihi keia, ma kahi o kekahi ma o ka 3,000-lakou e noiʻi i oi aku mamua o ka ieeeea? Tweets i mea e loaa lawa nui cascades no ko lakou Ka Ikepili pono.
Kolu, nui datasets e hō'ā i ka noiʻi e huai ae liʻiliʻi oko ao. Ma ka oiaio, nui no ke kālele ana ma ka nui ike i ka hana, ua pili keia mau liʻiliʻi oko ao: reliably AeAaO i ka likeʻole ma waena o 1% a 1.1% kaomi-ma holo 'awelika ma kekahi, a ke unuhi i loko o miliona o na dala i keu loaa. Ma kekahi 'epekema napoo ana no hoi, ua liilii oko ao paha, aole e pau nui (a ina lakou he statistically nui). Aka, ma kekahi kulekele napoo ana no hoi, ua uuku oko ao ke lilo nui i ka wa a nānā akula au i ka 'Ohi. No ka mea, laʻana, ina he he elua lehulehu ola uao ai ', a me kekahi mea iki oi aku kona mana mamua o ka kekahi, a laila ka hoʻololiʻana i ka nui ka hoʻokō' uao hiki pau i poe e hoola i na tausani o nā ola.
Eia ke oki, nui ikepili e puhi ia nui ko makou hiki e hana causal koho mai observational ikepili hoʻonui. ʻOiai Nui datasets ole fundamentally hoololi i na pilikia a me ka causal kuhi mai observational aeaiiua,'ālike a-ʻikepili nā pono nui maoli ho okolohua-elua 'ana i kūpono a noiʻi i hoʻomohala no ka causal kuleana mai observational mai ka nui datasets. e wehewehe aku au, ae kahaki i keia koi ana i ka nui au mamuli hope mai i loko o kēia mokuna, i ka wa aʻu e wehewehe noiʻi papa kōnane.
ʻOiai Bigness he nui i ka waiwai maikai, i ka wa hoʻohana pono, Fashion ike ia bigness mau hiki aku i ka conceptual hewa o keia hoopunipuni. No ka mea, o kekahi kumu, bigness manao e alakai noiʻi e ignore i ko lakouʻikepili ua ua loaʻa. Oiai bigness i emi ka pono e hopohopo e pili ana kaulele kuhihewa, ia nae maoli i ka pono e hopohopo e pili ana systematic hewa, i na ano o ka hewa a pau aʻu e hōʻike i oi aku malalo o alaʻe, mai biases i loko o ka 'ikepili ua hana a ua ohi ia. Ma kekahi mau mea dataset, o ka poe kaulele hewa o keia hoopunipuni a me ka hewa o keia hoopunipuni systematic hiki e mai, aka, ma ka nui dataset kaulele hewa o keia hoopunipuni, ua hiki ke 'awelika no aku a me ka hewa o keia hoopunipuni systematic dominates. Noiʻi ka poe e manao ana systematic hewa o keia hoopunipuni, e pau ia i ka hoʻohana 'ana i ko lakou nui datasets e loaa i kūlike loa olelo o ka hewa mea; lakou e lilo precisely paʻewa ai (McFarland and McFarland 2015) .