Idatha enkulu idalwe futhi iqoqwe izinkampani kanye nohulumeni ngezinhloso ngaphandle kokucwaninga. Ukusebenzisa le datha yocwaningo, ngakho-ke, kudinga ukuphindaphinda.
Indlela yokuqala abantu abaningi abahlangana ngayo nocwaningo lwezenhlalakahle eminyakeni yobudijithali yilokho okuvame ukubizwa ngokuthi idatha enkulu . Naphezu kokusetshenziselwa kabanzi kweli gama, akukho ukuvumelanisa ngokuthi iyiphi idatha enkulu ngisho. Noma kunjalo, enye yezincazelo ezivame kakhulu yedatha enkulu igxile "ku-3 Vs": Umqulu, Ukuhlukahluka, neVelocity. Okungenani, kukhona idatha eminingi, ngezindlela ezihlukahlukene, futhi idalwa njalo. Abanye abalandeli bemininingwane enkulu nabo bengeze ezinye "i-Vs" njenge-Veracity and Value, kanti abanye abagxeka bafaka i-V njenge-Vague and Vacuous. Esikhundleni se-3 "Vs" (noma i-5 "Vs" noma i-7 "Vs"), ngenjongo yokucwaninga kwezenhlalakahle, ngicabanga ukuthi indawo engcono ukuqala yi-5 "Ws": Ngubani, kuphi, kuphi, nini , futhi kungani. Empeleni, ngicabanga ukuthi izinselele namathuba amaningi adalwe yimithombo emikhulu yedatha ilandelwa kusuka ku "W" eyodwa kuphela: Kungani.
Esikhathini sobudala, iningi lemininingwane eyasetshenziselwa ucwaningo lwezenhlalakahle yenzelwe ngenhloso yokwenza ucwaningo. Esikhathini sobudala bedijithali, noma kunjalo, kudalwe idatha eningi yizinkampani kanye nohulumeni ngezinhloso ngaphandle kokucwaninga, njengokuhlinzeka ngezinsizakalo, ukukhiqiza inzuzo nokuphatha imithetho. Abantu bokudala, noma kunjalo, baqaphele ukuthi ungakwazi ukuphindaphinda le datha yebhizinisi kanye nohulumeni ukuze uthole ucwaningo. Uma ucabanga emuva kokufaniswa kobuciko esahlukweni 1, njengoba nje uDuchamp ephindaphinda into efunyenwe ukudala ubuciko, ososayensi bangakwazi ukuphindaphinda ukuthola idatha ukuze benze ucwaningo.
Nakuba kukhona amathuba amakhulu okuphindaphinda, ukusebenzisa idatha engadalwanga ngenhloso yocwaningo iphinde ihlinze izinselele ezintsha. Qhathanisa, isibonelo, insizakalo yezokuxhumana, njenge-Twitter, ngenhlolovo yomphakathi yendabuko, njenge-General Social Survey. Imigomo enkulu ye-Twitter ukuhlinzeka isevisi kubasebenzisi bayo nokwenza inzuzo. Ngakolunye uhlangothi, i-General Social Survey, igxile ekudaleni idatha ejwayelekile jikelele yokucwaninga kwezenhlalakahle, ikakhulukazi ekucwaningweni kombono womphakathi. Lo umehluko ngezinhloso kusho ukuthi idatha eyenziwe yi-Twitter futhi eyadalwa yi-General Social Survey inezakhiwo ezahlukene, nakuba zombili zingasetshenziswa ukutadisha umbono womphakathi. I-Twitter isebenza ngesilinganiso futhi isivinini ukuthi i-General Social Survey ayikwazi ukufanisa, kodwa, ngokungafani ne-General Social Survey, i-Twitter ayiqaphelisisi abasebenzisi futhi ayisebenzi kanzima ukugcina ukufaniswa ngokuhamba kwesikhathi. Ngenxa yokuthi le mithombo emibili yedatha ihluke kakhulu, akunangqondo ukusho ukuthi i-General Social Survey ingcono kune-Twitter noma ngokuphambene nalokho. Uma ufuna izinyathelo zamahora onke zomzwelo womhlaba (isib. Golder and Macy (2011) ), i-Twitter inhle kakhulu. Ngakolunye uhlangothi, uma ufuna ukuqonda izinguquko zesikhathi eside ekudaleni isimo sengqondo e-United States (isib. DiMaggio, Evans, and Bryson (1996) ), khona-ke i-General Social Survey ingcono kakhulu. Ngokuvamile, kunokuba uzame ukuphikisa ukuthi imithombo emikhulu yedatha ingcono noma ibuhlungu kunezinye izinhlobo zedatha, lesi sahluko sizozama ukucacisa ukuthi yiziphi izinhlobo zocwaningo eziyimithombo emikhulu yedatha ezinempahla ekhangayo nokuthi yiziphi izinhlobo zemibuzo engase zingabi nazo kuhle.
Uma ucabanga ngemithombo emikhulu yedatha, abacwaningi abaningi baqala ukugxila emininingwaneni ye-intanethi edalwe futhi eqoqwe izinkampani, njengezigijimi zokusesha injini kanye nokuthunyelwe koxhumana nabo. Kodwa-ke, lokhu kugxila okuncane kuyaphuma eminye imithombo emibili ebalulekile yedatha enkulu. Okokuqala, imithombo enkulu yedatha yeminyango iphuma kumadivaysi edijithali ezweni lomzimba. Isibonelo, kulesi sahluko, ngizokutshela ngesifundo esasiphinde satshengisa idatha esitolo sokuhlola ukuze kuhlolwe indlela ukukhiqizwa komsebenzi kuthinteka ngayo ukukhiqizwa kontanga yakhe (Mas and Moretti 2009) . Khona-ke, kuziqephu ezilandelayo, ngizokutshela ngabacwaningi abasebenzisa amarekhodi amakholi kusuka kumakhalekhukhwini (Blumenstock, Cadamuro, and On 2015) nedatha yokukhokha edalwe yizinsiza zikagesi (Allcott 2015) . Njengoba lezi zibonelo zibonisa, imithombo emikhulu yedatha yeminyango ingaphezu kokuziphatha kwe-intanethi kuphela.
Umthombo wesibili obalulekile wemininingwane enkulu engabanjwanga ngokugxila okuncane ekuziphatheni kwe-inthanethi idatha edalwe ohulumeni. Le datha kahulumeni, okuyinto abacwaningi ababiza ngayo amarekhodi okuphatha kahulumeni , kufaka izinto ezifana namarekhodi yentela, amarekhodi esikoleni kanye namarekhodi ezibalulekile (isib. Ukubhaliswa kokuzalwa nokufa). Ohulumeni sebedala lolu hlobo lwedatha, kwezinye izimo, amakhulu eminyaka, futhi ososayensi bezenhlalakahle bebelokhu bewaxhaphaza cishe uma nje kube khona ososayensi bezenhlalakahle. Yini eye ishintshile, noma kunjalo, ukukhishwa kwe-digitization, okwenza kube lula kakhulu ukuthi ohulumeni baqoqe, basuse, bagcine futhi bahlaziye idatha. Isibonelo, kulesi sahluko, ngizokutshela ngesifundo esasiphindaphinda idatha kusuka kumamitha wamatekisi we-New York City kahulumeni ukuze kulungiswe ingxabano ebalulekile embonini yezomnotho (Farber 2015) . Khona-ke, ezahlukweni ezilandelayo, ngizokutshela ukuthi amarekhodi okuvota ohulumeni aqoqwe kanjani ku-inhlolovo (Ansolabehere and Hersh 2012) kanye nokuhlolwa (Bond et al. 2012) .
Ngicabanga ukuthi umbono wokuphindaphinda uyisisekelo ekufundeni emithonjeni emikhulu yedatha, ngakho-ke, ngaphambi kokuba ukhulume ngokuqondile ngokuphathelene nemithombo yemithombo emikhulu yedatha (isigaba 2.3) nokuthi lezi zingasetshenziswa kanjani ocwaningweni (isigaba 2.4), ngingathanda ukunikeza izicucu ezimbili zeseluleko jikelele ngokuphindaphinda. Okokuqala, kungase kube isilingo ukucabanga ngokungafani engikusethe njengokuba phakathi kwedatha "etholakalayo" nedatha "eyenzelwe". Lokho kuseduze, kodwa akulungile ngempela. Noma kunjalo, ngokombono wabacwaningi, imithombo emikhulu yedatha "itholakele," ayiwa nje esibhakabhakeni. Kunalokho, imithombo yedatha "etholakala" ngabacwaningi yenzelwe umuntu ngenjongo ethile. Ngenxa yokuthi "idatha" etholakele yenzelwe umuntu, ngiyasikhuthaza njalo ukuthi uzame ukuqonda ngangokunokwenzeka mayelana nabantu nezinqubo ezidale idatha yakho. Okwesibili, uma uphinde uthumele idatha, kuvame ukuwusizo kakhulu ukucabanga ngedasetethi enhle yenkinga yakho bese uqhathanisa le datasethi enhle nalokho oyisebenzisayo. Uma ungazange uqoqe idatha yakho ngokwakho, kungenzeka ukuthi kube umehluko obalulekile phakathi kokufunayo nalokho onakho. Ukuqaphela lokhu umehluko kuzokusiza ukucacisa ukuthi yini ongayifunda futhi awukwazi ukufunda kusuka kudatha onayo, futhi kungase kusiphakamise idatha entsha okufanele uyiqoqe.
Esihlangenweni sami, ososayensi bezenhlalakahle kanye nososayensi bezintambo bavame ukukhuluma ngokuphindaphindiwe kakhulu. Ososayensi bezenhlalakahle, abajwayele ukusebenzisana nedatha eyenzelwe ucwaningo, bavame ukuveza izinkinga ngokuphindaphindiwe kwedatha ngenkathi bengayinaki amandla awo. Ngakolunye uhlangothi, ososayensi bezindlela ezivame ukuveza izinzuzo zokuphindwa kwedatha ngenkathi enganaki ubuthakathaka bawo. Ngokwemvelo, indlela engcono kakhulu i-hybrid. Okusho ukuthi, abacwaningi badinga ukuqonda izici zemithombo emikhulu yedatha-kokubili okuhle nokubi-bese uthola ukuthi ungafunda kanjani kubo. Futhi, yilona uhlelo lokusala kwalesi sahluko. Esigabeni esilandelayo, ngizochaza izici eziyishumi ezijwayelekile zemithombo yedatha enkulu. Khona-ke, esigabeni esilandelayo, ngizochaza izindlela ezintathu zokucwaninga ezingasebenza kahle ngedatha elinjalo.