2.3.2.1 ezingagcwaliswanga

Kungakhathaliseki ukuba "khulu" "data ezinkulu" yakho mhlawumbi ayinalo ulwazi olufunayo.

Uninzi imithombo yedatha enkulu aziphelelanga, ngengqiqo yokuba awunalo ulwazi lokuba uya kufuna uphando lwakho. Lo ngumsebenzi eqhelekileyo data zadalwa kuba ngaphandle kwe zophando iinjongo. sele izazinzulu ezininzi loluntu ngamava yokujongana kwezicelo, ezifana uphando ezikhoyo akazange ukubuza umbuzo ubufuna. Ngelishwa, iingxaki kwezicelo bakholisa ukuba kakhulu ngakumbi data ezinkulu. Ngokwamava am, data enkulu ithande ukuba singabikho iintlobo ezintathu ingcaciso eluncedo zophando kwezentlalo: Amanani, ukuziphatha kwezinye amaqonga, kunye nedata nokusebenzisa azenzele theoretical.

Zontathu ezi ntlobo kwezicelo ibonisiwe isifundo Gueorgi Kossinets and Duncan Watts (2006) malunga ililo kothungelwano lwezentlalo eyunivesithi. Kossinets kunye Watts yaqala kunye logs email evela eyunivesithi, ebe ulwazi ngqo malunga lowamthumayo emails ukuba ngubani na ixesha (abaphandi abazange ukufikelela kumxholo emails). Ezi rekhodi email kuvakala ngathi dataset emangalisayo, kodwa, bona-nangona ubukhulu kunye granularity-ngokuncamisayo engaphelelanga. Ngokomzekelo, logs email musa ziquka data malunga iimpawu demographic babafundi, ezifana isini kunye nobudala. Ngaphezu koko, le logs email musa kubandakanya ulwazi malunga konzibelelwano namanye amajelo eendaba, afana ifowuni, umyalezo obhaliweyo, okanye ubuso ngobuso iincoko. Okokugqibela, le logs email musa kuquka ngqo ulwazi malunga nobudlelwane, olwakha lweenkcazo neengcamango ezininzi ezikhoyo. Kamva kwakwesi sahluko, xa bethetha izicwangciso zophando, uza kuyibona indlela Kossinets and Watts iyicombulule le ngxaki.

Ngeentlobo ezintathu kwezicelo, ingxaki data azigqibeleli ukuqhuba azenzele theoretical yeyona nxale- ukusombulula, kwaye ngamava am, oko kudla ngempazamo ingasiwa zizazinzulu data. Kalukhuni, azenzele theoretical iimbono ezingabonakaliyo ntle ukufunda, kodwa, ngelishwa, ezi zilungiso Asinakusoloko zibekwe kuchaziwe walinganisa. Ngokomzekelo, makhe cinga uzama empirically ukuvavanya ibango kubonakala elula ukuba abantu bakrelekrele bafumane imali engakumbi. Ukuze ndiwalinge ngazo eli bango kufuneka ukuba ukulinganisa "yobukrelekrele." Kodwa, yintoni yingqondo? Umzekelo, Gardner (2011) wabonisa ukuba kukho eneneni iintlobo ezisibhozo ezahlukeneyo zobuntlola. Kwaye, iinkqubo kukho into ukulinganisa ngokuchanekileyo naziphi na zezi fom krelekrele? Nangona oluninzi umsebenzi lweesayikholojisti, le mibuzo ingekabinazo iimpendulo zicacileyo. Ngoko ke, nkqu ilula ibango-abantu bakrelekrele bonge imali-kusenokuba nzima ukuba bahlole empirically kuba kusenokuba nzima ukuqhuba azenzele lweenkcazo data. Eminye imizekelo zilungiso theoretical ezibalulekileyo kodwa kunzima ukuqhuba ziquka "izithethe," "kwentlalo enozinzo," yaye "idemokhrasi." Izazinzulu Social kubiza umdlalo phakathi olwakha ngayo ithiyori kwaye data kokuyilwayo semthethweni (Cronbach and Meehl 1955) . Ke, njengoko olu luhlu wakha libonisa, ukwakha semthethweni yingxaki ukuba ntle baye banyamezela ixesha elide kakhulu, kwanokuba xa ukusebenza iinkcukacha ezaziqokelelwe ngenjongo zophando. Xa usebenza iinkcukacha eziqokelelweyo ngaphandle kwe zophando iinjongo, iingxaki kokuyilwayo ngokusemthethweni nangakumbi ngumngeni (Lazer 2015) .

Xa ufunda iphepha lophando, enye indlela ekhawulezileyo kwaye luncedo ukuhlola iinkxalabo malunga kokuyilwayo wihi na ukuthatha ibango ephambili ephepheni, leyo ngokuqhelekileyo kuvakalisiweyo ngokuphathelele wakha, yaye-ukuvakalisa kwakhona ukuba ngokwemiqathango ye-data ezisetyenzisiweyo. Ngokomzekelo, cinga izifundo ezibini kwinto ezithi ukubonisa ukuba abantu bakrelekrele bafumane imali engakumbi:

  • Isifundo 1: abantu amanqaku kakuhle ihlungulu oluQhubekayo Matrices Test-uvavanyo wafunda kakuhle zobuntlola zicazulule (Carpenter, Just, and Shell 1990) bezigatya abanemivuzo ephezulu ingxelo wabo werhafu
  • Isifundo 2: abantu on Twitter basebenzisa amazwi elide kusenokwenzeka ukuba ukukhankanya zegama zodidi

Kuzo zombini ezi meko, abaphandi ukuba singabhengeza ukuba lubonise ukuba abantu bakrelekrele bafumane imali engakumbi. Kodwa, kwisifundo sokuqala olwakha theoretical ziya kusetyenziswa kakuhle i data, kwaye eyesibini bengasekho. Ngaphezu koko, njengoko lo mzekelo ubonisa, iinkcukacha ezingakumbi akuthethi ngokuzenzekelayo ukusombulula iingxaki kokuyilwayo ngokusemthethweni. Wena ungathandabuzi iziphumo kwiSifundo 2 ukuba babandakanyeke kwesigidi tweets, i tweets billion, okanye tweets ezigidi. Kuba abaphandi akazazi ongawathandiyo kokuyilwayo ikuhlole, Uluhlu 2.2 lunika eminye imizekelo izifundo eziye lisetyenziswa olwakha ngayo ithiyori usebenzisa data wokulanda yesuntswana.

Itheyibhile 2.2: Imizekelo imizila yedijithali ukuba zisetyenziswa amanyathelo yeengqiqo ezingabonakaliyo theoretical. Ntle kubiza le semthethweni umdlalo kokuyilwayo kwaye ngowona mngeni ukusebenzisa imithombo yedatha omkhulu wophando kwezentlalo (Lazer 2015) .
wokulanda Digital ekwakhiweni yolwazi isamani
logs email eyunivesithi (meta-data kuphela) nobudlelwane kwezentlalo Kossinets and Watts (2006) , Kossinets and Watts (2009) , De Choudhury et al. (2010)
izithuba eendaba kwezentlalo Weibo inxaxheba kwiZiko Zhang (2016)
logs email ukusuka ngokuqinileyo (meta-data nokubhaliweyo olupheleleyo) kufanelekile yeNkcubeko kumbutho Goldberg et al. (2015)

Nangona ingxaki data engaphelelanga azenzele operationalizing theoretical kunzima intle ukusombulula, kukho izisombululo ezintathu ziqhelekile ingxaki yolwazi olungaphelelanga abazalwayo kunye nolwazi engaphelelanga kwisimilo kwezinye kumaqonga. Eyokuqala kuku ukuqokelela idata ufuna; Ndiza kukuxelela ntoni umzekelo ukuba kwiSahluko 3 xa ndithi kuni malunga nophando. Ngelishwa, olu hlobo yokuqokelelwa kwedatha akusoloko kusenzeka. Isisombululo yesibini iphambili ntoni izazinzulu data kubiza umsebenzisi-lesiphumo ukuthelekelela noko ntle kubiza imputation. Ngale ndlela, abaphandi basebenzise ulwazi ukuba kwezinye abantu Ukuthelekelela iimpawu zabanye abantu. Eyesithathu kunokwenzeka isisombululo-neyayisetyenziswa Kossinets and Watts-yaba ukudibanisa imithombo yedatha ezininzi. Le nkqubo wambi ebizwa nokudityaniswa okanye irekhodi yePSO. Isikweko ndiyithandayo kule nkqubo icetywayo kumhlathi yokuqala iphepha kakhulu yokuqala ebhalwe irekhodi yePSO (Dunn 1946) :

"Umntu ngamnye ihlabathi kudala iNcwadi yoBomi. Le Ncwadi iqala ekuzalweni luze luphele kunye nokufa. amaphepha ayo zenziwe iirekhodi zeziganeko umgaqo ebomini. Irekhodi wonxibelelano yegama elinikwe inkqubo lokuhlanganisana amaphepha ale ncwadi ibe nomthamo. "

Le ngxelo yabhalwa ngowe-1946, yaye ngelo xesha, abantu becinga ukuba kwiNcwadi yoBomi kungaquka iziganeko ebomini ezifana ekuzalweni, umtshato, uqhawulo, kunye nokufa. Noko ke, ekubeni ngoku ulwazi kangaka abantu irekhodwa, eyiNcwadi yoBomi ingaba umfanekiso kakhulu oluneenkcukacha, ukuba loo maphepha ezahlukileyo (ukutsho oko, imizila yethu digital), nokuba ungakhonkxwa kunye. Le kwiNcwadi yoBomi abe ziluncedo olukhulu abaphandi. Kodwa, kwiNcwadi yoBomi Usenokubizwa ngokuba uvimba zokweyelisela (Ohm 2010) , ezino kusetyenziswa kuzo zonke iintlobo iinjongo okungafanelekanga, njengoko kuchaziwe ngasezantsi xa bethetha yokuzaza inkcazelo eqokelelwe imithombo yedatha enkulu ngezantsi kwaye kwiSahluko 6 (Unxulumano).