ezimbalini ngakumbi

Eli candelo yenzelwe ukuba isetyenziswe njengesingqiniso, kunokuba kufundwe njengokuba ebalisayo.

  • Intshayelelo (Section 2.1)

Enye uhlobo zokuphononnga ukuba ifakiwe kwesi sahluko ethnography. Vetshe ethnography kwiindawo digital ukubona Boellstorff et al. (2012) , kunye vetshe ethnography kwiindawo digital emzimbeni mixed ubone Lane (2016) .

  • Data Big (Section 2.2)

Xa UkuFumanela data, kukho amaqhinga ezimbini engqondweni ezinokukunceda uqonde iingxaki kunokwenzeka ukuba ukuze nazo. Okokuqala, ungazama Khawuthelekelele dataset umnqweno ingxaki yakho kunye uzithelekise ukuba dataset ukuba usebenzisa. Njani ke efanayo nendlela ke ezahlukeneyo? Ukuba zange kuqokelela data yakho ngokwakho, kukho kusenokwenzeka ukuba umahluko phakathi koko ufuna noko unayo. Kodwa ke, kufuneka ugqibe ukuba ezi Umahluko olungephi okanye enkulu.

Okwesibini, khumbula ukuba umntu wadalwa kwaza kwaqokelelwa data yakho ngasizathu sithile. Kufuneka uzame ukuqonda nokuqiqa kwabo. Olu hlobo reverse-zobunjineli kunokukunceda ukuchonga lweengxaki kunye nocalu kwi data yakho selebonwa.

Akukho nkcazelo imvumelwano esinye "data ezinkulu", kodwa kubonakala iinkcazelo ezininzi ukuba bagxininise kwi-3 Vs: (umzekelo, ivolyum, ezahlukahlukeneyo, kwaye velocity Japec et al. (2015) ). Kunokuba bacinge ngazo iimpawu data, inkcazelo wam ijolise ngakumbi kutheni wadalwa idata.

ukufakwa yam data yolawulo karhulumente ngaphakathi kudidi data enkulu na isuntswana ngokungaqhelekanga. Abanye baye kulo mzekelo, iquka Legewie (2015) , Connelly et al. (2016) , kunye Einav and Levin (2014) . Kuba ngakumbi ixabiso data yolawulo karhulumente uphando, funda Card et al. (2010) , Taskforce (2012) , kunye Grusky, Smeeding, and Snipp (2015) .

Ngokuba nembono yophando zolawulo kwi ngaphakathi kwinkqubo manani zikarhulumente, ingakumbi US Census Bureau, bona Jarmin and O'Hara (2016) . Kuba unyango ubude encwadini zophando iingxelo yolawulo kwii Statistics eSweden, bona Wallgren and Wallgren (2007) .

Kwisahluko, mna ngokufutshane yathelekisa uphando emveli anjalo njengoko General Social Survey (GSS) ukuya kumthombo data eendaba zentlalo ezifana Twitter. Kuba uthelekiso eyifuna ngenyameko phakathi iisaveyi zomthonyama data eendaba zentlalo, bona Schober et al. (2016) .

  • Iimpawu ezifana data ezinkulu (Section 2.3)

Ezi mpawu 10 data ezinkulu ziye kuchazwa ngeendlela ezahlukeneyo iintlobo ababhali ezahlukeneyo. Ukubhala eziphembelele iingcinga zam kule miba iquka: Lazer et al. (2009) , Groves (2011) , Howison, Wiggins, and Crowston (2011) , boyd and Crawford (2012) , Taylor (2013) , Mayer-Schönberger and Cukier (2013) , Golder and Macy (2014) , Ruths and Pfeffer (2014) , Tufekci (2014) , Sampson and Small (2015) , Lewis (2015) , Lazer (2015) , Horton and Tambe (2015) , Japec et al. (2015) , kunye Goldstone and Lupyan (2016) .

Kuyo kwesi sahluko, ndiye wasebenzisa imizila yedijithali elide, endicinga cala noko. Enye igama elidumile okukhulayo lwedijithali imizila digital (Golder and Macy 2014) , kodwa ke njengokuba Hal Abelson, Ken Ledeen, kwaye Harry Lewis (2008) Khombisa ibinzana ezifaneleke ngakumbi mhlawumbi yeminwe yesuntswana. Xa uyila imikhondo, wena uyazi ukuba kwenzeka ntoni na umkhondo wakho ngeke jikelele akafumaneki kuwe ngokobuqu. Kunjalo ke akunjalo okukhulayo yakho yesuntswana. Enyanisweni, nina bamkayo imizila lonke ixesha malunga onalo ulwazi oluncinane kakhulu. Kwaye, nangona ezi imizila awunalo igama lakho kubo, ngamaxesha amaninzi lungenakunxityelelaniswa kuwe. Ngamanye amazwi, zininzi njenge lweminwe: ezingabonakaliyo kunye ngokobuqu nokuchonga.

Big

Ngokungaphaya ngokuba kutheni iiseti ezinkulu, enze iimvavanyo zamanani baseyingxaki, bona Lin, Lucas, and Shmueli (2013) kunye McFarland and McFarland (2015) . Le miba kufuneka kukhokelela abaphandi ukuba baqwalasele ukubaluleka practical kunokuba ngokubaluleka kwezobalo.

Soloko-on

Xa sicinga usoloko-on data, kubalulekile ukuqwalasela ukuba ingaba wena ngokuthelekisa abantu ngqo ezifanayo ngokuhamba kwexesha okanye enoba ngokuthelekisa ezinye iqela ukutshintsha labantu; sibone umzekelo, Diaz et al. (2016) .

Non-esebenzayo

Enye incwadi classic kwi amanyathelo non-lovukelo na Webb et al. (1966) . Le mizekelo encwadini lwangaphambi umhla ubudala yesuntswana, kodwa nangoku ezikhanyayo. Kuba imizekelo yabantu ekutshintshweni kokuziphatha kwabo ngenxa ubukho zokucupha ubunzima, bona Penney (2016) kunye Brayne (2014) .

ezingazaliswanga

Ngaphezulu kwirekhodi yePSO, bona Dunn (1946) kunye Fellegi and Sunter (1969) (historical) kwaye Larsen and Winkler (2014) (namhlanje). Efanayo wasondela sele ziphuhlisiwe kwinzululwazi yekhompyutha phantsi amagama afana deduplication data, ukuchongwa Ngokomzekelo, igama lomntu, ikope singabonwa, yaye ikope irekhodi singabonwa (Elmagarmid, Ipeirotis, and Verykios 2007) . Kukwakho yabucala nokugcina iindlela ukurekhoda yePSO leyo ayifuni usasazeko ngokobuqu nokuchonga ulwazi (Schnell 2013) . Facebook kwakhona iphuhlise ukuqhubeka ukudibanisa iirekhodi zabo ukuziphatha sovoto; oku kwenziwa ukuvavanya uphando Ndiza kukuxelela ngazo kwiSahluko 4 (Bond et al. 2012; Jones et al. 2013) .

Vetshe kokuyilwayo ikuhlole, bona Shadish, Cook, and Campbell (2001) , iSahluko 3.

ezingafikelelekiyo

Kuba engakumbi AOL search yelog bomgodi, bona Ohm (2010) . Ndikubekela amacebiso kunye neenkampani kunye noorhulumente kwiSahluko 4 xa ukuchaza imifuniselo. Inani ababhali wavakalisa inkxalabo malunga nophando ukuba uxhomekeke kwidatha azifikeleleki, bona Huberman (2012) kunye boyd and Crawford (2012) .

Enye indlela elungileyo kubaphandi eyunivesithi ukuze afumane ufikelelo data kukuba ukusebenza ngexesha inkampani njengoko osafunda kwiziko okanye umphandi ukutyelela. Ukongezelela yokwenza ukufikelela data, le nkqubo iya kukunceda ukuba umphandi ukufunda okungakumbi ngendlela wadalwa idata, nto leyo ibalulekileyo uhlalutyo.

Non-ummeli

Non-ukumelwa yingxaki enkulu abaphandi kunye noorhulumente abanqwenela ukwenza iingxelo malunga womnatha. Oku ngaphantsi ezinxunguphalisa kwiinkampani bakutshelwe ingqalelo kubasebenzisi babo. Ngokungaphaya ngokuba Statistics Netherlands iqwalasela umba non-ukumelwa data ezinkulu, funda Buelens et al. (2014) .

KwiSahluko 3, ndiza ukuchaza zokusampula ukulinganisela ngokweenkcukacha enkulu kakhulu. Nokuba data non-ummeli, phantsi kweemeko ezithile, nokuba iphononongwe ukuvelisa uqikelelo elungileyo.

anisavani

System belishiya Kunzima kakhulu ukubona ezivela ngaphandle. Kambe, iprojekthi MovieLens (kuxutyushwa ngakumbi kwiSahluko 4) iye isebenze iminyaka engaphezu kwama-15 ngokuthi iqela lophando ezifundweni. Ngenxa yoko, baye kumaxwebhu kwaye ulwazi malunga nendlela le nkqubo iye yakhula kwexesha nendlela ekwabelwana oku nefuthe uhlalutyo (Harper and Konstan 2015) .

Abaphengululi zigxininise belishiya in Twitter: Liu, Kliman-Silver, and Mislove (2014) kunye Tufekci (2014) .

Algorithmically neentloni

Ndaqala ukuva elithi "algorithmically wegolide 'asetyenziswa Jon Kleinberg entethweni. Ingcinga ephambili emva performativity kukuba iingcamango ezinye inzululwazi lwentlalo "iinjini hayi iikhamera" (Mackenzie 2008) . Oko kukuthi, okunene ukubumba ihlabathi kunokuba kusithimba nje.

mdaka

Arhente karhulumente manani kubiza ukucoca data, ukuhlela iinkcukacha zamanani. De Waal, Puts, and Daas (2014) bachaza manani ubuchule nokuhlela data kuphuhliswa ze data saveyi kusibonisa ukuba kukangakanani esebenzayo imithombo yedatha enkulu, kunye Puts, Daas, and Waal (2015) kukho ezinye zezimvo linye abaphulaphuli jikelele.

Kuba eminye imizekelo izifundo ingqalelo spam in Twitter, Clark et al. (2016) kunye Chu et al. (2012) . Okokugqibela, Subrahmanian et al. (2016) uchaza iziphumo DARPA Twitter Bot Challenge.

nobuntununtunu

Ohm (2015) kwakhona uphando ngaphambili kwi ingcamango ontununtunu kwaye inikeza uvavanyo multi-yinto. Imiba ezine uceba zezi: ngenene imbi; ngenene imbi; bukho ubudlelwane oluyimfihlo; nokuba umngcipheko ukubonisa inkxalabo majoritarian.

  • Ukubala izinto (Section 2.4.1)

Uphando Farber ngayo iiteksi eNew York yayisekelwe isifundo ngaphambili Camerer et al. (1997) ukuba kusetyenziswa iisampuli ezintathu ezahlukeneyo lula iintlobo uhambo iphepha amaphepha-iphepha esetyenziswa abaqhubi ukurekhoda uhambo ixesha lokuqala, ixesha lesiphelo, kunye yokukhwela. Olu phando ngaphambili lufumanise ukuba abaqhubi babebonakala abarhola ekujoliswe: basebenza ngaphantsi ngeentsuku apho umvuzo yabo ephakamileyo.

Kossinets and Watts (2009) wajolisa imvelaphi homophily kubuxhakaxhaka zoluntu. Bona Wimmer and Lewis (2010) ukuba indlela ezahlukileyo ingxaki inye leyo esebenzisa data Facebook.

Xa umsebenzi elandelayo, uKumkani noogxa beyijongisisile ngakumbi ukuhluza intanethi China (King, Pan, and Roberts 2014; King, Pan, and Roberts 2016) . Kuba indlela ezinxulumene kongqinisiso ukuhluza intanethi eTshayina, bona Bamman, O'Connor, and Smith (2012) . Vetshe iindlela manani efana lowo ezisetyenziswa King, Pan, and Roberts (2013) ukuqikelela ivakalelwa izithuba 11 million, bona Hopkins and King (2010) . Kuba ngakumbi ekufundeni kweliso, funda James et al. (2013) (ngaphantsi zobugcisa) kwaye Hastie, Tibshirani, and Friedman (2009) (ezininzi zobugcisa).

  • Nokuqikelelwa (Section 2.4.2)

Vhumbha yinxalenye enkulu data mveliso nenzululwazi (Mayer-Schönberger and Cukier 2013; Provost and Fawcett 2013) . Olunye uhlobo nolwemo- okwenziwayo ngokuxhaphakileyo abaphandi social nolwemo- ngokwedemografi, umzekelo Raftery et al. (2012) .

Google Flu Trends yaba iprojekthi yokuqala ukusebenzisa data uphendlo ukuze nowcast umkhuhlane ukuxhaphaka hayi. Eneneni, abaphandi baseUnited States (Polgreen et al. 2008; Ginsberg et al. 2009) naseSweden (Hulth, Rydevik, and Linde 2009) bafumanisa ukuba amagama athile search (umzekelo, "umkhuhlane") kwangaphambili yokucupha yoluntu yesizwe yezempilo wakhululwa data phambi kwayo. Kamva amaninzi, nezinye iiprojekthi abaninzi baye bazama ukusebenzisa data umkhondo digital Ubhaqo yokucupha izifo, bona Althouse et al. (2015) ukulungiselela kwakhona.

Ukongezelela usebenzisa data umkhondo yedijithali ukuqikelela iziphumo impilo, kuthe kwakhona isixa esikhulu umsebenzi data Twitter ukuqikelela iziphumo zonyulo; kuba ncomo ukubona Gayo-Avello (2011) , Gayo-Avello (2013) , Jungherr (2015) (Isahluko 7.), kwaye Huberty (2015) .

Ukusebenzisa data uphendlo kwi ukuqikelela umkhuhlane ukuxhaphaka nokusebenzisa data Twitter ukuqikelela unyulo zombini imizekelo ngokusebenzisa uhlobo wokulanda yedijithali ukuqikelela ezinye uhlobo isiganeko ehlabathini. Kubekho omkhulu inani zophononongo zinamaqoqo jikelele. Uluhlu 2.5 ibandakanya ezimbalwa neminye imizekelo.

Itheyibhile 2.5: uluhlu kancinci izifundo ukusebenzisa abanye wokulanda yedijithali ukuqikelela kwezinye.
wokulanda Digital isiphumo isamani
Twitter ofisi Box ingeniso kweefilimu kwi US Asur and Huberman (2010)
Ukufuna logs Ukuthengiswa eshukumayo, umculo, iincwadi, kwaye ivideo imidlalo yokuzonwabisa kwi US Goel et al. (2010)
Twitter Dow Jones Industrial Umyinge (US istock market) Bollen, Mao, and Zeng (2011)
  • Imifuniselo Approximating (Section 2.4.3)

Ulindixesha PS Science Political waba oluya kwi data enkulu, ukuthelekelela woko, kunye nethiyori esesikweni, kwaye Clark and Golder (2015) ishwankathela igalelo ngamnye. KweeNkqubo Journal of the National Academy of Sciences yaseUnited States of America waba oluya kwi ithethe woko nedatha ezinkulu, kwaye Shiffrin (2016) ishwankathela igalelo ngamnye.

Ngokwemiqathango imifuniselo zendalo, Dunning (2012) inika unyango obalaseleyo ubude ncwadi. Vetshe ngokusebenzisa Vietnam idrafti leloto njenge yolingelo yendalo, bona Berinsky and Chatfield (2015) . Kuba iindlela umatshini yokufunda ezizama ukufumana ngokuzenzekelayo imifuniselo zendalo ngaphakathi koovimba enkulu, bona Jensen et al. (2008) kunye Sharma, Hofman, and Watts (2015) .

Ngokwemiqathango matching, kuba ngokutsha okuhle, bona Stuart (2010) , kwaye kwakhona angabathembi ukubona Sekhon (2009) . Vetshe kuthelekiseka njenge uhlobo ukuthenwa, bona Ho et al. (2007) . Iincwadi ezinika unyango ebalaseleyo matching, bona Rosenbaum (2002) , Rosenbaum (2009) , Morgan and Winship (2014) , kunye Imbens and Rubin (2015) .