faallo dheeraad ah

Qaybtan waxa loo qorsheeyay in lagu isticmaalo sida tixraac, halkii in la akhriyo sida sheeko.

  • Hordhac (Qaybta 2.1)

Mid ka mid ah nooca loo kuurgalo oo aan ka mid ahaa in cutubkan waa ethnography. Wixii dheeraad ah oo ku ethnography meelaha digital arki Boellstorff et al. (2012) , iyo ka badan oo ku saabsan ethnography meelaha digital oo isku dhafan jirka arki Lane (2016) .

  • Data Big (Qaybta 2.2)

Marka aad repurposing xogta, waxaa jira laba tricks maskaxda oo kaa caawin kara inaad fahamto dhibaatada suurto gal ah in aad la kulanto laga yaabaa. First, waxaad isku dayi kartaa in la qiyaaso dataset ku haboon ee aad dhibaatada iyo is barbar in dataset in aad isticmaalayso. Sidee bay la mid yihiin iyo sida ay u kala duwan yihiin? Haddii aadan u ururiyaan xogta aad naftaada, waxaa ay u badan tahay in ay kala duwan yihiin inta u dhaxaysa waxa aad rabto iyo waxa aad leedahay. Laakiin, in aad go'aansato haddii kala duwanaanshaha waa yaryahay ama waaweyn.

Second, xusuusnow in qof abuuray iyo ururiyey xogta aad sabab qaar ka mid ah. Waa in aad isku dayaan in ay fahmaan oo ay sabab u. Noocan ah reverse-injineernimada kaa caawin karaan inaad la aqoonsado dhibaatooyinka ay suurtagal tahay iyo eexasho in xogta aad repurposed.

Ma jirto qeexid heshiis hal "data weyn", laakiin sharaxyo badan oo u muuqdaan in ay diiradda saaraan 3 Vs ah: (tusaale ahaan, mugga, kala duwan, iyo xawaaraha Japec et al. (2015) ). Halkii diiradda on sifooyinka macluumaadka, aan qeexid diiradda badan oo ku saabsan sababta ay xogta la abuuray.

My noqoshada xogta maamulka dowladda gudaha category xogta weyn waa xoogaa aan caadi ahayn. Qaar kale kuwaas oo ka dhigay kiiskan, waxaa ka mid ah Legewie (2015) , Connelly et al. (2016) , iyo Einav and Levin (2014) . Wixii dheeraad ah oo ku saabsan qiimaha xogta maamulka dowladda cilmi, arki Card et al. (2010) , Taskforce (2012) , iyo Grusky, Smeeding, and Snipp (2015) .

Wixii aragti ah oo cilmi maamulka gudaha nidaamka dowladda tirakoobka, gaar ahaan Census Bureau Maraykanka, arki Jarmin and O'Hara (2016) . Waayo, daaweynta dhererka buug oo ka mid ah cilmi-diiwaanka maamulka ee Statistics Sweden, arki Wallgren and Wallgren (2007) .

In cutubka, waxaan si kooban marka loo eego baadhitaan dhaqanka sida Survey Guud ee Bulshada (GSS) si ay il bulshada warbaahinta xogta sida Twitter. Wixii la barbardhigo buuxda oo si taxaddar leh u dhexeeya sahan dhaqanka iyo xogta warbaahinta bulshada, arki Schober et al. (2016) .

  • Tilmaamaha caamka ah ee xogta weyn (Qaybta 2.3)

Kuwani 10 sifooyinka of data weyn ayaa la tilmaamay in noocyo kala duwan oo siyaabo kala duwan by noocyo kala duwan oo qorayaasha kala duwan. Qoraalka in saamayn aan fikirkooda ku saabsan arimahan ka mid ah: Lazer et al. (2009) , Groves (2011) , Howison, Wiggins, and Crowston (2011) , boyd and Crawford (2012) , Taylor (2013) , Mayer-Schönberger and Cukier (2013) , Golder and Macy (2014) , Ruths and Pfeffer (2014) , Tufekci (2014) , Sampson and Small (2015) , Lewis (2015) , Lazer (2015) , Horton and Tambe (2015) , Japec et al. (2015) , iyo Goldstone and Lupyan (2016) .

Intii cutubkan, waxaan isticmaali bixisay fiidda digital dheer, oo waxaan u malaynayaa waa dhexdhexaad. Muddo kale oo caan ah oo raad digital waa raadkoodii digital (Golder and Macy 2014) , laakiin sida Hal Abelson, Ken Ledeen, iyo Harry Lewis (2008) tilmaan, muddo ka badan ku habboon badan tahay waa faraha digital. Marka aad la abuuro raadkoodii, inaad ogtahay waxa dhacaya iyo raaadkoodiiba aad aan guud ahaan la ogan karo in aad si shaqsi ah. waxaa isku mid ma aha waa run, waayo, aad raad digital. Dhab ahaantii, aad ka tageeyso raad wakhtiga ku saabsan oo aad leedahay aqoon aad u yar oo dhan. Markaasaa, inkastoo raad kuwan ma magacaaga iyaga ku leeyihiin, waxay inta badan loo soo celin lala kartaa in aad. In si kale loo dhigo, waxay u badan sida faraha, la arki karin iyo shakhsi ahaan lagu aqoonsanayo.

Big

Wixii dheeraad ah oo ku saabsan sababta adkayd badan, ka abaalmarin baaritaano tirakoobka dhibaato, arki Lin, Lucas, and Shmueli (2013) iyo McFarland and McFarland (2015) . Arimahaa oo dhan waa in ay u horseedi cilmi in ay diiradda saaraan muhiimadda ku ool ah halkii muhiimadda tirakoobka.

Had iyo jeer-on

Marka la fiirinayo had-xog, waa muhiim in la tixgeliyo in aad is barbar dadka isla saxda ah muddo ama in aad is barbar group qaarkood beddelo dadka; eeg tusaale ahaan, Diaz et al. (2016) .

Non-Waxyeelo

Buugga A classic on tallaabooyin aan Waxyeelo waa Webb et al. (1966) . Tusaalayaasha ee buugga hore taariikhda da'da digital, laakiin ay weli ifi. Wixii tusaalooyin dadka beddelo dhaqankooda sababta oo ah joogitaanka surveillance mass, arki Penney (2016) iyo Brayne (2014) .

dhammeystirneyn

Wixii dheeraad ah oo ku record xirid, arki Dunn (1946) iyo Fellegi and Sunter (1969) (taariikheed) iyo Larsen and Winkler (2014) (casri). Midka soo dhawaaday ayaa sidoo kale la soo saaray ee computer science ka yar magacyada sida deduplication xogta, aqoonsi tusaale ahaan, magaca ku habboon, nuqul la ogaado, oo nuqul rikoorka ogaanshaha (Elmagarmid, Ipeirotis, and Verykios 2007) . Waxaa sidoo kale jira gaarka ah ilaalinta habab si ay u qoraan xirid oo aan u baahnayn gudbinta macluumaadka shaqsiga lagu aqoonsanayo (Schnell 2013) . Facebook ayaa sidoo kale ayaa sameeyay a soco link xogta si ay u dabeecad codbixinta, waxaas la sameeyey si ay u qiimeeyaan tijaabo ah aan kuu sheegi doonaa ah oo ku saabsan Cutubka 4 (Bond et al. 2012; Jones et al. 2013) .

Wixii dheeraad ah oo ku ansax dhisida, arki Shadish, Cook, and Campbell (2001) , Cutubka 3aad.

geli karin

Wixii dheeraad ah oo ku AOL Jabkii log raadinta, arki Ohm (2010) . Waxaan kuu soo bixin talo shuraako shirkadaha iyo dawladaha Cutubka 4 markii aan ku tilmaami tijaabo. Tiro ka mid ah qorayaasha ayaa muujiyay walaac ku saabsan cilmi-baarista in tiirsan xogta geli karin, arki Huberman (2012) iyo boyd and Crawford (2012) .

Mid ka mid ah jidka u wanaagsan cilmi-jaamacadeed si ay u bartaan helitaanka xogta waa in ay ka shaqeeyaan shirkad sida layli ah ama cilmi-booqanaya. Waxa intaa dheer in awood helaan xogta, habka this ayaa sidoo kale kaa caawin doontaa cilmibaadhe ka badan oo ku saabsan sida xogta la abuuray, taas oo muhiim u ah falanqaynta bartaan.

Non-wakiilka

Non-matali waa dhibaato weyn oo cilmi iyo dowladaha raba inay odhaahdiisa ku saabsan dadka oo dhan. Tani waa ka yar walaac shirkadaha in ay yihiin caadi ahaan diiradda on isticmaala ay. Wixii dheeraad ah oo ku saabsan sida Statistics Netherlands arko arrinta aan matali ganacsiga xogta weyn, arki Buelens et al. (2014) .

In Cutubka 3aad, waxaan ku tilmaami doonaa sampling oo ku qiimaysay oo si faahfaahsan weyn. Xitaa haddii xogta waa non-wakiilka, xaaladaha qaarkood, ay la miisaamaa karaan in ay soo saaraan qiyaas wanaagsan.

hor timi

miyigii System waa mid aad u adag tahay in la arko ka baxsan. Si kastaba ha ahaatee, mashruuca MovieLens ah (ka badan Cutubka 4aad wada hadleen) ayaa lagu ordo in ka badan 15 sano oo ay koox cilmi baaris aqooneed. Sidaa darteed, waxay ka diiwaan iyo macluumaad ku saabsan sida in nidaamka ayaa waxaad fartaan Wannaagga muddo iyo sida la wadaago this saamayn laga yaabaa falanqaynta (Harper and Konstan 2015) .

Tiro ka mid ah culimada ayaa diiradda lagu saaray miyigii ee Twitter: Liu, Kliman-Silver, and Mislove (2014) iyo Tufekci (2014) .

Algorithmically wareereen,

marka hore waxaan maqlay ereyga "algorithmically wareereen," by Jon Kleinberg loo adeegsaday hadal ah. Fikradda ugu weyn ee ka dambeeya performativity waa in qaar ka mid ah aragtiyaha cilmiga bulshada waa "ma matoorada kamaradaha" (Mackenzie 2008) . Taasi waa, waxay si dhab ah qaabka dunida halkii uun u qabsadaan.

Dirty

Hay'adaha tirakoobyada Dawladaha wac nadiifinta xogta, tafatirka xogta tirakoobka. De Waal, Puts, and Daas (2014) sharaxaad ka farsamooyinka tafatirka xogta tirakoobka loogu talagalay xogta sahanka iyo baaro taas oo ilaa xad ay yihiin dabaqi karo ilaha xogta weyn, iyo Puts, Daas, and Waal (2015) soo bandhigayaa qaar ka mid ah fikrado isku mid ah dhagaystayaasha badan oo guud.

Wixii tusaalooyin qaar ka mid ah waxbarashada diiradda spam ee Twitter, Clark et al. (2016) iyo Chu et al. (2012) . Ugu dambeyntii, Subrahmanian et al. (2016) wuxuu qeexayaa natiijada DARPA Twitter robot Challenge ah.

xasaasi

Ohm (2015) dib u cilmi horaantii on fikradda ah macluumaad xasaasi ah oo ay bixisaa baaritaanka-factor multi ah. The afar arrimood oo uu soo jeediyo waa: jaaniska waxyeelo; ixtimaalka waxyeello; joogitaanka of xiriir qarsoodi ah; iyo haddii halista fikiri walaac majoritarian.

  • Tirinta wax (Qaybta 2.4.1)

Farber ee waxbarasho ee tagaasida ee New York ayaa waxaa ku salaysan daraasad hore ay Camerer et al. (1997) in loo isticmaalo saddex baarka sahlanaato kala duwan ee foomamka safarka warqada sheets-warqad by darawalada isticmaalo si ay u qoraan wakhtiga bilowga safarka, dhammaadka wakhtiga, iyo rakaab. Daraasadani waxay hore ogaaday in darawallada u muuqday in mushaarka badan diirada: waxay shaqeeyeen yar on maalmood halkaas oo mushaharka ay ahaayeen sare.

Kossinets and Watts (2009) waxaa diiradda on asalka ah homophily ee shabakadaha bulshada. Eeg Wimmer and Lewis (2010) oo ah hab ka duwan in dhibaato la mid ah taas oo uu isticmaalaa xogta ka Facebook.

In shaqo ku xiga, King iyo asxaabtii ay sii sahamiyey faafreebka online China (King, Pan, and Roberts 2014; King, Pan, and Roberts 2016) . Waayo, hab la xiriira in lagu qiyaaso faafreebka online ee Shiinaha, arki Bamman, O'Connor, and Smith (2012) . Wixii dheeraad ah oo ku saabsan hababka tirakoobka sida mid ka mid ah loo isticmaalo ee King, Pan, and Roberts (2013) in la qiyaaso caadifo oo ka mid ah 11 million posts, arki Hopkins and King (2010) . Wixii dheeraad ah oo ku saabsan waxbarashada ee kormeersan, arki James et al. (2013) (in ka yar farsamo) oo Hastie, Tibshirani, and Friedman (2009) (more farsamo).

  • Saadaalin (Qaybta 2.4.2)

Odoroska waa qayb weyn oo ka mid ah sayniska macluumaadka warshadaha (Mayer-Schönberger and Cukier 2013; Provost and Fawcett 2013) . Mid ka mid ah nooca oddoroska in caadi ahaan ay sameeyeen cilmi baarayaal bulshada waa oddoroska dadka, tusaale ahaan Raftery et al. (2012) .

Hargabka Isbeddellada Google ma ahaa mashruuca ugu horeeya si ay u isticmaalaan xogta search in nowcast baahsanaanta hargabka. Dhab ahaantii, cilmi ee Maraykanka u United (Polgreen et al. 2008; Ginsberg et al. 2009) iyo Sweden (Hulth, Rydevik, and Linde 2009) ayaa lagu ogaaday in marka la eego search gaar ah (tusaale ahaan, "flu") saadaaliyay socoshada caafimaadka dadweynaha qaranka xogta ka hor la sii daayay. Ka dibna badan, mashaariic kale oo badan ayaa isku dayay in la isticmaalo xogta raad digital cudurka la ogaado socoshada, arki Althouse et al. (2015) in dib loo eego.

Waxa intaa dheer in la isticmaalayo xogta raad digital in la saadaaliyo natiijooyinka caafimaadka, waxaa sidoo kale tiro aad u badan oo shaqo la isticmaalayo xogta Twitter in la saadaaliyo natiijada doorashada; waayo, dib u eegista arki Gayo-Avello (2011) , Gayo-Avello (2013) , Jungherr (2015) (Ch. 7), iyo Huberty (2015) .

Isticmaalka xogta search in saadaalinta baahsanaanta hargabka iyo isticmaalka xogta Twitter in la saadaaliyo doorashada labaduba waa tusaalayaal isticmaalka nooc ka mid ah raad digital in la saadaaliyo nooc ka mid ah ay dhacdo adduunka. Waxaa tirada aadka u badan ee waxbarashada in ay leeyihiin qaab-dhismeedka guud ee this. Shaxda 2.5 waxaa ka mid ah tusaalayaal kale oo yar.

Shaxda 2.5: Liiska Qayb ka mid ah waxbarashada ay isticmaalaan qaar ka mid ah raad digital in la saadaaliyo dhacdo qaar ka mid ah.
raad Digital Natiijada yeerida
Twitter Box dakhliga xilka filimada ee dalka Maraykanka Asur and Huberman (2010)
Search abuse Sales ee filimada, muusikada, buugaag, iyo ciyaaraha fiidiyowga ee Maraykanka Goel et al. (2010)
Twitter Dow Jones Industrial Celceliska (suuq US) Bollen, Mao, and Zeng (2011)
  • Tijaabo Qiyaasidda (Qaybta 2.4.3)

Joornaalka The PS Cilmiga Siyaasadda lahaa dood cilmiyeed ku saabsan xogta weyn, ka baxayn sababaha, iyo aragtida rasmiga ah, iyo Clark and Golder (2015) soo koobaya waxtarka kaalinta kasta. Joornaal Talaaboda ee Academy of Sciences Qaranka ee Maraykanka ee America lahaa dood cilmiyeed ku saabsan ka baxayn sababaha iyo xogta weyn, iyo Shiffrin (2016) soo koobaya waxtarka kaalinta kasta.

In la eego tijaabo dabiiciga ah, Dunning (2012) waxay bixisaa daaweynta dhererka buugga fiican. Wixii dheeraad ah oo ku isticmaalaya qabyo bakhtiyaa Vietnam sida tijaabo dabiici ah, ka eeg Berinsky and Chatfield (2015) . Waayo, habab waxbarasho mashiinka in isku dayaan in ay si toos ah u soo ban tijaabo dabiiciga ah gudaha ilaha xogta weyn, arki Jensen et al. (2008) iyo Sharma, Hofman, and Watts (2015) .

In la eego ku habboon, in dib u eegis mustaqbil, arki Stuart (2010) , iyo in dib loo eego rajo arki Sekhon (2009) . Wixii dheeraad ah oo ku habboon sida nooc ka mid ah manjooyin, arki Ho et al. (2007) . Waayo, buugaagta bixiya daaweyn fiican ee ku habboon, arki Rosenbaum (2002) , Rosenbaum (2009) , Morgan and Winship (2014) , iyo Imbens and Rubin (2015) .