Ngalaba a na-iji ike mee ka a kwuru okwu ya, kama na-agụ dị ka a kọrọ.
One ụdị edebe na-adịghị gụnyere isiakwụkwọ a bụ ethnography. Maka ozi on ethnography na dijitalụ oghere ahụ Boellstorff et al. (2012) , na n'ihi na ihe on ethnography na mbuaha digital na n'ụzọ anụ ahụ oghere ahụ Lane (2016) .
Mgbe ị na-repurposing data, e nwere abụọ iche echiche aghụghọ na pụrụ inyere gị aka ịghọta na o kwere omume nsogbu, i wee na-enwe. Mbụ, i nwere ike gbalịa iche echiche na ezigbo dataset maka nsogbu gị na-atụnyere na ka dataset na ị na-eji. Olee ndị ha bụ ndị yiri na otú ka hà dị iche? Ọ bụrụ na ị na-anakọta gị data onwe gị, e nwere yiri ka ihe dị iche n'etiti ihe ị chọrọ na ihe i nwere. Ma, ị na-ekpebi ma ọ bụrụ na ndịrịta iche ndị a na-adị obere ma nke ukwu.
Nke abụọ, cheta na mmadụ kere na-anakọtara gị data na n'ihi ihe ụfọdụ. Ị kwesịrị ị na-agbalị ịghọta echiche ha na. Nke a na ụdị mgbanwe-engineering nwere ike inyere gị aka ịmata na o kwere omume nsogbu na biases gị repurposed data.
Ọ dịghị otu otutu mmadu kwenyere definition nke "nnukwu data", ma ọtụtụ nkọwa o yiri ka-elekwasị anya na 3 Vs: olu, dịgasị iche iche, na ike ọsọ (eg, Japec et al. (2015) ). Kama ilekwasị anya àgwà nke data, m definition elekwasị anya ọzọ na ihe mere data e kere.
My Nsonye nke ọchịchị administrative data n'ime Atiya nke nnukwu data bụ a bit unusually. Ndị ọzọ bụ ndị mere ka nke a, gụnyere Legewie (2015) , Connelly et al. (2016) , na Einav and Levin (2014) . N'ihi na ihe banyere uru ọchịchị administrative data maka nnyocha, gụọ Card et al. (2010) , Taskforce (2012) , na Grusky, Smeeding, and Snipp (2015) .
N'ihi na a echiche nke administrative nnyocha si n'ime ọchịchị mgbakọ na mwepụ usoro, karịsịa ndị US Census Bureau,-ahụ Jarmin and O'Hara (2016) . N'ihi na a akwụkwọ ogologo ọgwụgwọ nke administrative ndia nnyocha na Statistics Sweden, na-ahụ Wallgren and Wallgren (2007) .
Na isi, m nkenke tụnyere a omenala nnyocha e mere dị ka General Social Survey (GSS) ka a na-elekọta mmadụ media data isi iyi dị otú ahụ dị ka Twitter. N'ihi na a ọma na iji nlezianya na tụnyere n'etiti omenala nnyocha ndị e mere na-elekọta mmadụ media data, ịhụ Schober et al. (2016) .
Ndị a 10 e ji mara nnukwu data e kọwara na a dịgasị iche iche nke dị iche iche site na a dịgasị iche iche nke dere ya. Odide ahu nke na-enwe mmetụta m si eche echiche banyere ihe ndị a na-agụnye: Lazer et al. (2009) , Groves (2011) , Howison, Wiggins, and Crowston (2011) , boyd and Crawford (2012) , Taylor (2013) , Mayer-Schönberger and Cukier (2013) , Golder and Macy (2014) , Ruths and Pfeffer (2014) , Tufekci (2014) , Sampson and Small (2015) , Lewis (2015) , Lazer (2015) , Horton and Tambe (2015) , Japec et al. (2015) , na Goldstone and Lupyan (2016) .
N'oge nile isiokwu a, m ji okwu digital metụtara, nke m na-eche bụ dịtụ na-anọpụ iche. Ọzọ na-ewu ewu okwu maka dijitalụ metụtara bụ digital n'akara (Golder and Macy 2014) , ma dị ka Hal Abelson, Ken Ledeen, na Harry Lewis (2008) na-ekwu, a ọzọ kwesịrị ekwesị okwu bụ ma eleghị anya digital fingerprints. Mgbe ị mepụtara n'akara, ị maara ihe na-eme na gị n'akara ike n'ozuzu deere ndị a. Otu abụghị eziokwu n'ihi na gị dijitalụ metụtara. N'eziokwu, na ị na-ahapụ metụtara niile banyere nke i nwere nnọọ obere ihe ọmụma. Na, ọ bụ ezie na ndị a metụtara enweghị aha gị na ha, ha pụrụ mgbe mgbe ike jikọrọ azụ gị. Ya bu, ha na-aka dị ka fingerprints:-adịghị ahụ anya na ya onwe-akọwapụta.
Big
N'ihi na ihe mere nnukwu datasets, ijere mgbakọ na mwepụ ule mfịna,-ahụ Lin, Lucas, and Shmueli (2013) na McFarland and McFarland (2015) . Ndị a okwu kwesịrị iduga na-eme nnyocha na-eche banyere bara uru kama mgbakọ na mwepụ uru.
Always-on
Mgbe atụle mgbe niile-on data, ọ dị mkpa ịtụle ma ị na-atụnyere kpọmkwem otu ihe ahụ ndị mmadụ karịrị oge ma ọ bụ ma ị na-atụnyere ụfọdụ na-agbanwe agbanwe nke ìgwè mmadụ; -ahụ ihe atụ, Diaz et al. (2016) .
Non-reactive
A kpochapụwo akwụkwọ na-abụghị ndị reactive jikoro bụ Webb et al. (1966) . The atụ n'akwụkwọ tupu-ụbọchị dijitalụ afọ, ma ha ka na-enye ìhè. N'ihi na ihe atụ nke ndị na-agbanwe àgwà ha n'ihi na nke ọnụnọ nke uka onyunyo, na-ahụ Penney (2016) na Brayne (2014) .
ezughị ezu
Maka ozi on ndekọ linkage,-ahụ Dunn (1946) na Fellegi and Sunter (1969) (akụkọ ihe mere eme) na Larsen and Winkler (2014) (oge a). Yiri bịakwutere nakwa na e mepụtara na kọmputa sayensị n'okpuru aha dị ka data deduplication, atụ njirimara, aha kenha, oyiri nchọpụta, na oyiri ndekọ nchọpụta (Elmagarmid, Ipeirotis, and Verykios 2007) . E nwekwara nzuzo echebekwa-eru nso ka ịdekọ linkage nke-anaghị achọ nnyefe nke onwe-akọwapụta ọmụma (Schnell 2013) . Facebook na-ewepụtala n'ihu ejikọta ha na ndekọ na-ịtụ vootu omume; a ka e mere iji inwale nnwale na m agwa gị banyere n'Isi nke 4 (Bond et al. 2012; Jones et al. 2013) .
Maka ozi on mmepụta ya ndaba,-ahụ Shadish, Cook, and Campbell (2001) , n'Isi nke 3.
keerughi
N'ihi na ihe na AOL search log debacle,-ahụ Ohm (2010) . M na-enye ndụmọdụ banyere partnering na ụlọ ọrụ ndị na ọchịchị n'Isi nke 4 mgbe m na-akọwa nwere. A ọnụ ọgụgụ nke ndị edemede bụ kwuputela nchegbu banyere nnyocha na-adabere ná keerughi data, ịhụ Huberman (2012) na boyd and Crawford (2012) .
One ezi ụzọ na mahadum na-eme nnyocha iji nweta data ohere bụ na-arụ ọrụ a ụlọ ọrụ dị ka onye oru ma ọ bụ na ịga na-eme nchọpụta. Na mgbakwunye na-eme data ohere, nke a ga-enyekwara ndị na-eme nchọpụta mụta otú ahụ data e kere, nke dị mkpa maka analysis.
Non-anọchite anya
Non-representativeness bụ nsogbu bụ isi maka nnyocha na ndị ọchịchị ndị chọrọ ime okwu banyere ihe dum bi. Nke a bụ obere nchegbu o nwere maka ụlọ ọrụ ndị na-Elezie-lekwasịrị anya ha ọrụ. N'ihi na ihe na-esi Statistics Netherlands weere ihe iseokwu nke na-abụghị representativeness nke azụmahịa nnukwu data, ịhụ Buelens et al. (2014) .
N'Isi nke 3, Aga m na-akọwa ụfọdụ na ziekwa na nnọọ ukwuu zuru ezu. Ọbụna ma ọ bụrụ na data na-abụghị ndị nnọchiteanya, n'okpuru ụfọdụ ọnọdụ, ha nwere ike dara nha iji na-emepụta mma atụmatụ e mere.
ịkpafu
System nwayọọ siri nnọọ ike ịhụ site n'èzí. Otú ọ dị, MovieLens oru ngo (tụlere ihe n'Isi nke 4) e na-agba ọsọ n'ihi na ihe karịrị afọ 15 site na otu agụmakwụkwọ nnyocha òtù. Ya mere, ha akwukwo ma na-akọrọ ọmụma banyere otú di na usoro emewo, ghọrọ usoro ihe karịrị oge na otú a pụrụ imetụta analysis (Harper and Konstan 2015) .
A ọnụ ọgụgụ nke ndị ọkà mmụta na-elekwasị anya na nwayọọ na Twitter: Liu, Kliman-Silver, and Mislove (2014) na Tufekci (2014) .
Algorithmically ihere
Mbụ m nụrụ okwu ahụ bụ "algorithmically ogbara ghari" na-eji Jon Kleinberg na a okwu. Isi echiche n'azụ performativity bụ na ụfọdụ na-elekọta mmadụ na sayensị chepụtara bụ "engines bụghị ese foto" (Mackenzie 2008) . Ya bụ, na ha n'ezie enwe mmetụta ụwa kama dị nnọọ weghara ya.
unyi
Ọchịchị mgbakọ na mwepụ ụlọ ọrụ na-akpọ data ihicha, mgbakọ na mwepụ data edezi. De Waal, Puts, and Daas (2014) na-akọwa mgbakọ na mwepụ data edezi usoro mepụtara maka nnyocha e mere data na inyocha na nke ruo n'ókè ha na ọdabara ka nnukwu data isi mmalite, na Puts, Daas, and Waal (2015) ọnọde ụfọdụ nke otu echiche maka a ọzọ n'ozuzu na-ege ntị.
N'ihi na ihe atụ ụfọdụ nke ọmụmụ lekwasịrị anya spam na Twitter, Clark et al. (2016) na Chu et al. (2012) . N'ikpeazụ, Subrahmanian et al. (2016) na-akọwa na ihe ndị DARPA Twitter bot Challenge.
enwe mmetụta ọsọ ọsọ
Ohm (2015) reviews na mbụ nnyocha na echiche nke mwute ozi na-enye a multi-akpata ule. The anọ ihe ọ chọrọ ịlụ na-: gbasara nke puru omume ojoo; gbasara nke puru omume ojoo; ọnụnọ nke a nzuzo mmekọrịta; nakwa ma ihe ize ndụ na-egosi majoritarian nchegbu.
Farber si amụ tagzi ahụ nọ na New York dabeere mbụ nnyocha Camerer et al. (1997) na-eji atọ dị iche iche na mma samples akwụkwọ njem Ibé akwụkwọ-akwụkwọ iche-iche eji na ọkwọ ụgbọala idekọ njem mmalite oge, ọgwụgwụ oge, na ụgbọ. Nke a na mbụ Nchoputa na gosiri na ọkwọ ụgbọala nke yiri ka ọ iche earners: ha na-arụ ọrụ na-erughị on ụbọchị ebe ha ụgwọ ọrụ ha nọ elu.
Kossinets and Watts (2009) e lekwasịrị anya si malite nke homophily na mmadụ netwọk. Lee Wimmer and Lewis (2010) n'ihi na a dị iche iche obibia otu nsogbu nke na-eji data site na Facebook.
Ụdi ọrụ, King na ndị ọrụ ibe ha n'ihu enyoba online nnyocha na China (King, Pan, and Roberts 2014; King, Pan, and Roberts 2016) . N'ihi na a yiri obibia ji atụ online nnyocha na China,-ahụ Bamman, O'Connor, and Smith (2012) . Maka ozi on mgbakọ na mwepụ usoro dị ka onye na-eji na King, Pan, and Roberts (2013) na-eme atụmatụ ihe ahụ 11 nde posts,-ahụ Hopkins and King (2010) . Maka ozi on chịkwara mmụta,-ahụ James et al. (2013) (obere oru) na Hastie, Tibshirani, and Friedman (2009) (ihe oru).
Ịkọ bụ nnukwu akụkụ nke ulo oru data sayensị (Mayer-Schönberger and Cukier 2013; Provost and Fawcett 2013) . Otu ụdị ịkọ na ọtụtụ ndị na-eme site na-elekọta mmadụ na-eme nnyocha na-omume igwe mmadụ ịkọ ihe atụ Raftery et al. (2012) .
Google Flu Trends abụghị mgbe mbụ oru ngo iji search data ka nowcast influenza njupụta. N'ezie, nnyocha na United States (Polgreen et al. 2008; Ginsberg et al. 2009) na Sweden (Hulth, Rydevik, and Linde 2009) achọpụtala na ụfọdụ na ọchụchọ (eg, "flu") buru amụma na mba ahụ ike ọha surveillance data tupu ya a tọhapụrụ ya. Mgbe nke ahụ gasịrị ọtụtụ, ọtụtụ ndị ọzọ na oru ngo na-agbalị iji dijitalụ Chọpụta data maka ọrịa surveillance nchọpụta,-ahụ Althouse et al. (2015) n'ihi na a nyochaa.
Na mgbakwunye na-eji dijitalụ Chọpụta data ịkọ ahụ ike ịka, e Umuihe a Akwa utom eji Twitter data ịkọ ntuli aka a ga esi; n'ihi na reviews-ahụ Gayo-Avello (2011) , Gayo-Avello (2013) , Jungherr (2015) (Ch. 7), na Huberty (2015) .
Iji search data ka ịkọ influenza jupụta ebe nile nakwa iji Twitter data ịkọ ntuli aka na-ma ihe atụ nke na-eji ụdị ụfọdụ nke dijitalụ Chọpụta ịkọ ụdị ụfọdụ nke ihe omume ndị nọ n'ụwa. E buru ibu ọnụ ọgụgụ nke ọmụmụ na nwere a n'ozuzu Ọdịdị. Isiokwu 2.5 agụnye a ole na ole ihe atụ ndị ọzọ.
Digital Chọpụta | pụta | tikeeti |
---|---|---|
Igbe ọrụ revenue nke fim na US | Asur and Huberman (2010) | |
search ndekọ | Sales nke fim, music, akwụkwọ, na egwuregwu vidio na US | Goel et al. (2010) |
Dow Jones Industrial Nkezi (US ngwaahịa ahịa) | Bollen, Mao, and Zeng (2011) |
Magazin PS Political Science nwere a na ogbako mkparita uka on nnukwu data, causal inference, na iwu tiori, na Clark and Golder (2015) achịkọta ọ bụla onyinye. Magazin Proceedings of the National Academy of Sciences nke United States of America nwere a na ogbako mkparita uka on causal inference na nnukwu data, na Shiffrin (2016) achịkọta ọ bụla onyinye.
Na okwu nke eke nwere, Dunning (2012) kacha onye akwụkwọ ogologo ọgwụgwọ. Maka ozi on iji Vietnam draft lọtrị dị ka a eke nnwale, na-ahụ Berinsky and Chatfield (2015) . N'ihi na igwe mmụta na-eru nso na-anwa-akpaghị aka na-achọpụta eke nwere n'ime nnukwu data isi mmalite, na-ahụ Jensen et al. (2008) na Sharma, Hofman, and Watts (2015) .
Na okwu nke kenha, n'ihi na onye nwere nchekwube nyochaa,-ahụ Stuart (2010) , na n'ihi na a pessimistic nyochaa ahụ Sekhon (2009) . Maka ozi on kenha dị ka ụdị kwachaa,-ahụ Ho et al. (2007) . N'ihi na akwụkwọ na-enye ndị magburu onwe agwọ ọrịa nke kenha,-ahụ Rosenbaum (2002) , Rosenbaum (2009) , Morgan and Winship (2014) , na Imbens and Rubin (2015) .