key:
[ , ] Algorithmic confounding wani matsala da Google Mura Trends. Karanta takarda da Lazer et al. (2014) , da kuma rubuta a takaice, bayyananne email zuwa wani injiniya a Google bayyana matsalar kuma miƙa wani ra'ayin yadda za a gyara matsalar.
[ ] Bollen, Mao, and Zeng (2011) ya yi iƙirari cewa data daga Twitter za a iya amfani da su hango ko hasashen stock kasuwa. Wannan binciken ya kai ga halittar wani shinge asusu-Derwent Capital kasuwanni-su zuba jari a kasuwar jari bisa bayanan da aka tattara daga Twitter (Jordan 2010) . Abin da shaida za ka so ka ga kafin sa ka kudi a cikin wannan asusu?
[ ] Duk da yake wasu jama'a kiwon lafiya masu yada ƙanƙara e-sigari a matsayin wani tasiri taimako ga shan taba cessation, wasu yi gargaɗi game da m kasada, kamar high-matakan nicotine. Tunanin cewa wani bincike yanke shawarar da ya yi nazarin jama'a ra'ayi zuwa ga e-sigari da tattara e-sigari da alaka Twitter posts kuma gudanar da jin zuciya analysis.
[ ] A watan Nuwamba shekarar 2009, Twitter canza tambaya a cikin tweet akwatin daga "Me kake yi?" To "Abin da ke faruwa ba?" (Https://blog.twitter.com/2009/whats-happening).
[ ] Kwak et al. (2010) bincikar 41.7 miliyan mai amfani profiles, 1.47 biliyan zamantakewa dangantakar, 4262 trending batutuwa, da miliyan 106 tweets tsakanin Yuni 6th kuma Yuni 31st, 2009. Bisa ga wannan bincike sun kammala da cewa Twitter hidima more a matsayin sabon matsakaici da bayanai raba fiye da social network.
[ ] "Retweets" sukan yi amfani da su domin auna tasiri da kuma yada na tasiri a Twitter. Da farko, masu amfani ya kwafa da manna da tweet su son, sawa asalin marubucin da ya / ta rike, kuma da hannu rubuta "RT" kafin tweet da ya nuna cewa yana da wani retweet. Sa'an nan kuma, a shekarar 2009 Twitter kara da cewa wani "retweet" button. A watan Yuni 2016, Twitter sanya shi yiwu ga masu amfani don retweet nasu tweets (https://twitter.com/twitter/status/742749353689780224). Kada ka yi tunanin wadannan canje-canje ya kamata shafa yadda ka yi amfani da "retweets" a your bincike? Me yasa i ko a'a?
[ , , ] Michel et al. (2011) gina wani tarin rubuce-rubuce kunno kai daga Google ta kokarin sanyawa litattafai lambobi. Yin amfani da farko version na tarin rubuce-rubuce, wanda aka buga a 2009 da kuma dauke kan 5 miliyan digitized littattafai, marubuta bincikar kalmar amfani mita bincike harsuna canje-canje da kuma al'adu trends. Ba da da ewa Google Books Corpus zama m data source for masu bincike, da 2nd version na database da aka saki a 2012.
Duk da haka, Pechenick, Danforth, and Dodds (2015) ya yi gargadin cewa masu bincike bukatar cikakken faye hali daukan samfur tsari na tarin rubuce-rubuce da amfani da shi domin jawo m karshe. Babban batun shi ne cewa tarin rubuce-rubuce ne library-kamar, dauke da daya daga kowane littafi. A sakamakon haka, wani mutum, salon marubucin zai iya kula saka sabon phrases a cikin Google Books lexicon. Bugu da ƙari, kimiyya texts dokoki da samun substantive rabo daga tarin rubuce-rubuce cikin 1900s. Bugu da kari, ta hanyar kwatanta iri biyu na Turanci Fiction datasets, Pechenick et al. sami tabbacin cewa kasa tace aka yi amfani wajen samar da farko version. Dukkanin bayanan da ake bukata domin aiki yana samuwa a nan: http://storage.googleapis.com/books/ngrams/books/datasetsv2.html
[ , , , ] Penney (2016) ya duba ko da tartsatsi talla game NSA / Prism lura (ie, da Snowden ayoyin) a watan Yuni 2013 ake dangantawa da kaifi da kuma kwatsam karu a traffic to Wikipedia articles a kan batutuwa da cewa ta da bayanin tsare damuwa. Idan haka ne, wannan canji a hali zai zama daidai da wata chilling sakamako sakamakon taro kula. The m na Penney (2016) ne, wani lokacin ake kira da katse lokacin jerin zane da aka alaka da fuskantar a cikin babi game approximating gwajen daga observational data (Sashe 2.4.3).
Don zabi topic keywords, Penney ake magana a cikin jerin amfani da US Department of Gida Tsaro for tracking da kuma sa idanu kafofin watsa labarun. The DHS list categorizes wani search sharuddan cikin kewayon al'amurran da suka shafi, watau "Lafiya damuwa," "Lantarki Tsaro," da kuma "Ta'addanci." Gama binciken kungiyar, Penney amfani da arba'in da takwas keywords alaka "Ta'addanci" (duba Table 8 shafi). Sai ya aggregated Wikipedia article view kirga a kan kowane wata ga m arba'in da takwas Wikipedia articles a kan talatin da biyu ga watan lokaci, daga farkon Janairu 2012 zuwa karshen watan Agusta 2014. To ƙarfafa gardamar, ya kuma halicci dama kwatanta kungiyoyin da tracking kaya views a kan sauran batutuwa.
Yanzu, za ka rubanya da kuma mika Penney (2016) . Dukan raw bayanai da za ka bukata domin wannan aiki shi ne samuwa daga Wikipedia (https://dumps.wikimedia.org/other/pagecounts-raw/). Ko za ka iya samun shi daga R kunshin wikipediatrend (Meissner and Team 2016) . Lokacin da ka rubuta-up your martani, don Allah ka lura da data source ka yi amfani da. (Note: Wannan guda aiki kuma ya bayyana a Babi na 6)
[ ] Efrati (2016) rahotanni, bisa sirri bayani, cewa, "total sharing" on Facebook ya ragu da kamar 5.5% a shekara a kan shekara yayin da "asali watsa shirye-shirye sharing" ya sauka 21% shekara a kan shekara. Wannan ƙi shi musamman m da Facebook users a karkashin shekaru 30 da haihuwa. Rahoton danganta raguwar biyu dalilai. Daya shi ne ci gaban a yawan "abokai" mutane da on Facebook. Sauran shi ne cewa wasu sharing aiki ya canja zuwa saƙon kuma fafatawa a gasa kamar SnapChat. Rahoton kuma saukar da dama dabara Facebook ya yi kokarin bunkasa sharing, ciki har da News Feed algorithm tweaks cewa yin asali posts more shahararren, kazalika ambata tuni na asali posts users "A yau" shekaru da dama da suka wuce. Abin da abubuwan, idan wani, bai wadannan binciken da ga masu bincike da suke so su yi amfani da Facebook a matsayin data source?
[ ] Tumasjan et al. (2010) ya ruwaito cewa rabo daga tweets ambata a jam'iyyar siyasa dace da rabo daga kuri'u cewa jam'iyyar samu a Jamus majalisar zaben a shekarar 2009 (Figure 2.9). A wasu kalmomin, ya bayyana cewa, za ka iya amfani da Twitter a hango ko hasashen zaben. A lokacin da wannan binciken da aka wallafa da shi aka dauke musamman m domin shi da jũna a bayar da shawarar mai muhimmanci da amfani ga kowa tushen babban data.
Ganin bad fasali na babban data, duk da haka, ya kamata ka nan da nan a m wannan sakamakon. Jamusawa a Twitter a shekarar 2009 sun kasance quite maras wakilin kungiyar, da kuma magoya bayan daya jam'iyyar su aike game da harkokin siyasa more sau da yawa. Saboda haka, ga alama m cewa dukan yiwu biases da ka iya tunanin zai ko ta yaya sake fita. A gaskiya, da sakamako a cikin Tumasjan et al. (2010) ya juya a kira su da kuma mai kyau ya zama gaskiya. A cikin takarda, Tumasjan et al. (2010) dauke shida jam'iyyun siyasa: Kirista Democrats (CDU), Christian Social Democrats (CSU), SPD, Musulmai masu sassaucin ra'ayi (FDP), The Left (Mutu Linke), da kuma Green Party (Grüne). Duk da haka, mafi ambata Jamus jam'iyyar siyasa a Twitter a wancan lokacin shi ne dan fashi na teku Party (Piraten), wata ƙungiya cewa yãki gwamnatin tsari na yanar-gizo. Lokacin da dan fashi na teku Party aka kunshe a cikin bincike, Twitter ya ambaci zama m ashen da sakamakon zaben (Figure 2.9) (Jungherr, Jürgens, and Schoen 2012) .
Daga bisani, sauran masu bincike a duniya sun yi amfani da fancier hanyoyin-kamar yin amfani da jin zuciya analysis rarrabe tsakanin tabbatacce kuma korau ambaci daga cikin jam'iyyun-in don inganta ikon da Twitter data hango ko hasashen da dama daban-daban na zaben (Gayo-Avello 2013; Jungherr 2015, Ch. 7.) . Ga yadda Huberty (2015) takaita sakamakon wadannan yunkurin hango ko hasashen zaben:
"All sani kiyasin hanyoyin bisa kafofin watsa labarun sun kasa lõkacin hõre bukatar gaskiya gaba-neman zabe kiyasin. Wadannan kasawa bayyana su zama saboda muhimman hakkokin kaddarorin kafofin watsa labarun, maimakon zuwa methodological ko algorithmic matsaloli. A takaice, kafofin watsa labarun aikata ba, kuma mai yiwuwa ba za, bayar da barga, unbiased, wakilin hoto na za ~ e. da kuma dacewa da samfurori da kafofin watsa labarun rasa isa data gyara wadannan matsaloli post gadi. "
Karanta wasu daga cikin bincike cewa kai Huberty (2015) zuwa ga abin da ƙarshe, da kuma rubuta a daya page memo zuwa siyasa takarar kwatanta idan kuma yadda Twitter ya kamata a yi amfani da hasashen zaben.
[ ] Mene ne bambanci tsakanin wani sociologist da tarihi? A cewar Goldthorpe (1991) , babban bambanci tsakanin sociologist da tarihi shi ne da iko a kan data collection. Masana tarihi suna tilasta yin amfani da sauran kaya alhãli kuwa sociologists iya tela da data tarin zuwa takamaiman dalilai. Read Goldthorpe (1991) . Ta yaya ne bambanci tsakanin ilimin halayyar zaman jama'a da kuma tarihin alaka da ra'ayin Custommades da Readymades?
[ ] Orawa a baya tambaya, Goldthorpe (1991) kusantar da dama m martani, ciki har da daya daga Nicky Hart (1994) cewa ya kalubalanci Goldthorpe ta addini a tela sanya data. Don bayyana gazawar tela da aka yi data, Hart ya bayyana m ma'aikacin Project, babban binciken don auna dangantaka tsakanin zamantakewa ajin da zabe da aka gudanar da Goldthorpe da kuma abokan aiki a cikin tsakiyar 1960s. Kamar yadda daya iya sa ran daga wani masanin wanda falala a kansu tsara data kan samu bayanai, da m ma'aikacin Project tattara bayanai da aka kera don magance wata kwanan nan samarwa ka'idar game da nan gaba na zamantakewa aji a cikin wani zamanin da kara rayuwar. Amma, Goldthorpe da kuma abokan aiki ko ta yaya "manta" to tattara bayanai game da zabe hali na mata. Ga yadda Nicky Hart (1994) taƙaitawar dukan episode:
". . . shi yake da wuya don kauce wa Tsayawa akan matsayin cewa mata da aka tsallake saboda wannan 'tela sanya' dataset aka tsare da wani paradigmatic dabaru wanda cire mace kwarewa. Kore ta a msar tambayar wahayin aji sani da mataki kamar yadda namiji preoccupations. . . , Goldthorpe da takwarorinsa gina wani sa na empirical hujjoji wanda ciyar da nurtured nasu ka'idojin zaton maimakon fallasa su zuwa ga m gwajin adequacy. "
Hart ya ci gaba:
"The empirical binciken da m ma'aikacin Project gaya mana game da masculinist dabi'u na tsakiyar karni ilimin halayyar zaman jama'a daga gare su sanar da matakai na stratification, siyasa da kuma kayan rayuwa."
Za ku iya tunanin wasu misalai inda tela da aka yi data tarin yana da biases daga cikin bayanai tara gina a cikinta? Ta yaya wannan kwatanta to algorithmic confounding? Abin da abubuwan iya wannan da ga lokacin da masu bincike ya kamata amfani Readymades kuma a lõkacin da ya kamata su yi amfani da Custommades?
[ ] A cikin wannan babi, na contrasted data tattara zuwa masu bincike domin bincike da administrative records halitta da kamfanoni da gwamnatoci. Wasu mutane kira wadannan administrative records "same data," abin da suka bambanci da "tsara bayanai." Gaskiya ne cewa administrative records an same ta bincike, amma su ma sosai tsara. Alal misali, ta zamani, tech kamfanonin kashe babban yawa da lokaci da kuma albarkatun da tattara da kuma ajiye su data. Saboda haka, wa'yannan administrative records suna iske kuma tsara, shi kawai ya dogara a kan fuskar (Figure 2.10).
Samar da wani misali na data source inda ya gani biyu kamar yadda samu da kuma tsara shi ne m lokacin amfani da data source for bincike.
[ ] A cikin wani m muqala, Christian Sandvig da Eszter Hargittai (2015) bayyana iri biyu digital bincike, inda digital tsarin ne "kayan aiki", ko "abu na binciken." Wani misali na farko irin binciken da ake inda Bengtsson da kuma abokan aiki (2011) amfani da wayar hannu bayanai zuwa waƙa hijirarsa bayan girgizar kasa a Haiti, a 2010. An misali na biyu shi ne inda irin Jensen (2007) da karatu yadda gabatarwar mobile phones cikin Kerala, India tasiri ga aiki na kasuwa domin kifi. Na sami wannan taimako domin shi ya bayyana cewa karatu ta yin amfani da digital data kafofin iya samun quite daban-daban a raga ma idan suna yin amfani da wannan irin data Madogararsa. Domin kara bayyana wannan bambanci, bayyana hudu karatu da ka gani: biyu da amfani da digital tsarin a matsayin wani kayan aiki da biyu da suke amfani da wani digital tsarin a matsayin wani abu na binciken. Zaka iya amfani da misalai daga wannan babi idan kana so.