Ayyukan

key:

  • mataki na wahala: sauki sauki , matsakaici matsakaici , m wuya , Da wuya wuya
  • bukatar ilimin lissafi ( bukatar ilimin lissafi )
  • bukatar coding ( bukatar coding )
  • data tarin ( data tarin )
  • ta favorites ( ta fi so )
  1. [ matsakaici , ta fi so ] Algorithmic confounding wani matsala da Google Mura Trends. Karanta takarda da Lazer et al. (2014) , da kuma rubuta a takaice, bayyananne email zuwa wani injiniya a Google bayyana matsalar kuma miƙa wani ra'ayin yadda za a gyara matsalar.

  2. [ matsakaici ] Bollen, Mao, and Zeng (2011) ya yi iƙirari cewa data daga Twitter za a iya amfani da su hango ko hasashen stock kasuwa. Wannan binciken ya kai ga halittar wani shinge asusu-Derwent Capital kasuwanni-su zuba jari a kasuwar jari bisa bayanan da aka tattara daga Twitter (Jordan 2010) . Abin da shaida za ka so ka ga kafin sa ka kudi a cikin wannan asusu?

  3. [ sauki ] Duk da yake wasu jama'a kiwon lafiya masu yada ƙanƙara e-sigari a matsayin wani tasiri taimako ga shan taba cessation, wasu yi gargaɗi game da m kasada, kamar high-matakan nicotine. Tunanin cewa wani bincike yanke shawarar da ya yi nazarin jama'a ra'ayi zuwa ga e-sigari da tattara e-sigari da alaka Twitter posts kuma gudanar da jin zuciya analysis.

    1. Mene ne uku yiwu biases cewa kai ne mafi damu game da wannan binciken?
    2. Clark et al. (2016) ya gudu kawai irin wannan binciken. Na farko, su tattara 850,000 tweets cewa amfani da e-taba-shafi keywords daga Janairu 2012 ta Disamba 2014. Bayan kusa dubawa, suka gane cewa mutane da yawa daga cikin wadannan tweets aka sarrafa kansa (watau, ba samar da mutane) da kuma da yawa daga cikin wadannan sarrafa kansa tweets kasance da gaske tallace. Suka ɓullo da wani Human ganewa algorithm ka raba sarrafa kansa tweets daga kwayoyin tweets. Amfani da wannan Human Gane algorithm suka gano cewa, 80% na tweets aka sarrafa kansa. Shin wannan bincike canza amsar part (a)?
    3. A lõkacin da suka kwatanta da jin zuciya a cikin kwayoyin da sarrafa kansa tweets suka gano cewa, da sarrafa kansa tweets ne mafi m fiye da kwayoyin tweets (6.17 gaban 5.84). Shin wannan bincike canza amsar (b)?
  4. [ sauki ] A watan Nuwamba shekarar 2009, Twitter canza tambaya a cikin tweet akwatin daga "Me kake yi?" To "Abin da ke faruwa ba?" (Https://blog.twitter.com/2009/whats-happening).

    1. Yaya za ka yi tunanin canji na tsokana zai shafi wa aike da / ko abin da suka aike?
    2. Name daya bincike aikin ga abin da za ka fi son m "Me kake yi?" Bayyana dalilin da ya sa.
    3. Name daya bincike aikin ga abin da za ka fi son m "Abin da ke faruwa?" Bayyana dalilin da ya sa.
  5. [ matsakaici ] Kwak et al. (2010) bincikar 41.7 miliyan mai amfani profiles, 1.47 biliyan zamantakewa dangantakar, 4262 trending batutuwa, da miliyan 106 tweets tsakanin Yuni 6th kuma Yuni 31st, 2009. Bisa ga wannan bincike sun kammala da cewa Twitter hidima more a matsayin sabon matsakaici da bayanai raba fiye da social network.

    1. Ganin Kwak et al ta binciken, abin da irin bincike za ka yi da Twitter data? Abin da irin bincike za ka yi ba tare da Twitter data? Me ya sa?
    2. A shekara ta 2010, Twitter kara da cewa wani wanda ya bi sabis yin wanda aka kera da shawara ga masu amfani. Three shawarwari da aka nuna a wani lokaci a kan main page. Yabo sukan kõma daga daya ta "friends-of-abokai," kuma juna lambobi ma nuna a cikin shawarwarin. Masu amfani iya refresh ganin wani sabon sa na shawarwari ko ziyarci wani page da ya fi tsayi jerin shawarwari. Kuna ganin wannan sabon fasalin zai canza amsar kashi a)? Me yasa i ko a'a?
    3. Su, Sharma, and Goel (2016) kimanta sakamakon wanda ya bi sabis kuma gano cewa, yayin da masu amfani a fadin shahararsa bakan amfana daga shawarwari, mafi m users amfãne ma fiye da talakawan. Shin wannan bincike canza amsar kashi b)? Me yasa i ko a'a?
  6. [ sauki ] "Retweets" sukan yi amfani da su domin auna tasiri da kuma yada na tasiri a Twitter. Da farko, masu amfani ya kwafa da manna da tweet su son, sawa asalin marubucin da ya / ta rike, kuma da hannu rubuta "RT" kafin tweet da ya nuna cewa yana da wani retweet. Sa'an nan kuma, a shekarar 2009 Twitter kara da cewa wani "retweet" button. A watan Yuni 2016, Twitter sanya shi yiwu ga masu amfani don retweet nasu tweets (https://twitter.com/twitter/status/742749353689780224). Kada ka yi tunanin wadannan canje-canje ya kamata shafa yadda ka yi amfani da "retweets" a your bincike? Me yasa i ko a'a?

  7. [ matsakaici , data tarin , bukatar coding ] Michel et al. (2011) gina wani tarin rubuce-rubuce kunno kai daga Google ta kokarin sanyawa litattafai lambobi. Yin amfani da farko version na tarin rubuce-rubuce, wanda aka buga a 2009 da kuma dauke kan 5 miliyan digitized littattafai, marubuta bincikar kalmar amfani mita bincike harsuna canje-canje da kuma al'adu trends. Ba da da ewa Google Books Corpus zama m data source for masu bincike, da 2nd version na database da aka saki a 2012.

    Duk da haka, Pechenick, Danforth, and Dodds (2015) ya yi gargadin cewa masu bincike bukatar cikakken faye hali daukan samfur tsari na tarin rubuce-rubuce da amfani da shi domin jawo m karshe. Babban batun shi ne cewa tarin rubuce-rubuce ne library-kamar, dauke da daya daga kowane littafi. A sakamakon haka, wani mutum, salon marubucin zai iya kula saka sabon phrases a cikin Google Books lexicon. Bugu da ƙari, kimiyya texts dokoki da samun substantive rabo daga tarin rubuce-rubuce cikin 1900s. Bugu da kari, ta hanyar kwatanta iri biyu na Turanci Fiction datasets, Pechenick et al. sami tabbacin cewa kasa tace aka yi amfani wajen samar da farko version. Dukkanin bayanan da ake bukata domin aiki yana samuwa a nan: http://storage.googleapis.com/books/ngrams/books/datasetsv2.html

    1. A Michel et al. Ainihin takarda (2011) , sun yi amfani da 1st version na Turanci data sa, kuma suka yi niyya ga mita na amfani na shekara "1880", "1912" da kuma "1973", da kuma ƙarasa da cewa "muna manta mu da sauri tare da kowane wucewa shekara "(siffa 3A, Michel et al.). Rubanya wannan mãkirci amfani 1) 1st version na tarin rubuce-rubuce, English dataset (kamar siffa 3A, Michel et al.)
    2. Yanzu rubanya wannan mãkirci da 1st version, English fiction dataset.
    3. Yanzu rubanya wannan mãkirci da 2nd version na tarin rubuce-rubuce, English dataset.
    4. A karshe, rubanya wannan mãkirci da 2nd version, English fiction dataset.
    5. Kwatanta da bambance-bambance da kamance tsakanin wadannan hudu mãkirci. Shin kun yarda da Michel et al. Ainihin fassarar da lura Trend? (Ambato: c) da kuma d) ya zama kamar Figure 16 a Pechenick et al.)
    6. Yanzu da ka replicated wannan binciken yin amfani da daban-daban Google Books corpora, zabi wani harsuna canji ko al'adu mamaki gabatar a Michel et al. Ainihin takarda. Shin kun yarda da su fassarar a cikin hasken da gazawar gabatar a Pechenick et al.? Don yin your shawara karfi, kokarin rubanya wannan jadawali amfani da daban-daban versions na data kafa kamar yadda a sama.
  8. [ wuya , data tarin , bukatar coding , ta fi so ] Penney (2016) ya duba ko da tartsatsi talla game NSA / Prism lura (ie, da Snowden ayoyin) a watan Yuni 2013 ake dangantawa da kaifi da kuma kwatsam karu a traffic to Wikipedia articles a kan batutuwa da cewa ta da bayanin tsare damuwa. Idan haka ne, wannan canji a hali zai zama daidai da wata chilling sakamako sakamakon taro kula. The m na Penney (2016) ne, wani lokacin ake kira da katse lokacin jerin zane da aka alaka da fuskantar a cikin babi game approximating gwajen daga observational data (Sashe 2.4.3).

    Don zabi topic keywords, Penney ake magana a cikin jerin amfani da US Department of Gida Tsaro for tracking da kuma sa idanu kafofin watsa labarun. The DHS list categorizes wani search sharuddan cikin kewayon al'amurran da suka shafi, watau "Lafiya damuwa," "Lantarki Tsaro," da kuma "Ta'addanci." Gama binciken kungiyar, Penney amfani da arba'in da takwas keywords alaka "Ta'addanci" (duba Table 8 shafi). Sai ya aggregated Wikipedia article view kirga a kan kowane wata ga m arba'in da takwas Wikipedia articles a kan talatin da biyu ga watan lokaci, daga farkon Janairu 2012 zuwa karshen watan Agusta 2014. To ƙarfafa gardamar, ya kuma halicci dama kwatanta kungiyoyin da tracking kaya views a kan sauran batutuwa.

    Yanzu, za ka rubanya da kuma mika Penney (2016) . Dukan raw bayanai da za ka bukata domin wannan aiki shi ne samuwa daga Wikipedia (https://dumps.wikimedia.org/other/pagecounts-raw/). Ko za ka iya samun shi daga R kunshin wikipediatrend (Meissner and Team 2016) . Lokacin da ka rubuta-up your martani, don Allah ka lura da data source ka yi amfani da. (Note: Wannan guda aiki kuma ya bayyana a Babi na 6)

    1. Read Penney (2016) da kuma rubanya Figure 2 wanda ya nuna page views for "Ta'addanci" -related pages kafin da kuma bayan da Snowden wahayi. Fassara binciken.
    2. Next, rubanya siffa 4A, wanda kwantanta nazarin kungiyar ( "Ta'addanci" -related articles) tare da comparator kungiyar ta yin amfani da keywords kasafta karkashin "DHS & Other Hukumomin" daga DHS list (duba Karin Bayani Table 10). Fassara binciken.
    3. A kashi b) ku idan aka kwatanta da binciken kungiyar daya comparator kungiyar. Penney ma idan aka kwatanta da sauran biyu comparator kungiyoyin: "Lantarki Tsaro" -related articles (Shafi Table 11) da kuma rare Wikipedia pages (Shafi Table 12). Ka zo tare da wani madadin comparator kungiyar, da kuma gwada idan binciken daga kashi b) ne m to your zabi na comparator kungiyar. Wanne zabi na comparator kungiyar sa mafi hankalta? Me ya sa?
    4. Marubucin ya bayyana cewa, game da keywords "Ta'addanci" aka yi amfani don zaɓar Wikipedia articles domin gwamnatin Amirka kawo sunayensu ta'addanci a matsayin key gaskata domin ta online kula da ayyuka. As rajistan wadannan 48 "Ta'addanci" -related keywords, Penney (2016) kuma gudanar da wani binciken a kan MTurk tambayar weights zuwa Rate kowane daga keywords cikin sharuddan gwamnatin Masifa, Privacy-m, kuma kaucewa (Shafi Table 7 da 8). Rubanya binciken a kan MTurk kuma kwatanta your results.
    5. Bisa ga sakamakon a sashi d) da karatu daga cikin labarin, kada ka yarda da marubucin zabi na topic keywords a cikin binciken kungiyar? Me yasa i ko a'a? Idan ba haka ba, me za ka bayar da shawarar a maimakon?
  9. [ sauki ] Efrati (2016) rahotanni, bisa sirri bayani, cewa, "total sharing" on Facebook ya ragu da kamar 5.5% a shekara a kan shekara yayin da "asali watsa shirye-shirye sharing" ya sauka 21% shekara a kan shekara. Wannan ƙi shi musamman m da Facebook users a karkashin shekaru 30 da haihuwa. Rahoton danganta raguwar biyu dalilai. Daya shi ne ci gaban a yawan "abokai" mutane da on Facebook. Sauran shi ne cewa wasu sharing aiki ya canja zuwa saƙon kuma fafatawa a gasa kamar SnapChat. Rahoton kuma saukar da dama dabara Facebook ya yi kokarin bunkasa sharing, ciki har da News Feed algorithm tweaks cewa yin asali posts more shahararren, kazalika ambata tuni na asali posts users "A yau" shekaru da dama da suka wuce. Abin da abubuwan, idan wani, bai wadannan binciken da ga masu bincike da suke so su yi amfani da Facebook a matsayin data source?

  10. [ matsakaici ] Tumasjan et al. (2010) ya ruwaito cewa rabo daga tweets ambata a jam'iyyar siyasa dace da rabo daga kuri'u cewa jam'iyyar samu a Jamus majalisar zaben a shekarar 2009 (Figure 2.9). A wasu kalmomin, ya bayyana cewa, za ka iya amfani da Twitter a hango ko hasashen zaben. A lokacin da wannan binciken da aka wallafa da shi aka dauke musamman m domin shi da jũna a bayar da shawarar mai muhimmanci da amfani ga kowa tushen babban data.

    Ganin bad fasali na babban data, duk da haka, ya kamata ka nan da nan a m wannan sakamakon. Jamusawa a Twitter a shekarar 2009 sun kasance quite maras wakilin kungiyar, da kuma magoya bayan daya jam'iyyar su aike game da harkokin siyasa more sau da yawa. Saboda haka, ga alama m cewa dukan yiwu biases da ka iya tunanin zai ko ta yaya sake fita. A gaskiya, da sakamako a cikin Tumasjan et al. (2010) ya juya a kira su da kuma mai kyau ya zama gaskiya. A cikin takarda, Tumasjan et al. (2010) dauke shida jam'iyyun siyasa: Kirista Democrats (CDU), Christian Social Democrats (CSU), SPD, Musulmai masu sassaucin ra'ayi (FDP), The Left (Mutu Linke), da kuma Green Party (Grüne). Duk da haka, mafi ambata Jamus jam'iyyar siyasa a Twitter a wancan lokacin shi ne dan fashi na teku Party (Piraten), wata ƙungiya cewa yãki gwamnatin tsari na yanar-gizo. Lokacin da dan fashi na teku Party aka kunshe a cikin bincike, Twitter ya ambaci zama m ashen da sakamakon zaben (Figure 2.9) (Jungherr, Jürgens, and Schoen 2012) .

    Adadi 2.9: Twitter ambaci bayyana a hango ko hasashen sakamakon da 2009 Jamus zaben (Tumasjan et al. 2010), amma wannan sakamakon itace dogara a kan wasu sabani da m zabi (Jungherr, Jürgens, kuma Schoen 2012).

    Adadi 2.9: Twitter ambaci bayyana a hango ko hasashen sakamakon da 2009 Jamus zaben (Tumasjan et al. 2010) , amma wannan sakamakon itace dogara a kan wasu sabani da m zabi (Jungherr, Jürgens, and Schoen 2012) .

    Daga bisani, sauran masu bincike a duniya sun yi amfani da fancier hanyoyin-kamar yin amfani da jin zuciya analysis rarrabe tsakanin tabbatacce kuma korau ambaci daga cikin jam'iyyun-in don inganta ikon da Twitter data hango ko hasashen da dama daban-daban na zaben (Gayo-Avello 2013; Jungherr 2015, Ch. 7.) . Ga yadda Huberty (2015) takaita sakamakon wadannan yunkurin hango ko hasashen zaben:

    "All sani kiyasin hanyoyin bisa kafofin watsa labarun sun kasa lõkacin hõre bukatar gaskiya gaba-neman zabe kiyasin. Wadannan kasawa bayyana su zama saboda muhimman hakkokin kaddarorin kafofin watsa labarun, maimakon zuwa methodological ko algorithmic matsaloli. A takaice, kafofin watsa labarun aikata ba, kuma mai yiwuwa ba za, bayar da barga, unbiased, wakilin hoto na za ~ e. da kuma dacewa da samfurori da kafofin watsa labarun rasa isa data gyara wadannan matsaloli post gadi. "

    Karanta wasu daga cikin bincike cewa kai Huberty (2015) zuwa ga abin da ƙarshe, da kuma rubuta a daya page memo zuwa siyasa takarar kwatanta idan kuma yadda Twitter ya kamata a yi amfani da hasashen zaben.

  11. [ matsakaici ] Mene ne bambanci tsakanin wani sociologist da tarihi? A cewar Goldthorpe (1991) , babban bambanci tsakanin sociologist da tarihi shi ne da iko a kan data collection. Masana tarihi suna tilasta yin amfani da sauran kaya alhãli kuwa sociologists iya tela da data tarin zuwa takamaiman dalilai. Read Goldthorpe (1991) . Ta yaya ne bambanci tsakanin ilimin halayyar zaman jama'a da kuma tarihin alaka da ra'ayin Custommades da Readymades?

  12. [ wuya ] Orawa a baya tambaya, Goldthorpe (1991) kusantar da dama m martani, ciki har da daya daga Nicky Hart (1994) cewa ya kalubalanci Goldthorpe ta addini a tela sanya data. Don bayyana gazawar tela da aka yi data, Hart ya bayyana m ma'aikacin Project, babban binciken don auna dangantaka tsakanin zamantakewa ajin da zabe da aka gudanar da Goldthorpe da kuma abokan aiki a cikin tsakiyar 1960s. Kamar yadda daya iya sa ran daga wani masanin wanda falala a kansu tsara data kan samu bayanai, da m ma'aikacin Project tattara bayanai da aka kera don magance wata kwanan nan samarwa ka'idar game da nan gaba na zamantakewa aji a cikin wani zamanin da kara rayuwar. Amma, Goldthorpe da kuma abokan aiki ko ta yaya "manta" to tattara bayanai game da zabe hali na mata. Ga yadda Nicky Hart (1994) taƙaitawar dukan episode:

    ". . . shi yake da wuya don kauce wa Tsayawa akan matsayin cewa mata da aka tsallake saboda wannan 'tela sanya' dataset aka tsare da wani paradigmatic dabaru wanda cire mace kwarewa. Kore ta a msar tambayar wahayin aji sani da mataki kamar yadda namiji preoccupations. . . , Goldthorpe da takwarorinsa gina wani sa na empirical hujjoji wanda ciyar da nurtured nasu ka'idojin zaton maimakon fallasa su zuwa ga m gwajin adequacy. "

    Hart ya ci gaba:

    "The empirical binciken da m ma'aikacin Project gaya mana game da masculinist dabi'u na tsakiyar karni ilimin halayyar zaman jama'a daga gare su sanar da matakai na stratification, siyasa da kuma kayan rayuwa."

    Za ku iya tunanin wasu misalai inda tela da aka yi data tarin yana da biases daga cikin bayanai tara gina a cikinta? Ta yaya wannan kwatanta to algorithmic confounding? Abin da abubuwan iya wannan da ga lokacin da masu bincike ya kamata amfani Readymades kuma a lõkacin da ya kamata su yi amfani da Custommades?

  13. [ matsakaici ] A cikin wannan babi, na contrasted data tattara zuwa masu bincike domin bincike da administrative records halitta da kamfanoni da gwamnatoci. Wasu mutane kira wadannan administrative records "same data," abin da suka bambanci da "tsara bayanai." Gaskiya ne cewa administrative records an same ta bincike, amma su ma sosai tsara. Alal misali, ta zamani, tech kamfanonin kashe babban yawa da lokaci da kuma albarkatun da tattara da kuma ajiye su data. Saboda haka, wa'yannan administrative records suna iske kuma tsara, shi kawai ya dogara a kan fuskar (Figure 2.10).

    Figure 2.10: A hoto ne duka a duck da zomo. abin da ka ga ya dogara a kan hangen zaman gaba. Gwamnatin da kuma kasuwanci administrative records suna iske kuma tsara. abin da ka ga ya dogara a kan hangen zaman gaba. Alal misali, kira data records tattara zuwa wayar kamfanin da ake samu bayanai daga hangen zaman gaba da wani bincike. Amma, wadannan ainihin wannan records an tsara data fuskar wani aiki a lissafin kuɗi sashen na wayar kamfanin. Source: Wikimedia Commons

    Figure 2.10: A hoto ne duka a duck da zomo. abin da ka ga ya dogara a kan hangen zaman gaba. Gwamnatin da kuma kasuwanci administrative records suna iske kuma tsara. abin da ka ga ya dogara a kan hangen zaman gaba. Alal misali, kira data records tattara zuwa wayar kamfanin da ake samu bayanai daga hangen zaman gaba da wani bincike. Amma, wadannan ainihin wannan records an tsara data fuskar wani aiki a lissafin kuɗi sashen na wayar kamfanin. Source: Wikimedia Commons

    Samar da wani misali na data source inda ya gani biyu kamar yadda samu da kuma tsara shi ne m lokacin amfani da data source for bincike.

  14. [ sauki ] A cikin wani m muqala, Christian Sandvig da Eszter Hargittai (2015) bayyana iri biyu digital bincike, inda digital tsarin ne "kayan aiki", ko "abu na binciken." Wani misali na farko irin binciken da ake inda Bengtsson da kuma abokan aiki (2011) amfani da wayar hannu bayanai zuwa waƙa hijirarsa bayan girgizar kasa a Haiti, a 2010. An misali na biyu shi ne inda irin Jensen (2007) da karatu yadda gabatarwar mobile phones cikin Kerala, India tasiri ga aiki na kasuwa domin kifi. Na sami wannan taimako domin shi ya bayyana cewa karatu ta yin amfani da digital data kafofin iya samun quite daban-daban a raga ma idan suna yin amfani da wannan irin data Madogararsa. Domin kara bayyana wannan bambanci, bayyana hudu karatu da ka gani: biyu da amfani da digital tsarin a matsayin wani kayan aiki da biyu da suke amfani da wani digital tsarin a matsayin wani abu na binciken. Zaka iya amfani da misalai daga wannan babi idan kana so.