Ayyukan

  • mataki na wahala: sauki sauƙi , matsakaici matsakaici , wuya wuya , mai wuya sosai wuya
  • yana buƙatar matsa ( yana buƙatar matsa )
  • yana buƙatar coding ( yana buƙatar coding )
  • tattara bayanai ( tattara bayanai )
  • my favorites ( na fi so )
  1. [ matsakaici , na fi so ] Algorithmic rikitawa shi ne matsala tare da Google Flu Trends. Karanta takarda ta Lazer et al. (2014) , kuma rubuta wani ɗan gajeren gajere, share imel ga injiniya a Google yana bayanin matsalar kuma ya ba da ra'ayin yadda za'a gyara shi.

  2. [ matsakaici ] Bollen, Mao, and Zeng (2011) iƙirarin cewa bayanai daga Twitter za a iya amfani dasu don hango nasu kasuwa. Wannan binciken ya haifar da kafa asusun shinge-Derwent Capital Markets-don zuba jarurruka a kasuwar jari bisa bayanan da aka tattara daga Twitter (Jordan 2010) . Wadanne shaidu za ku so ku gani kafin ku saka kuɗin a cikin asusu?

  3. [ sauƙi ] Yayin da wasu masu bayar da agajin lafiyar jama'a suka yi la'akari da cigaban e-cigaba don taimakawa wajen shan taba, wasu sunyi gargadin yiwuwar hadarin, irin su matakan nicotine. Ka yi tunanin cewa wani mai bincike ya yanke shawarar nazarin ra'ayin jama'a game da taba-sigari ta hanyar tattara takaddun shafukan yanar-gizon e-cigare da gudanar da zane-zane.

    1. Mene ne abubuwa uku da za ku damu da cewa kun fi damu game da wannan binciken?
    2. Clark et al. (2016) gudana kawai irin wannan binciken. Na farko, sun tattara tweets 850,000 waɗanda suka yi amfani da kalmomin da aka yi amfani da su na cigaban cigaba daga Janairu 2012 zuwa Disamba 2014. Bayan dubawa, suka gane cewa an yi amfani da yawa daga cikin waɗannan nau'in tweets (watau, ba a samar da su ba) kuma da yawa daga cikin waɗannan tweets na ainihi sun kasance da gaske kasuwanci. Sun ci gaba da gano algorithm mutum don raba tweets mai sarrafa kansa daga kwayoyin tweets. Amfani da wannan mutum gano algorithm sun gano cewa 80% na tweets an sarrafa ta atomatik. Shin wannan binciken ya canza amsarku ga sashi (a)?
    3. Lokacin da suka kwatanta jinin a cikin kwayoyin halitta da masu amfani da fasaha, sun gano cewa tweets mai sarrafa kansa sun kasance mafi kyau fiye da kwayoyin tweets (6.17 zuwa 5.84). Shin wannan binciken ya canza amsarku zuwa (b)?
  4. [ sauƙi ] A watan Nuwamba 2009, Twitter ta canza tambayar a cikin akwatin zane daga "Me kuke yin?" Zuwa "Me ke faruwa?" (Https://blog.twitter.com/2009/whats-happening).

    1. Yaya kake tsammanin canji na tasowa zai shafi wanda tweets da / ko abin da suke tweet?
    2. Rubuta aikin bincike guda daya wanda za ku fi son rukunin "Me kuke yin?" Bayyana dalilin da yasa.
    3. Sanya wani aikin bincike wanda za ka fi son rukunin "Me ke faruwa?" Bayyana dalilin da yasa.
  5. [ sauƙi ] "Saukewa" ana amfani da su don auna tasiri da kuma yada tasiri akan Twitter. Da farko, masu amfani sun kwafa da kuma manna tweet da suka so, zana mawallafi na ainihi tare da mahimmanta, kuma da hannu ta rubuta "RT" a gaban tweet don nuna cewa yana da wani retweet. Sa'an nan kuma, a 2009, Twitter ya kara maɓallin "retweet". A watan Yuni 2016, Twitter ya sa masu amfani su sake gwada tweets a kansu (https://twitter.com/twitter/status/742749353689780224). Kuna tsammanin wadannan canje-canje zasu shafi yadda kuke amfani da "retweets" a cikin bincikenku? Me ya sa ko me yasa ba?

  6. [ sosai wuya , tattara bayanai , yana buƙatar coding , na fi so ] A cikin takarda da aka tattauna, Michel da abokan aiki (2011) bincikar abubuwan da ke dauke da littattafai fiye da miliyan biyar a cikin ƙoƙari na gano al'amuran al'adu na dogon lokaci. Bayanan da aka yi amfani dasu yanzu an sake sakin su kamar yadda Google NGrams dataset ya samu, don haka za mu iya amfani da bayanan don sake bugawa da kuma mika wasu ayyukan su.

    A cikin ɗaya daga cikin sakamakon da yawa a cikin takardun, Michel da abokan aiki sun jaddada cewa muna manta da sauri da sauri. A wani shekara, ka ce "1883," sun ƙididdige yawan adadin 1-grams da aka buga a kowace shekara tsakanin 1875 zuwa 1975 wanda "1883". Sun yi tunani cewa wannan karfin ne ma'auni na sha'awar abubuwan da suka faru a wannan shekarar. A cikin siffar su 3a, sun yi mãkircin fasalin da ake amfani dasu a shekaru uku: 1883, 1910, da 1950. Wadannan shekaru uku suna raba wani abu na yau da kullum: kadan amfani kafin wannan shekarar, to, kuzari, sa'an nan kuma lalata. Bayan haka, don tantance yawan lalacewa a kowace shekara, Michel da abokan aiki sun lasafta "rabin rabi" na kowace shekara domin dukan shekaru tsakanin 1875 zuwa 1975. A cikin adadi na 3a (sun kasance), sun nuna cewa rabi na kowane shekara yana ragewa, kuma suna jaddada cewa wannan yana nufin cewa muna manta da baya da sauri da sauri. Sun yi amfani da Amfani na 1 na harshen Turanci, amma daga bisani Google ya sake fitowa ta biyu na corpus. Da fatan a karanta dukkan bangarorin tambayoyin kafin ka fara zangon.

    Wannan aikin zai ba ka yin aiki da rubutu rubutun da za a iya sake sauyawa, sakamakon fassara, da kuma rikicewar bayanai (kamar aiki tare da fayiloli mara kyau da kuma magance bayanai bace). Wannan aikin zai taimaka maka ka tashi da gudu tare da dataset mai arziki da ban sha'awa.

    1. Samun bayanai masu dacewa daga shafin yanar gizon Google Books NGram Viewer. Musamman ma, ya kamata ka yi amfani da sashi na 2 na harshen Turanci, wanda aka saki a ranar 1 ga Yuli, 2012. Ba tare da kariya ba, wannan fayil ɗin shine 1.4GB.

    2. Sauke babban sashi na 3a na Michel et al. (2011) . Don sake kwatanta wannan adadi, zaku buƙaci fayiloli guda biyu: wanda kuka sauke a sashi na (a) da "jimlar kuɗi", wanda zaka iya amfani da su don sake juyawa takaddun shaida. Lura cewa tsarin ƙidayar ƙidayar yana da tsarin da zai sa ya zama da wuya a karanta a ciki. Shin fassarar 2 na NGram data samar da irin wannan sakamako ga waɗanda aka gabatar a Michel et al. (2011) , wanda ke dogara ne akan bayanin 1?

    3. Yanzu duba shafukanku akan hoton da NGram Viewer ya tsara.

    4. Recreate adadi 3a (ainihin hoto), amma canza da \(y\) -axis don zama ƙididdigar ƙididdigar ƙididdiga (ba a san yadda ake magana ba).

    5. Shin bambanci tsakanin (b) da (d) kai ka sake sake duba duk wani sakamakon da Michel et al. (2011). Me ya sa ko me yasa ba?

    6. Yanzu, ta yin amfani da raƙuman labaran, zakuyi rubutun adadi na 3a. Wato, a kowace shekara tsakanin 1875 zuwa 1975, lissafin rabin rabin shekarar wannan shekara. Rabin rabi an ƙayyade ya zama yawan shekarun da suka wuce kafin sassaucin ra'ayi ya kai rabin ƙimarsa. Lura cewa Michel et al. (2011) yi wani abu da yafi rikitarwa don kimanta rabi na rabi na III.6 na Rahoton Intanet na Taimako-amma suna da'awar cewa duk hanyoyin biyu suna samar da sakamako irin wannan. Shin fassarar 2 na NGram data samar da irin wannan sakamako ga waɗanda aka gabatar a Michel et al. (2011) , wanda ke dogara ne akan bayanin 1? (Shawarwari: Kada ka yi mamaki idan ba haka ba.)

    7. Shin akwai wasu shekarun da suka wuce kamar shekaru da aka manta musamman da sauri ko musamman sannu a hankali? Yi bayani a taƙaice game da dalilan da za a iya amfani da shi don bayyana irin yadda ka gano wadanda suka fito.

    8. Yanzu sabunta wannan sakamakon don version 2 na NGrams bayanai a cikin Sinanci, Faransanci, Jamusanci, Ibrananci, Italiyanci, Rasha da Mutanen Espanya.

    9. Idan muka kwatanta a ko'ina cikin harsuna, akwai wasu shekarun da suka wuce, irin su shekaru da aka manta musamman da sauri ko musamman sannu a hankali? Yi bayani a taƙaice game da dalilan da za a iya yin hakan.

  7. [ sosai wuya , tattara bayanai , yana buƙatar coding , na fi so ] Penney (2016) binciko ko yada labarai game da kulawar NSA / PRISM (watau wahalar Snowden) a watan Yuni 2013 an hade da haɓakaccen kwatsam a hanzarin zuwa hanyoyin Wikipedia a kan batutuwa da suke tayar da damuwa na sirri. Idan haka ne, wannan canji a hali zai kasance daidai da sakamako mai lalacewa daga sakamakon taro. Penney (2016) wani lokaci ana kiran sa da zubar da zane, kuma yana da alaƙa da hanyoyin da aka bayyana a sashe na 2.4.3.

    Don zaɓar kalmomin mahimman kalmomi, Penney yayi magana akan jerin da Ma'aikatar Tsaro na Gida ta Amurka ta yi amfani da su don biyan sa ido da kuma kula da kafofin watsa labarun. Jerin DHS ya rarraba wasu sharuddan bincike a cikin wasu al'amurra, watau, "Lafiya," "Tsaro Harkokin Tsaro," da "Ta'addanci." Ga ƙungiyar binciken, Penney yayi amfani da kalmomin 48 da suka danganci "Ta'addanci" (duba shafi na 8 ). Ya kuma hada gwargwadon shafi na Wikipedia a kowane wata don abubuwan da suka dace 48 Wikipedia a cikin wata 32, daga farkon Janairu 2012 zuwa karshen watan Agustan 2014. Don ƙarfafa gardamarsa, ya kuma kirkiro ƙungiyoyi masu yawa ta hanyar bin sa ra'ayoyi game da wasu batutuwa.

    Yanzu, za ku sake yin jima'i da mika Penney (2016) . Dukan bayanan da za ku buƙaci don wannan aikin yana samuwa daga Wikipedia. Ko zaka iya samun shi daga R-kunshin wikipediatrend (Meissner and R Core Team 2016) . Lokacin da ka rubuta amsoshinka, don Allah a lura abin da tushen bayanan da kuka yi amfani dasu. (Lura cewa wannan aikin yana bayyana a babi na 6.) Wannan aikin zai ba ka yin aiki a cikin jayayya na bayanai da tunani game da gwaje-gwaje na halitta a cikin manyan bayanai. Har ila yau zai sa ku sama da gudu tare da tushen bayanan mai ban sha'awa don ayyukan gaba.

    1. Karanta Penney (2016) kuma ya sake kwatanta siffarsa 2 wadda ta nuna alamun shafi na "Ta'addanci" -nannun shafukan da aka danganta da kuma bayan bayanan Snowden. Bayyana binciken.
    2. Kashi na gaba, mai lamba 4A, wanda ya kwatanta ƙungiyar bincike ("Ta'addanci" -ananan sharuɗɗa) tare da ƙungiyar mai ƙididdiga ta amfani da kalmomi masu mahimmanci waɗanda aka rarraba a ƙarƙashin "DHS & Sauran Agencies" daga jerin DHS (duba shafi na layi na 10 da ƙashi na 139). Bayyana binciken.
    3. A wani ɓangare (b) ka kwatanta ƙungiyar binciken tare da ƙungiyar mai gwadawa ɗaya. Penney kuma idan aka kwatanta da wasu kungiyoyi masu zaman kansu guda biyu: "Abubuwan Tsaro na Tsaro" da aka danganci (shafi ta 11) da kuma shafukan Wikipedia masu amfani (shafi na 12). Ku zo tare da wata ƙungiya mai ba da shawara, kuma gwada ko abubuwan da aka samu daga ɓangare (b) suna damu da zaɓin kungiya mai gwadawa. Wanne zabin sa ya fi hankali? Me ya sa?
    4. Penney ya bayyana cewa, ana amfani da kalmomin da suka shafi "Ta'addanci" don zaɓar abubuwan da Wikipedia ke amfani da ita domin gwamnatin Amurka ta bayyana ta'addanci a matsayin wata hujja mai mahimmanci ga ayyukan kula da kan layi. A matsayin bincike na waɗannan 'yan ta'addan nan 48 na' 'ta'addanci' '' - kalmomi masu dangantaka, Penney (2016) kuma sun gudanar da bincike a kan MTurk, suna tambayar masu amsawa don yin la'akari da kowace ma'anar kalmomin ht a cikin sha'anin Matsala na Gwamnati, Tsaro-Sani, da kaucewa (shafi na 7 da 8 ). Rubuta binciken a kan MTurk kuma kwatanta sakamakonku.
    5. Bisa ga sakamakon a ɓangare (d) da kuma karatun labarin, kun yarda da zabi na Penney na ma'anar kalmomi a cikin ƙungiyar binciken? Me ya sa ko me yasa ba? Idan ba haka ba, menene za ku bayar a maimakon haka?
  8. [ sauƙi ] Efrati (2016) ruwaito, bisa ga bayanin sirri, cewa "raba baki" akan Facebook ya ki yarda da kimanin shekara 5.5% a kowace shekara, yayin da "rahotannin watsa shirye-shirye na asali" ya karu da kashi 21% a shekara. Wannan rushewa ya kasance mai ban sha'awa tare da masu amfani da Facebook a kasa da shekaru 30. Rahoton ya danganta da ragu zuwa abubuwa biyu. Daya shine girma cikin adadin "abokai" mutane suna kan Facebook. Sauran ita ce, wasu ayyukan rabawa sun sauya zuwa saƙonni da masu gasa kamar Snapchat. Rahoton ya kuma bayyana dabarun da Facebook ta yi ƙoƙari don bunkasa rabawa, ciki har da Turawalin algorithm na News wanda ke sanya ginshiƙan asali mafi mahimmanci, da kuma tunatarwa na lokaci na asali da siffar "A wannan rana". Menene abubuwa, idan wani ya yi, waɗannan binciken ne na masu bincike da suke so su yi amfani da Facebook a matsayin tushen bayanai?

  9. [ matsakaici ] Menene bambanci tsakanin masanin ilimin zamantakewa da kuma tarihi? A cewar Goldthorpe (1991) , babban bambanci shine iko akan tattara bayanai. Ana tilasta masu yin tarihi su yi amfani da relics, yayin da masu ilimin zamantakewa na iya kirkiro tattara bayanai ga wasu dalilai. Karanta Goldthorpe (1991) . Yaya bambanci tsakanin ilimin zamantakewar al'umma da tarihin ya shafi manufar custommades da readymades?

  10. [ wuya ] Wannan yana gina kan tambayoyin da suka gabata. Goldthorpe (1991) ya jawo hanyoyi masu mahimmanci, ciki harda wanda daga Nicky Hart (1994) wanda ya kalubalanci yin sujada ga Goldthorpe don yin amfani da bayanai. Don bayyana mahimmancin yiwuwar bayanan da aka yi, Hart ya bayyana Ma'aikatar Harkokin Kasuwanci, babban bincike don auna dangantakar tsakanin zamantakewa da kuma ra'ayin da Goldthorpe da abokan aiki suka gudanar a tsakiyar shekarun 1960. Kamar yadda mutum zai iya tsammanin wani malamin wanda ya fi so ya tsara bayanan da aka samu bayanai, Cibiyar Harkokin Kasuwanci ta tattara bayanai da aka tsara domin magance wata ka'idar da aka tsara kwanan nan game da makomar zamantakewar al'umma a wani lokaci na cigaba da rayuwa. Amma, Goldthorpe da abokan aiki sun "manta" ko ta yaya sun "tattara" bayanai game da yadda za a gudanar da za ~ e na mata. Ga yadda Nicky Hart (1994) taƙaita dukkanin labarin:

    "... yana da matukar wuya a guje wa ƙaddamarwa cewa an kawar da mata saboda wannan" mai fasahar da aka rubuta "dataset an tsare shi ta hanyar tunani mai zurfi wadda ba ta da masaniyar mata. Gwadawa ta hanyar hangen nesa game da ilimin ajiyar yara da kuma aiki a matsayin kulawa da maza ..., Goldthorpe da abokan aikinsa sun gina wata hujja ta tabbatar da abin da suke ciyarwa da kuma inganta tunanin kansu maimakon magance su ga gwaji mai kyau. "

    Hart ya ci gaba:

    "Sakamakon binciken da ya shafi ma'aikata na ma'aikata ya gaya mana game da dabi'un masculinist na zamantakewar zamantakewar karni na karni fiye da yadda suke sanar da hanyoyin tafiyar da zamantakewa, siyasa da rayuwa."

    Kuna iya yin la'akari da wasu misalai inda karbar tattara bayanan da ke tattare da mai tattara bayanai ya gina cikin shi? Yaya wannan ya kwatanta da algorithmic rikitawa? Mene ne wannan zai iya faruwa a lokacin da masu bincike su yi amfani da shirye-shirye da kuma lokacin da suke amfani da custommades?

  11. [ matsakaici ] A cikin wannan babi, na bambanta bayanan da masu bincike suka tattara don masu bincike tare da bayanan kulawa da kamfanoni da gwamnatoci suka kirkiri. Wasu mutane suna kira wadannan bayanan kulawa "sun sami bayanai," wanda suke bambanta da "tsara bayanai." Gaskiya ne cewa masu bincike sun samo asali na tarihin gudanarwa, amma an tsara su sosai. Alal misali, kamfanonin fasaha na yau da kullum suna aiki da wuya don tattarawa da kuma magance bayanai. Sabili da haka, waɗannan kayan tarihi sun samo su kuma an tsara su, shi ne kawai ya dogara da hangen nesa (siffa 2.12).

    Figure 2.12: Hoton yana da duck da zomo; abin da kake gani ya dogara ne akan hangen nesa. Ana samo samfurori da manyan samfurori; Har ila yau, abin da kuke gani yana dogara ne da hangen nesa. Alal misali, bayanan bayanan da aka tattara ta hanyar kamfanin wayar salula ne aka samo bayanai daga hanyar mai bincike. Duk da haka, waɗannan rubutattun takardun an tsara su ne daga yadda wani ke aiki a cikin sashin cajin waya na kamfanin waya. Source: Kimiyya mai mahimmanci a watan Yuni (1899) / Wikimedia Commons.

    Figure 2.12: Hoton yana da duck da zomo; abin da kake gani ya dogara ne akan hangen nesa. Ana samo samfurori da manyan samfurori; Har ila yau, abin da kuke gani yana dogara ne da hangen nesa. Alal misali, bayanan bayanan da aka tattara ta hanyar kamfanin wayar salula ne aka samo bayanai daga hanyar mai bincike. Duk da haka, waɗannan rubutattun takardun an tsara su ne daga yadda wani ke aiki a cikin sashin cajin waya na kamfanin waya. Source: Kimiyya mai mahimmanci a watan Yuni (1899) / Wikimedia Commons .

    Samar da misali na tushen bayanai inda ganin shi duka kamar yadda aka samo kuma aka tsara yana taimakawa lokacin amfani da wannan maɓallin bayanan don bincike.

  12. [ sauƙi ] A cikin wata matsala mai zurfi, Kirista Sandvig da Eszter Hargittai (2015) raba binciken digiri a cikin manyan fannoni guda biyu dangane da ko tsarin dijital ya zama "kayan aiki" ko "abu na binciken." Misali na farko irin-inda tsarin shine wani kayan aiki-bincike ne da Bengtsson da abokan aiki (2011) akan yin amfani da wayar tarhon tafi-da-gidanka don biye da tafiye-tafiye bayan girgizar kasa a Haiti a 2010. Wani misali na nau'i na biyu-inda tsarin shine abu ne na nazarin-bincike ne na Jensen (2007) game da yadda aka gabatar da wayoyin tafi-da-gidanka a ko'ina cikin Kerala, Indiya ta rinjayi aikin kasuwancin ga kifi. Ina ganin wannan bambanci yana taimakawa domin ya bayyana cewa nazarin yin amfani da asusun bayanai na digital zai iya samun matakai daban-daban ko da suna amfani da irin wannan tushen bayanai. Don ƙarin bayani game da wannan bambanci, bayyana binciken hudu da ka gani: biyu da suke amfani da tsarin dijital azaman kayan aiki da biyu waɗanda suke amfani da tsarin dijital azaman abin binciken. Zaka iya amfani da misalai daga wannan babi idan kana so.