Teburin Abubuwan Ciki
1. Gabatarwa
Haɓaka cikin sauri na bayanan multimedia ya haifar da buƙatar gaggawa don tsare-tsaren nemo abubuwa cikin sauri a cikin nau'ikan bayanai daban-daban. Yayin da nemo rubutu, hoto, da bidiyo suka sami ci gaba mai muhimmanci, nemo sauti ta amfani da tambayoyin harshe na halitta har yanzu ba a bincika sosai ba. Wannan binciken yana magance wannan gibi mai mahimmanci ta hanyar gabatar da sabon tsari don nemo abubuwan cikin sauti ta amfani da cikakkun bayanai na harshe na halitta.
Hanyoyin nemo sauti na gargajiya sun dogara da alamun bayanai ko tambayoyin da suka danganci sauti, waɗanda ke iyakance bayyana ma'ana da amfani. Hanyarmu tana bawa masu amfani damar siffanta sautuna ta amfani da cikakken harshe na halitta, kamar "Mutum yana magana yayin da kiɗa ke kunne sannan kuma kwadi yana yin ƙara," yana ba da damar nemo abubuwan cikin sauti da suka dace da jerin abubuwan da suka faru cikin daidaito da fahimta.
10-30 seconds
Tsawon lokacin faifan sauti a cikin ma'auni
2 Ma'auni
An gabatar da sabbin tarin bayanai don tantancewa
Tsakanin nau'ikan bayanai
Hanyar nemo sauti daga rubutu
2. Hanyar Bincike
2.1 Ma'aunin Bayanai
Muna gabatar da ma'auni guda biyu masu ƙalubale dangane da tarin bayanan AUDIO CAPS da Clotho. AUDIO CAPS yana ɗauke da faifan sauti na dakika 10 daga AudioSet tare da rubutun ɗan adam, yayin da Clotho ke nuna faifan sauti na dakika 15-30 daga Freesound tare da cikakkun bayanai. Waɗannan tarin bayanan suna ba da nau'i-nau'i na sauti da rubutu masu wadata waɗanda ke da mahimmanci don horar da tsare-tsaren nemo abubuwa tsakanin nau'ikan bayanai.
2.2 Tsarin Nemo Tsakanin Nau'ikan Bayanai
Tsarinmu yana daidaita gine-ginen nemo bidiyo don nemo sauti, yana amfani da cibiyoyin sadarwar ƙwararrun sauti da aka horar da su a baya. Tsarin yana koyon haɗaɗɗun sifofi inda aka tsara sifofi iri ɗaya na sauti da rubutu a kusa da juna a cikin sararin samaniya.
2.3 Dabarun Koyarwa Kafin Aiki
Muna nuna fa'idodin koyarwa a baya akan ayyukan sauti daban-daban, yana nuna cewa canja wurin koyo daga yankuna masu alaƙa yana inganta aikin nemo abubuwa sosai. Haɗaɗɗun ƙwararrun sauti suna ɗaukar abubuwan da suka danganci abubuwan cikin sauti.
3. Aiwatar da Fasaha
3.1 Cire Siffofi daga Sauti
Muna amfani da cibiyoyin sadarwar sauti da yawa da aka horar da su a baya don cire siffofi masu wadata. An lissafa sifar sauti $\mathbf{a}_i$ don faifan $i$ kamar haka:
$$\mathbf{a}_i = f_{\theta}(x_i)$$
inda $f_{\theta}$ ke wakiltar mai rufe sauti kuma $x_i$ shine shigarwar sauti danye.
3.2 Rufe Rubutun Bayani
Ana rufe tambayoyin rubutu ta amfani da samfuran da suka dogara da transformer don ɗaukar ma'anar ma'ana. An rufe sifar rubutu $\mathbf{t}_j$ don tambaya $j$ kamar haka:
$$\mathbf{t}_j = g_{\phi}(q_j)$$
inda $g_{\phi}$ shine mai rufe rubutu kuma $q_j$ shine tambayar shigarwa.
3.3 Daidaita Tsakanin Nau'ikan Bayanai
Muna inganta kamanceceniya tsakanin sifofin sauti da rubutu ta amfani da koyon kwatance. An lissafa maki kamanceceniya $s_{ij}$ tsakanin sauti $i$ da rubutu $j$ kamar haka:
$$s_{ij} = \frac{\mathbf{a}_i \cdot \mathbf{t}_j}{\|\mathbf{a}_i\| \|\mathbf{t}_j\|}$$
An horar da samfurin don haɓaka kamanceceniya don nau'i-nau'i masu dacewa kuma a rage shi don nau'i-nau'i marasa dacewa.
4. Sakamakon Gwaji
4.1 Ayyukan Tsari na Asali
Gwaje-gwajenmu sun kafa ƙaƙƙarfan tsare-tsare na asali don nemo sauti dangane da rubutu. Samfuran sun sami sakamako mai ban sha'awa akan duka ma'aunin AUDIO CAPS da Clotho, tare da auna daidaiton nemo abubuwa ta amfani da ma'auni na yau da kullun ciki har da Recall@K da Matsakaicin Matsakaicin Daidaito.
Hoto na 1: Kwatancin Ayyukan Nemo Abubuwa
Sakamakon ya nuna cewa hanyoyin haɗaɗɗun tsari waɗanda suka haɗa ƙwararrun sauti da yawa sun fi hanyoyin samfuri ɗaya. Koyarwa a baya akan ayyukan sauti daban-daban yana ba da ingantacciyar ci gaba, musamman don tambayoyi masu sarƙaƙiya waɗanda suka haɗa da abubuwan sauti da yawa.
4.2 Hanyoyin Haɗaɗɗun Tsari
Mun nuna cewa haɗa siffofi daga cibiyoyin sadarwar sauti da yawa da aka horar da su a baya ta hanyar koyon haɗaɗɗun tsari yana inganta ƙarfin nemo abubuwa. Cibiyoyin sadarwa daban-daban suna ɗaukar abubuwan da suka danganci abubuwan cikin sauti, wanda ke haifar da siffofi masu cikakken bayani.
4.3 Nazarin Cire Sassa
Gwaje-gwajen cire sassa sun tabbatar da muhimmancin kowane ɓangare a cikin tsarinmu. Binciken ya bayyana cewa duka zaɓin mai rufe sauti da dabarun daidaita tsakanin nau'ikan bayanai suna tasiri sosai ga aikin ƙarshe.
5. Tsarin Bincike
Gano Asali
Wannan binciken yana ƙalubalantar matsayin nemo sauti ta hanyar canzawa daga tsare-tsaren da suka dogara da bayanan bayanai zuwa tambayar harshe na halitta dangane da abun ciki. Hanyar tana wakiltar canjin tsari kwatankwacin abin da CycleGAN (Zhu et al., 2017) ta cim ma don fassarar hoto mara nau'i-nau'i—ta karya dogaro akan cikakkun bayanan horo ta hanyar daidaita tsakanin nau'ikan bayanai.
Matsalar Hankali
Hanyar bincike tana bin matakai uku masu sarƙaƙi: cire siffofi daga ƙwararrun sauti daban-daban, rufe ma'anar ma'ana na rubutu na harshe na halitta, da daidaita haɗaɗɗun sifofi tsakanin nau'ikan bayanai. Wannan ginin yayi kama da nasarar CLIP (Radford et al., 2021) a cikin yankunan harshe na gani amma an daidaita shi musamman don halayen lokaci da na gani na sauti.
Ƙarfi & Kurakurai
Ƙarfi: Hanyar haɗaɗɗun tsari ta yi amfani da ƙwarewar sauti da ke akwai da wayo maimakon horarwa daga farko. Ƙirƙirar ma'auni ta magance matsala mai mahimmanci na ƙarancin bayanai a fagen. Ingantaccen lissafi don ayyukan nemo bidiyo yana da jan hankali musamman.
Kurakurai: Hanyar ta gaji iyakoki daga cibiyoyin sadarwar ɗin da ke cikinta—rashin son zuciya mai yuwuwa a cikin bayanan koyarwa a baya, iyakancewar haɓakawa zuwa abubuwan sauti da ba a saba gani ba, da kuma kula da sake fasalin rubutu. Daidaita lokaci tsakanin bayanan rubutu da abubuwan sauti har yanzu yana da ƙalubale don jerin abubuwa masu tsayi.
Gano Bayanai Masu Aiki
Ga masu aiki: Fara da daidaita hanyar haɗaɗɗun tsari akan bayanan sauti na musamman. Ga masu bincike: Mayar da hankali kan inganta samfurin lokaci da magance matsalar ƙarfin sake fasalin rubutu. Tsarin yana nuna dacewa nan take don binciken ajiyar sauti da haɓaka nemo bidiyo.
Nazarin Hali: Binciken Ajiyar Sauti
Yi la'akari da ajiyar sauti na tarihi wanda ke ɗauke da dubban rikodin muhalli marasa lakabi. Bincike na gargajiya dangane da mahimmin kalma ya gaza saboda ba a sanya alamar abun ciki ba. Ta amfani da tsarinmu, masu adana kayan tarihi za su iya yin tambaya "ruwan sama mai ƙarfi tare da aradu mai nisa" kuma su dawo da faifan da suka dace dangane da abun cikin sauti maimakon bayanan bayanai.
6. Ayyukan Gaba
Fasahar tana ba da damar yin ayyuka masu amfani da yawa ciki har da:
- Ajiyar Sauti Mai Hankali: Ingantacciyar damar bincike don tarin sautunan tarihi kamar Ajiyar Tasirin Sauti na BBC
- Na'urorin IoT Masu Ƙarancin Wutar Lantarki: Tsare-tsaren sa ido dangane da sauti don kiyayewa da binciken halittu
- Ayyukan Ƙirƙira: Daidaita tasirin sauti ta atomatik don faifan bidiyo, littattafan sauti, da samar da multimedia
- Kayan Aikin Samun Damar Shiga: Tsare-tsaren bayyani da nemo sauti don masu amfani marasa lahiyar gani
- Haɓaka Nemo Bidiyo: Yin amfani da sauti a matsayin wakili don abun cikin bidiyo a cikin manyan tsare-tsaren bincike
Hanyoyin bincike na gaba sun haɗa da faɗaɗawa zuwa tambayoyin harsuna da yawa, inganta iyawar tunani na lokaci, da haɓaka ingantattun dabarun daidaita tsakanin nau'ikan bayanai waɗanda suka dace da ayyukan ainihin lokaci.
7. Bayanan Kafa
- Zhu, J. Y., et al. (2017). Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks. IEEE ICCV.
- Radford, A., et al. (2021). Learning Transferable Visual Models From Natural Language Supervision. ICML.
- Gemmeke, J. F., et al. (2017). Audio Set: An ontology and human-labeled dataset for audio events. IEEE ICASSP.
- Drossos, K., et al. (2020). Clotho: An Audio Captioning Dataset. IEEE ICASSP.
- Oncescu, A. M., et al. (2021). Audio Retrieval with Natural Language Queries. INTERSPEECH.
- Arandjelovic, R., & Zisserman, A. (2018). Objects that sound. ECCV.
- Harvard Dataverse: Ma'aunin Nemo Sauti