Cookies Policy
X

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies.

I accept this policy

Find out more here

Full Access Assessing audiovisual saliency and visual-information content in the articulation of consonants and vowels on audiovisual temporal perception

No metrics data to plot.
The attempt to load metrics for this article has failed.
The attempt to plot a graph for these metrics has failed.
The full text of this article is not currently available.

Brill’s MyBook program is exclusively available on BrillOnline Books and Journals. Students and scholars affiliated with an institution that has purchased a Brill E-Book on the BrillOnline platform automatically have access to the MyBook option for the title(s) acquired by the Library. Brill MyBook is a print-on-demand paperback copy which is sold at a favorably uniform low price.

Assessing audiovisual saliency and visual-information content in the articulation of consonants and vowels on audiovisual temporal perception

  • HTML
  • PDF
Add to Favorites
You must be logged in to use this functionality

image of Seeing and Perceiving
For more content, see Multisensory Research and Spatial Vision.

Research has revealed different temporal integration windows between and within different speech-tokens. The limited speech-tokens tested to date has not allowed for the proper evaluation of whether such differences are task or stimulus driven? We conducted a series of experiments to investigate how the physical differences associated with speech articulation affect the temporal aspects of audiovisual speech perception. Videos of consonants and vowels uttered by three speakers were presented. Participants made temporal order judgments (TOJs) regarding which speech-stream had been presented first. The sensitivity of participants’ TOJs and the point of subjective simultaneity (PSS) were analyzed as a function of the place, manner of articulation, and voicing for consonants, and the height/backness of the tongue and lip-roundedness for vowels. The results demonstrated that for the case of place of articulation/roundedness, participants were more sensitive to the temporal order of highly-salient speech-signals with smaller visual-leads at the PSS. This was not the case when the manner of articulation/height was evaluated. These findings suggest that the visual-speech signal provides substantial cues to the auditory-signal that modulate the relative processing times required for the perception of the speech-stream. A subsequent experiment explored how the presentation of different sources of visual-information modulated such findings. Videos of three consonants were presented under natural and point-light (PL) viewing conditions revealing parts, or the whole, face. Preliminary analysis revealed no differences in TOJ accuracy under different viewing conditions. However, the PSS data revealed significant differences in viewing conditions depending on the speech token uttered (e.g., larger visual-leads for PL-lip/teeth/tongue-only views).

Affiliations: 1: 1Cognitive Systems Research Institute (CSRI), GR; 2: 2Department of Experimental Psychology, Oxford University, GB

Research has revealed different temporal integration windows between and within different speech-tokens. The limited speech-tokens tested to date has not allowed for the proper evaluation of whether such differences are task or stimulus driven? We conducted a series of experiments to investigate how the physical differences associated with speech articulation affect the temporal aspects of audiovisual speech perception. Videos of consonants and vowels uttered by three speakers were presented. Participants made temporal order judgments (TOJs) regarding which speech-stream had been presented first. The sensitivity of participants’ TOJs and the point of subjective simultaneity (PSS) were analyzed as a function of the place, manner of articulation, and voicing for consonants, and the height/backness of the tongue and lip-roundedness for vowels. The results demonstrated that for the case of place of articulation/roundedness, participants were more sensitive to the temporal order of highly-salient speech-signals with smaller visual-leads at the PSS. This was not the case when the manner of articulation/height was evaluated. These findings suggest that the visual-speech signal provides substantial cues to the auditory-signal that modulate the relative processing times required for the perception of the speech-stream. A subsequent experiment explored how the presentation of different sources of visual-information modulated such findings. Videos of three consonants were presented under natural and point-light (PL) viewing conditions revealing parts, or the whole, face. Preliminary analysis revealed no differences in TOJ accuracy under different viewing conditions. However, the PSS data revealed significant differences in viewing conditions depending on the speech token uttered (e.g., larger visual-leads for PL-lip/teeth/tongue-only views).

Loading

Full text loading...

/deliver/18784763/25/0/18784763_025_00_S026_text.html;jsessionid=_L4nKLHom-MrbhcZpwV2oG1U.x-brill-live-03?itemId=/content/journals/10.1163/187847612x646514&mimeType=html&fmt=ahah
/content/journals/10.1163/187847612x646514
Loading

Data & Media loading...

http://brill.metastore.ingenta.com/content/journals/10.1163/187847612x646514
Loading
Loading

Article metrics loading...

/content/journals/10.1163/187847612x646514
2012-01-01
2016-12-07

Sign-in

Can't access your account?
  • Key

  • Full access
  • Open Access
  • Partial/No accessInformation