This is how voice authentication can be bypassed
- Biometric authentication uses biometric characteristics instead of passwords and PINs
- They promise a high level of security, but this can rarely be achieved in practice
- Voice authentication can be verified through systematic testing
- Such systems can be successfully attacked with great effort in the past and now with relatively few clicks thanks to artificial intelligence
- Therefore, it is not recommended to use voice authentication in environments with high security requirements
People are talking about Biometric authentication, if the user can use the biometric feature to verify his identity. Instead of having to enter a complex, regularly changing password, you can rely on what you always carry with you anyway: fingerprint, iris, and voice.
in Voice authentication become fingerprint An audio signal, in this case sound. This pairing must be done initially. This can happen at the user’s request by starting the configuration on their device. However, it can also be implemented passively by analyzing existing audio samples or using them as part of a conversation (at least in the first few seconds) to create a fingerprint.
This fingerprint allows identification of individual characteristics. These include, for example, pitch, frequencies, modulation, intonation and pauses. If the user now has to authenticate, their new entry will be compared to the existing fingerprint. If a certain level of agreement can be determined, it is assumed that the user is themselves and is authorized to perform authentication. In the classic movie sport shoes (1992) sums this up with a startling sentence: My voice is my password.
A voice authentication attack primarily aims to make the system perform authentication successfully, despite not meeting the relevant requirements (legitimate user and correct voice). The attacker attempts to approximate the fingerprint requirements – again taking into account pitch, frequencies, modulation, intonation and pauses.
In contrast to many other attack techniques, such an attack essentially requires a high level of understanding of sound and acoustics. An audio engineer is more likely to understand what authentication requirements are and how they can be addressed.
However, developments in recent years in artificial intelligence (AI) have greatly simplified attack options. Synthetic sounds can be used to create real-looking data. Online services such as Lyrebird It made the first attempts of this kind possible. With iOS 17.0, this was named Voice synthesis Your voice It was even introduced on iPhones. So now audio synthesis is available to everyone.
As part of security testing, we usually take the opposite approach: we first try to identify anomalies that the legitimate fingerprint might have so that it is no longer recognized as legitimate. Initial pairing and recording are performed at the same time. This ensures the maximum possible match that the audio sample can have: when played, it should achieve a 100% match.
In another step, the language sample is Westernized. We mainly distinguish between the following categories:
|Identification card||category||a description|
|1||pressure||Increasing pressure removes fine details.|
|2||Echo and frequency||Additional echo and reverb effects lead to alienation.|
|3||Other effects (such as chorus)||Additional influences can lead to very strong isolation.|
|4||reflects||By mirroring the original sample, many characteristics are preserved (such as frequency), but some characteristics are essentially deleted or replaced (intonation, pauses).|
|5||Sample rate||Changing the sample rate can affect recording quality.|
|6||pace||Adjusting the rhythm, while taking pitch/frequencies into account, can benefit analysis techniques.|
|7||Secret recording (error)||Secret recordings of conversations can be played or used as a soundboard.|
|VIII||Compilation||By slicing up recordings, the data can be artificially fabricated, but it sounds distorted in tone.|
|9||Generating artificial sounds||Generating synthetic sounds can show how easy it is to generate individual speech.|
Appropriate authentication is then performed using a large number of different samples. What is primarily interesting is whether or not this is successful. However, some manufacturers of corresponding authentication systems provide a level of trust. This provides additional information about whether and to what extent there is bias due to alienation. This allows conclusions to be drawn about attributes and their weight.
There are several measures that can be used to make attacks on voice authentication more difficult. On the one hand, there is the non-technical aspect that the dialogue must be dynamic. For example, questions are chosen randomly or authentication must take place in a natural conversation.
It is difficult for the attacker to adapt to this. Ready-made speech samples can no longer be used or can be used very poorly. The counterattack is then recognized as such at the interpersonal level. Full automation of voice authentication is not able to work sustainably at this level.
In addition, matching can be set more stringently. On a technical level, this means that there must be a high level of agreement.
This increases security, but only on a relative level. Because an absolute agreement can never be implemented. It is important to remember that this approach poses new problems. For example, if someone is tired or sick (e.g. hoarse), an unusual device is used for communication (e.g. Skype instead of a landline) or the voice quality is poor (e.g. reception or network problems). The convenience attributed to biometric mechanisms can disappear very quickly.
Voice authentication feels convenient – and secure. However, upon closer examination, this turns out not to be the case. Technical or health impacts may limit the benefits obtained by legitimate users. Improvements in repeatability and sound generation have also made attacks much easier in recent years.
As is always the case with biometric authentication, it is only suitable to a limited extent to ensure a high level of security. They can provide an additional factor to make attacks more difficult. But relying on these alone is neither reasonable nor contemporary. Therefore, it is not recommended for use in high-security environments.
Biometrics should primarily be an identifying feature. It’s not really suitable for authentication. Just because if it is “lost” (for example, fingerprints become known), it cannot be easily changed. Setting passwords is definitely more practical.
About the author
“Subtly charming coffee scholar. General zombie junkie. Introvert. Alcohol nerd. Travel lover. Twitter specialist. Freelance student.”