Follow TV Tropes


Music / UTAU

Go To

UTAU is a musical voice synthesis program created by Ameya as a freeware response to Vocaloid. Using UTAU is much like using Vocaloid, however there are some differences in usage (which may lead to Damn You, Muscle Memory!). Also, UTAU has the feature of importing vocal samples manually, which allows a person to use their own voice in UTAU.

Before the appearance of the now popular Teto Kasane, UTAU was virtually unknown. To date, there are over 1500 UTAU voicebanks from all over the world.

Near the end of 2009, Ameya created a special voicebank recording style first known as renzokuon, or "continuous sound", also widely called VCV (vowel-consonant-vowel). This allowed UTAU voicebanks to sound much smoother and more human than before. Another recording style a few years after that, referred to as CV-VC (consonant-vowel-vowel-consonant), quickly became popular due to allowing one to create voicebanks for different languages other than Japanese in the program. (Like, for example, english) A Mac version of UTAU, known as UTAU-Synth, has now been released, and a mobile version of the software is in the works.

     A detailed explanation of each recording style for those who care 
  • CV:
    • Often referred to as "Single Sound" in Japan, CV was the first recording style even created, and what the UTAU program was designed to work with. The Japanese language is made up of almost pure consonant-vowel clusters, with next to no ending consonants. (The closest thing resembling such being the 'm' and 'n' consonants, which can work like vowels in the program.) As such, this tends to be the most popular recording style as, not only because most Vocaloid songs are in Japanese, it's the easiest and quickest style to record in, possessing anywhere around 52 sounds.
    • While it is the easiest to work with, it's not without its problems. CV voicebanks tend to sound rather choppy due to the program artificially inducing vowel and consonant transitions, which results in it being viewed as the 'lazy recorder's method for those who don't want to spend the time making a VCV bank. While there may be some truth to this for some recorders, there is some unfairness in that accusation. CV voicebanks tend to be popular due to their ease of use on new comers, as well as the fact that they generally tend to take far less time to load when playing a song than VCV banks, as well as the fact that well put together CV banks can sound exceptionally smooth, to the point a few would swear that it was VCV bank if they didn't look into it. It all comes down to several factors on whether a CV bank was done out of laziness or a style choice, as there are a few who actually prefer the rather robotic tone CV can induce, saying it adds its own bit of charm to the UTAU program.
      • A sub-type recording style of this, CV-VV, is essentially a CV bank with vowel-to-vowel transitions, to help make the CV bank sound smooth overall while still being lighter on the program than a VCV bank.
  • VCV:
    • Known as "Continuous Sound" voicebanks in Japan, are voicebanks that are designed to have a more natural sounding vocal transition between phonetics. The recording style consists of recording vowels before each phonetic CV of a CV bank, allowing for the program to better transition from a previous vowel of the last phonetic, and to record natural vowel transitions for more natural sounding transitions in the program. The result often ends up sounding very smooth in the program, with some even verging on Vocaloid quality. As a result, it tends to be the recording style most "professional" UTAU producers use, and what they feel every UTAU voicebank should be.
    • As explained in the CV section, this is a bit of an unfair accusation. While VCV generally sounds smoother than CV, it is not without its own problems. Like with UTAU banks in general, the most well put together CV bank can out perform the laziest put together VCV bank by miles. In addition, VCV takes far longer to record, taking anywhere from around 200-300 samples in comparison to the 52 samples needed for CV. In addition, it leaves you with two choices to go for its recording method. The first, and most popular one, is to record several phonetic samples and duplicate them using the inbuilt duplication program that comes with UTAU. This results in less file sizes over all, but takes far longer to load in the program than a CV voicebank. Also, because the recorder is chaining together several phonentics at once, the consonants can sound rather slurred when played in the program. The second, less popular method is to record each VCV separately, allowing for more control over the tone and sharpness of the bank. While this results in clearer vocals over all than the previous one, it also results in massive voicebank file sizes.
    • In the end, it's up to the recorder to decide which they prefer. Extremely well put together VCV banks often times verge on Vocaloid quality, while sometimes even out performing a few "official" Vocaloids, while lazily put together ones will often sound no different than lazily put together CV voicebanks.
      • It should be noted that this recording style can also be used to record voicebanks for different languages other than Japanese. However, regardless of what style you use for recording these kinds of VCV banks, the results will often barely sound better than CV-VC at most, in addition to being twice to three times as larger than a CV-VC bank. As a result, it's generally viewed that you better stick to CV-VC if you plan on recording voicebanks of other languages.
  • CV-VC:
    • CV-VC voicebanks are one of the "newer" recording styles in the fact that they weren't created until several years after VCV was discovered. However, they quickly made their claim to fame in the fact that they allow one to make voicebanks for languages other than Japanese. The recording style focuses on CV with the addition of end consonants, which nearly every language outside of Japanese possesses. In order to record a VC, it's very much like a VCV where you record the vowel before the consonant, but stop upon saying the consonant, leaving a space of silence where the vowel in a CV or VCV bank would go so the program can naturally transition to the next phonetic without cutting off the end consonant. Upon its discovery, CV-VC quickly exploded to one of the more popular recording styles due to allowing foreign-language UTAU voicebanks, though some theorize it was created earlier before its abilities for English banks were discovered.
    • CV-VC is also a bit controversial as many claim that the results are "Better than Engloid", even though such accusations are flimsy at best. As with anything, general editing skill factors in heavily with both English UTAU and English Vocaloids. A well put together English UTAU can still sound like a drunken speak and spell in the hands of someone who has no clue what they're doing, while even the worst English Vocaloid can sound amazing in the hands of a professional. For the longest while, English Vocaloids were far more accessible to the general public than their Japanese counterparts, in addition to the Japanese ones being more well know by professional Vocaloid users and fans, resulting in their better covers being posted to YouTube. This resulted in mostly newbies to the program getting their hands on it, which contributed to a bit of a backlash against English Vocaloids. Rest assured, if one were to do a quick search on Nico Nico Douga, one would find just as many Miku, Kaito, and Len covers as there are Sonika, Big Al, and Leon covers on YouTube. Thankfully, this kind of thought process seems to be slowly dying down.
    • That said, most agree that English UTAU on average have better control over the consonants and vowels thanks to the recording style, but such arguments can also be made for Japanese UTAU to Vocaloids. Another thing to note is that CV-VCs, on average, take far more phonetic samples than CV or VCV Japanese. A single pitch CV-VC English bank can take anywhere between 300-500-to 1000 recording samples to get a natural sounding tone and smoothness to it. Naturally, this is because English is a far more complex language than Japanese, being one of the most complex languages in the world.
      • CV-VC is also not limited to English, either. Among many things, there are quite a few surprisingly good Spanish, Korean, and even French voicebanks floating around the net. This kind of style can also be applied to Japanese, but in a different way. Rather than end consonants being the end of the word, they more so serve as flag for where to start the next phonetic in editing, essentially meaning it serves as a way to artificially create VCV transitions without recording everything needed for a VCV bank. While the results of such generally sound just as smooth as VCV, it, like the other styles, is not without its faults. If one were to listen closely, it is possible to hear the program playing the transition separately from the two CVs, and can be a bit hard to unhear afterwards, and professional CV voicebanks on average can sound just as smooth. Still, for those who don't possess the editing skills for a professional CV or VCV bank, this presents a nice alternative.
  • VCCV: One of the newest methods around, coming out in 2015. This method was engineered specifically to make English voicebanks sound clear in UTAU, and was innovated by the same person who came up with CVVC English. This can sound professional in good hands, or absolutely awful in other hands. While this is considered the gold standard for English in UTAU, it is not very beginner-friendly, and beginners are usually directed to use less well-known (and often less clear) recording styles for English such as CV-C (a decently clear but still tough bank type with decent multilingual support), or Bluemoon (a multilingual CVVC style that sounds like a drunk Microsoft Sam no matter who uses it). Despite all of its issues, VCCV is still incredibly popular for English speaking members of the fandom.
  • Multipitch:
    • Multipitch has been something that's been in the program for years, but was never really paid attention to until the creation of Ritsu's "Kire" voicebank. It can apply to any of the above recording styles if one wishes, and has a multitude of uses. The way to record it is to record your base samples, then record several more at different pitches and assign them to the appropriate tone range in UTAU, often by giving it a marker in the program.
    • The uses for such a style vary. The original use was to allow for more natural sounding tone transitions, as well as taking less stress of the program itself so that it can plan each sample in a more natural sounding tone. However, the creator of Ritsu discovered that this application could be used for more dynamic tone transitions, allowing for a voicebank to have a more emotional tone to it than what a "monopitch" voicebank is capable of, and thus allowing for a wider range of music genres to use it for. Often times, most use it for rock songs and the like, but this technique can also be used for more mellow songs as well.
      • A common belief in the fandom is that having multipitch for the sake of multipitch is a bad thing, as the UTAU program is capable of replicating the tone of a non-dynamic monopitch voicebank rather perfectly. While there is some truth to this (in addition to multipitch banks being monsters when it comes to file sizes, especially for VCV and CV-VC) in that the program does manage to replicate the tone of your voice pretty well, it's not without some benefits. As said, multipitch takes a lot of strain off the program itself when it comes to replicating tones at higher pitches, which generally results in smoother sounding tones and far less mechanical rasp at higher pitches. It's really up to the user if they want to make a multipitch bank or not, because as said, many find that the possible mechanical rasp adds to an UTAU's charm.

The last major update to the UTAU program was in 2013, with it's creator going silent. As such, it can be considered Abandonware at this point in time. Although the program still functions as of 2023, as it runs on Visual Basic 6, there may be a risk that it will not function on new PCs in the future. Though Microsoft has made a statement on striving for backward compatibility with VB6 programs on Windows 11, only time will tell if UTAU will still be able to work out of the box in the future.

In preparation for that happening, multiple different branch projects based on the UTAU program have sprung up since. A notable one is OpenUTAU, which has full compatibility with most UTAU voicebanks, plugins and resamplers as well as AI software like ENUNU. More alternatives can be found on the trivia page.

Since all UTAU are essentially OCs, opinions may be conflicting as to how fitting a trope may be. Edit with caution.

Provides Examples Of:

    open/close all folders 

     Tropes about the actual program, as well as general UTAU tropes. 
  • Allegedly Free Game: Or program. In order to achieve clean and effective results when creating a voicebank, it is necessary to have a quality microphone, to develop knowledge of how to record effectively and to learn to code effective configuration files. This can be a significant investment of both time and money.
  • Alternate Company Equivalent:
  • Damn You, Muscle Memory!: Considering the interface is very similar to Vocaloid, with some minor to major differences, this is bound to happen. Of course, this trope works both ways if you used UTAU long before Vocaloid.
  • "Do It Yourself" Theme Tune: Naturally, considering making an UTAU based on the user's voice is the first thing most people do upon getting the program. Depending on several things such as editing skill, microphone quality, and general voice tone, results can range from drunken karaoke to ear meltingly amazing.
  • Electronic Speech Impediment: Disregarding the fact that an UTAU's bank quality depends on a number of factors, sometimes a voicebank just flat out doesn't like one of the many resamplers out there, or the UTAU program ends up hiccuping for some reason. The end results of such can be rather... amusing, to say the least.
  • Expy: Too many deliberate examples to count. Due to being the freeware alternative to Vocaloid, there are a lot of Utauloids that are designed specifically to have the same tone or similar sounding voice to official Vocaloids.
  • Helium Speech: Either done intentionally with some UTAU voicebanks for the lulz, unintentionally in an attempt to make UTAU Miku equivalents, or as the result of putting the voicebank at an extremely high pitch.
  • Mascot: Surprisingly, no, it's not Teto Kasane, despite what the fandom insists. It's actually supposed to be Uta Utane, the voicebank that comes with the program created entirely from robotic speech synthesis. Doesn't stop people from seeing Teto as the unofficial mascot, though, thanks to her being the most well known UTAU outside of the fandom.
  • Loads and Loads of Loading: Depending on how many phonetic samples are being used, the general speed of the resample being used, and if the phonetic samples only possess one sample (usually CV) or several (usually VCV), it can take a while for the UTAU program to load them all and play them out. Once it's loaded it though, it doesn't take nearly as long, but should you move a few things around...
  • Not Quite Starring: There are quite a few UTAU voicebanks floating around the net based off of characters from either popular animes, video games, or even musicians, either through using voice samples from their show, game, or music, or by someone replicating their voice. It should be known though that, while (generally) there's no rule against creating these kinds of voicebanks, distributing said voicebanks on the net to the general public without the permission of whoever owns said character / music can get you in serious hot water with the show creator / game creator / musician, the creator of the UTAU program, and quite possibly a good portion of the UTAU fandom as well.
  • One-Steve Limit: Zig-Zagged. With the many characters created on a daily basis, it's only natural that quite a few would share the same name. However, there seems to be an unspoken rule that naming characters after pre-existing Vocaloids or VIPPERloids is off limits. (And even then, there's some overlap thanks to some of the characters from those two coming out sharing a similar name with pre-existing UTAU.)
  • Reality Is Unrealistic: Due to attempts to make UTAU sound either like anime characters, like Miku, or generally make unique voice tones, this tends to pop up frequently with more 'realistic' voicebanks, often using the UTAU creators normal sounding voice. More often than not, you'll end up seeing at least one person complaining about a male UTAU not sounding like a chain smoker or a woman's voice being too deep to be a woman.
  • Ridiculously Human Robot: Depending on how much effort is put into it, and how skilled the person using the voicebank is, some UTAU banks can sound surprisingly realistic, to the point you'd swear an actual human was singing it if you didn't know better. A good example is Ritsu's Kire voicebank, though it is by no means the only example.
  • Robo Speak: Depending on the skill level of whoever's using the voicebank, this is either Played Straight unintentionally or Averted. Keep in mind, the voicebank quality also depends on what mic is used to record it and how much editing is put into it. Some UTAU can sound like glorified speak and spells, while others can sound more realistic than Vocaloids, or even actual human singers!
  • Small Name, Big Ego: There are quite a few UTAU creators who think their UTAU is the greatest thing since Teto Kasane, or even better than official Vocaloids, and are damn determined to make everyone aware of this fact. Whether such claims are actually true or not is a fact likely best left up to interpretation.
  • Vocal Dissonance: Depending on the creator, this is either Invoked intentionally for the lulz with some UTAU, or a difference in opinions on the design matching the voice.

    General Character Tropes 
  • Anthropomorphic Food: Sekushii Beikon is a...piece of bacon.
  • Big, Screwed-Up Family: The Vippers, usually caused by either Teto Kasane or Sukone Tei. Doesn't mean they'll be the best of friends however.
  • Computer Voice: While this is technically true of all UTAU, Uta Utane stands out in that her voicebank was made 100% from a computer speech synthesis program, with no actual human voice actor providing the voice base. Depending on how she's used, she can range anywhere from Machine Monotone to having a surprising amount of emotion to her voice.
  • Canon Immigrant: Teto, who started as an April's Fools Joke and became an UTAU. Also, the Macne Family, as well as Acme Iku, the Default family and Loline Com, weren't originally UTAU. Macne Family were originally intended for Mac, Acme Iku was originally from a hentai Flash app with a plethora of ero sounds, Defoko and her siblings were from AquesTalk, and Loline Com had his own editor, which looked like a prototype of UTAU, even with similar mechanics.
  • Gender Bender: Some UTAU have two voicebanks, one male and one female. Also, around 70% of all UTAU voicebanks get a genderbent of some sort (sometimes, an older/younger counterpart of the same gender as well, which also uses the gender factor flag). The most famous are:
    • Kasane Teto - Kasane Ted
    • Sukone Tei - Sukone Teiru
    • Nagone Mako - Nagone Makoto
    • Suiga Sora - Suiga Sara
    • Macne Nana - Macne Nanao/Macne Shichi (and sometimes, Macne Nana Petit as the younger counterpart)
    • Acme Iku - Acme Ikuo
    • Teira - Mashuu (and Tiramisu as the younger counterpart)
    • Kaiserine Sympherianne - Noel Chuckstern
  • Hartman Hips: Exaggerated with Hana Chikako.
  • Ridiculously Cute Critter: InuHebi.
  • Trademark Favorite Food: Many UTAU have those. Momo got peaches, Teto got the French bread, Sora got the curry, Yufu got castella cake...
  • Tsundere: Kasane Teto is regarded as a Type B. There are enough tsundere UTAU to make a list of its own.
  • Troll: Some UTAU are made for this exact purpose; the most famous being the VIPPERloids Kasane Teto, Yokune Ruko and Namine Ritsu, who started as April's Fools jokes from 2ch before gaining UTAU voicebanks.
    • Tei Sukone, also from 2ch, is another example. Her unoriginal design is a Take That! at everyone whose UTAU bank's outfit is a Palette Swap of Miku's, her personality is a Take That! at fan characters who love Len and hate Miku, and her sharingan is a Take That! to Naruto fan characters who have it.
  • Spotlight-Stealing Squad: Due to a combination of attempts to pass them off as "official" Vocaloids and the sheer amount of voicebanks created on a daily basis, only the VIPPERloids tend to be relatively well know outside, and even inside, the UTAU fandom, with Teto being viewed as the unofficial mascot of the program.
  • Wholesome Crossdresser: Ritsu is the most notable example of such, although there are quite a few crossdressing characters.
  • Yandere: As with Tsundere UTAU, there are many UTAU who fit this character type; Tei Sukone and Amagaku are two popular examples.

    Original Song Tropes 
  • Amazon Brigade: Teto's "Confront, You Look So Cool!" has Teto putting together a team of UTAUs, almost all of whom are girls. The only exception is Sora Suiga.
  • And I Must Scream: The premise of Time Glass, as explained in the author's comments.
    As some of you will remember from the original Grimm fairytale, the poison did not actually kill Snow White, it simply incapacitated her. In a way, she was in a coma for years until her prince found her. So, my concept was, what if she was conscious during that whole time, locked inside a glass coffin with no way to speak, or move, with only her thoughts as company. Thus this song was born.
  • Domestic Abuse: Supposedly the theme to CIRCRUSH's Biohazard by Word of God.
  • Driven to Suicide: One of the various interpretations given to Sky High.
  • High-Class Call Girl: "Yoshiwara Lament" centers around one, and talks about her time dealing with customers and lamenting her lost love. If the PV is anything to go by, she'll get her happy ending.
  • Kids Are Cruel: In Daughter of an Evil Spirit, kids made fun of Teto because she had wings on her back and red eyes. Although, adults joined on it.
  • Senseless Sacrifice: Again, Daughter of an Evil Spirit. Although, it wasn't senseless on her part, but on those who sacrificed her. The volcano still was active after she was sacrificed to it.