What went into creating Vincent, a hyper-real digital human
Until very recently, directly mapping an actor’s performance onto photorealistic digital humans using real-time rendering was considered impossible. We’d seen the likes of Avatar and Alita in the movies. Still, these involved time-consuming offline rendering and post-processing work.
But thanks to the advancements in graphics hardware and software, as well as the relentless hard work from innovative teams in the field, we’re seeing real-time rendered digital humans. Meet Vincent, a digital human born from Korea-based creative studio, Giantstep.
Creating Vincent was not an easy feat. After much deliberation and research into potential joint partnerships, the studio’s R&D arm, GXLab took on the challenge to develop the technology in-house – with just five artists. To make Vincent a reality, the team quickly identified three key challenges they needed to overcome.
Skin and hair: the visual technology challenge
The first technical issue the team faced lay in ensuring that they had access to the necessary shading technology. Developing skin and hair features can cost an exorbitant amount of time, manpower and cost. With that in mind, the team turned to Unreal Engine. “Unreal Engine’s material editor and its powerful skin shading features like Transmission and Dual Lobe Specularity played a major role in boosting Vincent’s skin quality up to offline rendering levels without any additional development,” said Sungku Kang, Giantstep’s director of research and development.
The team made good use of Unreal Engine’s online learning courses to get to grips with the technology’s development processes and properties. “By leveraging all the available information, we were able to accurately understand how changing different parameters would affect the outcome. And use that information with more precise intent, rather than entering random numbers and leaving the results to chance,” Kang said. “Also, using material instancing to make immediate parameter changes and preview the results was very useful. This dramatically reduced the time spent in the final fine-tuning stage.”
The Meet Mike free asset sample provided by Epic Games also gave the team hints about the technologies they’d require to create the visuals for Vincent. Exploring the asset, they gathered necessary information, including how to express fine facial hairs and what kind of data is required for the final hair shape. “With all this information, our developers were able to reduce development time by clearly setting objectives for Maya plugins or scripts,” Kang explained.
Creating lifelike facial expressions in real-time
Choosing and implementing technologies for effective facial expressions was another challenge the team had to overcome. They found that most solutions were developed for offline rendering by default and were of low, video-game quality – which limited the level of detail or the ability to customise. They started researching the technology that was the nearest fit, and then based on their findings, they developed their own solution in-house.
Their first task was to assess which candidate offered an accurate three-dimensional location data while providing a high degree of freedom for the data format. They settled on Vicon Cara, a head-mounted camera apparatus, as the facial animation capture system that best met their needs. Using this device, marker locations can be set with great flexibility, and they can then be translated into three-dimensional data with extremely high accuracy.
“Most solutions at that time only scanned two-dimensional location information data of facial landmarks. The fact that Cara was able to capture three-dimensional data made it a good choice,” says Kang. However, Cara is for offline rendering, making real-time data transfer impossible out of the box. To resolve this, we decided to create a neural network that uses deep learning to infer 3D marker locations from 2D images.”
First, the team added one additional camera to Cara. When capturing the facial movements of the actor, video from the other camera was separately saved and used as the learning data. From this, the developers created artificial intelligence that can infer the 3D marker locations with high accuracy from a 2D image input. Machine learning was further leveraged in as many areas as possible, including for emphasising facial expressions and setting up blend shape weight. “Giantstep gained lots of experience in machine learning through this process. We were encouraged by the fact that the effective use of machine learning can help a small team overcome its limitations,” Kang said.
Optimising the digital human pipeline
The last major technical hurdle the team faced was in securing technology for optimising the pipeline. Given the small size of the team, they knew right away that efficiency was crucial. The most important initiative was to reduce the burden of manual iteration and to facilitate automation as much as possible. “This is another major reason why Unreal Engine was the best choice,” says Kang. “Unreal Engine’s support for Python and the convenience that brings to creating plugins allowed us to easily resolve iteration issues as well as to easily develop relevant tools.”
A case in point is the work the team did on photorealistic facial expressions. This process involved trying many combinations for the shape and number of areas into which the face is divided. And then followed by a preview of the results as quickly as possible. Because changing the facial areas required textures to be newly combined, assets had to be re-imported, and material composition details had to be readjusted. “If the whole process were to be manually handled, an artist would have to spend a full day on simple iterations to preview the results,” says Kang.
Instead, using Python scripting in Unreal Engine, the team was able to automate the management process of laborious tasks such as importing assets. “Tasks which previously required a full day whenever the data changed were automatically completed in just a few minutes,” continues Kang. “By maximising automation through various Maya plugins and scripts, and developing Unreal Engine plugins, Project Vincent was completed in a short time frame, despite the small size of the team.”
Beyond movies: the future of real-time digital humans
Real-time digital human development technology is expected to change the production of entertainment media that traditionally relied on offline rendering technology. It’s even predicted to move into other industries.
Synced with AI speakers or assistants, the technology could enable users to experience more intuitive AI services. We could start to see high-quality, hyper-real character creation used for effective cutting-edge marketing and promotional activities.
Giantstep intends to be a key part of this story, committed to advancing their technology and with plans to reveal its next development results as soon as possible.