Google新品,CAST: Component-Aligned 3D Scene Reconstruction from an RGB Image
發(fā)布日期:2025/2/24 22:07:43 瀏覽量:
Google新品,CAST: Component-Aligned 3D Scene Reconstruction from an RGB Image
Recovering high-quality 3D scenes from a single RGB image is a challenging task in computer graphics. Current methods often struggle with domain-specific limitations or low-quality object generation. To address these, we propose CAST (Component-Aligned 3D Scene Reconstruction from a Single RGB Image), a novel method for 3D scene reconstruction and recovery. CAST starts by extracting object-level 2D segmentation and relative depth information from the input image, followed by using a GPT-based model to analyze inter-object spatial relationships. This enables the understanding of how objects relate to each other within the scene, ensuring more coherent reconstruction. CAST then employs an occlusion-aware large-scale 3D generation model to independently generate each object’s full geometry, using MAE and point cloud conditioning to mitigate the effects of occlusions and partial object information, ensuring accurate alignment with the source image’s geometry and texture. To align each object with the scene, the alignment generation model computes the necessary transformations, allowing the generated meshes to be accurately placed and integrated into the scene’s point cloud. Finally, CAST incorporates a physics-aware correction step that leverages a fine-grained relation graph to generate a constraint graph. This graph guides the optimization of object poses, ensuring physical consistency and spatial coherence. By utilizing Signed Distance Fields (SDF), the model effectively addresses issues such as occlusions, object penetration, and floating objects, ensuring that the generated scene accurately reflects real-world physical interactions. Experimental results demonstrate that CAST significantly improves the quality of single-image 3D scene reconstruction, offering enhanced realism and accuracy in scene recovery tasks. CAST has practical applications in virtual content creation, such as immersive game environments and film production, where real-world setups can be seamlessly integrated into virtual landscapes. Additionally, CAST can be leveraged in robotics, enabling efficient real-to-simulation workflows and providing realistic, scalable simulation environments for robotic systems.
在計(jì)算機(jī)圖形學(xué)中,從單幅RGB圖像中恢復(fù)高質(zhì)量的3D場景是一項(xiàng)具有挑戰(zhàn)性的任務(wù)。當(dāng)前的方法經(jīng)常與特定領(lǐng)域的限制或低質(zhì)量的對象生成作斗爭。為了解決這些問題,我們提出了一種新的三維場景重建和恢復(fù)方法CAST。CAST首先從輸入圖像中提取對象級2D分割和相對深度信息,然后使用基于GPT的模型來分析對象間的空間關(guān)系。這有助于理解場景中的對象如何相互關(guān)聯(lián),從而確保更連貫的重建。然后,CAST采用遮擋感知的大規(guī)模3D生成模型來獨(dú)立生成每個(gè)對象的完整幾何圖形,使用MAE和點(diǎn)云條件來減輕遮擋和部分對象信息的影響,確保與源圖像的幾何圖形和紋理精確對齊。為了將每個(gè)對象與場景對齊,對齊生成模型會計(jì)算必要的變換,從而允許將生成的網(wǎng)格準(zhǔn)確放置并集成到場景的點(diǎn)云中。最后,CAST結(jié)合了物理感知的校正步驟,該步驟利用細(xì)粒度的關(guān)系圖來生成約束圖。該圖指導(dǎo)物體姿態(tài)的優(yōu)化,確保物理一致性和空間連貫性。通過利用帶符號的距離場(SDF),該模型有效地解決了遮擋、對象穿透和浮動對象等問題,確保生成的場景準(zhǔn)確地反映了現(xiàn)實(shí)世界的物理交互。實(shí)驗(yàn)結(jié)果表明,CAST顯著提高了單幅圖像三維場景重建的質(zhì)量,增強(qiáng)了場景恢復(fù)任務(wù)的真實(shí)感和準(zhǔn)確性。CAST在虛擬內(nèi)容創(chuàng)作方面有實(shí)際應(yīng)用,如沉浸式游戲環(huán)境和電影制作,其中真實(shí)世界的設(shè)置可以無縫集成到虛擬場景中。此外,CAST還可用于機(jī)器人領(lǐng)域,實(shí)現(xiàn)高效的實(shí)時(shí)模擬工作流程,并為機(jī)器人系統(tǒng)提供逼真、可擴(kuò)展的模擬環(huán)境。
Bringing the vibrant diversity of the real world into the virtual realm, this collection reimagines open-vocabulary scenes as immersive digital environments, capturing the richness and depth of each unique setting. For each scene, the images display as follows: the top-left shows the input image, the top-center displays the rendered geometry, and the right presents the rendered image with realistic textures.
將真實(shí)世界充滿活力的多樣性帶入虛擬世界,該系列將開放詞匯場景重新想象為身臨其境的數(shù)字環(huán)境,捕捉每個(gè)獨(dú)特場景的豐富性和深度。對于每個(gè)場景,圖像顯示如下:左上方顯示輸入圖像,中上方顯示渲染幾何體,右側(cè)顯示具有真實(shí)紋理的渲染圖像。
論文地址:https://sites.google.com/view/cast4
馬上咨詢: 如果您有業(yè)務(wù)方面的問題或者需求,歡迎您咨詢!我們帶來的不僅僅是技術(shù),還有行業(yè)經(jīng)驗(yàn)積累。
QQ: 39764417/308460098 Phone: 13 9800 1 9844 / 135 6887 9550 聯(lián)系人:石先生/雷先生