SegPoint: Segment Any Point Cloud via Large Language Model

1Nanyang Technological University     2Fudan University    

Example of functionality in SegPoint. SegPoint can complete various point cloud tasks in a unified framework by leveraging task-specific prompts, including 1) 3D instruction segmentation, 2) 3D referring segmentation, 3) 3D semantic segmentation, and 4) 3D open-vocabulary semantic segmentation.

Abstract

Despite significant progress in 3D point cloud segmentation, existing methods primarily address specific tasks and depend on explicit instructions to identify targets, lacking the capability to infer and understand implicit user intentions in a unified framework. In this work, we propose a model, called SegPoint, that leverages the reasoning capabilities of a multi-modal Large Language Model (LLM) to produce point-wise segmentation masks across a diverse range of tasks: 1) 3D instruction segmentation, 2) 3D referring segmentation, 3) 3D semantic segmentation, and 4) 3D open-vocabulary semantic segmentation. To advance 3D instruction research, we introduce a new benchmark, Instruct3D, designed to evaluate segmentation performance from complex and implicit instructional texts, featuring 2,565 point cloud-instruction pairs. Our experimental results demonstrate that SegPoint achieves competitive performance on established benchmarks such as ScanRefer for referring segmentation and ScanNet for semantic segmentation, while delivering outstanding outcomes on the Instruct3D dataset. To our knowledge, SegPoint is the first model to address these varied segmentation tasks within a single framework, achieving satisfactory performance.

ß

Method: SegPoint

Overview of the Proposed Approach. The pipeline of SegPoint. Given input point cloud and text query, the multi-modal LLM \(\mathcal{F}\) generates text output. Geometric Enhancer Module \(\mathcal{G}\) injects geometric information into Point Encoder \(\mathcal{E}\) and obtains point features \(\hat{{f}}_{point}\). Per-point embeddings \({{f}}_{\mathcal{P}}\) derived from Geometric-guided Feature Propagation \(\mathcal{P}\) multiplied with the embedding associated with the token yield the final segmentation masks.

Instruct3D Dataset Visualization

empty

Downloads



BibTeX

Please consider to cite SegPoint if it helps your research.
@inproceedings{SegPoint,
  title={SegPoint: Segment Any Point Cloud via Large Language Model},
  author={He, Shuting and Ding, Henghui and Jiang, Xudong and Wen, Bihan},
  booktitle={ECCV},
  year={2024}
}

License

Creative Commons License
GRES is licensed under a CC BY-NC-SA 4.0 License. The data of Instruct3D is released for non-commercial research purpose only.