Thanks to the advent of telepresence applications, we can remotely take control and operate industrial machinery. Teleoperation removes operators from hazardous workplaces such asmining and plays an essential role in the safety of workers. In addition, augmented telepresence can introduce information that helps the user understand the remote scene. However, remote operation has challenges when the received information is more limited than what could be perceived on-site–e.g., judging depth. This study investigates how well operators interact with an Augmented Remote Operation Scaling System (AROSS) in a mining context when different computer-generated visual interfaces are provided. The system can achieve five augmented views: Disocclusion Augmentation using selective content removal; Novel Perspective view generation; Lidar view; Right field of view; and Left field of view. We performed two experiments in a mine-like laboratory. The first experiment was a feasibility test to obtain an understanding of what users need to accurately perceive depth. The second experiment was designed to evaluate the user’s experience with the different versions of AROSS. To analyze human interaction with the designed prototype, we employed a mixed research methodology that used interviews, observations, and questionnaires. This mixed methodology consisted of quality of experience methods to discover the users’requirements from a technological standpoint and user experience methods (i.e., user-centricapproaches). We investigated 10 and 11 users’ interactions in the two subjective experiments. The first experiment focused on the effects of in-view augmentations and interface distributions on perceiving wall patterns. The second focused on the effects of augmentations on the depth and understanding the 3D environment. Using these data, we analyzed both thequality of experience and user experience via evaluation criteria consisting of interface helpfulness, task performance, potential improvement, and user satisfaction. The feasibility test results were mainly used to structure the formative investigation. The overall conclusion from the formative testing shows that the remote operators preferred using natural views (Original) as this approach made it easier to understand the environment. Although the augmented computer-generated views do not look natural, they support 3D cues. In addition, the combination of Novel Perspective and Lidar interfaces as additional views in depth perception tasks seemed helpful. There was difficulty performing tasks when the robot arm was obscured during the Disocclusion Augmentation view and low video quality during the Novel Perspective view. However, participants found the Novel Perspective view useful for geometry and depth estimation.