Mid Sweden University

miun.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Augmented Telepresence based on Multi-Camera Systems: Capture, Transmission, Rendering, and User Experience
Mid Sweden University, Faculty of Science, Technology and Media, Department of Information Systems and Technology. (Realistic 3D)ORCID iD: 0000-0002-4967-3033
2021 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

 Observation and understanding of the world through digital sensors is an ever-increasing part of modern life. Systems of multiple sensors acting together have far-reaching applications in automation, entertainment, surveillance, remote machine control, and robotic self-navigation. Recent developments in digital camera, range sensor and immersive display technologies enable the combination of augmented reality and telepresence into Augmented Telepresence, which promises to enable more effective and immersive forms of interaction with remote environments.

The purpose of this work is to gain a more comprehensive understanding of how multi-sensor systems lead to Augmented Telepresence, and how Augmented Telepresence can be utilized for industry-related applications. On the one hand, the conducted research is focused on the technological aspects of multi-camera capture, rendering, and end-to-end systems that enable Augmented Telepresence. On the other hand, the research also considers the user experience aspects of Augmented Telepresence, to obtain a more comprehensive perspective on the application and design of Augmented Telepresence solutions.

This work addresses multi-sensor system design for Augmented Telepresence regarding four specific aspects ranging from sensor setup for effective capture to the rendering of outputs for Augmented Telepresence. More specifically, the following problems are investigated: 1) whether multi-camera calibration methods can reliably estimate the true camera parameters; 2) what the consequences are of synchronization errors in a multi-camera system; 3) how to design a scalable multi-camera system for low-latency, real-time applications; and 4) how to enable Augmented Telepresence from multi-sensor systems for mining, without prior data capture or conditioning. 

The first problem was solved by conducting a comparative assessment of widely available multi-camera calibration methods. A special dataset was recorded, enforcing known constraints on camera ground-truth parameters to use as a reference for calibration estimates. The second problem was addressed by introducing a depth uncertainty model that links the pinhole camera model and synchronization error to the geometric error in the 3D projections of recorded data. The third problem was addressed empirically - by constructing a multi-camera system based on off-the-shelf hardware and a modular software framework. The fourth problem was addressed by proposing a processing pipeline of an augmented remote operation system for augmented and novel view rendering.

The calibration assessment revealed that target-based and certain target-less calibration methods are relatively similar in their estimations of the true camera parameters, with one specific exception. For high-accuracy scenarios, even commonly used target-based calibration approaches are not sufficiently accurate with respect to the ground truth. The proposed depth uncertainty model was used to show that converged multi-camera arrays are less sensitive to synchronization errors. The mean depth uncertainty of a camera system correlates to the rendered result in depth-based reprojection as long as the camera calibration matrices are accurate. The presented multi-camera system demonstrates a flexible, de-centralized framework where data processing is possible in the camera, in the cloud, and on the data consumer's side. The multi-camera system is able to act as a capture testbed and as a component in end-to-end communication systems, because of the general-purpose computing and network connectivity support coupled with a segmented software framework. This system forms the foundation for the augmented remote operation system, which demonstrates the feasibility of real-time view generation by employing on-the-fly lidar de-noising and sparse depth upscaling for novel and augmented view synthesis.

In addition to the aforementioned technical investigations, this work also addresses the user experience impacts of Augmented Telepresence. The following two questions were investigated: 1) What is the impact of camera-based viewing position in Augmented Telepresence? 2) What is the impact of depth-aiding augmentations in Augmented Telepresence? Both are addressed through a quality of experience study with non-expert participants, using a custom Augmented Telepresence test system for a task-based experiment. The experiment design combines in-view augmentation, camera view selection, and stereoscopic augmented scene presentation via a head-mounted display to investigate both the independent factors and their joint interaction.

The results indicate that between the two factors, view position has a stronger influence on user experience. Task performance and quality of experience were significantly decreased by viewing positions that force users to rely on stereoscopic depth perception. However, position-assisting view augmentations can mitigate the negative effect of sub-optimal viewing positions; the extent of such mitigation is subject to the augmentation design and appearance.

In aggregate, the works presented in this dissertation cover a broad view of Augmented Telepresence. The individual solutions contribute general insights into Augmented Telepresence system design, complement gaps in the current discourse of specific areas, and provide tools for solving challenges found in enabling the capture, processing, and rendering in real-time-oriented end-to-end systems.

Place, publisher, year, edition, pages
Sundsvall: Mid Sweden University , 2021. , p. 70
Series
Mid Sweden University doctoral thesis, ISSN 1652-893X ; 345
National Category
Computer and Information Sciences
Identifiers
URN: urn:nbn:se:miun:diva-41860ISBN: 978-91-89341-06-7 (print)OAI: oai:DiVA.org:miun-41860DiVA, id: diva2:1544394
Public defence
2021-05-17, C312, Mittuniversitetet Holmgatan 10, Sundsvall, 14:00 (English)
Opponent
Supervisors
Available from: 2021-04-15 Created: 2021-04-15 Last updated: 2021-11-22Bibliographically approved
List of papers
1. Assessment of Multi-Camera Calibration Algorithms for Two-Dimensional Camera Arrays Relative to Ground Truth Position and Direction
Open this publication in new window or tab >>Assessment of Multi-Camera Calibration Algorithms for Two-Dimensional Camera Arrays Relative to Ground Truth Position and Direction
2016 (English)In: 3DTV-Conference, IEEE Computer Society, 2016, article id 7548887Conference paper, Published paper (Refereed)
Abstract [en]

Camera calibration methods are commonly evaluated on cumulative reprojection error metrics, on disparate one-dimensional da-tasets. To evaluate calibration of cameras in two-dimensional arrays, assessments need to be made on two-dimensional datasets with constraints on camera parameters. In this study, accuracy of several multi-camera calibration methods has been evaluated on camera parameters that are affecting view projection the most. As input data, we used a 15-viewpoint two-dimensional dataset with intrinsic and extrinsic parameter constraints and extrinsic ground truth. The assessment showed that self-calibration methods using structure-from-motion reach equal intrinsic and extrinsic parameter estimation accuracy with standard checkerboard calibration algorithm, and surpass a well-known self-calibration toolbox, BlueCCal. These results show that self-calibration is a viable approach to calibrating two-dimensional camera arrays, but improvements to state-of-art multi-camera feature matching are necessary to make BlueCCal as accurate as other self-calibration methods for two-dimensional camera arrays.

Place, publisher, year, edition, pages
IEEE Computer Society, 2016
Keywords
Camera calibration, multi-view image dataset, 2D camera array, self-calibration, calibration assessment
National Category
Signal Processing Computer and Information Sciences
Identifiers
urn:nbn:se:miun:diva-27960 (URN)10.1109/3DTV.2016.7548887 (DOI)000390840500006 ()2-s2.0-84987849952 (Scopus ID)STC (Local ID)978-1-5090-3313-3 (ISBN)STC (Archive number)STC (OAI)
Conference
2016 3DTV-Conference: The True Vision - Capture, Transmission and Display of 3D Video, 3DTV-CON 2016; Hamburg; Germany; 4 July 2016 through 6 July 2016; Category numberCFP1655B-ART; Code 123582
Funder
Knowledge Foundation, 20140200
Available from: 2016-06-17 Created: 2016-06-16 Last updated: 2025-02-18Bibliographically approved
2. Modeling Depth Uncertainty of Desynchronized Multi-Camera Systems
Open this publication in new window or tab >>Modeling Depth Uncertainty of Desynchronized Multi-Camera Systems
2017 (English)In: 2017 International Conference on 3D Immersion (IC3D), IEEE, 2017Conference paper, Published paper (Refereed)
Abstract [en]

Accurately recording motion from multiple perspectives is relevant for recording and processing immersive multi-media and virtual reality content. However, synchronization errors between multiple cameras limit the precision of scene depth reconstruction and rendering. In order to quantify this limit, a relation between camera de-synchronization, camera parameters, and scene element motion has to be identified. In this paper, a parametric ray model describing depth uncertainty is derived and adapted for the pinhole camera model. A two-camera scenario is simulated to investigate the model behavior and how camera synchronization delay, scene element speed, and camera positions affect the system's depth uncertainty. Results reveal a linear relation between synchronization error, element speed, and depth uncertainty. View convergence is shown to affect mean depth uncertainty up to a factor of 10. Results also show that depth uncertainty must be assessed on the full set of camera rays instead of a central subset.

Place, publisher, year, edition, pages
IEEE, 2017
Keywords
Camera synchronization, Synchronization error, Depth estimation error, Multi-camera system
National Category
Signal Processing Other Engineering and Technologies
Identifiers
urn:nbn:se:miun:diva-31841 (URN)10.1109/IC3D.2017.8251891 (DOI)000427148600001 ()2-s2.0-85049401578 (Scopus ID)978-1-5386-4655-7 (ISBN)
Conference
2017 International Conference on 3D Immersion (IC3D 2017), Brussels, Belgium, 11th-12th December 2017
Projects
LIFE project
Funder
Knowledge Foundation, 20140200
Available from: 2017-10-13 Created: 2017-10-13 Last updated: 2025-02-18
3. LIFE: A Flexible Testbed For Light Field Evaluation
Open this publication in new window or tab >>LIFE: A Flexible Testbed For Light Field Evaluation
Show others...
2018 (English)Conference paper, Published paper (Refereed)
Abstract [en]

Recording and imaging the 3D world has led to the use of light fields. Capturing, distributing and presenting light field data is challenging, and requires an evaluation platform. We define a framework for real-time processing, and present the design and implementation of a light field evaluation system. In order to serve as a testbed, the system is designed to be flexible, scalable, and able to model various end-to-end light field systems. This flexibility is achieved by encapsulating processes and devices in discrete framework systems. The modular capture system supports multiple camera types, general-purpose data processing, and streaming to network interfaces. The cloud system allows for parallel transcoding and distribution of streams. The presentation system encapsulates rendering and display specifics. The real-time ability was tested in a latency measurement; the capture and presentation systems process and stream frames within a 40 ms limit.

Keywords
Multiview, 3DTV, Light field, Distributed surveillance, 360 video
National Category
Computer and Information Sciences
Identifiers
urn:nbn:se:miun:diva-33620 (URN)000454903900016 ()2-s2.0-85056147245 (Scopus ID)978-1-5386-6125-3 (ISBN)
Conference
2018 3DTV Conference: The True Vision - Capture, Transmission and Display of 3D Video (3DTV-CON), Stockholm – Helsinki – Stockholm, 3-5 June 2018
Projects
LIFE Project
Funder
Knowledge Foundation, 20140200
Available from: 2018-05-15 Created: 2018-05-15 Last updated: 2021-04-15Bibliographically approved
4. View Position Impact on QoE in an Immersive Telepresence System for Remote Operation
Open this publication in new window or tab >>View Position Impact on QoE in an Immersive Telepresence System for Remote Operation
Show others...
2019 (English)In: 2019 Eleventh International Conference on Quality of Multimedia Experience (QoMEX), IEEE, 2019, p. 1-3Conference paper, Published paper (Refereed)
Abstract [en]

In this paper, we investigate how different viewing positions affect a user's Quality of Experience (QoE) and performance in an immersive telepresence system. A QoE experiment has been conducted with 27 participants to assess the general subjective experience and the performance of remotely operating a toy excavator. Two view positions have been tested, an overhead and a ground-level view, respectively, which encourage reliance on stereoscopic depth cues to different extents for accurate operation. Results demonstrate a significant difference between ground and overhead views: the ground view increased the perceived difficulty of the task, whereas the overhead view increased the perceived accomplishment as well as the objective performance of the task. The perceived helpfulness of the overhead view was also significant according to the participants.

Place, publisher, year, edition, pages
IEEE, 2019
Keywords
quality of experience, augmented telepresence, head mounted display, viewpoint, remote operation, camera view
National Category
Telecommunications Other Engineering and Technologies Computer and Information Sciences
Identifiers
urn:nbn:se:miun:diva-36256 (URN)10.1109/QoMEX.2019.8743147 (DOI)000482562000001 ()2-s2.0-85068638935 (Scopus ID)978-1-5386-8212-8 (ISBN)
Conference
Quality of Multimedia Experience (QoMEX), Berlin, Germany, 5-7, 2019
Funder
Knowledge Foundation, 20160194
Available from: 2019-06-10 Created: 2019-06-10 Last updated: 2025-02-18Bibliographically approved
5. Joint effects of depth-aiding augmentations and viewing positions on the quality of experience in augmented telepresence
Open this publication in new window or tab >>Joint effects of depth-aiding augmentations and viewing positions on the quality of experience in augmented telepresence
Show others...
2020 (English)In: Quality and User Experience, ISSN 2366-0139, E-ISSN 2366-0147, Vol. 5, p. 1-17Article in journal (Refereed) Published
Abstract [en]

Virtual and augmented reality is increasingly prevalent in industrial applications, such as remote control of industrial machinery, due to recent advances in head-mounted display technologies and low-latency communications via 5G. However, the influence of augmentations and camera placement-based viewing positions on operator performance in telepresence systems remains unknown. In this paper, we investigate the joint effects of depth-aiding augmentations and viewing positions on the quality of experience for operators in augmented telepresence systems. A study was conducted with 27 non-expert participants using a real-time augmented telepresence system to perform a remote-controlled navigation and positioning task, with varied depth-aiding augmentations and viewing positions. The resulting quality of experience was analyzed via Likert opinion scales, task performance measurements, and simulator sickness evaluation. Results suggest that reducing the reliance on stereoscopic depth perception via camera placement has a significant benefit to operator performance and quality of experience. Conversely, the depth-aiding augmentations can partly mitigate the negative effects of inferior viewing positions. However the viewing-position based monoscopic and stereoscopic depth cues tend to dominate over cues based on augmentations. There is also a discrepancy between the participants’ subjective opinions on augmentation helpfulness, and its observed effects on positioning task performance.

Keywords
Quality of Experience, Augmented Reality, Telepresence, Head Mounted Displays, Depth Perception
National Category
Computer and Information Sciences
Identifiers
urn:nbn:se:miun:diva-38413 (URN)10.1007/s41233-020-0031-7 (DOI)
Funder
European Regional Development Fund (ERDF), 20201888Knowledge Foundation, 20160194
Available from: 2020-02-13 Created: 2020-02-13 Last updated: 2021-04-15Bibliographically approved
6. Camera and Lidar-based View Generation for Augmented Remote Operation in Mining Applications
Open this publication in new window or tab >>Camera and Lidar-based View Generation for Augmented Remote Operation in Mining Applications
2021 (English)In: IEEE Access, E-ISSN 2169-3536, Vol. 9, p. 82199-82212Article in journal (Refereed) Published
Abstract [en]

Remote operation of diggers, scalers, and other tunnel-boring machines has significant benefits for worker safety in underground mining. Real-time augmentation of the presented remote views can further improve the operator effectiveness through a more complete presentation of relevant sections of the remote location. In safety-critical applications, such augmentation cannot depend on preconditioned data, nor generate plausible-looking yet inaccurate sections of the view. In this paper, we present a capture and rendering pipeline for real time view augmentation and novel view synthesis that depends only on the inbound data from lidar and camera sensors. We suggest an on-the-fly lidar filtering for reducing point oscillation at no performance cost, and a full rendering process based on lidar depth upscaling and in-view occluder removal from the presented scene. Performance assessments show that the proposed solution is feasible for real-time applications, where per-frame processing fits within the constraints set by the inbound sensor data and within framerate tolerances for enabling effective remote operation.

Place, publisher, year, edition, pages
IEEE, 2021
Keywords
Augmented reality, disocclusion, industry 4.0, lidar imaging, mining technology, real-time rendering, remote operation, view synthesis
National Category
Computer and Information Sciences
Identifiers
urn:nbn:se:miun:diva-41859 (URN)10.1109/ACCESS.2021.3086894 (DOI)000673980500001 ()2-s2.0-85111074643 (Scopus ID)
Available from: 2021-04-15 Created: 2021-04-15 Last updated: 2021-08-10Bibliographically approved

Open Access in DiVA

fulltext(3748 kB)1442 downloads
File information
File name FULLTEXT01.pdfFile size 3748 kBChecksum SHA-512
dfe6225a2ae6a7076ad435a662b73edfc5da5fc1a17cba2b393a56e3cf9d6aa2dd4c646e842cde230aea53c53d73328c96301acdaace8d97e404fcdb39fa3d23
Type fulltextMimetype application/pdf

Authority records

Dima, Elijs

Search in DiVA

By author/editor
Dima, Elijs
By organisation
Department of Information Systems and Technology
Computer and Information Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 1453 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 1631 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf