last update: July 1999
What is medical virtual reality (VR)? Simply, VR with a medical application. It involves at a minimum the visualisation of data (which is usually anatomical) in three dimensions, and interaction with that visualised data. In other words, the 3D visualisation must be manipulable through user interaction with the computer system providing the visualisations.
Using VR involves interactive examination and manipulation of data about 3D reality, presented as a simulated 3D reality. Additional important features are the simulation of behaviour of the visualised data (increasingly including behaviour as a result of interaction), and feedback in modalities other than vision (sound, force, touch, smell). The data then are not only merely visualised, and it is useful to think in terms of computer "realisation" of data in forms that can engage several of the user's senses.
Weaknesses of current medical VRs include low fidelity of the simulations (level of detail needed to distinguish different tissue types, for example) as compared with the real thing, speed of interaction (which trades off against fidelity) and lack of realism (restricted modalities, unrealistic feedback from interactions).
Another aspect of realism and fidelity that needs further development for surgical applications is the co-registering of data from different sources (MRI and CAT, for example) to produce an accurate, composite realisation with which surgeons can interact in real time. There has also been relatively little work on the human-computer design aspects of medical VR. A focus on these (including more user trials with medical practitioners) would improve usability in actual medical settings.
Surgery is heavily dependent on patient data. In surgery planning, surgeons interact with models of patient anatomy. In surgery training, they operate on a model that is built from patient data. This obviously requires that the models be as accurate as possible for the available data.
Patient data used in VR may come from several sources familiar to medical practitioners:
Computer (Aided) Tomography (CT) (CAT)
Magnetic Resonance Imaging (MRI)
Ultrasound
Physiological Imaging (PET) (SPECT)
Others: range finders, etc..
Imaging involves the collection of anatomical or physiological data from the patient. Computer graphics techniques - rendering and modelling - are then used to display that data as (part of) a virtual body so that it can be examined and manipulated. The rest of this section deals with imaging and rendering techniques. The next section (2.2) treats the modelling of objects and the simulation of their behaviour.
2.1.1 Volume Imaging Technologies
Data sets originate as voxels (volume elements) derived from the imaging of anatomy: CT, MRI, MRA, MRV, Ultrasound; or of physiological function: PET, SPECT, fMRI
Visible Human as 2D slices (a stack of CT scans)
With current (pre-VR) technology, surgeons view what is essentially 3D information as large sets of 2D "slices". One of the many skills that must be developed by the surgeon is that of mentally imagining how the 2D pictures relate to the 3D anatomical reality. Put very simply, VR does the job of realising data in 3D that was previously done only in the surgeon's imagination. This 3D VR representation can then be examined in detail, shared and discussed with others, and related precisely to physical reality (for example, in stereotactic surgery).
Another advantage of VR is that data from different source - CT, MRI, MRA, X-ray, for example, can be combined (or registered together) in the same 3D realisation. This is extremely useful in understanding how different aspects of anatomy relate to each other - blood vessels and bones, for example.
Voxelman showing registration of several data sources
Computer Tomography (CT or CAT), from tomo, meaning cut (volume), and graph, meaning display, is based on low intensity X-Ray projections shot from many angles. A computer then creates a graphical representation of the slice. CT works by absorption of rays and is particularly good for displaying bones. Variations include Spiral CT and Open CT
CT scans from the Visible Human
Nuclear Magnetic Resonance Imaging (NMR, MRI) despite its name, is based on magnetism, not harmful radiation (unlike X-rays). It relies on the differential decay and recovery characteristics of the proton NMR signal. It gives high contrast among various soft tissues and organs, and so is good for the head, spine and joints. Variations include tagged MRI, MRA (arteries), MRV (veins), and Open MR
Magnetic resonance techniques are particularly useful in visualising temperature changes for so-called thermal surgery procedures. MR can show changes in tissues resulting from heating, and so can be used to monitor these procedures.
MRA
MRI stack from the Visible Human
Ultrasound is based on echo imaging. An acoustic wave is launched, which interacts with tissue and blood and some of the energy returns. There is no ionizing radiation, and acquisition is in real time. Ultrasound equipment is also relatively inexpensive. Applications of ultrasound include cardiology, neurosurgery, gynaecology, abdominal imaging, and vascular imaging. Because it is harmless and instant, it is often used as a real-time guiding tool in surgical interventions.
Ultrasound views from Fraunhofer
Physiological (Functional) imaging is a form of nuclear medicine: imaging the decay of radio-isotopes bound to molecules with known biological properties, using a rotation camera. The amount of brain activity determines the level of absorption.
With SPECT (Single Photon Emission Computer Tomography), a gamma-ray emitting radio-isotope chemical is administered, and the photons emitted by the decaying isotope are recorded.
SPECT image
PET: Positron-Emission Tomography is more precise but also more expensive., since an on-site cyclotron is needed to provide positron-emitting isotopes. Scanners for PET are also more expensive than single-photon cameras.
PET scans
2.1.2 Surface Rendering and Hybrid Models
Surface rendering converts volumes into geometric primitives using a form of isocontouring. Isocontouring relies on "thresholding", which requires knowledge of data, since noise blurs the boundaries between regions. This, of course, results in a significant loss of information from the original volumetric data. It is popular because it is highly tractable in computer graphics terms. It capitalises on known polygonal geometry implemented in readily available hardware and software. It allows shading and other calculations, including deformations of the surfaces. The complexity of data is reduced, and so its rendering is speeded up, because the surfaces have no contents within them.
Visible Productions skull
From PTI
Hybrid Models
Hybrid models are currently very widely used and are composed of polygonal surface models with 2D textures "mapped" onto them, This can produce a quite realistic effect, although inaccuracies can be seen when object are views from different angles - because the surface textures applied tend to be uniform and/or are not changed according to viewing angle or distance.
EVL eye
The most successful surgery trainers - particularly for endoscopic surgery, but also for some (relatively simple) open surgery procedures - use such hybrid models.
Boston Dynamics Suture Trainer
2.1.3 Volume Rendering
Volume rendering depends on ray casting from the volume in question to the eye. The advantages lie in the immediate conversion of volumetric data into a truly 3D rendering. This supports such features as selective transparency, cut views, and so on. Stereoscopic vision is important here to assist in clarifying relative positions of rendered features. The disadvantages revolve around the huge amount of data that must be handled, which requires specialised and expensive hardware.
volume rendering from CT, MRI data
Volume rendering from MRA, MRI data
Volume rendering works directly on the volumetric data, and renders by traversing the whole data set every time. Surface rendering avoids that by first extracting a surface out of the volumetric data, and then manipulating the surface. This is fine if the surgeon only wants to looks at the surface of things (for example, looking at a skeleton), but whenever he wants to look inside, he needs the data from within to be rendered. Volume rendering does that, but the processing is demanding (data sets can go up to hundreds of megabytes) - which is why surface rendering was invented in the first place - to only ever display part of the data.
The other problem with surface rendering is that extracting the surface is not easy, since a threshold that determines inside and outside must be calculated and identified. That can take some time, of the order of hours. Every time the surface is created (using a well-established computer graphics technique called "Marching Cubes") millions of polygons are generated. In other words, although rendering the surface is fast, because graphics machines deal with millions of polygons in a few seconds or less, creating the surfaces takes much longer.
To work on volumes directly, many techniques have been devised and the field is already 15 years old. The trick is to do it in real time, and that is where the Silicon Graphics "volume rendering using 3D textures" technique is useful, since it takes advantages of their specialised hardware to render volumes fast. There are currently several companies trying to do volume rendering in real time, with and without special hardware. VOXAR is one. There is a German company called VolumeGraphics that is developing software for the PC that displays volumes using four CPUs.
In summary, volume rendering is recommended because it is faithful to the original data (no thresholding that empties the data), and thus avoids the step of thresholding, but it requires a lot of power to transfer 256x256x100 data elements to the screen. Volume rendering using 3D Textures (the name of the SGI technique) is restricted to expensive SGI machines, although the prices are becoming more affordable with the introduction of the Octane series.
The modelling of objects is the first step in adding meaning to what the computer scans and displays. The problem is initially one of identification - which are the objects in a set of volumetric data? Once the objects are known, the simulation of behaviours associated with different objects becomes possible. That means, in a simple case, that if an object is known and not just shown, the surgeon can interact with it (moving one highlighted structure relative to the others, for example).
Many electronic anatomical atlases exist, mostly based on the Visible Human or on standard anatomical atlases used in surgery
Segmented and labelled at KRDL, Singapore, from Visible Human data
Talairach and Tournoux Atlas (left) and Schaltenbrand and Wahren Atlas (right) - both courtesy of KRDL
Volume rendering is also a natural way of combining data from different sources - i.e. more than one image modality (CT, MRI, etc...) - a process known as "registration".
VIVIAN from KRDL
Voxelman from University of Hamburg
Having identified objects, it is possible to do things with them, such as highlight them, render them transparent, rotate or otherwise move them. Further than that, through physically-based modelling, it becomes possible to model behaviours inherent in the objects themselves - such as the way they compact or bend when pressure is applied in particular ways.
Catheter simulation (da Vinci) from KRDL
A big advantage of object and behaviour simulation is that it allows
prediction of outcomes, not just planning the interventions. For example,
a face can be visualised after reconstructive plastic surgery to access
the appeal of the results.
Craniofacial surgery simulation from Erlangen, Germany
Interaction techniques for VR involve a close relationship between output to the user and input from the user. Often, they have been dealt with separately (with different devices), but increasingly it is impossible to separate the two. In this section, we deal with both input and output, sometimes separately, sometimes in an integrated way. For example, the next subsection on visual display is mostly about output from the system, whereas the subsection on tracking is about input, and the subsection on force feedback combines both.
2.3.1 Visual Displays
The centre of any Virtual Reality today is the visual display. VR developed out of computer graphics, and is still largely concerned with how data can best be presented visually to impart a sense of realism. An important aspect of this is how a sense of 3D is conveyed, the most common approach being to mimic binocular vision by rendering two slightly different displays and, by some means, presenting them separately to the two eyes to give a stereoscopic effect.
When the head is tracked and an appropriate display is used, head motion parallax is also used as a stereoscopic vision cue. Opinions vary about how useful head tracking is. Some studies have shown little benefit, and the computational costs are high. Object motion parallax also provides a cue to depth, but only when the objects move relative to each other. Other stereo cues contributing to human depth perception (accommodation, muscle contraction, eyeball shape and pressure, texture flows, etc..) are generally not used.
Other issues include the resolution of the display (how much detail it can show) and whether accurate manipulation of displayed objects is needed.
Resolution is higher with screen-based, desktop VR, but the sense of immersion is less strong and head-tracking is often considered to be less useful. Because the user is not immersed in the space produced, he cannot conveniently reach into the space to manipulate objects - either his hands get in the way of the display if the stereo image appears in front of the screen, or the screen gets in the way if the volume is presented as if behind the screen. These problems arise, of course, because the apparently 3D space is really produced on a 2D screen. This can be avoided by using a mirror to produce an apparent 3D volume in a place where the hands can physically reach.
Head-mounted displays avoid this problem in another way, by placing the images directly in front of the eyes, tracking the head position and altering the display appropriately. The user can then manipulate things displayed in an apparent 3D space without obstruction by the hands and without the space being out of reach behind the screen. But resolution is low, and tracking head and unconstrained hand positions is inaccurate and computationally expensive - and therefore slow. Typically, users of such immersive VRs report some nausea due to slow updating of the images when they move their heads. The accuracy of head tracking also tends to reduce with time. However, HMD technology is improving all the time, and new techiques - such as Retinal Laser Scanning and Light Pipes - show considerable promise.
HMD from Virtual Research
Vista Medical: HMD for operating theatre, microscope image
The main problem with Head Up Displays, where views of the world (usually video images) are combined with views of data in a form of Augmented Reality, is registration of the different sources. They also tend to be extremely cumbersome.
HUD from University of North Carolina
Projection Systems, using walls and tables, are useful for teaching groups or for collaborative discussions, but are less suitable for individual examination of data or for training of surgical skillls in a realistic way.
The CAVE from EVL, University of Illinois
Immersadesk, aka Immersive Workdesk from Pyramid Systems
Responsive Workbench, from GMD, Germany
Mechanically tethered displays attempt to avoid the problems of low resolution and tracking head position inherent in HMDs, by attaching the screen to the front of the face and tracking its position mechanically. This again is rather cumbersome, as shown in the pictures of the BOOM from Fakespace Labs, who also produce a smaller, desktop version called the PUSH.
The BOOM from Fakespace Labs
Reflection systems use the virtual image in a mirror to produce an apparent space with high resolution and into which the user can reach.
Boston Dynamics (left) and CMU Enhanced Reality (right)
KRDL Virtual Workbench
Penn State University mirror display
2.3.2 Tracking
Tracking concerns the various ways in which the hands, the head, and occasionally the rest of the body can be used to manipulate or inspect the data. In other words, it is about detecting the position of parts of the body, so that the display can be changed to reflect user actions.
Trackers in common use include various "wired" gloves, props, joysticks, and specialised trackers with buttons
When used in surgery trackers report back 3D position and orientation in space. They tend to rely on connection by RF(Radio Frequencies) signals rather than wires, since this gives a wide range of action, and allows unencumbered interaction. General purpose trackers, such as the Polhemus stylus and the Ascension Bird, are often used. Coils are used to track catheters. Other connection methods include ultrasound, which is low-cost but suffers from blind areas; mechanical coupling, which is accurate, but bulky and also has blind areas; and infrared, which is low cost, but requires a clear line-of-site and so also results in blind areas.
Gloves report finger position and angle and provide a very natural interface. They also require a tracker to determine the absolute position of the hand. However, they are hard to calibrate (and the calibration slips progressively with time) and cumbersome to put on.
Wired glove from UC Berkeley Robotics Department
University of Virginia "Props" Interface
Bat (U. Alberta)
Joysticks (Division)
2.3.3 Tactile Feedback
The purpose of tactile feedback is to convey a sense of the feel of an object or surface - its texture, weight, response to pressure, etc..
Vibro-tactile devices depend on either voice coils or piezo-electric vibrators to vibrate a surface against a finger tip at various frequencies. Similarly, electro-tactile devices create vibrations in the finger tip - they function like the pads that physiotherapists use to stimulate the muscles electrically.
Micro pin arrays are more sophisticated and can give an impression of complex surface textures. They involve a matrix of tiny pins each of which can apply pressure to the skin, reflecting the texture of the surface. Unlike the vibration-based devices, they can also provide information about edges.
Pneumatic systems go a stage further, allowing shape as well as textures and edges to be conveyed. They usually work in combination with a glove, by dynamically filling pockets in the glove with air to convey the feel of the object being displayed. However, they do not deal with forces, and so the illusion is destroyed when pressure is applied by the fingers, which then seem to pass directly through the object.
UC Berkeley, Robotics and Intelligent Machines Laboratory
2.3.4 Force Feedback
Force feedback systems combine output of forces from the system, with input of positions and forces to the system. This means that the user feels the force of objects in response to the forces he applies. Objects have apparent weight and inertia. To work, they require a structure again which forces can be generated. This is sometimes achieved by means of an exoskeleton that fits over the hand or glove, or by means of a specialised "gantry" through which all manipulations must be made.
Finger Force Feedback - SensAble`s PHANToM
Boston Dynamics Surgical Skills Simulator Immersion Laparoscopic Impulse Engine
To prevent the hand going through the object if large forces are applied, the exoskeleton or gantry must be bolted onto a more-or-less immoveable object, such as a heavy desk or the floor.
As well as the manipulation of data, such devices can form a component in robotic surgery.
Force feedback systems are good for certain types of application, notably endosopy simulation, but they are still too crude to convey subtle tactile differences between soft tissue types.
2.3.5 Auditory Displays
Sound is currently used only to signal relatively simple information - when a certain structure has been selected, for example - or to attract attention. However, the potential of auditory displays is much greater. People are very good at detecting changes in an auditory signal, and this allows sounds to be used to convey a range of subtle variations in texture. Sounds can also convey position information and provide feedback on whether a path is being accurately followed.
Auditory displays have the potential to be used in all areas of medical VR - from endoscopic trainers to complex surgery planners. The main drawback is that they are unnatural, but it remains to be tested how important this is in practice.
Augmented reality refers to the blending of the simulated virtual world (from medical data) with the real world (from normal vision or video). One popular application is in obstetrics, using data obtained from real-time ultrasound scans.
Ultrasound data seen in HUD superimposed on real world
The main problem is that of registering the two or more sources together accurately. There is also a suggestion that surgeons are unhappy with augmented reality because it implies a degradation of their direct view of the patient. They may prefer to switch between the two, rather than have the two fused together.