## Digital Image Correlation using GPU Computing Applied to Biomechanics

**Amarjot Singh**^{1}, **S.N. Omkar**^{2,}

^{1}Department of Electrical and Electronics Engineering, National Institute of Technology, Warangal, India

^{2}Department of Aerospace Engineering, Indian Institute of Science, Bangalore, India

3. Displacement Field Measurement by Digital Image Correlation

5. Problem Formulation and Experimentation

### Abstract

DIC an extremely effective, non-contact analysis tool applied here to biomechanics research, in order to examine the strain pattern due to wrist extension and calf stretching experiments. The DIC code developed computes the in-plane strain with a correlation function, using pictures taken before and after extension, using a CCD camera. The shift between the initial picture and subsequent one is calculated by computing cross-correlation using FFT. The intermediate FFT cross correlation step can be computationally expensive depending upon the field of application, like biomechanics involving high computational power. This paper explains the methodology for harnessing the power of GPU for Image Processing and Computer Vision, thereby providing dramatic speedups on commodity, readily available graphics hardware. Further, a brief review of the DIC algorithms mapped to the GPU vision is presented. The latest NVIDIA CUDA programming model is explained in order to achieve parallelism without the need for graphics expertise. The paper also gives a detailed description of GPU architectures, GPU computing, the software environments used for programming followed by the advantages and disadvantages of the technique. Further, the paper proves the capability and superiority of powerful GPU computing with CPU on the basis of runtime analysis, applied to biomechanical experiments. The paper presents the results with huge speed ups in both biomechanical experiments.

### At a glance: Figures

**Keywords:** digital image correlation,GPU computing,parallel computing, CUDA

*Biomedical Science and Engineering*, 2013 1 (1),
pp 1-10.

DOI: 10.12691/bse-1-1-1

Received December 22, 2012;Revised February 22, 2013; Accepted March 25, 2013

**Copyright**© 2013 Science and Education Publishing. All Rights Reserved.

### Cite this article:

- Singh, Amarjot, and S.N. Omkar. "Digital Image Correlation using GPU Computing Applied to Biomechanics."
*Biomedical Science and Engineering*1.1 (2013): 1-10.

- Singh, A. , & Omkar, S. (2013). Digital Image Correlation using GPU Computing Applied to Biomechanics.
*Biomedical Science and Engineering*,*1*(1), 1-10.

- Singh, Amarjot, and S.N. Omkar. "Digital Image Correlation using GPU Computing Applied to Biomechanics."
*Biomedical Science and Engineering*1, no. 1 (2013): 1-10.

Import into BibTeX | Import into EndNote | Import into RefMan | Import into RefWorks |

### 1. Introduction

Digital Imagery has being used over a number of years to compute and evaluate strain for a number of applications. 1-d time delay estimation techniques were used initially for displacement and strain estimation ^{[1]} while the evaluation of 2-d strain has been developed, using both B-mode data ^{[2]} and raw, radio frequency (rF) data ^{[3]}. These strain-imaging techniques are being used primarily for cardiovascular applications ^{[4]} and breast and prostate tumor research ^{[5]}. A number of groups have been exploring the opportunity of shear strain imaging and its possible applications ^{[6]}. For instance, shear strain imaging has been of interest for characterization of breast tumors ^{[7]} and cardiovascular applications ^{[8]}. Rotation and, especially, torsion are of great interest in cardiovascular research ^{[9]}. Apart from the biomedical application, digital imaging has made its mark in a number of other applications including the measurement of deformation and strain in sheet metal forming analysis, automotive crash testing, rail vehicle safety ^{[23]}, air-plane safety ^{[24]} etc. The methodology is also applied for quantitative evaluation of in-plane deformation characteristics of geo-materials ^{[25]}, and also in medical fields to evaluate local failure of bone ^{[26]}.

In the past, the measurement of strain was restricted to a specified location or discrete target points in a structure i.e. it was impossible to compute the strain distribution over the whole structure. A number of false attempts were made to extract data at a very large number of locations by using discrete targets due to the time consuming process and computational power requirements. A few techniques such as Moiré Interferometry ^{[57]}, holographic interferometry ^{[58]}, speckle photography ^{[59]} have been proposed to acquire the overall deformation contour, but it is often time consuming and involves heavy computational power. On the other hand, Digital Image Correlation (DIC) is a simple and quick state-of-art technique superior to previous techniques due to its ability to compute faster.

DIC was originally introduced in the early ‘80s by researchers from the University of South Carolina ^{[28]}. The idea behind the method was to infer the displacement of the material under test by “tracking the deformation of a random speckle pattern applied to the component’s surface in digital images acquired during the experiment” ^{[27]}. In addition to its non–contact, non destructive and full field measurement capability, the 2-d DIC technique is also well known for its simplicity, low environmental vulnerability and easy processing. A large number of algorithms ^{[30]} have been developed over the years in order to find the solutions through DIC. Among these methods, the most commonly used algorithm involves an iterative solution that finds the maximum of the cross-correlation coefficient in parameter space ^{[38]}. A correlation function is used to calculate the shift between the two images. The algorithm is highly effective when precise displacement between two images have to be calculated. Digital Image correlation has proven to be an extremely effective optical approach with vast applications like determining mechanical properties of human soft tissue in vivo ^{[31]}, direct measurement of two-dimensional strain distributions within articular cartilage under unconfined compression ^{[32]}, measuring Osteocyte lacunae tissue strain in cortical bone ^{[33]}, determining local mechanical conditions within early bone callus ^{[34]} etc. DIC has been used to analyze the stresses in solder interconnects of BGA packages under thermal loading ^{[35]}, ^{[36]} material characterization under thermal loading ^{[37]}, dynamic testing to study deformation for flexible bodies ^{[39]}, material characterization at high strain rate ^{[40]}, stresses and strain in flip-chip die under thermal loading ^{[41]} etc.

DIC has been applied multiple times in the past in the field of applied anthropology to assist practitioners for diagnosis ^{[20]}. DIC has been effectively used to study the mechanical properties of biological soft tissues e.g. using 2D DIC: on the human tympanic membrane ^{[42]}, sheep bone callus ^{[34]}, human cervical tissue ^{[43]} and recently also using 3D DIC: for the bovine cornea ^{[44]} and mouse arterial tissue ^{[45]}. A number of algorithms have been developed and applied to evaluate the strain pattern due to the force applied on different body parts. A correlation function can be used to calculate the shift between the two images taken before and after the deformation by tracking the speckle pattern, leading to evaluation of strain. The computation of cross-correlation can be performed in the physical space ^{[21]} or in the Fourier space using Fast Fourier Transform (FFT) ^{[22]}. The intermediate cross correlation FFT step can involve heavy computations depending upon the field due to the heavy computational complexity of the speckle pattern ^{[11, 12, 13]}.

Digital Image Speckle Correlation (DISC) uses digital image correlation to resolve displacement and deformation gradient ﬁelds. The underlying principle of DISC consists of tracking a geometric point before and after deformation resulting in its displacement vector. The technique is extremely simple and effective but offers some important practical implementation challenges. One of the challenges is the computation complexity. Data interpolation is used in order to acquire the strain pattern over the discrete digital images. Most of the existing algorithms require curve fitting or interpolation, which is computationally complex and expensive. Moreover the iterative algorithms demand the calculation of second-order spatial derivatives of the digital images, which further boosts the computation complexity. In addition, the sensitivities and accuracies claimed in previous studies fluctuate within the order of magnitude from 0.5 pixel ^{[50]} to 0.01 pixel ^{[51]}. The field demands highly efficient and fast methodologies to speed up correlation ^{[10]}. A number of methodologies have been used in order to improve and speed up the performance of the transform in computationally expensive experiments ^{[14]}.

Parallelism is one such methodology considered to be the future of fast computing. Microprocessor development is visualized as a methodology focusing on adding cores rather than increasing single-thread performance. With the increase in need for computation, the highly parallel graphics processing unit (GPU) is growing rapidly as a powerful engine for computationally demanding applications. The graphics processing unit (GPU) has turned out to be an essential part in today’s mainstream computing systems. Despite the differences in architecture and programming compared to other single chip processors, GPUs have a great part to play in future computing systems in terms of performance and potential. To efficiently exploit the level of parallelism as well as mapping of algorithms, GPUs can be effectively used ^{[15]}.

This paper focuses on the analysis of two important biomechanical experiments - wrist and calf stretching - using the technique of DIC implemented on a GPU platform. The idea behind the method is to infer the displacement of the superficial muscles under test by “tracking the deformation of a random speckle pattern applied on the surface of the muscle” ^{[17, 18, 19]}. The focus of the experiment is to examine the strain pattern for extension and/or relaxation of superficial muscles in human using DIC. The biomechanical experiment of wrist stretching called Carpal extension is a remedy for carpal tunnel syndrome, a painful disorder of the wrist ^{[54, 55]} while the calf stretching experiment relates to Tarsal tunnel syndrome popularly known as Big Foot ^{[52]}. In many parts of the world CTS results in billions of dollars of worker compensation claims every year ^{[46]}. CTS is generally, a result of repetitive movements involving the wrist and is common among persons who use computers for a long duration. On the other hand, TTS is mainly observed in athletes or sports persons due to the pressure applied by them on the tarsal tunnel such as tennis calf. It is extremely painful condition leading to inflammation and swelling. These biomechanical experiments can provide vast information regarding the strain experienced by the muscles during the experiments. To the best of the author’s knowledge, this is the first paper which makes an attempt to compute the strain pattern with digital image correlation using GPU computing.

The paper is further divided into following sections. Section two elaborates the biomechanical experiments analysed in the paper with the details of DIC applied to the experiments. The third section elaborates the algorithm of digital image correlation used for the evaluating the strain. The fourth section explains the architectural features of DIC algorithm on the GPU platform leading to parallelization while the fifth section explains problem formulation followed by the experimentation. Sixth section explains in detail the results followed by a brief conclusion presented in the seventh section. Finally, the last section elaborates the limitations of the algorithm as well as graphical processing unit.

### 2. Biomechanical Experiments

The remedy of wrist extension relates to Carpal Tunnel Syndrome ^{[54, 55]}, first quoted in medical literature in 1939 by physician Dr. George S. Phalen ^{[47]}. CTS is a grievous disorder resulted by the compression of median nerve in the carpal tunnel. Carpal tunnel dwelling of CTS is surrounded by bones on three sides and a carpal ligament on the fourth side. The tunnel comprises of nine flexor tendons of the hand, passing through the tunnel. The median nerve passing through the carpal tunnel is compressed by decrease in the size of the carpal tunnel or an increase in the size of tissues (Swelling) around flexor tendons or both. The median nerve gets compressed as it runs down to the transverse carpal ligament (TCL) resulting into weakness of the flexor pollicis brevis, opponens pollicis, abductor pollicis brevis, as well as sensory loss in the distribution of the median nerve distal to the transverse carpal ligament. The most important causes of CTS are related with the biological and structural activities rather than the environmental ones ^{[48]}. Study performed by Gross et. al., ^{[49]} proved the dominance of force as a major risk factor.

The second experiment of calf stretching is a remedy for Tarsal tunnel syndrome (TTS), also called as posterior tibial neuralgia, a painful disorder of the calf, ankle and foot ^{[53]}. TTS is a compression neuropathy and a painful foot condition in which the tibial nerve is compressed as it travels through the tarsal tunnel. The tarsal tunnel is made up of bone on the inside and the flexor retinaculum on the outside. This tunnel is located along the inner calf behind the medial malleolus (bump on the inside of the ankle). Nerves in a bundle namely the posterior tibial artery, tibial nerve, and tendons of the tibialis posterior, flexor digitorum longus, and flexor hallucis longus muscles travel this pathway, through the tarsal tunnel. The flexor retinaculum has a limited ability to stretch, so increased pressure will eventually cause compression on the nerve within the tunnel. With the increase in pressure on the nerves, the blood pressure decreases leading to tingling and numbness. The effect of the entrapment can spread to other area depending upon the locked area. If the entrapment is high, the entire foot can be affected as varying branches of the tibial nerve can become involved. TTS can result into numbness in the foot, pain, burning, electrical sensations, big toe and tingling over the base of foot and heel. Such conditions can even cause damage to the tendons passing through the foot leading to swelling and severe pain.

### 3. Displacement Field Measurement by Digital Image Correlation

DIC is based on image matching algorithm. It can be effectively used in Fourier as well as physical space. The displacement field can be calculated by taking the correlation of an interrogation window of the deformed image with respect to the reference image.

**3.1 Preliminaries: Correlation of Two Images**

To determine the displacement field of one image of the deformed surface with respect to reference image, one considers a sub-image which will be referred to as a Zone of Interest (ZOI) Figure 1. The aim of the correlation method is to match the ZOIs in the respective images. The displacement of a ZOI with respect to its copy in deformed image is a two-dimensional shift of an intensity signal digitized by a camera. The aim is to compute the strain signal , a shifted copy of reference signal. can also be considered as equivalent to the reference signal shifted by . The strain function can be defined as:

(1) |

where are unknown displacement vectors and a random noise. To evaluate the shift () one may minimize the norm of the difference between and ) with respect to x and y,

(2) |

where ‘’ is a dumpy parameter. If one chooses the usual quadratic norm = , the previous minimization problem is equivalent to maximizing the quantity :

(3) |

where ‘’ denotes the cross-correlation operator. Furthermore, when *b* is a white noise, the previous estimate is optimal.

(4) |

The use of the ‘shifting’ property enables one to ‘move’ a signal. For the sake of simplicity, let us consider the shift operator defined for 1D signals , where d is the shift parameter. The FFT of *f* becomes, where the modulation operator is defined by :

(5) |

These two results are the basic tools used for image correlation ^{[29]}.

**Fig**

**ure**

**1**

**.**Schematic of the reference image with the parameters (Q=P and shift

*δ*)

**3.2 Correlation Algorithm for Two Dimensional Signals:**

In this CORRELI ^{[29]} algorithm two images, referred to as ‘reference image’ and ‘deformed image’ as shown in Figure 2 (a) and Figure 2 (b) respectively are considered for strain computation. A region of interest (ROI) of size pixels centered in the reference image is selected as shown in Figure 1. The ROI is composed of a number of random elementary regions called ZOIs as shown in Figure 3. The ROI of the same size as in the reference image is selected in the deformed image. A first FFT correlation between the two ROIs results into the average displacement of the deformed image with respect to the reference image. The maximum of the cross correlation function evaluated for each pixel of the respective ROI is expressed as integer number of pixels representing the shift between the two ROIs. The correlation predicts the maximum number of common pixels between the ROIs. The ROI in the deformed image is now centered at a point corresponding to the displaced center of the ROI in the reference image by an amount. In the next step, in order to track the shift for the ZOIs in the reference image, elementary square ZOIs of size pixels where s< p are selected in the reference image as shown in Figure 1. In order to map the whole image, the shift (=) should be chosen careful such that the shift between two consecutive ZOIs is pixels. These two parameters define the mesh formed by the centers of each ZOI used to analyze the displacement field. Further, the following analysis is performed for each ZOI independently in order to compute the strain for all the ZOIs.

**Fig**

**ure**

**2**

**.**Wrist Extension Experiment: (a) Reference Image (b) Deformed Image

**Fig**

**ure**

**3**

**.**Calf Stretching Experiment: (a) Reference Image (b) Deformed Image

A first FFT correlation of the reference ZOI is carried out with the ROI of the deformed image in order to spot the corresponding ZOI. The correlation results into in plane integral displacement of for the reference ZOI. The displacement correction for the ZOI is completed by displacing the reference ZOI by an additional amount. To limit the errors due to edge effects the considered ZOI is then windowed by a modified Hanning window:

(7) |

where denotes the windowed ZOI, the dyadic product and H the one dimensional modified Hanning window H(i) =

(8) |

The value is considered as an optimal value to minimize the error due to edge effects and to have a sufficiently large number of data unaltered by the window^{[56]}.

The displacement residues are now less than ½ pixels in each direction. A sub-pixel iterative scheme is used further to pin point the position of ZOI by computing the remaining displacement. A sub-pixel correlation of the displacement is determined by evaluating the maximum of a parabolic interpolation of the respective correlation function. The interpolation is performed by considering the maximum pixel and its eight neighboring pixels. By using the ‘Shifting-modulation’ property of the Fourier Transform one can move the deformed ZOI by an amount -, -. Since an interpolation was used, one may induce some errors requiring re-iterating by considering the new ‘deformed’ ZOI until a convergence is reached. The criterion checks the increase in the maximum value of the correlation function with the increase in the number of iterations. Otherwise, the iteration scheme is stopped.

### 4. DIC with GPU

Digital image correlation is an effective image matching methodology successfully used to compute the strain in a number of applications. DIC makes use of convolution between two images taken before and after the application of the force resulting into the shift between both images. This step can involve heavy computations in case of large datasets such as in biomechanics. This paper focuses on computing the strain pattern for wrist and calf muscle when subjected to extension/flexion. These biomechanical experiments can provide vast information regarding the strain experienced by the superficial muscles during the experiments. Two images taken before and after extension were correlated using Fast Fourier transform in order to compute the shift between the two images, finally leading to the strain pattern of the muscles. The simulations were evaluated using the MATLAB software on standard Pentium machine. CPU computes the commands serially one by one leading to the final solution. Serial processing can be extremely expensive in terms of the processing time depending upon the applications. Further, in order to speed up the simulations, Graphical processing unit computing was implemented. The code written by the author connects to the GPU of the machine for the processing as shown in Figure 6. GPU computing was evaluated using the AccerlerEyes jacket in order to speed up the convolution step in the DIC algorithm. The jacket connects the MATLAB software to GPU card instead of CPU leading to parallelism. GPU makes use of parallel operations to simulate a particular application leading to decrease in computation time and ultimately a speed up. Along with the speed up, GPU also adds visualization as well as user-friendliness of MATLAB programming to the system. Jacket enables developers to write and run code on the GPU in the native M-Language used in MATLAB. This is achieved by automatically wrapping the M-Language into a GPU compatible form as shown in Figure 6. The MATLAB functions input to Jacket’s GPU data structure are transformed into GPU functions. In case of DIC, the convolution function is passed to MATLAB which is further converted into a GPU function with the help of the jacket. The convolution step has to be repeated a number of times for each patch in the calf and wrist experiments. In order to speed up the computations, threads are assigned to each loop which can run in parallel as there is no common variable exchange between the loops. This methodology can be used to accelerate all the algorithms involving a repetitive step with no variable exchange between the loops.

### 5. Problem Formulation and Experimentation

This setup consists of a desktop computer, a digital camera and the code used to compute the variations in the wrist/calf positions. The two dimensional variations are measured using the digital image correlation code developed by the authors. The position of the camera and the subject are fixed for both pre and post images. The test was conducted on five healthy and active males with no known records of neural/muscular/skeletal disorders. Informed consent was obtained from each participant before the experiment was conducted. The average age, body height and body mass of the participants are 35.5 years, 170cm and 68.5kg respectively.

**Fig**

**ure**

**6**

**.**CUDA Jacket Model: The programmer directly accesses CPU with MATLAB while with the jacket installed in the system the programmer can work with GPU

The experiment is started by coating the body part (right forearm anterior / right leg posterior) with zinc powder in order to provide contrast for better identification of marker points. Later, the zinc coated portion is covered with a random black speckle pattern with a marker. The subject is made to sit on a chair and place the forearm at the edge of a table so that view for the camera is not obstructed. Initially the elbow, wrist and palm are aligned in the same plane. The initial relaxed forearm is as shown in Figure 2. A CCD camera is used to take the picture, referred to as reference image, of the forearm. In the next step, the wrist is extended by moving the hand approximately 90 degree about the wrist; the deformed image is shown in Figure 2(b). The same setup is used to capture the picture, referred to as deformed image, in the extended position. The same procedure is adapted for the calf muscle exercise. The calf is stretched by dorsiflexing the foot by 45 degree with the help of a wooden slope. The reference and deformed image of the lower leg is shown in Figure 3 (a) and 3 (b) respectively. Finally, the images captured are used for strain computation.

Feature point extraction is the primary step in order to compute the strain pattern. Basic image processing operation of dilation is used in order to extract the feature points from the image. Further, the aim of the algorithm is to match the Zone of Interest of the reference image with the deformed image using a cross correlation function to evaluate the strain. FFT convolution is used in order to compute the strain for the feature points. The experiment is then repeated for five other subjects and the trends are observed. Digital image correlation (DIC) applied to biomechanics requires large amount of computational power in order to correlate the images taken before and after extension, due to the FFT analysis involved for a large speckle pattern. The increasing programming flexibility and computational power offered by Graphics Processing Units* *(GPUs), as well as their low cost, provide a standard way of getting large speed-ups for many algorithms. Hence the ability of graphical processing unit is tested for biomechanical images with respect to the two experiments mentioned above.

### 6. Results

The results obtained from the simulation enable us to explain the capabilities of GPU computing over CPU applied to two biomechanical experiments in order to evaluate the strain experienced by the superficial muscles and tendons. The strain is evaluated by digital image correlation using cross correlation FFT analysis between the two referred and deformed images. FFT computation between two images, an intermediate step of the algorithm is analysed using windows of variable sizes. The methodologies are mainly applied to accelerate the intermediate convolution step applied between the images in order to evaluate the strain pattern. The superiority and speed up of both the techniques is evaluated on an Intel Core 2 Duo 2.20 GHz machine. Speed up is defined as the ratio of time required by CPU computing divided by the time required for computation by GPU computing for a particular case. The results are finally analysed and tabulated accordingly.

The algorithms are compared for two computationally expensive biomechanics experiments of calf and wrist stretching. The experiments aim at computing the elongation of the tendons upon wrist and calf extension. The zone of interest from the reference image as shown in Figure 2(a) and Figure 4(a) is convoluted with each region of interest in the deformed image as shown in Figure 2 (b) and Figure 4(b), in order to locate the shifted position of the reference zone of interest. A high greyscale peak is obtained on convolution of the reference zone of interest with its shifted copy in the deformed image. Once the zone of interest is located in the deformed image the strain for the experiment can be computed.

**Fig**

**ure**

**4**

**.**Flow chart for feature extraction

The computations are explained for the CTS experiment in detail followed by TTS. In order to extract the feature points from the image, dilation in a specified 11 by 11 square neighbourhood is applied for the enlargement or expansion of region of interest as shown in Figure 7 (a). In the next step, a threshold of gray scale 50 is applied to extract all the feature points below the respective value. The feature point’s coloured black can be extracted easily as they have a greyscale of zero and appear as white or binary 1 on inversion while the background turns black as shown in Figure 7 (b). The process is explained in detail in the flow chart shown in Figure 4.

**Fig**

**ure**

**5**

**.**Flow chart for strain computation

**Fig**

**ure**

**7**

**.**(a) Enlargement of area after square 11 by 11 Dilation (b) Marker points after thresholding

The feature points are extracted using the Matlab pseudo code shown in Figure 8 (a). Once the feature points have been extracted the strain can be computed by using FFT convolution between the reference and deformed images. The Matlab pseudo code for the FFT cross correlation is shown in Figure 8 (b). The strain is computed using the strategy described in Figure 5. On convolution between the images, a high grayscale is observed when the reference zone of interest in the reference image is convoluted with the same zone of interest in the deformed image as shown in Figure 8 (c).

**Fig**

**ure**

**8**

**.**(a) Code for extracting the feature points and for computing cross correlation between the two images (b) Intermediate Cross Convolution FFT Step for DIC algorithm (c) High greyscale peak observed on convolution between two ZOIs

The strain can be computed effortlessly once the shifted ZOI has been spotted. This step is repeated multiple times in order to compute the strain for every zone of interest in the reference image. On the similar platform the strain for the TTS experiment can be evaluated. The strain is evaluated using both CPU as well as GPU computing and the speed up is analysed.

The speed up is analysed using FFT windows of different sizes for both the experiments. Initially, the results for both the methodologies are compared on the basis of convolution of a single zone of interest in the reference image with a single zone of interest in the deformed image. In order to search a particular zone of interest in the deformed image, the convolution of the zone of interest in the reference image is evaluated with the every zone of interest inside the region of interest in the deformed image. Once the zone of interest has been spotted, the strain can be effortlessly computed. Further, in order to compute the strain for every feature point in the reference image, the FFT convolution for every zone of interest in the reference image is computed with the total region of interest of the deformed image. The results for the simulations are given in Figure 9. In case of wrist extension experiment, a minimum speed up of 9.8 was observed for a single FFT convolution between reference and deformed image of size 256 by 256. On the other hand maximum speed up of 22.5 was observed for FFT convolution between images of size 2048 by 2048. In order to spot the specific zone of interest in the deformed image, the convolution step has to be repetitive with all zone of interest in the deformed image until high grayscale peak as shown in Figure 8. (c) is observed. The high gray scale represents the perfect match in the deformed image hence indicating the shifted position of the reference ZOI. The speed up by GPU for the repetitive step varies from a minimum of 2.69 to 5.68 for the wrist extension experiment for images of size 3548 by 3548 and 256 by 256 respectively. The total time required for the strain computation for every feature point in the reference image for the CTS experiment was observed to be 54 hrs computed for CPU while the simulations were evaluated in 22 hrs over the GPU as shown in Figure 9.

**Fig**

**ure**

**9**

**.**

**(a) CPU and GPU simulations along with the speed up for wrist extension experiment for single window FFT convolution with single window (b) CPU and GPU simulations along with the speed up for calf stretching experiment for single window FFT convolution with single window (c) CPU and GPU simulations along with the speed up for wrist extension experiment for single window FFT convolution with multiple windows (d) CPU and GPU simulations along with the speed up for calf stretching experiment for single window FFT convolution CPU and GPU simulations along with the speed up for calf stretching experiment for single window FFT convolution with multiple windows**

**Fig**

**ure**

**10**

**.**(a) Graphical representation of CPU and GPU simulations along with the speed up for wrist extension experiment for single window FFT convolution with single window (b) Graphical representation of CPU and GPU simulations along with the speed up for calf stretching experiment for single window FFT convolution with single window (c) Graphical representation of CPU and GPU simulations along with the speed up for wrist extension experiment for single window FFT convolution with multiple windows (d) Graphical representation of CPU and GPU simulations along with the speed up for calf stretching experiment for single window FFT convolution CPU and GPU simulations along with the speed up for calf stretching experiment for single window FFT convolution with multiple windows

On the other hand in case of calf stretching experiment, a minimum speed up of 13.2 was observed for a single FFT convolution between reference and deformed image of size 256 by 256. On the other hand maximum speed up of 23.2 was observed for FFT convolution between images of size 2048 by 2048. In case of the repetitive convolution, the speed up varies from 2.19 to 6.12 for images of size 3548 by 3548 and 256 by 256 respectively. The total time required for the strain computation for every feature point in the reference image for the CTS experiment was observed to be 62 hrs while the same simulation was computed in 28 hrs on the GPU platform.

### 7. Conclusion

This paper has presented a GPU boosted system applied to two computationally heavy biomechanical experiments aimed at computing the strain pattern upon muscle stretching, leading to valuable information to athletes and practioners. The two most important features of Current GPUs leading to improved download and readback time overhead are pinned memory and their ability to overlap processing and transfer times. The primary feature can allot non-pageable pinned memory and can further transfer using hardware DMA. Secondly, processing times and transfer times can be overlapped using different streams. Similarly, GPU processing can be set in motion asynchronously allowing the CPU to perform parallel computations along with GPU processing. The Biomechanics experiments of calf and wrist stretching are modeled on the GPU as well as CPU computing. It is observed that GPU computing performs much faster than its counterpart due to its ability to generate parallel threads as mentioned above. Moreover, GPU Computing architecture and software no longer require knowledge/overhead of the graphics processing pipeline. The coding can be done in M-language with the help of AccerlerEyes jacket on an Open NVIDIA platform. Naïve implementations typically can result in a 2 − 3× speedup, however, speedups beyond 10× can be obtained by further optimizations. A massive speed up of 22.5 and 23.4 is observed in case of wrist and calf stretching experiments for a single window cross correlation for GPU computing over the CPU model respectively. The maximum speed up by GPU for the repetitive step of 5.68 for the wrist extension experiment while 6.12 for calf stretching experiments are observed respectively. In case of strain evaluation for every feature point GPU computing computed the strain in 22 hrs and 28 hrs while CPU took 54 hrs and 62 hrs for wrist and calf stretching experiments respectively. There is still, then, much room for exploration in mapping algorithms onto a massively parallel architecture.

### 8. Limitations of Jacket Programming

The jacket software allows the user to directly pass functions calls to MATLAB which can be converted into GPU functions using the jacket. The major speed up can be observed from the ability of GPU to model the algorithm on a parallel platform. In order to generate parallel threads to the loops FOR function for the FOR loop has to be passed to the MATLAB. The function is converted into a GPU function called ‘GFOR’. This preliminary implementation of GFOR has the following restrictions: (i) No conditional statements in the body of the loop--no branching (ii) Each loop iteration is independent of all other loop iterations (iii) No cell array assignment (iv) Nesting GFOR-loops within GFOR-loops is unsupported (v) GFOR must be on a line by itself and trailing comments are allowed (vi) Do not use the iterator after GEND. Its value will not be that of the final iteration. Since each computation is performed in parallel for all iterator values, the system needs to have enough card memory available to do all iterations simultaneously. If the problem exceeds memory, it will trigger standard “out of memory” errors. Within MATLAB functions, the GFOR iterator must not use the variable names i or j, since these are reserved for complex variables (A MATLAB bug). Use instead, k or some other variable name.

### References

[1] | M. o’donnell, a. r. skovoroda, B. M. shapo, and s. y. Emelianov, “Internal displacement and strain imaging using ultrasonic speckle tracking,” IEEE Trans. Ultrason. Ferroelectr. Freq. Control, vol. 41, pp. 314-325, May. 1994. | ||

In article | |||

[2] | P. chaturvedi, M. Insana, and T. Hall, “2-d companding for noise reduction in strain imaging,” IEEE Trans. Ultrason. Ferroelectr. Freq. Control, vol. 45, no. 1, pp. 179-191, Jan. 1998. | ||

In article | CrossRef PubMed | ||

[3] | X. c. chen, M. J. Zohdy, s. y. Emelianov, and M. o’donnell, “lateral speckle tracking using synthetic lateral phase,” IEEE Trans. Ultrason. Ferroelectr. Freq. Control, vol. 51, no. 5, pp. 540-550, May. 2004. | ||

In article | |||

[4] | J. d’hooge, a. Heimdal, F. Jamal, T. Kukulski, B. Bijnens, F. rademakers, l. Hatle, p. suetens, and G. r. sutherland, “regional strain and strain rate measurements by cardiac ultrasound: principles, implementation and limitations,” Eur. J. Echocardiogr., vol. 1, no. 3, pp. 154-170, 2000. | ||

In article | CrossRef PubMed | ||

[5] | B. S Garra, E. I. cespedes, J. ophir, s. r. spratt, r. a. Zuurbier, c. M. Magnant, and M. F. pennanen, “Elastography of breast lesions: Initial clinical results,” Radiology, vol. 202, no. 1, pp. 79-86, Jan. 1997. | ||

In article | PubMed | ||

[6] | E. E. Konofagou, T. Varghese, and J. ophir, “Theoretical bounds on the estimation of transverse displacement, transverse strain and poisson’s ratio in elastography,” Ultrason. Imaging, vol. 22, no. 3, pp. 153-177, 2000. | ||

In article | CrossRef PubMed | ||

[7] | A. Thitaikumar, T. a. Krouskop, B. s. Garra, and J. ophir, “Visualization of bonding at an inclusion boundary using axial-shear strain elastography: a feasibility study,” Phys. Med. Biol., vol. 52, no. 9, pp. 2615-2633, May. 2007. | ||

In article | |||

[8] | A. R. ahlgren, M. cinthio, s. steen, H. W. persson, T. sjoberg, and K. lindstrom, “Effects of adrenaline on longitudinal arterial lopata et al.: estimation of strain in shearing and rotating structures 863wall movements and resulting intramural shear strain: a first report,” Clin. Physiol. Funct. Imaging, vol. 29, no. 5, pp. 353-359, Sep. 2009. | ||

In article | |||

[9] | W. Han, M. X. Xie, X. F. Wang, q. lu, J. Wang, l. Zhang, and J. Zhang, “assessment of left ventricular torsion in patients with anterior wall myocardial infarction before and after revascularization using speckle tracking imaging,” Chin. Med. J. (Engl.), vol. 121, no. 16, pp. 1543-1548, Aug. 2008. | ||

In article | |||

[10] | Meijering, E.H.W.; Niessen, W.J.; Viegever, M.A., “Retrospective motion correction in digital subtraction angiography’’, in IEEE Transactions on Medical Imaging, Issue, Volume: 18, Issue:1,pp. 2-21, Jan.1999. | ||

In article | |||

[11] | Chen J L, Xia G M, Zhou K B, Xia G S and Qin Y W , “Two-step digital image correlation for micro-region measurement Opt. Lasers Eng. Vol. 43, pp. 836-46, 2005. | ||

In article | CrossRef | ||

[12] | Zhang Z F, Kang Y L,Wang HW, Qin Q H, Qiu Y and Li X Q, “ A novel coarse–ﬁne search scheme for digital image correlation method Measurement, Vol 39, pp. 710-8, 2006. | ||

In article | CrossRef | ||

[13] | S.N. Omkar, Amarjot Singh, “Analysis of wrist extension using digital image correlation, ICTACT journal on image and video processing, volume: 02, issue: 03, February 2012. | ||

In article | |||

[14] | Hii A, Hann C E, Chase J G and Van Houten E, “Fast normalized cross correlation for motion tracking using basis function”, Computer Methods Programs Biomed, Vol. 82, pp. 144-56, 2006. | ||

In article | CrossRef PubMed | ||

[15] | Ruigang Yang andM. Pollefeys, “Multi-resolution real-time stereo on commodity graphics hardware,” Computer Vision and Pattern Recognition, 2003. Proceedings 2003 IEEE, vol. 1, pp. 211-217, 2003. | ||

In article | |||

[16] | Dominik G¨oddeke, Robert Strzodka, Jamaludin Mohd-Yusof, Patrick Mc-Cormick, Sven H.M. Buijssen, Matthias Grajewski, and Stefan Turek, “Exploring weak scalability for FEMcalculations on a GPU-enhanced cluster,” Parallel Computing, vol. 33, no. 10-11, pp. 685-699, 2007. | ||

In article | CrossRef | ||

[17] | Chiang F.P, Wang Q., Lehman F., New developments in full-field strain measurements using speckles. In: ASTMSTP 1318 on Non-traditional Methods of Sensing Stress, Strain and Damage in Materials and Structures, pp. 156-170, 1997. | ||

In article | |||

[18] | Chiang F.P.,Wang Q., Lehman F., New developments in full-field strain measurements using speckles. In: ASTMSTP 1318 on Non-traditional Methods of Sensing Stress, Strain and Damage in Materials and Structures, pp. 156-170, 1997. | ||

In article | |||

[19] | Lichtenberger R and Schreier H, Contactless and full-field 3d-deformation measurement for impact and crash tests, 2004. | ||

In article | |||

[20] | Bonetto P, Comis G, Formiconi AR,Guarracino M.A new approach to brain imaging, based on an open and distributed environment. Proceedings of International IEEE EMBS Conference on Neural Engineering. March 2003. | ||

In article | PubMed | ||

[21] | Chu, T.C., Ranson, W.F., Sutton, M.A. and Petters, W.H. Applications of Digital-Image-Correlation Techniques to Experimental Mechanics. Exp. Mech., 3(25), 232-244, 1985. | ||

In article | CrossRef | ||

[22] | Chiang F.P, Wang Q., Lehman F., New developments in full-field strain measurements using speckles. In ASTMSTP 1318 on Non-traditional Methods of Sensing Stress, Strain and Damage in Materials and Structures, pp. 156-170, 1997. | ||

In article | |||

[23] | Kirkpatrick, S., Schroeder, M., Simons, J., Evaluation of Passenger Rail Vehicle Crashworthiness, International Journal of Crashworthiness, Vol. 6, No. 1, pp. 95-106, 2001. | ||

In article | |||

[24] | Marzougui, D., Eskandarian, A., Mechzkowski, L., Analysis and Evaluation of a Redesigned 3” x 3” Slipbase Sign Support System Using Finite Element Simulations, International Journal of Crashworthiness, Woodhead Publishing Ltd, Cambridge, Vol. 4, No. 1, 1999. | ||

In article | |||

[25] | Watanabe, K., Koseki, J., Tateyama, M., Application of High-Speed Digital CCD Cameras to Observe Static and Dynamic Deformation Characteristics of Sand, Geotechnical Testing Journal, Vol. 28, No. 5, 2005. | ||

In article | |||

[26] | Thurner, P., Erickson, B., Schriock, Z., et al., High-Speed Photography of Human Trabecular Bone during Compression, Materials Research Society Symposium Proceedings, Vol 874, 2005. | ||

In article | |||

[27] | Lichtenberger R and Schreier H, Contactless and full-field 3d-deformation measurement for impact and crash tests, 2004. | ||

In article | |||

[28] | Peters W. H and Ransom W. F, Digital Imaging techniques in experimental analysis, Optical Engineering, 21: pp. 427-431, 1982. | ||

In article | CrossRef | ||

[29] | Herd, F. Perie, J.N., and coret, M., “Mesure de champs de de-placements 2D par Interpolation D’images: CORRELI”, internal repot (LMT-cachan), 230 (1999) in French. | ||

In article | |||

[30] | Sutton M.A, Cheng M, Peters W.H., Chao Y.J and McNeill S.R, Application of an optimized digital correlation method to planar deformation analysis, Image and Vision Computing, 4: pp. 143-151,.1986. | ||

In article | CrossRef | ||

[31] | Kevin M. Moerman, Cathy A. Holt, Sam L. Evans, Ciaran K. Simms Digital image correlation and finite element modelling as a method to determine mechanical properties of human soft tissue in vivo Journal of Biomechanics, Volume 42, Issue 8, pp. 1150-1153, May 2009. | ||

In article | |||

[32] | Wang C.C.B, Deng J.M, Ateshian G.A and Hung C.T, An automated approach for direct measurement of two-dimensional strain distributions within articular cartilage under unconfined compression. Journal of Biomech Engg, 124: pp. 557-67, 2002. | ||

In article | CrossRef PubMed | ||

[33] | Nicolella D. P, Moravits D. E, Gale A. M, Bonewald L. F and Lankford J, Osteocyte lacunae tissue strain in cortical bone. Journal of Biomechanics, 39: Pages 1735-1743, 2006. | ||

In article | CrossRef PubMed | ||

[34] | Thompson M S, Schell H, Lienau J, Duda G N, Digital image correlation: A technique for determining local mechanical conditions within early bone callus, Medical Engineering & Physics, 29: Pages 820-823, 2007. | ||

In article | CrossRef PubMed | ||

[35] | Zhou, P., Goodson, K. E., Sub-pixel Displacement and Deformation Gradient Measurement Using Digital Image-Speckle Correlation (DISC), Optical Engineering, Vol. 40, No. 8, pp 1613-1620, August 2001. | ||

In article | CrossRef | ||

[36] | Yogel, D., Grosser, V., Schubert, A., Michel, B., MicroDAC Strain Measurement for Electronics Packaging Structures, Optics and Lasers in Engineering, Vol. 36, pp. 195-211, 2001. | ||

In article | CrossRef | ||

[37] | Srinivasan, V., Radhakrishnan, S., Zhang, X., Subbarayan, G., Baughn, T., Nguyen, L., High Resolution Characterization of Materials Used In Packages Through Digital Image Correlation, InterPACK Conference Proceedings, July 17-22, 2005. | ||

In article | |||

[38] | Sutton, M.A., Cheng, M., Peters, W.H., Chao, Y.J. and McNeill, S.R., Application of an optimized digital correlation method to planar deformation analysis, Image and Vision Computing, 4(3), pp. 143-151, 1986. | ||

In article | CrossRef | ||

[39] | Reu, P., Miller, T., High-speed Multi-camera DIC for Finite Element Model Validation, Part 1, SEM Annual Conference and Exposition on Experimental and Applied Mechanics, June 4-7, 2006. | ||

In article | |||

[40] | Tiwari, V., Williams, S., Sutton, M., McNeill, S., Application of Digital Image Correlation in Impact Testing, Proceedings of the 2005 SEM Annual Conference and Exposition on Experimental and Applied Mechanics, June 7-9, 2005. | ||

In article | |||

[41] | Kehoe, L., Lynch, P., Guénebaut, V., Measurement of Deformation and Strain in First Level C4 Interconnect and Stacked Die using Optical Digital Image Correlation, Electronic Components and Technology Conference, pp. 1874-1881, May 2006. | ||

In article | |||

[42] | Cheng, T., Dai, C. and Gan, R., Viscoelastic Properties of Human Tympanic Membrane. Annals of Biomedical Engineering 35(2): 305-314, 2007. | ||

In article | CrossRef PubMed | ||

[43] | Myers, K. M., Paskaleva, A. P., House, M. and Socrate, S., 2008. Mechanical and biochemical properties of human cervical tissue. Acta Biomaterialia 4(1): 104-116. | ||

In article | CrossRef PubMed | ||

[44] | Boyce, B. L., Grazier, J. M., Jones, R. E. and Nguyen, T. D., 2008. Full-field deformation of bovine cornea under constrained inflation conditions. Biomaterials 29(28): 3896-3904. | ||

In article | CrossRef PubMed | ||

[45] | Sutton, M. A., Ke, X., Lessner, S. M., Goldbach, M., Yost, M., Zhao, F. and Schreier, H. W., 2008. Strain field measurements on mouse carotid arteries using microscopic three-dimensional digital image correlation. Journal of Biomedical Materials Research Part A 84A(1): 178-190. | ||

In article | CrossRef PubMed | ||

[46] | Derebery J. "Work-related carpal tunnel syndrome: the facts and the myths". Clin Occup Environ Med 5 (2): 353-67, 2006. | ||

In article | PubMed | ||

[47] | Kao SY (2003). "Carpal tunnel syndrome as an occupational disease". The Journal of the American Board of Family Practice / American Board of Family Practice 16 (6): pp. 533-42, 2003. | ||

In article | |||

[48] | Lozano-Calderón, Santiago; Shawn Anthony, David Ring. "The Quality and Strength of Evidence for Etiology: Example of Carpal Tunnel Syndrome". J. Hand Surg. 33 (4): pp. 525-538, April 2008. | ||

In article | |||

[49] | Gross, Brian, Dean Louis, Kyle Carr, and Sharon Weiss,“Carpal Tunnel Syndrome: A Clinicopathollogic Study”,Journal of Environmental Medicine, pp. 437-441, April 1995. | ||

In article | |||

[50] | Pan Bing, Xie Hui-min, Xu Bo-qin, Dai Fu-long,” Performance of sub-pixel registration algorithms in digital image correlation” Meas. Sci. Technol. Vol. 17, No. 6, 2006. | ||

In article | CrossRef | ||

[51] | Xiao-yong Liu, Qing-chang Tan and Rong-li Li , “Study on Digital Image Correlation Using Artificial Neural Networks for Subpixel Displacement Measurement”, in ADVANCES IN NEURAL NETWORK RESEARCH AND APPLICATIONS Lecture Notes in Electrical Engineering, Volume 67, Part 5, 405-412, 2010. | ||

In article | |||

[52] | Yates, Ben. Merriman's Assessment of the Lower Limb (3rd ed.). New York: Churchill Livingstone, 2009. | ||

In article | |||

[53] | Catherine E. ktla. Haxby Abbott, “Ti bialis Posterior Myofascial Tightness as a Source of Heel Pain: Diagnosis and Treatment”, Journal of Orthopaedic & Sports Physical Therapy, 30(10): 624-632, 2000. | ||

In article | |||

[54] | P. Michael Leahy, “Improved Treatments for Carpal Tunnel and Related Syndromes”, Chiropractic Sports Medicine; Vol. 9, No. 1, 1995. | ||

In article | |||

[55] | Carpal Tunnel Syndrome, Department of Health and Human services, USA. http://www.womenshealth.gov/faq/carpal-tunnel-syndrome.pdf. | ||

In article | |||

[56] | Hild F., Périé J.-N., Coret M., Mesure de champs de déplacements 2D par intercorrélation d’images: CORRELI2D, Internal report 230, LMT-Cachan, 1999. | ||

In article | |||

[57] | C. A. Walker, “Moiré interferometry for strain analysis”, in Optics and Lasers in Engineering, Volume 8, Issues 3-4, Pages 213-262, 1988. | ||

In article | |||

[58] | T. Puškar, D. Jevremović, L. Blažić, D. Vasiljević, D. Pantelić, B. Murić, B. Trifković, “Holographic interferometry as a method for measuring strain caused by polymerization shrinkage of dental composite”, in Contemporary Materials, I-1, Page 105 of 111, 2010. | ||

In article | |||

[59] | A r luxmoore, f a a amin, w t evans , “In-plane strain measurement by speckle photography: A practical assessment of the use of Young's fringes”, in The Journal of Strain Analysis for Engineering Design, Volume 9, Number 1, pp 26-35, 1974. | ||

In article | CrossRef | ||