External links:
- University of St Andrews Research Profile
- Google Scholar Profile
- ORCID: 0000-0002-7662-3146
- AVISI Research Group
External links:
In proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems
Dynamic Binary Translation (DBT) is a powerful approach to support cross-architecture emulation of unmodified binaries. However, DBT systems face correctness and performance challenges, when emulating concurrent binaries from strong to weak memory consistency architectures. As a matter of fact, we report several translation errors in Qemu, when emulating x86 binaries on Arm hosts. To address these challenges, we propose an end-to-end approach that provides correct and efficient emulation for weak memory model architectures....
In Proceedings of the 2022 IEEE International Symposium on Workload Characterization
WebAssembly is gaining more and more popularity, finding applications beyond the Web browser – for which it was initially designed for. However, its performance, which developers aimed at being comparable to native, requires further tuning and hasn’t being studied extensively to pinpoint the cause of overheads. This paper identifies that WebAssembly unique safety mechanisms, one of the major ones being bounds-checked memory accesses, may introduce up to 650% overhead. Therefore, we evaluate four popular WebAssembly runtimes against native compiled code....
In proceedings of the 43rd ACM SIGPLAN Conference on Programming Language Design and Implementation
The emergence of new architectures create a recurring challenge to ensure that existing programs still work on them. Manually porting legacy code is often impractical. Static binary translation (SBT) is a process where a program’s binary is automatically translated from one architecture to another, while preserving their original semantics. However, these SBT tools have limited support to various advanced architectural features. Importantly, they are currently unable to translate concurrent binaries. The main challenge arises from the mismatches of the memory consistency model specified by the different architectures, especially when porting existing binaries to a weak memory model architecture....
In proceedings of the International Conference on Compilers, Architectures and Synthesis for Embedded Systems
Dynamic Binary Translation (DBT) requires the implementation of load-link/store-conditional (LL/SC) primitives for guest systems that rely on this form of synchronization. When targeting e.g. x86 host systems, LL/SC guest instructions are typically emulated using atomic Compare-and-Swap (CAS) instructions on the host. Whilst this direct mapping is efficient, this approach is problematic due to subtle differences between LL/SC and CAS semantics. In this paper, we demonstrate that this is a real problem, and we provide code examples that fail to execute correctly on QEMU and a commercial DBT system, which both use the CAS approach to LL/SC emulation....
In ACM Transactions on Computer Systems
System-level Dynamic Binary Translation (DBT) provides the capability to boot an Operating System (OS) and execute programs compiled for an Instruction Set Architecture (ISA) different to that of the host machine. Due to their performance-critical nature, system-level DBT frameworks are typically hand-coded and heavily optimized, both for their guest and host architectures. While this results in good performance of the DBT system, engineering costs for supporting a new, or extending an existing architecture are high....
In Proceedings of the 2019 USENIX Annual Technical Conference
System-level Dynamic Binary Translation (DBT) provides the capability to boot an Operating System (OS) and execute programs compiled for an Instruction Set Architecture (ISA) different to that of the host machine. Due to their performance critical nature, system-level DBT frameworks are typically hand-coded and heavily optimized, both for their guest and host architectures. While this results in good performance of the DBT system, engineering costs for supporting a new, or extending an existing architecture are high....
In proceedings of the IEEE International Conference on Pervasive Computing and Communications Workshops
The popularity of smart home devices is growing as consumers begin to recognize their potential to improve the quality of domestic life. At the same time, serious vulnerabilities have been revealed over recent years, which threaten user privacy and can cause financial losses. The lack of appropriate security protections in these devices is thus of increasing concern for the Internet of Things (IoT) industry, yet manufacturers’ ongoing efforts remain superficial. Hence, users continue to be exposed to serious weaknesses....
In proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software
Graphics Processing Units (GPUs) critically rely on a complex system software stack comprising kernel- and userspace drivers and Just-in-time (JIT) compilers. Yet, existing GPU simulators typically abstract away details of the software stack and GPU instruction set. Partly, this is because GPU vendors rarely release sufficient information about their latest GPU products. However, this is also due to the lack of an integrated CPU/GPU simulation framework, which is complete and powerful enough to drive the complex GPU software environment....
In proceedings of the 15th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments
Many Virtual Execution Environments (VEEs) rely on Justin-time (JIT) compilation technology for code generation at runtime, e.g. in Dynamic Binary Translation (DBT) systems or language Virtual Machines (VMs). While JIT compilation improves native execution performance as opposed to e.g. interpretive execution, the JIT compilation process itself introduces latency. In fact, for highly optimizing JIT compilers or compilers not specifically designed for JIT compilation, e.g. LLVM, this latency can cause a substantial overhead....
In proceedings of the 28th International Conference on Compiler Construction (CC ’19)
The C++ programming language offers a strong exception mechanism for error handling at the language level, improving code readability, safety, and maintainability. However, current C++ implementations are targeted at general-purpose systems, often sacrificing code size, memory usage, and resource determinism for the sake of performance. This makes C++ exceptions a particularly undesirable choice for embedded applications where code size and resource determinism are often paramount. Consequently, embedded coding guidelines either forbid the use of C++ exceptions, or embedded C++ tool chains omit exception handling altogether....
In proceedings of the IEEE
Visual understanding of 3D environments in real-time, at low power, is a huge computational challenge. Often referred to as SLAM (Simultaneous Localisation and Mapping), it is central to applications spanning domestic and industrial robotics, autonomous vehicles, virtual and augmented reality. This paper describes the results of a major research effort to assemble the algorithms, architectures, tools, and systems software needed to enable delivery of SLAM, by supporting applications specialists in selecting and configuring the appropriate algorithm and the appropriate hardware, and compilation pathway, to meet their performance, accuracy, and energy consumption goals....
In Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies
Fitbit fitness trackers record sensitive personal information, including daily step counts, heart rate profiles, and locations visited. By design, these devices gather and upload activity data to a cloud service, which provides aggregate statistics to mobile app users. The same principles govern numerous other Internet-of-Things (IoT) services that target different applications. As a market leader, Fitbit has developed perhaps the most secure wearables architecture that guards communication with end-to-end encryption. In this paper, we analyze the complete Fitbit ecosystem and, despite the brand’s continuous efforts to harden its products, we demonstrate a series of vulnerabilities with potentially severe implications to user privacy and device security....
In proceedings of the 20th International Symposium on Research in Attacks, Intrusions and Defenses
Tens of millions of wearable fitness trackers are shipped yearly to consumers who routinely collect information about their exercising patterns. Smartphones push this health-related data to vendors’ cloud platforms, enabling users to analyze summary statistics on-line and adjust their habits. Third-parties including health insurance providers now offer discounts and financial rewards in exchange for such private information and evidence of healthy lifestyles. Given the associated monetary value, the authenticity and correctness of the activity data collected becomes imperative....
In proceedings of the 2017 IEEE International Symposium on Performance Analysis of Systems and Software
Full-system simulators are increasingly finding their way into the consumer space for the purposes of backwards compatibility and hardware emulation (e.g. for games consoles). For such compute-intensive applications simulation performance is paramount. In this paper we argue that existing benchmarksuites such as SPEC CPU2006, originally designed for architecture and compiler performance evaluation, are not well suited for the identification of performance bottlenecks in full-system simulators. While their large, complex workloads provide an indication as to the performance of the simulator on ‘real-world’ workloads, this does not give any indication of why a particular simulator might run an application faster or slower than another....
In ACM Transactions on Architecture and Code Optimization (TACO)
Hardware virtualization solutions provide users with benefits ranging from application isolation through server consolidation to improved disaster recovery and faster server provisioning. While hardware assistance for virtualization is supported by all major processor architectures, including Intel, ARM, PowerPC & MIPS, these extensions are targeted at virtualization of the same architecture, e.g. an x86 guest on an x86 host system. Existing techniques for cross-architecture virtualization, e.g. an ARM guest on an x86 host, still incur a substantial overhead for CPU, memory and I/O virtualization due to the necessity for software emulation of these mismatched system components....
PhD - University of Edinburgh
Hardware virtualisation is the provision of an isolated virtual environment that represents real physical hardware. It enables operating systems, or other system-level software (the guest), to run unmodified in a “container” (the virtual machine) that is isolated from the real machine (the host). There are many use-cases for hardware virtualisation that span a wide-range of end-users. For example, home-users wanting to run multiple operating systems side-by-side (such as running a Windows® operating system inside an OS X environment) will use virtualisation to accomplish this....
In proceedings of the 17th ACM SIGPLAN/SIGBED Conference on Languages, Compilers, Tools, and Theory for Embedded Systems
Instruction set simulators (ISS) have many uses in embedded software and hardware development and are typically based on dynamic binary translation (DBT), where frequently executed regions of guest instructions are compiled into host instructions using a just-in-time (JIT) compiler. Full-system simulation, which necessitates handling of asynchronous interrupts from e.g. timers and I/O devices, complicates matters as control flow is interrupted unpredictably and diverted from the current region of code. In this paper we present a novel scheme for handling of asynchronous interrupts, which integrates seamlessly into a region-based dynamic binary translator....
In proceedings of the 2015 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)
Dynamic Binary Translation (DBT) allows software compiled for one Instruction Set Architecture (ISA) to be executed on a processor supporting a different ISA. Some modern DBT systems decouple their main execution loop from the built-inJust-In-Time (JIT) compiler, i.e. the JIT compiler can operate asynchronously in a different thread without blocking program execution. However, this creates a problem for target architectures with dual-ISA support such as ARM/THUMB, where the ISA of the currently executed instruction stream may be different to the one processed by the JIT compiler due to their decoupled operation and dynamic mode changes....
In proceedings of the 2014 International Conference on Compilers, Architecture and Synthesis for Embedded Systems
Processor design tools integrate in their workflows generators for instruction set simulators (ISS) from architecture descriptions. However, it is difficult to validate the correctness of these simu-lators. ISA coverage analysis is insufficient to isolate modelling faults, which might only be exposed in corner cases. We present a novel ISA branch coverage analysis, which considers every possible execution path within an instruction and, on demand, generates new test cases to cover the missing paths....
In proceedings of the 2014 SIGPLAN/SIGBED conference on Languages, compilers and tools for embedded systems
Region-based JIT compilation operates on translation units comprising multiple basic blocks and, possibly cyclic or conditional, control flow between these. It promises to reconcile aggressive code optimisation and low compilation latency in performance-critical dynamic binary translators. Whilst various region selection schemes and isolated code optimisation techniques have been investigated it remains unclear how to best exploit such regions for efficient code generation. Complex interactions with indirect branch tables and translation caches can have adverse effects on performance if not considered carefully....