|Dr. Michael McCool
Embedded and Mobile Software Development for Intel SoCs
Software is a critical component of any system, and software development for SoCs can be challenging, especially if there are high
reliability or performance requirements. In this talk I will survey the wide range of functionality available for software development
using Intel SoCs in both the embedded and mobile contexts. Intel® System Studio, for instance, provides the Intel® C++ Compiler,
including the Intel® Cilk™ Plus parallel model for scaling performance using multiple cores and vector units; Intel® VTune™
Amplifier XE for performance and power analysis; the Intel® JTAG Debugger, providing low overhead event tracing and logging for
source level debug of UEFI firmware, bootloaders, OS kernels, and drivers; support in the GDB* debugger for fast application level
defect analysis for increased system stability, application level instruction trace, and data race detection; the Intel® Inspector for
Systems, a dynamic and static analyzer to identify memory and threading errors; and libraries such as Intel® Integrated Performance
Primitives and the Intel® Math Kernel Library for accelerating application programs. Intel also provides outstanding support for
Android development. For cross-platform mobile application development, the Intel® XDK supports portable HTML5 application
development (including access to sensors and other mobile-specific functionalities across multiple operating systems), and the
Beacon Mountain package includes the Intel® HAXM Android emulator, among other useful tools and libraries, for native Android
development. The Intel® Perceptual Computing SDK supports additional capabilities related to interpretation of sensor data,
including camera input, and the Intel® Media SDK provides access to fixed-function hardware units such as motion estimation.
Finally, Intel processors support a huge variety of open-source software, including Linux (such as the Yocto distribution for
technologies, and a range of compilers. For example, Cilk™ Plus implementations are also available or under development in both
gcc and clang/LLVM, supporting the same parallel and vector computing capabilities as the Intel compiler.
Biography: Michael McCool is Intel Principal Engineer. He has degrees in Computer Engineering (University of Waterloo, BASc)
and Computer Science (University of Toronto, M.Sc. and PhD.) with specializations in mathematics (BASc) and biomedical
engineering (MSc) as well as computer graphics and parallel computing (MSc, PhD). He has research and application experience
in the areas of data mining, computer graphics (specifically sampling, rasterization, path rendering, texture hardware, antialiasing,
shading, illumination, function approximation, compression, and visualization), medical imaging, signal and image processing,
financial analysis, and parallel languages and programming platforms. In order to commercialize research work into many-core
computing platforms done while he was an Associate Professor at the University of Waterloo, in 2004 he co-founded RapidMind,
which in 2009 was acquired by Intel. Currently he is a software architect with Intel working on parallel programming languages,
applications, and mobile computing. In addition to his university teaching, he has presented numerous tutorials at Eurographics,
SIGGRAPH, and SC on graphics and/or parallel computing, and has co-authored three books. The most recent book, Structured
Parallel Programming, was co-authored with James Reinders and Arch Robison. It presents a pattern-based approach to parallel
programming using a large number of examples in Intel Cilk Plus and Intel Threading Building Blocks.
|Dr. Jose Flich
Universidad Politécnica de Valencia, Spain
Many-core System Designs through Effective Routing Support and Reconfigurability
Current technology still pushes for higher number of nodes in future chips. Chip Multiprocessor systems (CMPs) and Multiprocessor
System-on-Chip systems are targeted with the many-core approach where tens and hundreds of cores are expected to be supported.
In such configurations, the network inside the chip plays a central role, shifting the system from a computation-centric approach to a
communication-centric approach. In parallel, new technology and architectural challenges threaten CMP/MPSoC designs. Process
variation, manufacturing defects, and power dissipation problems set limiting barriers to efficiently scale to hundreds. In this talk, the
problems with NoC design for CMPs and MPSoCs/MCSoCs will be identified and possible solutions will be addressed. The solutions
will have its center of gravity in the routing algorithm and its implementation and reconfiguration capability, which is key to cope
with the incoming challenges. Topology alternatives and key latency improvement strategies, linked with coherence protocols will
be tackled as well.
Biography: Jose Flich got his PhD in 2001 in Computer Engineering. He is an Associate Professor at UPV where he leads the
research activities related to NoCs. He published over 100 conference and journal papers, and has served in different conference
program committees (ISCA, NOCS, ICPP, IPDPS, HiPC, CAC, CASS, ICPADS, ISCC), as program chair (INA-OCMC, CAC) and track
co-chair (EUROPAR). He has collaborated with different Institutions (Ferrara, Catania, Jonkoping, USC) and companies (AMD, Intel,
Sun). Current research activities focus routing, coherency protocols and congestion management within NoCs. He has co-invented
different routing strategies, reconfiguration and congestion control mechanisms, some of them with high recognition (RECN and
LBDR for on-chip networks). He is a member of the Hipeac-2 NoE. He is coeditor of the book “Designing Network-on-Chip
Architectures in the Nanoscale Era”, and is the coordinator of the P7 NaNoC project.
|Dr. Hideyuki Kawashima
University of Tsukuba, Japan
Taming Big Data Streams
The amount of data streams produced by sensing devices or network monitoring system is increasing. To process them in low
latency, stream processing engines (SPE) have been studied. This talk introduces the overview of stream data processing first, and
then it presents speaker’s recent work including transactional stream processing, outlier detection technique over packet streams, a
secure data processing framework with encryption, and an acceleration system with FPGA.
Biography: Hideyuki Kawashima received Ph.D. from Science for Open and Environmental Systems Graduate School of Keio
University, Japan. He was a research associate at Department of Science and Engineering, Keio University from 2005 to 2007.
From 2007 to 2011, he was an assistant professor at both Graduate School of Systems and Information Engineering and Center for
Computational Sciences, University of Tsukuba, Japan. From 2011, he is an assistant professor at Faculty of Information, Systems
and Engineering, University of Tsukuba.
|Dr. Jiang Xu
Hong Kong University of Science and Technology, Hong Kong SAR
Network-on-Chip Benchmarks Based on Real MPSoC Applications
By integrating multiple processing units on a single chip, multiprocessor system-on-chip (MPSoC) can provide higher performance
per energy and lower cost per function to applications with burgeoning complexity. The performance of MPSoC/MCSoC is
determined not only by the performance of its processing units, but also by how efficiently they collaborate with one another. It is
the MPSoC's communication architecture which determines the collaboration efficiency. The on-chip communication
architectures of MPSoC are moving from traditional buses and ad-hoc interconnects to more sophisticated network-on-chip (NoC),
and have become an active research area in both industry and academic communities.
As benchmark programs for microprocessor architectures, NoC traffic patterns are essential tools for NoC performance assessments
and architecture explorations. The fidelity of NoC traffic patterns has profound influence on NoC studies. Ideally, realistic NoC traffic
patterns should capture communication behaviors as well as their temporal and spatial dependencies in real applications. And in
addition to communications, they should offer insights into computation tasks and memory usages for comprehensive NoC-based
MPSoC research and development. This talk will introduce an industry-academic joint effort to systematically develop realistic NoC
benchmarks through multidisciplinary collaborations on real MPSoC applications.
Biography: Dr. Xu received Ph.D. degree from Princeton University. From 2001 to 2002, he worked at Bell Labs, NJ, as a Research
Associate. He was a Research Associate at NEC Laboratories America, NJ, from 2003 to 2005. He joined a startup company,
Sandbridge Technologies, NY, from 2005 to 2007 and developed as well as implemented two generations of NoC-based ultra-low
power multiprocessor systems-on-chip for mobile platforms. In 2007, Dr. Xu joined Hong Kong University of Science and
Technology, and established the Mobile Computing System Lab and Xilinx-HKUST Joint Lab. He currently serves as an Associate
Editor of ACM Transactions on Embedded Computing Systems and IEEE Transactions on Very Large Scale Integrated Systems. He
is an ACM Distinguished Speaker and a Distinguished Visitor of IEEE Computer Society. He served on the organizing committees
and technical program committees of many international conferences, including ICCAD, CASES, ICCD, ISVLSI, VLSI, EMSOFT,
CODES+ISSS, NOCS, ASP-DAC, etc. Dr. Xu authored or coauthored more than 60 book chapters and papers in peer-reviewed
journals and international conferences. He and his students received Best Paper Award from IEEE Computer Society Annual
Symposium on VLSI in 2009, and Best Poster Award from AMD Technical Forum and Exhibition in 2010. He coauthored a book
titled Algorithms, Architecture and System-on-Chip Design for Wireless Applications (Cambridge University Press). His research areas
include network-on-chip, multiprocessor system-on-chip, embedded system, computer architecture, low-power VLSI design, and
|Dr. Ran Ginosar
EE & CS, Technion, Israel
The Plural Architecture: Shared Memory Many-cores with Hardware Scheduling
The Plural many-core architecture combines hundreds of small cores, many shared memory banks, a hardware scheduler, and two
custom active networks-on-chip: cores-to-memories and cores-to-scheduler. A theoretical model (almost) justifies increasing the
number of cores while making them smaller and slower, maximizing performance-to-power ratio. Several benchmark simulations are
demonstrated, showing close to linear speedup and high performance-to-power ratio. A de-synchronized PRAM-like task-based
non-CSP and non-locking programming model for shared memory enables fine-grain parallelism.
Biography: Prof. Ran Ginosar received BSc from the Technion and PhD from Princeton University. He has conducted research at
Bell Laboratories, at the University of Utah and at Intel Research Laboratories in Oregon, USA. He is member of the faculty of EE
and CS departments at the Technion, and heads the VLSI Systems Research Center. He has also co-founded several start-up
companies in the area of VLSI and parallel processing. His research interests focus on VLSI, asynchronous logic and parallel
|Dr. Peter A. Beerel
University of Southern California, U.S.A.
Practical Advances and Applications of Asynchronous Design
As we continue to push for lower-power and lower supply voltages, there is a growing need for resilient circuits which can
accommodate increasing variability in the characteristics of both transistors and wires. Asynchronous circuits have long been an
intriguing potential solution to address this issue due to their natural ability to adapt to
variations. However, the asynchronous circuit overhead and the lack of CAD tools have been stumbling blocks for their wide spread
adoption. This talk reviews some of the styles of asynchronous design and discusses their potential advantages and challenges for
both network on chip and core logic applications. We then review one promising asynchronous design flow called Proteus which
was commercialized out of USC research via TimeLess Design Automation and used on Intel's latest 10G Ethernet Switch Chip. This
flow enables design from high-level specifications using a combination of standard simulation, synthesis, and physical design tools
with a small set of specific algorithms for performance and power optimization of asynchronous circuits.
Biography: Peter Beerel received his B.S.E. degree in Electrical Engineering from Princeton University, and his M.S. and Ph.D.
degrees in Electrical Engineering from Stanford University in 1991 and 1994, respectively. He joined the Department of Electrical
Engineering-Systems at the University of Southern California’s Viterbi School of Engineering in 1994, where he is currently an
Associate Professor and Faculty Director of Innovation and Entrepreneurship in Engineering. In May of 2008, he co-founded
TimeLess Design Automation with one of his Ph.D. students, Dr. Georgios Dimou to commercialize an asynchronous ASIC flow
called Proteus. They sold the company in July of 2010 to Fulcrum Microsystems. Fulcrum Microsystems was acquired by Intel in
2011 and became part of its Networking Division at which he also works as Chief Scientist of Technology Development. Dr. Beerel’s
has been a member of the technical program committee for the International Symposium on Advanced Research in Asynchronous
Circuits and Systems since 1997, was program co-chair for ASYNC'98, was general co-chair for ASYNC’07, and general chair for
ASYNC’13. He received a National Science Foundation CAREER Award, was co-winner of the Charles E. Molnar in ASYNC'97, and
was a co-recipient of the best paper award in ASYNC'99. He was also the 2008 recipient of the IEEE Region 6 Outstanding Engineer
Award for significantly advancing the application of asynchronous circuits to modern VLSI chips.