Static Worst-Case Execution Time Optimization using DPSO for ASIP Architecture

Mood Venkanna, Rameshwar Rao


Introduction: The application of specific instructions significantly improves energy, performance, and code size of configurable processors. The design of these instructions is performed by the conversion of patterns related to application-specific operations into effective complex instructions. This research was presented at the icitkm Conference, University of Delhi, India in 2017.

Methods: Static analysis was a prominent research method during late the 1980’s. However, end-to-end measurements consist of a standard approach in industrial settings. Both static analysis tools perform at a high-level in order to determine the program structure, which works on source code, or is executable in a disassembled binary. It is possible to work at a low-level if the real hardware timing information for the executable task has the desired features.

Results: We experimented, tested and evaluated using a H.264 encoder application that uses nine cis, covering most of the computation intensive kernels. Multimedia applications are frequently subject to hard real time constraints in the field of computer vision. The H.264 encoder consists of complicated control flow with more number of decisions and nested loops. The parameters evaluated were different numbers of A partitions (300 slices on a Xilinx Virtex 7each), reconfiguration bandwidths, as well as relations of cpu frequency and fabric frequency fCPU/ffabric. ffabric remains constant at 100MHz, and we selected a multiplicity of its values for fCPU that resemble realistic units. Note that while we anticipate the wcet in seconds (wcetcycles/ f CPU) to be lower (better) with higher fCPU, the wcet cycles increase (at a constant ffabric) because hardware cis perform less computations on the reconfigurable fabric within one cpu cycle.

Conclusions: The method is similar to tree hybridization and path-based methods which are less precise, and to the global ipet method, which is more precise. Optimization is evaluated with the Discrete Particle Swarm Optimization (dpso) algorithm for wcet. For several real-world applications involving embedded processors, the proposed technique develops improved instruction sets in comparison to native instruction sets.

Originality: For wcet estimation, flow analysis, low-level analysis and calculation phases of the program need to be considered. Flow analysis phase or the high-level of analysis helps to extract the program’s dynamic behavior that gives information on functions being called, number of loop iteration, dependencies among if-statements, etc. This is due to the fact that the analysis is unaware of the execution path corresponding to the longest execution time.

Limitations: This path is executed within a kernel iteration that relies upon the nature of mb, either i-mb or p-mb, determined by the motion estimation kernel, that is, its’ input depends on the i-mb and p-mb paths ,which also contain separate cis leading to the instability of the worst-case path, that is, adding more partitions to the current worst-case path can result in the other path becoming the worst case. The pipeline stalls for the reconfiguration delay and continues when entering the kernel once the reconfiguration process finishes.


embedded processor; specific integrated circuit application; worst case execution time; particle swarm optimization; discrete particle swarm optimization;

Full Text:



S.S. Lim, Y. H. Bae, C. T. Jang, B.-D. Rhee, S. L. Min,C. Y. Park, H. Shin, K. Park, and C. S. Ki, “An Accurate Worst-Case Timing Analysis for RISC Processors,” IEEETransactions on Software Engineering, vol. 21, no. 7, pp.593–604, Jul 1995,doi:10.1109/32.392980.

S.-K. Kim, S. L. Min, and R. Ha, “Efficient Worst CaseTiming Analysis of Data Caching,” in Proc. 2nd IEEE Real-Time Technology and Applications Symposium (RTAS’96).IEEE, 1996, doi:230–240,10.1109/RTTAS.1996.509540

R. White, F. M¨uller, C. Healy, D. Whalley, and M. Harmon, “Timing Analysis for Data Caches and Set-Associative Caches,” in Proc. 3rd IEEE Real-Time Technology and Applications Symposium (RTAS’97), Jun 1997, pp. 192–202, doi: 10.1109/RTTAS.1997.601358.

A.Colin and I. Puaut, “Worst Case Execution Time Analysis for a Processor with Branch Prediction,” Journal of Real-Time Systems, vol. 18, no. 2/3, pp. 249–274, May 2000, doi:10.1023/a:10081.

T.Mitra and A. Roychoudhury, “Effects of Branch Prediction on Worst Case Execution Time of Programs,” National University of Singapore (NUS), Tech. Rep. 11-01, Nov 2001, doi: 10.1023/a: 1008149332687.

J. Engblom, “Processor Pipelines and Static Worst-Case Execution Time Analysis,” Ph.D. dissertation, Uppsala University, Dept. of Information Technology, Box 337, Uppsala, Sweden, Apr 2002, ISBN 91-554-5228-0,doi:

J.Engblom and A. Ermedahl, “Pipeline Timing Analysis Using a Trace-Driven Simulator,” in Proc. 6th International Conference on Real-Time Computing Systems and Applications (RTCSA’99). IEEE Computer Society Press, Dec1999, doi:10.1109/RTCSA.1999.811197.

C.Ferdinand, R. Heckmann, M. Langenbach, F. Martin,M. Schmidt, H. Theiling, S. Thesing, and R. Wilhelm,“Reliable and Precise WCET Determination for a Real-Life Processor,” in Proc. 1st International Workshop on Embedded Systems, (EMSOFT2000), LNCS 2211, Oct 2001, doi:10.1145/384197.384213.

S.-S. Lim, J. H. Han, J. Kim, and S. L. Min, “A Worst Case Timing Analysis Technique for Multiple-Issue Machines,” inProc. 19th IEEE Real-Time Systems Symposium (RTSS’98),Dec 1998,doi:10.1109/real.19998.739765

Schneider and C. Ferdinand, “Pipeline BehaviourPredictionfor Superscalar Processors by Abstract Interpretation,”inProc. SIGPLAN Workshop on Languages, Compilers andTools for Embedded Systems (LCTES’99). ACM Press,May 1999, doi:10.1145/315253.314432.

S. Petters and G. F¨arber, “Making Worst-Case ExecutionTime Analysis for Hard Real-Time Tasks on State of theArt Processors Feasible,” in Proc. 6th International Conference on Real-Time Computing Systems and Applications(RTCSA’99), Dec 1999, doi:10.1109/RTCSA.1999.811296.

M,.Venkanna, R. Rao, P.Chandra Sekhar, “Application of ASIP in Embedded Design with Optimized Clock Management”, ICITKM Conference, Newdelhi, pp. 161–165,2017, doi:10.15439/2017km41

A. Alomaryet al., “PEAS-I: A hardware/software co-design system for ASIPs,” In Proc. EURO-DAC 1993, doi:10.1109/EURDAC.1993.410608.

J. Van Praetet al., "Instruction set definition and instruction selection for ASIPs," In Proc. HLS Symposium 1994, Instruction set definition and instruction selection for ASIPs, doi:10.1109/ISHLS.1994.302348.

N. Clark, H. Zhong and S. Mahlke, “Processor Acceleration through Automated Instruction Set Customization”. In Proceedings of the 36th annual IEEE/ACM International Symposium on Micro architecture (MICRO 36), 2003, doi:10.1109/MICRO.2004.5 138.

R. Klemm, J. P. Sabugo, H. Ahlendorf and G. Fettweis, “Using LISATek for the Design of an ASIP core including , doi:10.15439/2018KM41.

Floating Point Operations”, Technical report, 2008.

3. R. R. Hoare et al., “Rapid VLIW Processor Customization for Signal Processing Applications Using Combinational Hardware Functions”. EURASIP Journal on Applied Signal Processing, vol. 2006 ID 46472, 2010, doi:10.1155/ASP/2006/46472.

4. F. Tlili and A. Ghorbel, “ASIP Solution for Implementation of H.264 Multi Resolution Motion Estimation”. In the International Journal of Communications, Network and System Sciences, Vol.3 No.5, May 2010, doi:10.4236/ijcns.2010.35060.

P. Guironnet de Massas, P. Amblard and F. Pétrot, “On SPARC LEON-2 ISA Extensions Experiments for MPEG Encoding Acceleration”. Journal VLSI Design, vol. 2007, ID 28686, 2007, doi:org/10.1155/2007/28686.

S. Tillich. “Instruction Set Extensions for Secret-Key Cryptography”. Poster Presentation at the Ph.D. Forum at the 9th Conference on Design, Automation and Test in Europe (DATE 2006), Munich, Germany, March 6, 2006, doi:10.1109/CCST.2014.6986988.

F. Naessens, A. Bourdoux, A. Dejonghe, “A flexible ASIP decoder for combined binary and non-binary LDPC codes”. 17th IEEE Symposium on Communications and Vehicular Technology (SCVT), 24-25 November 2010, doi:10.1109/SCVT.2010.5720462.

G. Kappen, L. Kurz, O. Priebe and T. G. Noll, “Design Space Exploration for an ASIP/Co-Processor Architecture used in GNSS Receivers”. Journal of Signal Processing Systems, vol. 58 (1), pp. 41-51, 2010, doi:10.1007/s11265-008-0261-z.

Kennedy, J., & Eberhart, R. C. (1997, October). A discrete binary version of the particle swarm algorithm. In Systems, Man, and Cybernetics, 1997. Computational Cybernetics and Simulation., 1997 IEEE International Conference on (Vol. 5, pp. 4104-4108). IEEE, doi:10.1109/ICSMC.1997.637339.

Pan, Q. K., Tasgetiren, M. F., & Liang, Y. C. (2008). A discrete particle swarm optimization algorithm for the no-wait flowshop scheduling problem. Computers & Operations Research, 35(9), 2807-2839, doi:10.1016/j.ejor.2007.10.044.

Heiko Falk and Jan C. Kleinsorge. 2009.Optimal static WCET-aware scratchpad allocation of program code.In Proc. of Design Automat. Conf. ACM, 732–737, doi:

Heiko Falk, SaschaPlazar, and Henrik Theiling. 2007. Compile-time decided instruction cache locking usingworst-case execution paths. In Proc. of Int. Conf. on Hardware/Software Codesign and Syst. Synthesis.ACM, 143–148, doi:10.1145/1289816.1289853.

J¨org Henkel, Lars Bauer, Michael H¨ ubner, and ArtjomGrudnitsky. 2011. i-Core: A run-time adaptive processor for embedded multi-core systems. In Proc. Int. Conf. on Engineering of Reconfig. Syst. and Algorithms, doi:

Tiantian Liu, Minming Li, and Chun Jason Xue. 2009. Minimizing WCET for real-time embedded systemsvia static instruction cache locking. In Real-Time and Embed. Technol.Applications Symp. IEEE,35–44, doi:10.1109/RTAS.2009.11.

SaschaPlazar, Jan C. Kleinsorge, Peter Marwedel, and Heiko Falk. 2012. WCET-aware static locking ofinstruction caches. In Proc. 10th International Symposium on Code Generation and Optimization. ACM,44–52, doi:10.1145/2259016.2259023.

Christoph Steiger, Herbert Walder, Marco Platzner, and Lothar Thiele. 2003. Online scheduling and placementof real-time tasks to partially reconfigurable devices. In Proc. of Real-Time Syst. Symp. IEEE,224–225, doi:

Pan Yu and TulikaMitra. 2004. Scalable custom instructions identification for instruction-set extensible processors. In Proc. of Int. Conf. on Compilers, Architecture and Synthesis for Embed. Syst. ACM, 69–78, doi:10.1145/1023833.1023844.

comments powered by Disqus

Article Metrics

Metrics Loading ...

Metrics powered by PLOS ALM

Revista indexada en:



Línea gratuita nacional

01 8000 420101


Facultad de Ingeniería
Avenida Caracas no. 37-15 
Bogotá, D.C.


(57) (1) 3323565

(57) 3004956353

Revista en OJS implementada por Biteca Ltda.