157 lines
		
	
	
	
		
			8 KiB
		
	
	
	
		
			HTML
		
	
	
	
	
	
		
		
			
		
	
	
			157 lines
		
	
	
	
		
			8 KiB
		
	
	
	
		
			HTML
		
	
	
	
	
	
| 
								 | 
							
								<html lang="en">
							 | 
						||
| 
								 | 
							
								<head>
							 | 
						||
| 
								 | 
							
								<title>Implementation - GNU gprof</title>
							 | 
						||
| 
								 | 
							
								<meta http-equiv="Content-Type" content="text/html">
							 | 
						||
| 
								 | 
							
								<meta name="description" content="GNU gprof">
							 | 
						||
| 
								 | 
							
								<meta name="generator" content="makeinfo 4.7">
							 | 
						||
| 
								 | 
							
								<link title="Top" rel="start" href="index.html#Top">
							 | 
						||
| 
								 | 
							
								<link rel="up" href="Details.html#Details" title="Details">
							 | 
						||
| 
								 | 
							
								<link rel="next" href="File-Format.html#File-Format" title="File Format">
							 | 
						||
| 
								 | 
							
								<link href="http://www.gnu.org/software/texinfo/" rel="generator-home" title="Texinfo Homepage">
							 | 
						||
| 
								 | 
							
								<!--
							 | 
						||
| 
								 | 
							
								This file documents the gprof profiler of the GNU system.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								Copyright (C) 1988, 92, 97, 98, 99, 2000, 2001, 2003, 2007 Free Software Foundation, Inc.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								Permission is granted to copy, distribute and/or modify this document
							 | 
						||
| 
								 | 
							
								under the terms of the GNU Free Documentation License, Version 1.1
							 | 
						||
| 
								 | 
							
								or any later version published by the Free Software Foundation;
							 | 
						||
| 
								 | 
							
								with no Invariant Sections, with no Front-Cover Texts, and with no
							 | 
						||
| 
								 | 
							
								Back-Cover Texts.  A copy of the license is included in the
							 | 
						||
| 
								 | 
							
								section entitled ``GNU Free Documentation License''.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								man end-->
							 | 
						||
| 
								 | 
							
								<meta http-equiv="Content-Style-Type" content="text/css">
							 | 
						||
| 
								 | 
							
								<style type="text/css"><!--
							 | 
						||
| 
								 | 
							
								  pre.display { font-family:inherit }
							 | 
						||
| 
								 | 
							
								  pre.format  { font-family:inherit }
							 | 
						||
| 
								 | 
							
								  pre.smalldisplay { font-family:inherit; font-size:smaller }
							 | 
						||
| 
								 | 
							
								  pre.smallformat  { font-family:inherit; font-size:smaller }
							 | 
						||
| 
								 | 
							
								  pre.smallexample { font-size:smaller }
							 | 
						||
| 
								 | 
							
								  pre.smalllisp    { font-size:smaller }
							 | 
						||
| 
								 | 
							
								  span.sc { font-variant:small-caps }
							 | 
						||
| 
								 | 
							
								  span.roman { font-family: serif; font-weight: normal; } 
							 | 
						||
| 
								 | 
							
								--></style>
							 | 
						||
| 
								 | 
							
								</head>
							 | 
						||
| 
								 | 
							
								<body>
							 | 
						||
| 
								 | 
							
								<div class="node">
							 | 
						||
| 
								 | 
							
								<p>
							 | 
						||
| 
								 | 
							
								<a name="Implementation"></a>Next: <a rel="next" accesskey="n" href="File-Format.html#File-Format">File Format</a>,
							 | 
						||
| 
								 | 
							
								Up: <a rel="up" accesskey="u" href="Details.html#Details">Details</a>
							 | 
						||
| 
								 | 
							
								<hr><br>
							 | 
						||
| 
								 | 
							
								</div>
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								<h3 class="section">9.1 Implementation of Profiling</h3>
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								<p>Profiling works by changing how every function in your program is compiled
							 | 
						||
| 
								 | 
							
								so that when it is called, it will stash away some information about where
							 | 
						||
| 
								 | 
							
								it was called from.  From this, the profiler can figure out what function
							 | 
						||
| 
								 | 
							
								called it, and can count how many times it was called.  This change is made
							 | 
						||
| 
								 | 
							
								by the compiler when your program is compiled with the <span class="samp">-pg</span> option,
							 | 
						||
| 
								 | 
							
								which causes every function to call <code>mcount</code>
							 | 
						||
| 
								 | 
							
								(or <code>_mcount</code>, or <code>__mcount</code>, depending on the OS and compiler)
							 | 
						||
| 
								 | 
							
								as one of its first operations.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								   <p>The <code>mcount</code> routine, included in the profiling library,
							 | 
						||
| 
								 | 
							
								is responsible for recording in an in-memory call graph table
							 | 
						||
| 
								 | 
							
								both its parent routine (the child) and its parent's parent.  This is
							 | 
						||
| 
								 | 
							
								typically done by examining the stack frame to find both
							 | 
						||
| 
								 | 
							
								the address of the child, and the return address in the original parent. 
							 | 
						||
| 
								 | 
							
								Since this is a very machine-dependent operation, <code>mcount</code>
							 | 
						||
| 
								 | 
							
								itself is typically a short assembly-language stub routine
							 | 
						||
| 
								 | 
							
								that extracts the required
							 | 
						||
| 
								 | 
							
								information, and then calls <code>__mcount_internal</code>
							 | 
						||
| 
								 | 
							
								(a normal C function) with two arguments—<code>frompc</code> and <code>selfpc</code>. 
							 | 
						||
| 
								 | 
							
								<code>__mcount_internal</code> is responsible for maintaining
							 | 
						||
| 
								 | 
							
								the in-memory call graph, which records <code>frompc</code>, <code>selfpc</code>,
							 | 
						||
| 
								 | 
							
								and the number of times each of these call arcs was traversed.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								   <p>GCC Version 2 provides a magical function (<code>__builtin_return_address</code>),
							 | 
						||
| 
								 | 
							
								which allows a generic <code>mcount</code> function to extract the
							 | 
						||
| 
								 | 
							
								required information from the stack frame.  However, on some
							 | 
						||
| 
								 | 
							
								architectures, most notably the SPARC, using this builtin can be
							 | 
						||
| 
								 | 
							
								very computationally expensive, and an assembly language version
							 | 
						||
| 
								 | 
							
								of <code>mcount</code> is used for performance reasons.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								   <p>Number-of-calls information for library routines is collected by using a
							 | 
						||
| 
								 | 
							
								special version of the C library.  The programs in it are the same as in
							 | 
						||
| 
								 | 
							
								the usual C library, but they were compiled with <span class="samp">-pg</span>.  If you
							 | 
						||
| 
								 | 
							
								link your program with <span class="samp">gcc ... -pg</span>, it automatically uses the
							 | 
						||
| 
								 | 
							
								profiling version of the library.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								   <p>Profiling also involves watching your program as it runs, and keeping a
							 | 
						||
| 
								 | 
							
								histogram of where the program counter happens to be every now and then. 
							 | 
						||
| 
								 | 
							
								Typically the program counter is looked at around 100 times per second of
							 | 
						||
| 
								 | 
							
								run time, but the exact frequency may vary from system to system.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								   <p>This is done is one of two ways.  Most UNIX-like operating systems
							 | 
						||
| 
								 | 
							
								provide a <code>profil()</code> system call, which registers a memory
							 | 
						||
| 
								 | 
							
								array with the kernel, along with a scale
							 | 
						||
| 
								 | 
							
								factor that determines how the program's address space maps
							 | 
						||
| 
								 | 
							
								into the array. 
							 | 
						||
| 
								 | 
							
								Typical scaling values cause every 2 to 8 bytes of address space
							 | 
						||
| 
								 | 
							
								to map into a single array slot. 
							 | 
						||
| 
								 | 
							
								On every tick of the system clock
							 | 
						||
| 
								 | 
							
								(assuming the profiled program is running), the value of the
							 | 
						||
| 
								 | 
							
								program counter is examined and the corresponding slot in
							 | 
						||
| 
								 | 
							
								the memory array is incremented.  Since this is done in the kernel,
							 | 
						||
| 
								 | 
							
								which had to interrupt the process anyway to handle the clock
							 | 
						||
| 
								 | 
							
								interrupt, very little additional system overhead is required.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								   <p>However, some operating systems, most notably Linux 2.0 (and earlier),
							 | 
						||
| 
								 | 
							
								do not provide a <code>profil()</code> system call.  On such a system,
							 | 
						||
| 
								 | 
							
								arrangements are made for the kernel to periodically deliver
							 | 
						||
| 
								 | 
							
								a signal to the process (typically via <code>setitimer()</code>),
							 | 
						||
| 
								 | 
							
								which then performs the same operation of examining the
							 | 
						||
| 
								 | 
							
								program counter and incrementing a slot in the memory array. 
							 | 
						||
| 
								 | 
							
								Since this method requires a signal to be delivered to
							 | 
						||
| 
								 | 
							
								user space every time a sample is taken, it uses considerably
							 | 
						||
| 
								 | 
							
								more overhead than kernel-based profiling.  Also, due to the
							 | 
						||
| 
								 | 
							
								added delay required to deliver the signal, this method is
							 | 
						||
| 
								 | 
							
								less accurate as well.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								   <p>A special startup routine allocates memory for the histogram and
							 | 
						||
| 
								 | 
							
								either calls <code>profil()</code> or sets up
							 | 
						||
| 
								 | 
							
								a clock signal handler. 
							 | 
						||
| 
								 | 
							
								This routine (<code>monstartup</code>) can be invoked in several ways. 
							 | 
						||
| 
								 | 
							
								On Linux systems, a special profiling startup file <code>gcrt0.o</code>,
							 | 
						||
| 
								 | 
							
								which invokes <code>monstartup</code> before <code>main</code>,
							 | 
						||
| 
								 | 
							
								is used instead of the default <code>crt0.o</code>. 
							 | 
						||
| 
								 | 
							
								Use of this special startup file is one of the effects
							 | 
						||
| 
								 | 
							
								of using <span class="samp">gcc ... -pg</span> to link. 
							 | 
						||
| 
								 | 
							
								On SPARC systems, no special startup files are used. 
							 | 
						||
| 
								 | 
							
								Rather, the <code>mcount</code> routine, when it is invoked for
							 | 
						||
| 
								 | 
							
								the first time (typically when <code>main</code> is called),
							 | 
						||
| 
								 | 
							
								calls <code>monstartup</code>.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								   <p>If the compiler's <span class="samp">-a</span> option was used, basic-block counting
							 | 
						||
| 
								 | 
							
								is also enabled.  Each object file is then compiled with a static array
							 | 
						||
| 
								 | 
							
								of counts, initially zero. 
							 | 
						||
| 
								 | 
							
								In the executable code, every time a new basic-block begins
							 | 
						||
| 
								 | 
							
								(i.e., when an <code>if</code> statement appears), an extra instruction
							 | 
						||
| 
								 | 
							
								is inserted to increment the corresponding count in the array. 
							 | 
						||
| 
								 | 
							
								At compile time, a paired array was constructed that recorded
							 | 
						||
| 
								 | 
							
								the starting address of each basic-block.  Taken together,
							 | 
						||
| 
								 | 
							
								the two arrays record the starting address of every basic-block,
							 | 
						||
| 
								 | 
							
								along with the number of times it was executed.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								   <p>The profiling library also includes a function (<code>mcleanup</code>) which is
							 | 
						||
| 
								 | 
							
								typically registered using <code>atexit()</code> to be called as the
							 | 
						||
| 
								 | 
							
								program exits, and is responsible for writing the file <span class="file">gmon.out</span>. 
							 | 
						||
| 
								 | 
							
								Profiling is turned off, various headers are output, and the histogram
							 | 
						||
| 
								 | 
							
								is written, followed by the call-graph arcs and the basic-block counts.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								   <p>The output from <code>gprof</code> gives no indication of parts of your program that
							 | 
						||
| 
								 | 
							
								are limited by I/O or swapping bandwidth.  This is because samples of the
							 | 
						||
| 
								 | 
							
								program counter are taken at fixed intervals of the program's run time. 
							 | 
						||
| 
								 | 
							
								Therefore, the
							 | 
						||
| 
								 | 
							
								time measurements in <code>gprof</code> output say nothing about time that your
							 | 
						||
| 
								 | 
							
								program was not running.  For example, a part of the program that creates
							 | 
						||
| 
								 | 
							
								so much data that it cannot all fit in physical memory at once may run very
							 | 
						||
| 
								 | 
							
								slowly due to thrashing, but <code>gprof</code> will say it uses little time.  On
							 | 
						||
| 
								 | 
							
								the other hand, sampling by run time has the advantage that the amount of
							 | 
						||
| 
								 | 
							
								load due to other users won't directly affect the output you get.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								   </body></html>
							 | 
						||
| 
								 | 
							
								
							 |