hacking the kernel: Reading and writing to performance counter

So I have never written real x86 assembly code. So trying to figure out how to initialize a performance counter was a feat.

The intel manual talks a lot about to determine functionality with notation like this:
CPUID.0AH:EAX[bits 7:0] (which is the performance monitor version number). Initially it made no sense to me.

I soon discovered that CPUID is actually a instruction that read the EAX register and puts the results into EAX, EBX, ECX, and EDX. So the 0aH is notation for hex 0x0a to be put into EAX. And then read bits 0 through 7 in EAX to get the result.

Thanks to inline assembly, it makes this easy:


unsigned a, b, c, d;
asm("cpuid"  //assembly instruction
: "=a" (a), "=b" (b) , "=c" (c), "=d" (d) //output: the letter in quotations is the register and the letter in paranthesis is the variable
: "a" (0xa) ); //input: similar notation as output, except I am passing the hex in the parenthesis directly

v = a & 0xff; //mask the first 7 bits
 printk(KERN_INFO "Version Identifier: %u \n", v);

__

So now that I know the version number, I can setup a counter for use. Every performance counter on the intel chip-set as a corresponding event selector address space (defined in the intel manual).
The region has 2 areas for the event number and mask. The rest of the region is flags for fine tuning the counting operation. To start the counter I wrote a unsigned int to the region, as it is only the lower 32 bits.
To prepare the unsigned int I created the following function:


unsigned preparePERFEVTSEL(unsigned event, unsigned mask){
                int _INV = 0x0; //invert counter mask
                int _EN = 0x1;  //enable
                int _ANY = 0x1; //tracks threads accross processors                int _INT = 0x0; //interupt, turn on for replaying
                int _PC = 0x0;  //pin control
                int _E = 0x0;   //edge
                int _OS = 0x0;  //OS detection, TODO: do we want system calls track
ed?
                int _USR = 0x1; //track user level code

                unsigned prepared = 0x0;

                prepared |= event & 0xff;
                prepared |= (mask & 0xff)       <<8;
                prepared |= (_USR & 0x1)        <<16 ;
                prepared |= (_OS & 0x1)         <<17 ;
                prepared |= (_E & 0x1)          <<18 ;
                prepared |= (_PC & 0x1)         <<19 ;
                prepared |= (_INT & 0x1)        <<20 ;
                prepared |= (_ANY & 0x1)        <<21 ;
                prepared |= (_EN & 0x1)         <<22 ;
                prepared |= (_INV & 0x1)        <<23 ;

                return prepared;


}

I then was able to write to the event selection address space:


 unsigned event = preparePERFEVTSEL(br_inst, br_umask);
 reg = IA32_PERFEVTSEL0;  //#defined
                                asm("wrmsr"
                                :
                                : "a" (event), "c" (reg));

___________________________
Update:

All of the MSR operations have a nice wrapper within the linux kernel linux/msr.h
I can do a simple wrmsrl(MSR location, unsinged long long input) and rdmsrl(MSR location, unsinged long long ouput)

These are long operations, theres also the correspoing wrmsr(msr location, high, low) similarly rdmsr(msr location, high, low)

Also...
There seams to be a perf watchdog code that allocs the performance counters to coordinate between processors. It looks like the kernel already uses performance counter 0 for its own doing.

hacking the kernel

Friday, December 17, 2010

Reading and writing to performance counter

No comments:

Post a Comment