Skip to content

Commit 77d1e51

Browse files
Automatic merge of 'next-test' into merge-test (2025-09-22 14:52)
2 parents 8c86d0e + 4a774b3 commit 77d1e51

File tree

7 files changed

+790
-2
lines changed

7 files changed

+790
-2
lines changed
Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
What: /sys/bus/event_source/devices/vpa_dtl/format
2+
Date: February 2025
3+
Contact: Linux on PowerPC Developer List <linuxppc-dev at lists.ozlabs.org>
4+
Description: Read-only. Attribute group to describe the magic bits
5+
that go into perf_event_attr.config for a particular pmu.
6+
(See ABI/testing/sysfs-bus-event_source-devices-format).
7+
8+
Each attribute under this group defines a bit range of the
9+
perf_event_attr.config. Supported attribute are listed
10+
below::
11+
12+
event = "config:0-7" - event ID
13+
14+
For example::
15+
16+
dtl_cede = "event=0x1"
17+
18+
What: /sys/bus/event_source/devices/vpa_dtl/events
19+
Date: February 2025
20+
Contact: Linux on PowerPC Developer List <linuxppc-dev at lists.ozlabs.org>
21+
Description: (RO) Attribute group to describe performance monitoring events
22+
for the Virtual Processor Dispatch Trace Log. Each attribute in
23+
this group describes a single performance monitoring event
24+
supported by vpa_dtl pmu. The name of the file is the name of
25+
the event (See ABI/testing/sysfs-bus-event_source-devices-events).

Documentation/arch/powerpc/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,7 @@ powerpc
3737
vas-api
3838
vcpudispatch_stats
3939
vmemmap_dedup
40+
vpa-dtl
4041

4142
features
4243

Lines changed: 156 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,156 @@
1+
.. SPDX-License-Identifier: GPL-2.0
2+
.. _vpa-dtl:
3+
4+
===================================
5+
DTL (Dispatch Trace Log)
6+
===================================
7+
8+
Athira Rajeev, 19 April 2025
9+
10+
.. contents::
11+
:depth: 3
12+
13+
14+
Basic overview
15+
==============
16+
17+
The pseries Shared Processor Logical Partition(SPLPAR) machines can
18+
retrieve a log of dispatch and preempt events from the hypervisor
19+
using data from Disptach Trace Log(DTL) buffer. With this information,
20+
user can retrieve when and why each dispatch & preempt has occurred.
21+
The vpa-dtl PMU exposes the Virtual Processor Area(VPA) DTL counters
22+
via perf.
23+
24+
Infrastructure used
25+
===================
26+
27+
The VPA DTL PMU counters do not interrupt on overflow or generate any
28+
PMI interrupts. Therefore, hrtimer is used to poll the DTL data. The timer
29+
nterval can be provided by user via sample_period field in nano seconds.
30+
vpa dtl pmu has one hrtimer added per vpa-dtl pmu thread. DTL (Dispatch
31+
Trace Log) contains information about dispatch/preempt, enqueue time etc.
32+
We directly copy the DTL buffer data as part of auxiliary buffer and it
33+
will be processed later. This will avoid time taken to create samples
34+
in the kernel space. The PMU driver collecting Dispatch Trace Log (DTL)
35+
entries makes use of AUX support in perf infrastructure. On the tools side,
36+
this data is made available as PERF_RECORD_AUXTRACE records.
37+
38+
To correlate each DTL entry with other events across CPU's, an auxtrace_queue
39+
is created for each CPU. Each auxtrace queue has a array/list of auxtrace buffers.
40+
All auxtrace queues is maintained in auxtrace heap. The queues are sorted
41+
based on timestamp. When the different PERF_RECORD_XX records are processed,
42+
compare the timestamp of perf record with timestamp of top element in the
43+
auxtrace heap so that DTL events can be co-related with other events
44+
Process the auxtrace queue if the timestamp of element from heap is
45+
lower than timestamp from entry in perf record. Sometimes it could happen that
46+
one buffer is only partially processed. if the timestamp of occurrence of
47+
another event is more than currently processed element in the queue, it will
48+
move on to next perf record. So keep track of position of buffer to continue
49+
processing next time. Update the timestamp of the auxtrace heap with the timestamp
50+
of last processed entry from the auxtrace buffer.
51+
52+
This infrastructure ensures dispatch trace log entries can be correlated
53+
and presented along with other events like sched.
54+
55+
vpa-dtl PMU example usage
56+
=========================
57+
58+
.. code-block:: sh
59+
60+
# ls /sys/devices/vpa_dtl/
61+
events format perf_event_mux_interval_ms power subsystem type uevent
62+
63+
64+
To capture the DTL data using perf record:
65+
.. code-block:: sh
66+
67+
# ./perf record -a -e sched:\*,vpa_dtl/dtl_all/ -c 1000000000 sleep 1
68+
69+
The result can be interpreted using perf record. Snippet of perf report -D
70+
71+
.. code-block:: sh
72+
73+
# ./perf report -D
74+
75+
There are different PERF_RECORD_XX records. In that records corresponding to
76+
auxtrace buffers includes:
77+
78+
1. PERF_RECORD_AUX
79+
Conveys that new data is available in AUX area
80+
81+
2. PERF_RECORD_AUXTRACE_INFO
82+
Describes offset and size of auxtrace data in the buffers
83+
84+
3. PERF_RECORD_AUXTRACE
85+
This is the record that defines the auxtrace data which here in case of
86+
vpa-dtl pmu is dispatch trace log data.
87+
88+
Snippet from perf report -D showing the PERF_RECORD_AUXTRACE dump
89+
90+
.. code-block:: sh
91+
92+
0 0 0x39b10 [0x30]: PERF_RECORD_AUXTRACE size: 0x690 offset: 0 ref: 0 idx: 0 tid: -1 cpu: 0
93+
.
94+
. ... VPA DTL PMU data: size 1680 bytes, entries is 35
95+
. 00000000: boot_tb: 21349649546353231, tb_freq: 512000000
96+
. 00000030: dispatch_reason:decrementer interrupt, preempt_reason:H_CEDE, enqueue_to_dispatch_time:7064, ready_to_enqueue_time:187, waiting_to_ready_time:6611773
97+
. 00000060: dispatch_reason:priv doorbell, preempt_reason:H_CEDE, enqueue_to_dispatch_time:146, ready_to_enqueue_time:0, waiting_to_ready_time:15359437
98+
. 00000090: dispatch_reason:decrementer interrupt, preempt_reason:H_CEDE, enqueue_to_dispatch_time:4868, ready_to_enqueue_time:232, waiting_to_ready_time:5100709
99+
. 000000c0: dispatch_reason:priv doorbell, preempt_reason:H_CEDE, enqueue_to_dispatch_time:179, ready_to_enqueue_time:0, waiting_to_ready_time:30714243
100+
. 000000f0: dispatch_reason:priv doorbell, preempt_reason:H_CEDE, enqueue_to_dispatch_time:197, ready_to_enqueue_time:0, waiting_to_ready_time:15350648
101+
. 00000120: dispatch_reason:priv doorbell, preempt_reason:H_CEDE, enqueue_to_dispatch_time:213, ready_to_enqueue_time:0, waiting_to_ready_time:15353446
102+
. 00000150: dispatch_reason:priv doorbell, preempt_reason:H_CEDE, enqueue_to_dispatch_time:212, ready_to_enqueue_time:0, waiting_to_ready_time:15355126
103+
. 00000180: dispatch_reason:decrementer interrupt, preempt_reason:H_CEDE, enqueue_to_dispatch_time:6368, ready_to_enqueue_time:164, waiting_to_ready_time:5104665
104+
105+
Above is representation of dtl entry of below format:
106+
107+
struct dtl_entry {
108+
u8 dispatch_reason;
109+
u8 preempt_reason;
110+
u16 processor_id;
111+
u32 enqueue_to_dispatch_time;
112+
u32 ready_to_enqueue_time;
113+
u32 waiting_to_ready_time;
114+
u64 timebase;
115+
u64 fault_addr;
116+
u64 srr0;
117+
u64 srr1;
118+
119+
};
120+
121+
First two fields represent the dispatch reason and preempt reason. The post
122+
processing of PERF_RECORD_AUXTRACE records will translate to meaningful data
123+
for user to consume.
124+
125+
Visualize the dispatch trace log entries with perf report
126+
=========================================================
127+
128+
.. code-block:: sh
129+
130+
# ./perf record -a -e sched:*,vpa_dtl/dtl_all/ -c 1000000000 sleep 1
131+
[ perf record: Woken up 1 times to write data ]
132+
[ perf record: Captured and wrote 0.300 MB perf.data ]
133+
134+
# ./perf report
135+
# Samples: 321 of event 'vpa-dtl'
136+
# Event count (approx.): 321
137+
#
138+
# Children Self Command Shared Object Symbol
139+
# ........ ........ ....... ................. ..............................
140+
#
141+
100.00% 100.00% swapper [kernel.kallsyms] [k] plpar_hcall_norets_notrace
142+
143+
Visualize the dispatch trace log entries with perf script
144+
=========================================================
145+
146+
.. code-block:: sh
147+
148+
# ./perf script
149+
migration/9 67 [009] 105373.359903: sched:sched_waking: comm=perf pid=13418 prio=120 target_cpu=009
150+
migration/9 67 [009] 105373.359904: sched:sched_migrate_task: comm=perf pid=13418 prio=120 orig_cpu=9 dest_cpu=10
151+
migration/9 67 [009] 105373.359907: sched:sched_stat_runtime: comm=migration/9 pid=67 runtime=4050 [ns]
152+
migration/9 67 [009] 105373.359908: sched:sched_switch: prev_comm=migration/9 prev_pid=67 prev_prio=0 prev_state=S ==> next_comm=swapper/9 next_pid=0 next_prio=120
153+
:256 256 [016] 105373.359913: vpa-dtl: timebase: 21403600706628832 dispatch_reason:decrementer interrupt, preempt_reason:H_CEDE, enqueue_to_dispatch_time:4854, ready_to_enqueue_time:139, waiting_to_ready_time:511842115 c0000000000fcd28 plpar_hcall_norets_notrace+0x18 ([kernel.kallsyms])
154+
:256 256 [017] 105373.360012: vpa-dtl: timebase: 21403600706679454 dispatch_reason:priv doorbell, preempt_reason:H_CEDE, enqueue_to_dispatch_time:236, ready_to_enqueue_time:0, waiting_to_ready_time:133864583 c0000000000fcd28 plpar_hcall_norets_notrace+0x18 ([kernel.kallsyms])
155+
perf 13418 [010] 105373.360048: sched:sched_stat_runtime: comm=perf pid=13418 runtime=139748 [ns]
156+
perf 13418 [010] 105373.360052: sched:sched_waking: comm=migration/10 pid=72 prio=0 target_cpu=010

arch/powerpc/include/asm/time.h

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,10 @@ extern u64 decrementer_max;
2929

3030
extern void generic_calibrate_decr(void);
3131

32+
#ifdef CONFIG_PPC_SPLPAR
33+
extern u64 get_boot_tb(void);
34+
#endif
35+
3236
/* Some sane defaults: 125 MHz timebase, 1GHz processor */
3337
extern unsigned long ppc_proc_freq;
3438
#define DEFAULT_PROC_FREQ (DEFAULT_TB_FREQ * 8)

arch/powerpc/kernel/time.c

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -137,7 +137,7 @@ EXPORT_SYMBOL_GPL(rtc_lock);
137137

138138
static u64 tb_to_ns_scale __read_mostly;
139139
static unsigned tb_to_ns_shift __read_mostly;
140-
static u64 boot_tb __read_mostly;
140+
static u64 boot_tb __ro_after_init;
141141

142142
extern struct timezone sys_tz;
143143
static long timezone_offset;
@@ -639,6 +639,12 @@ notrace unsigned long long sched_clock(void)
639639
return mulhdu(get_tb() - boot_tb, tb_to_ns_scale) << tb_to_ns_shift;
640640
}
641641

642+
#ifdef CONFIG_PPC_SPLPAR
643+
u64 get_boot_tb(void)
644+
{
645+
return boot_tb;
646+
}
647+
#endif
642648

643649
#ifdef CONFIG_PPC_PSERIES
644650

arch/powerpc/perf/Makefile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ obj-$(CONFIG_PPC_POWERNV) += imc-pmu.o
1414
obj-$(CONFIG_FSL_EMB_PERF_EVENT) += core-fsl-emb.o
1515
obj-$(CONFIG_FSL_EMB_PERF_EVENT_E500) += e500-pmu.o e6500-pmu.o
1616

17-
obj-$(CONFIG_HV_PERF_CTRS) += hv-24x7.o hv-gpci.o hv-common.o
17+
obj-$(CONFIG_HV_PERF_CTRS) += hv-24x7.o hv-gpci.o hv-common.o vpa-dtl.o
1818

1919
obj-$(CONFIG_VPA_PMU) += vpa-pmu.o
2020

0 commit comments

Comments
 (0)