I pause and wonder how many hours one must have invested to become so highly skilled on such an esoteric topic. I find comfort in user zxmth’s question, asserting I was not alone left in awe.
Here’s a plausible method. There are three parameters here, so lets just look at how hpbussize infuences the error.
$ git grep "No bus number available for hot-added bridge"
drivers/pci/probe.c: pci_err(dev, "No bus number available for hot-added bridge\n");
which starts like
int pci_hp_add_bridge(struct pci_dev *dev)
{
struct pci_bus *parent = dev->bus;
int busnr, start = parent->busn_res.start;
unsigned int available_buses = 0;
int end = parent->busn_res.end;
for (busnr = start; busnr <= end; busnr++) {
if (!pci_find_bus(pci_domain_nr(parent), busnr))
break;
}
if (busnr-- > end) {
pci_err(dev, "No bus number available for hot-added bridge\n");
return -1;
}
So, just like the message says, we’ve allocated some bus numbers up front, but we ran out. Looks like this depends on the value of busn_res.end. Grepping around in the same file, we find that pci_scan_bridge_extend calls pci_bus_update_busn_res_end with a max value. Aside from some increments and decrements, this function also does max = pci_scan_child_bus_extend(...), which is documented like
/**
* pci_scan_child_bus_extend() - Scan devices below a bus
* @bus: Bus to scan for devices
* @available_buses: Total number of buses available (%0 does not try to
* extend beyond the minimal)
*
* Scans devices below @bus including subordinate buses. Returns new
* subordinate number including all the found devices. Passing
* @available_buses causes the remaining bus space to be distributed
* equally between hotplug-capable bridges to allow future extension of the
* hierarchy.
*/
That certainly sounds promising. The hotplug stuff is
/*
* Make sure a hotplug bridge has at least the minimum requested
* number of buses but allow it to grow up to the maximum available
* bus number if there is room.
*/
if (bus->self && bus->self->is_hotplug_bridge) {
used_buses = max_t(unsigned int, available_buses,
pci_hotplug_bus_size - 1);
if (max - start < used_buses) {
max = start + used_buses;
/* Do not allocate more buses than we have room left */
if (max > bus->busn_res.end)
max = bus->busn_res.end;
dev_dbg(&bus->dev, "%pR extended by %#02x\n",
&bus->busn_res, max - start);
}
}
So if we wanted to increases the number of busses, we’d have to increase pci_hotplug_bus_size. This is initialized in pci_setup:
And there’s our parameter. Documentation/admin-guide/kernel-parameters.txt says
hpbussize=nn The minimum amount of additional bus numbers
reserved for buses below a hotplug bridge.
Default is 1.
We can also look at the docs for the rest of these
realloc= Enable/disable reallocating PCI bridge resources
if allocations done by BIOS are too small to
accommodate resources required by all child
devices.
off: Turn realloc off
on: Turn realloc on
and
assign-busses [X86] Always assign all PCI bus
numbers ourselves, overriding
whatever the firmware may have done.
So it looks like by default Linux doesn’t adjust the bus numbers assigned by the BIOS, and those parameters let Linux rearrange things and ensure that there’s enough space for hotplugged busses (such as from thunderbolt devices).
The above process is a bit “happy path” (no wrong turns), but to someone who’s seen (and debugged) this error before, the answer would be obvious to them. The real question (which could probably be answered by someone more familiar with PCI) is why we allocate bus numbers up front, and can’t add more later.
The real question (which could probably be answered by someone more familiar with PCI) is why we allocate bus numbers up front, and can’t add more later.
It seems like the reallocation patch has been stalled for two years.
I expect these kinds of problems are simply a result of the complexity of modern hardware. This is basically what you get when the hardware is poorly (or at least incomprehensibly) designed or implemented, and there are so many combinations of peripherals and configurations that they can’t all be tested. USB and Thunderbolt docks, to pick an extreme example, have been around for years and they are still a compatibility and interoperability shitshow on every OS, not just Linux.
Linux devs do their best to paper over poor design decisions while not inadvertently breaking existing setups, but it’s a perilous line to walk.
Here’s a plausible method. There are three parameters here, so lets just look at how
hpbussize
infuences the error.which starts like
So, just like the message says, we’ve allocated some bus numbers up front, but we ran out. Looks like this depends on the value of
busn_res.end
. Grepping around in the same file, we find thatpci_scan_bridge_extend
callspci_bus_update_busn_res_end
with amax
value. Aside from some increments and decrements, this function also doesmax = pci_scan_child_bus_extend(...)
, which is documented likeThat certainly sounds promising. The hotplug stuff is
So if we wanted to increases the number of busses, we’d have to increase
pci_hotplug_bus_size
. This is initialized inpci_setup
:And there’s our parameter.
Documentation/admin-guide/kernel-parameters.txt
saysWe can also look at the docs for the rest of these
and
So it looks like by default Linux doesn’t adjust the bus numbers assigned by the BIOS, and those parameters let Linux rearrange things and ensure that there’s enough space for hotplugged busses (such as from thunderbolt devices).
The above process is a bit “happy path” (no wrong turns), but to someone who’s seen (and debugged) this error before, the answer would be obvious to them. The real question (which could probably be answered by someone more familiar with PCI) is why we allocate bus numbers up front, and can’t add more later.
It seems like the reallocation patch has been stalled for two years.
https://lore.kernel.org/linux-pci/20201218174011.340514-1-s.miroshnichenko@yadro.com/
(Taken from the orange site)
I expect these kinds of problems are simply a result of the complexity of modern hardware. This is basically what you get when the hardware is poorly (or at least incomprehensibly) designed or implemented, and there are so many combinations of peripherals and configurations that they can’t all be tested. USB and Thunderbolt docks, to pick an extreme example, have been around for years and they are still a compatibility and interoperability shitshow on every OS, not just Linux.
Linux devs do their best to paper over poor design decisions while not inadvertently breaking existing setups, but it’s a perilous line to walk.
You get used to it since most OEMs don’t care about Linux support, even the ones that “love” Linux.
I came for the debugging story, stayed for the mysterious grudge against USB disks, and laughed out loud at the even more mysterious solution.
Anecdotes like that happen on every single OS; even the one officially supported.
But when it happens:
Yeah, for some reason it’s a common complaint, but:
The author just hasn’t spent enough time on other systems to have a “Windows evening” it seems. Stuff is broken everywhere.
Ha, just today my Windows decided to refuse to boot. At least I can debug NixOS.