# Black holes regulate cool gas accretion in massive galaxies

### Cosmology

We adopted a Chabrier initial mass function (IMF)^{31} to estimate star formation rate and assumed cosmological parameters of *H*_{0} = 70 km s^{−1} Mpc^{−1}, *Ω*_{M} = 0.3, and *Ω*_{Λ} = 0.7.

### Sample selection

#### The BH sample

The sample for galaxies with directly measured BH masses is primarily from ref. ^{11}, which includes 91 central galaxies collected from refs. ^{19,20,21}. We excluded 18 sources with BH masses measured with reverberation mapping and kept only those measured with dynamical methods. We then added another 63 galaxies with measured BH masses from recent literature, which were matched with the group catalogue^{32} of nearby galaxies to select only central galaxies. We obtained the HI flux densities and masses of this sample by crossmatching with the nearby galaxy database, HyperLeda^{33}. Our final sample includes 69 central galaxies with 41 from ref. ^{11} and the remaining from the compilation of recent literature. In Extended Data Table 3, we list the basic properties of our BH sample.

#### The galaxy sample

The sample for galaxies with HI measurements and indirect BH mass measurements are from the extended GALEX Arecibo SDSS Survey (xGASS; ref. ^{34}) and HI-MaNGA programme^{35,36}, which include HI observations towards a representative sample of about 1,200 and 6,000 galaxies with 10^{9}*M*_{⊙} < *M*_{⋆} < 10^{11.5}*M*_{⊙}, respectively. The depth of the survey also allows for stringent constraints on the upper limits for the HI non-detections, enabling a comprehensive assessment of *f*_{HI} for the entire sample. We limited the redshift z < 0.035 to ensure high HI-detection rates even at the highest stellar masses and BH masses. We selected only group central galaxies, which include at least one satellite galaxy in their groups, based on the crossmatch with the group catalogue^{37,38,39}. Isolated central galaxies lacking any satellites in their groups are discarded because they may have probably suffered from additional environmental effects^{40}. We derived the BH masses for the xGASS and HI-MaNGA samples with their velocity dispersion^{21} from SDSS DR17^{37} (*σ*_{SDSS}, and we require *σ*_{SDSS} ≥ 70 km s^{−1}):

$$\log \left(\frac{{M}_{{\rm{BH}}}}{{M}_{\odot }}\right)=(8.32\pm 0.04)+(5.35\pm 0.23)\log \left(\frac{{\sigma }_{{\rm{SDSS}}}}{200\,{\rm{km}}\,{{\rm{s}}}^{-1}}\right).$$

(1)

### Physical parameters of the BH and galaxy sample

#### Stellar masses

The stellar masses for the galaxy sample are taken from the MPA-JHU catalogue^{41,42}, which are derived from SED fitting based on SDSS data. For the BH sample, because most of them lack the same photometric coverage as the galaxy sample, we derive their stellar masses from their K-band luminosity and velocity dispersion-dependent K-band mass-to-light ratio following ref. ^{21}:

$${M}_{\star }/{L}_{{\rm{K}}}=0.1{\sigma }_{{\rm{e}}}^{0.45}.$$

(2)

As an accurate determination of *σ*_{e} is not available for all galaxies, we derived *σ*_{e} for the full BH sample from the tight correlation in ref. ^{21}:

$$\begin{array}{l}\log \left(\frac{{\sigma }_{{\rm{e}}}}{{\rm{km}}}\,{{\rm{s}}}^{-1}\right)=(2.11\pm 0.01)+(0.71\pm 0.03)\log \left(\frac{{L}_{{\rm{K}}}}{1{0}^{11}{L}_{\odot }}\right)\\ \,\,\,\,\,\,\,+(-0.72\pm 0.05)\log \left(\frac{{R}_{{\rm{e}}}}{5\,{\rm{kpc}}}\right).\end{array}$$

(3)

To explore whether there are systematic differences between the two methods, we compare the stellar masses of the galaxy sample taken from the MPA-JHU catalogue and those derived from equation (2). A median mass difference 0.32 dex is found between the two methods (Extended Data Fig. 6), which may be attributed to the tilt from the fundamental plane beyond the mass-to-light ratio, for example, the dark matter component in the effective radius. We corrected these systematic mass differences for the BH sample to match that of the galaxy sample.

#### HI fraction and upper limits

The HI-detection limit depends not only on the sensitivity but also on the width of the HI line. To obtain more realistic upper limits, we first derived the expected HI line width for each HI non-detection. The width of the HI line indicates the circular velocity of the host galaxy, which should be proportional to the stellar masses. We explored this using the HI detections from the xGASS sample. Extended Data Fig. 1 shows the relation between *M*_{⋆} and the observed line width, as well as *M*_{⋆} and inclination-corrected line width. It indicates that the inclination-corrected line width is tightly correlated with *M*_{⋆}, which is further used to derive the expected line width for the HI non-detections. Combining the sensitivity of the HI observations and the expected line width, we derived the upper limits for all the HI non-detections in our BH and galaxy samples.

#### Morphology

For BH sample, the morphology indicator *T* is obtained from the HyperLEDA database^{33}. It can be a non-integer because for most objects the final *T* is averaged over various estimates available in the literature. For the galaxy sample, we classified them in to the early types and late types based on the Sérsic index (from NASA-Sloan Atlas catalogue; NSA: Blanton M.; http://www.nsatlas.org) larger or smaller than 2.

#### Star formation rates

The specific star formation rates (SSFR) of the galaxy sample are from the MPA-JHU catalogue based on ref. ^{42}. The SSFR for the BH sample is taken from the original reference.

#### Bulge masses

The bulge information is from refs. ^{43,44} for the BH sample and galaxy sample, respectively. More specifically, we calculate the bulge mass for the galaxy sample using r-band *B*/*T*.

#### Stellar mass surface density

We calculated the K-band effective radius for both the BH and the galaxy sample according to ref. ^{21}: log *R*_{e} = 1.16 log *R*_{K_R_EFF} + 0.23 log *q*_{K_BA}, where *R*_{e} is the corrected apparent effective size, *R*_{K_R_EFF} and *q*_{K_BA} are K-band apparent effective radius and K-band axis ratio from 2MASS. After converting the apparent sizes to the physical sizes, the stellar mass surface density was derived as \({\varSigma }_{{\rm{star}}}={M}_{\star }/(2{\rm{\pi }}{R}_{{\rm{e}}}^{2})\).

#### H_{2} masses

We collected H_{2} masses from xCOLD GASS survey^{18} and ref. ^{45} for xGASS and MaNGA galaxies, respectively. We acknowledge that at least in the nearby Universe, the molecular-to-atomic gas mass ratio increases only weakly with stellar masses and remains relatively low over a wide stellar mass range, with \(R\equiv {M}_{{{\rm{H}}}_{2}}/{M}_{{\rm{HI}}} \sim 10-20 \% \) at 10^{9}*M*_{⊙} < *M*_{⋆} < 10^{11.5}*M*_{⊙}. We calculate the total gas fractions as \({\mu }_{{\rm{HI}}+{{\rm{H}}}_{2}}\,=\)\(({M}_{{\rm{HI}}}+{M}_{{{\rm{H}}}_{2}})/{M}_{\star }\). For central galaxies (isolated centrals plus group centrals), we compare the *M*_{BH}–*μ*_{HI} and *M*_{BH}–\({\mu }_{{\rm{HI}}+{{\rm{H}}}_{2}}\) relation in Extended Data Fig. 4. The *M*_{BH}–\({\mu }_{{\rm{HI}}+{{\rm{H}}}_{2}}\) relation exhibits a stronger correlation with the smaller scatter than the *M*_{BH}*–μ*_{HI} relation. We acknowledge that, based on molecular hydrogen gas content traced through dust extinction, previous studies show an *M*_{BH}–\({f}_{{{\rm{H}}}_{2}}\) correlation^{12}. Future studies with more direct measurements of molecular hydrogen gas for large samples will be needed to examine in detail whether *M*_{BH} also plays a fundamental part in regulating the molecular gas content in galaxies.

### Quiescent fraction

To estimate the quiescent fraction at different *M*_{BH}, we selected galaxies from the MPA-JHU catalogue of SDSS galaxies with the same criteria as the galaxy sample, except that we limited the velocity dispersion to greater than 30 km s^{−1} to cover broader *M*_{BH} and we made no constraints on the HI detection. We classified the sample galaxies into star-forming and quiescent ones, separated at SSFR = −11. In each *M*_{BH} bin, the quiescent fraction was calculated as the ratio between the number of quiescent galaxies and that of all galaxies. The result is shown in Extended Data Fig. 5, which is consistent with that of previous work^{29,46}.

### Linear least squares approximation

We implemented linear regression for the BH sample and the galaxy sample using Python package LTS_LINEFIT introduced in ref. ^{47}, which is insensitive to outliers and can give the intrinsic scatter around the linear relation with corresponding errors of the fitted parameters.

### Linear fitting including upper limits

To incorporate both detections and upper limits in the galaxy sample, we applied the Kaplan–Meier non-parametric estimator to derive the cumulative distribution function at different *M*_{BH} bins (with Python package Reliability^{48}), and performed 10,000 random draws from the cumulative distribution function at each bin to fit the relation between *f*_{gas} and *M*_{BH}. The linear relation and its corresponding errors are taken as the best fitting and standard deviations of these fittings (Extended Data Table 2). The non-detection rate of HI is relatively low across most of the *M*_{BH} range and becomes significant only for galaxies with the most massive BHs (reaching about 50% at *M*_{BH} > 10^{8}*M*_{⊙}).

### Partial least square regression

To derive the most significant physical parameters in determining *μ*_{HI} statistically, we used the Python package Scikit-learn^{49} with partial least squares (PLS) Regression function, which uses a non-linear iterative partial least squares (NIPALS)^{50} algorithm. The PLS algorithm generalizes a few latent variables (or principal components) that summarize the variance of independent variables, which is used to find the fundamental relation between a set of independent and dependent variables. It has advantages in regression among highly correlated predictor variables. It calculates the linear combinations of the original predictor datasets (latent variables) and the response datasets with maximal covariance, then fits the regression between the projected datasets and returns the model:

where *X* and *Y* are predictor and response datasets, *B* is the matrix of regression coefficients and *F* is the intercept matrix.

We constructed the *X* and *Y* matrices as the set of *M*_{BH}, *M*_{⋆}, *Σ*_{star}, *M*_{bulge} and the set of *μ*_{HI}. For the BH and galaxy samples, this returns the sample size of 45 and 189, respectively. The optimal number of latent variables (linear combinations of predictor variables) in PLS Regression is determined by the minimum of mean squared error from cross-validation (using function cross_val_predict in Scikit-learn) at each number of components. We find that the optimal number of latent variables for both the BH and the galaxy sample converges to one. Further increasing the number of latent variables yields only a few percentage changes in the mean squared errors, and *M*_{BH} remains the most significant predictor parameter. Following appendix B in ref. ^{51}, the variance contribution from different parameters to *μ*_{HI} is decomposed as

$${\rm{Var}}(Y)=\mathop{\sum }\limits_{i=1}^{4}{\rm{Var}}({X}_{i}{B}_{i})+{\rm{Var}}(F),$$

(5)

where Var is a measure of the spread of a distribution. The portion of each parameter variance is shown in the last column of the Extended Data Table 3, which shows that *M*_{BH} dominates the variance. Further increasing the number of latent variables results only in a few percentage changes in the mean squared errors, and *M*_{BH} remains the most significant predictor parameter.