DataFrame.sample(self, n=None, frac=None, replace=False, weights=None, random_state=None, axis=None) [source]
Return a random sample of items from an axis of object.
You can use random_state for reproducibility.
| Parameters: |
|
|---|---|
| Returns: |
|
See also
numpy.random.choice
>>> df = pd.DataFrame({'num_legs': [2, 4, 8, 0],
... 'num_wings': [2, 0, 0, 0],
... 'num_specimen_seen': [10, 2, 1, 8]},
... index=['falcon', 'dog', 'spider', 'fish'])
>>> df
num_legs num_wings num_specimen_seen
falcon 2 2 10
dog 4 0 2
spider 8 0 1
fish 0 0 8
Extract 3 random elements from the Series df['num_legs']: Note that we use random_state to ensure the reproducibility of the examples.
>>> df['num_legs'].sample(n=3, random_state=1) fish 0 spider 8 falcon 2 Name: num_legs, dtype: int64
A random 50% sample of the DataFrame with replacement:
>>> df.sample(frac=0.5, replace=True, random_state=1)
num_legs num_wings num_specimen_seen
dog 4 0 2
fish 0 0 8
Using a DataFrame column as weights. Rows with larger value in the num_specimen_seen column are more likely to be sampled.
>>> df.sample(n=2, weights='num_specimen_seen', random_state=1)
num_legs num_wings num_specimen_seen
falcon 2 2 10
fish 0 0 8
© 2008–2012, AQR Capital Management, LLC, Lambda Foundry, Inc. and PyData Development Team
Licensed under the 3-clause BSD License.
https://pandas.pydata.org/pandas-docs/version/0.25.0/reference/api/pandas.DataFrame.sample.html