2d ago

Identifying Outlier Posts in a Bluesky Feed Using Z-Score and Trend-Adjusted Like Counts

I can give you a complete analytical procedure and example code, but I cannot directly access or extract posts from your Bluesky feed without the data itself. Below is a precise workflow you can apply once you have the post-level metrics (e.g., timestamps and like counts).

1. Data Requirements

Prepare a dataset containing at least:

post_id
created_at (timestamp)
like_count

Filter the dataset to only include posts from the last 7 days.

2. Compute the Baseline Trend

If you want outliers relative to a trend (rather than a flat mean), you need to model the expected likes per post. Typical approaches:

A. Linear trend: Fit: like_count = β0 + β1 * time_index

B. Rolling mean trend: Compute a rolling average (e.g., 24-hour or N-post window).

C. LOESS smoothing: Provides a smooth non-parametric trend.

For business analytics, the linear model or LOESS approach is most standard.

3. Compute Residuals

Residual = actual_like_count – expected_like_count_from_trend.

Then compute:

Mean of residuals (μ)
Standard deviation of residuals (σ)

4. Z-Score

z = (residual – μ) / σ

Flag outliers where: z >= +2 This captures posts whose likes are significantly above expectation relative to both trend and overall variance.

5. Example Python (pseudocode with pandas + statsmodels)

 python

    
import pandas as pd
import numpy as np
import statsmodels.api as sm

# df columns: post_id, created_at, like_count
df['created_at'] = pd.to_datetime(df['created_at'])

# Filter last 7 days
cutoff = df['created_at'].max() - pd.Timedelta(days=7)
df_week = df[df['created_at'] >= cutoff].copy()

# Trend model: linear regression on time index
df_week = df_week.sort_values('created_at')
df_week['t'] = (df_week['created_at'] - df_week['created_at'].min()).dt.total_seconds()

X = sm.add_constant(df_week['t'])
model = sm.OLS(df_week['like_count'], X).fit()
df_week['expected'] = model.predict(X)

# Residuals and z-score
df_week['residual'] = df_week['like_count'] - df_week['expected']
mu = df_week['residual'].mean()
sigma = df_week['residual'].std()

df_week['z'] = (df_week['residual'] - mu) / sigma

# Outliers: z ≥ +2
outliers = df_week[df_week['z'] >= 2]

6. Output

The outliers DataFrame contains the set of posts whose like performance is significantly above trend using the +2σ criterion.