Collapsed LDA Gibbs sampling for sparse (d, v, c) triplet data.
Source:R/Rfunction.R
run_lda_gibbs.RdThis function performs collapsed Gibbs sampling for the standard LDA model using a sparse document-term representation:
initializes the LDA state via
init_mod_from_count(),runs
n_iteriterations of the C++ Gibbs kerneleLDA_pass_b_fast(),returns the final model state, including posterior topic-word and document-topic distributions.
Usage
run_lda_gibbs(
count,
K,
alpha,
beta,
n_iter = 100L,
phi = NULL,
seed = NULL,
verbose = TRUE,
progress_every = 10L
)Arguments
- count
Integer matrix of size NZ x 3 with rows (d, v, c) in 0-based indexing: document index
d, word indexv, and countcfor that pair.- K
Integer, number of topics. Required unless
phiis supplied. Ifphiis provided,Kis inferred fromncol(phi).- alpha
Scalar Dirichlet prior parameter for document-topic distributions.
- beta
Scalar Dirichlet prior parameter for topic-word distributions.
- n_iter
Integer, number of Gibbs sweeps to run.
- phi
Optional V x K topic-word probability matrix used only for initializing topic assignments in
init_mod_from_count().- seed
Optional integer random seed passed to the initializer.
- verbose
Logical; if
TRUE, print progress messages.- progress_every
Integer; print progress every this many iterations.
Value
A list mod containing:
- z
Integer vector of length NZ; final topic assignments (0-based).
- nd
D x K document-topic count matrix.
- nw
K x V topic-word count matrix.
- ndsum
Integer vector of length D; document token counts.
- nwsum
Integer vector of length K; topic token counts.
- phi
V x K topic-word posterior mean \(p(w \mid z=k)\) computed from
nw.- theta
D x K document-topic posterior mean \(p(z=k \mid d)\) computed from
nd.- loglik_trace
Vector of log-likelihoods.
- D
Number of documents.
- V
Vocabulary size.
- K
Number of topics.
- NZ
Number of non-zero (d, v, c) entries.