| Title: | Monte Carlo Simulations of Time Changes in Sequences |
|---|---|
| Description: | Generates replicated sets of sequences with Monte Carlo simulated timing changes and computes various indicators for evaluating effects of timing uncertainty on sequence analysis results. See Ritschard, G. and Liao, T.F. (2026): "Assessing the Impact of Timing Errors in Sequence Analysis". International Journal of Social Research Methodology <doi:10.1080/13645579.2026.2666297>. |
| Authors: | Gilbert Ritschard [aut, cre, cph] (ORCID: <https://orcid.org/0000-0001-7776-0903>), Tim F. Liao [ctb] (ORCID: <https://orcid.org/0000-0002-1296-7660>) |
| Maintainer: | Gilbert Ritschard <[email protected]> |
| License: | GPL (>= 2) |
| Version: | 1.1.0 |
| Built: | 2026-07-02 20:32:55 UTC |
| Source: | https://github.com/cran/MCseqReplic |
Cluster comparison indices (CCI) between clusters based on observed data and each of the MC-replicated partitions.
MCclustcomp(clustlist, clust.o = NULL, weights = NULL, AMI = FALSE)MCclustcomp(clustlist, clust.o = NULL, weights = NULL, AMI = FALSE)
clustlist |
List of MC-replicated vectors of cluster memberships. |
clust.o |
Cluster memberships based on observed dissimilarities. |
weights |
vector of doubles. Case weights. If |
AMI |
logical. AMI is more costly! Should AMI also be computed. Deafult is FALSE. |
When diss.o=NULL, the last element of disslist is taken as diss.o and the other elements as sets of MC-replicated dissimilarities.
For a description of the CCIs, see (Sundqvist et al. 2022).
A table with in columns the list of comparison scores provided by aricode::compare_clustering for each replicated set, except Chi2, which is replaced by Cramer's V.
Chiquet J, Rigaill G, Sundqvist M, Dervieux V, Bersani F (2023).
“aricode: Efficient Computations of Standard Clustering Comparison Measures.”
Comprehensive R Archive Network, CRAN.
doi:10.32614/CRAN.package.aricode.
Sundqvist M, Chiquet J, Rigalli G (2022).
“Adjusting the adjusted Rand Index: A multinomial story.”
Computational Statistics, 38(1), 327-347.
doi:10.1007/s00180-022-01230-7.
## mini test data, 6 sequences of length 4, 4 unique sequences exdata <- read.table(text=" a a b b a a b b b b a a a c c b b b a c b b a c ") weights=rep(1, nrow(exdata)) s.exdata <- seqdef(exdata, weights = weights, id=paste("id",1:nrow(exdata), sep="")) ## 3 altered sequence datasets set.seed(25) altseq.list <- MCseqReplicate(s.exdata, J=1, R=3) ## list of dissimilarity matrices disslist <- MCdisslist(altseq.list, method="LCS") diss.o <- seqdist(s.exdata, method="LCS") ## cluster per MC-dissimilarity matrices library(WeightedCluster) clust.o <- wcKMedoids(diss.o, k=2, cluster.only=TRUE) clustlist <- lapply(disslist, wcKMedoids, k=2, cluster.only=TRUE) res <- MCclustcomp(clustlist, clust.o=clust.o) res## mini test data, 6 sequences of length 4, 4 unique sequences exdata <- read.table(text=" a a b b a a b b b b a a a c c b b b a c b b a c ") weights=rep(1, nrow(exdata)) s.exdata <- seqdef(exdata, weights = weights, id=paste("id",1:nrow(exdata), sep="")) ## 3 altered sequence datasets set.seed(25) altseq.list <- MCseqReplicate(s.exdata, J=1, R=3) ## list of dissimilarity matrices disslist <- MCdisslist(altseq.list, method="LCS") diss.o <- seqdist(s.exdata, method="LCS") ## cluster per MC-dissimilarity matrices library(WeightedCluster) clust.o <- wcKMedoids(diss.o, k=2, cluster.only=TRUE) clustlist <- lapply(disslist, wcKMedoids, k=2, cluster.only=TRUE) res <- MCclustcomp(clustlist, clust.o=clust.o) res
Cluster quality measures for a range of number of groups by MC-replicated set.
ggplotMCcqi makes a ggplot of the range of values of the selected CQI by MC-sets and for the observed sequences. When attr(data,"obs") is TRUE, the range of CQI values for the observed sequences is also plotted.
The print method only prints by default the qual.max and max.freq tables of MCclustQ objects.
MCclustqual( disslist, ncluster = 10, clustmeth = "PAM", weights = NULL, core = 1, snow = TRUE, verbose = !silent, silent = FALSE, ... ) ggplotMCcqi( data, cqi = "PBC", meancqi = TRUE, scalelwd = 1, linecolor = NULL, ... ) ## S3 method for class 'MCclustQ' print(x, all = FALSE, nMC = 5, ...)MCclustqual( disslist, ncluster = 10, clustmeth = "PAM", weights = NULL, core = 1, snow = TRUE, verbose = !silent, silent = FALSE, ... ) ggplotMCcqi( data, cqi = "PBC", meancqi = TRUE, scalelwd = 1, linecolor = NULL, ... ) ## S3 method for class 'MCclustQ' print(x, all = FALSE, nMC = 5, ...)
disslist |
List of MC-dissimilarity matrices (or |
ncluster |
integer vector. Maximum number of groups. Default is |
clustmeth |
character. Clustering method. Either |
weights |
vector of doubles. Case weights. If |
core |
Integer or |
snow |
Logical. If |
verbose |
Logical. Should waiting and timing messages be printed? |
silent |
Logical. Deprecated, use |
... |
further arguments passed to or from other methods. |
data |
an |
cqi |
string. The name of the selected CQI. |
meancqi |
logical. Should the range of mean values of the selected CQI be plotted? |
scalelwd |
double. Line width scale value. |
linecolor |
vector of three line colors in the order Mean, Obs, MCset. If |
x |
|
all |
logical. Should tables by MC-sets also be printed? Default is |
nMC |
numeric. Maximal number of MC-sets for which optimal size by CQIs are printed. Default is 5. |
When attr(MCdisslist,"obs") is TRUE, the last element of disslist is treated as the dissimilarity matrix of the observed sequences.
MCclustqual computes the range of CQI values for all the CQIs included in the stats element returned by WeightedCluster::wcClusterQuality.
List of length 3:
- qual.tab: list of tables of cluster quality statistics per MC-dissimilarity matrix,
- qual.max: table of cluster number $k$ for which the statistics reach their maximum (minimum for HC) by MC-sets and observed sequence set (rows),
- max.freq: the frequency table of optimal $k$ over the MC-replicated sets, and
ggplotMCcqi returns the ggplot object.
the print method returns the last printed tables.
Gilbert Ritschard
Studer M (2013). “WeightedCluster Library Manual: A practical guide to creating typologies of trajectories in the social sciences with R.” LIVES Working Papers 24, NCCR LIVES, Switzerland. doi:10.12682/lives.2296-1658.2013.24.
## mini test data, 6 sequences of length 4, 4 unique sequences exdata <- read.table(text=" a a b b a a b b b b a a a c c b b b a c b b a c ") weights=rep(1, nrow(exdata)) s.exdata <- seqdef(exdata, weights = weights, id=paste("id",1:nrow(exdata), sep="")) ## 3 altered sequence datasets set.seed(25) altseq.list <- MCseqReplicate(s.exdata, J=1, R=3) ## list of dissimilarity matrices disslist <- MCdisslist(altseq.list, method="LCS") diss.o <- seqdist(s.exdata, method="LCS") ## cluster per MC-dissimilarity matrices res <- MCclustqual(disslist,ncluster=3, verbose=FALSE) res ggplotMCcqi(res,"PBC")## mini test data, 6 sequences of length 4, 4 unique sequences exdata <- read.table(text=" a a b b a a b b b b a a a c c b b b a c b b a c ") weights=rep(1, nrow(exdata)) s.exdata <- seqdef(exdata, weights = weights, id=paste("id",1:nrow(exdata), sep="")) ## 3 altered sequence datasets set.seed(25) altseq.list <- MCseqReplicate(s.exdata, J=1, R=3) ## list of dissimilarity matrices disslist <- MCdisslist(altseq.list, method="LCS") diss.o <- seqdist(s.exdata, method="LCS") ## cluster per MC-dissimilarity matrices res <- MCclustqual(disslist,ncluster=3, verbose=FALSE) res ggplotMCcqi(res,"PBC")
Collects statistics and p-values for the comparison of groups of sequences.
MCcompgrp( disslist, group, weights = NULL, dissassoc.args = list(), dissCompare.args = list(), verbose = TRUE )MCcompgrp( disslist, group, weights = NULL, dissassoc.args = list(), dissCompare.args = list(), verbose = TRUE )
disslist |
list of dissimilarity matrices or |
group |
vector of group memberships of length equal to number of rows of the dissimilarity matrices. |
weights |
vector of case weights |
dissassoc.args |
list of additional arguments passed to |
dissCompare.args |
list of additional arguments passed to |
verbose |
logical. Should messages be printed? |
The function collects the values of R2 and its p-value (Studer et al. 2011) returned by TraMineR::dissassoc and the values of LRT, its p-value, and delta BIC (Liao and Fasang 2020) returned by TraMineRextras::dissCompare. Since dissCompare works only with two groups, only R2 and its p-value are returned when there are more than two groups.
Except for group and weights, dissassoc and dissCompare are called by default with the default values of their arguments. This can be changed by passing the wanted arguments as a list to dissassoc.args and dissCompare.args.
The R2 and its p-value are computed by dissassoc, which computes th p-value using permutation tests. The default number of permutation is R=1000 but this can be changed by means of the dissassoc.args argument, for example, by passing dissassoc.args = list(R=500).
The LRT and delta BIC are computed by dissCompare, which computes the LRT for samples of s data, with s possibly greater than the number of observed data. When s=0 (default in MCcompgrp), no sampling is applied. dissCompare computes the p-value of LRT using the appropriate Chi-square distribution. In case of multiple samples, i.e. when s is smaller than the greatest group size, BFopt=1 is used by default. BFopt=NULL could generate unpredictable results in that case.
Liao TF, Fasang AE (2020).
“Comparing Groups of Life-Course Sequences Using the Bayesian Information Criterion and the Likelihood-Ratio Test.”
Sociological Methodology, 51(1), 44-85.
doi:10.1177/0081175020959401.
Ritschard G, Liao TF (2026).
“Assessing the Impact of Timing Errors in Sequence Analysis.”
International Journal of Social Research Methodology.
doi:10.1080/13645579.2026.2666297.
Studer M, Ritschard G, Gabadinho A, Müller NS (2011).
“Discrepancy Analysis of State Sequences.”
Sociological Methods and Research, 40(3), 471-510.
doi:10.1177/0049124111415372.
## mini test data, 6 sequences of length 4, 4 unique sequences exdata <- read.table(text="t1 t2 t3 t4 sex a a b b f a a b b f b b a a f a c c b m b b a c m b b a c m ", header=TRUE) weights=rep(1, nrow(exdata)) s.exdata <- seqdef(exdata[,1:4], weights = weights, id=paste("id",1:nrow(exdata), sep="")) ## 3 altered sequence datasets set.seed(25) altseq.list <- MCseqReplicate(s.exdata, J=1, R=3) ## list of dissimilarity matrices disslist <- MCdisslist(altseq.list, method="LCS") ## Group comparison per MC-dissimilarity matrices res <- MCcompgrp(disslist,group=exdata$sex) res## mini test data, 6 sequences of length 4, 4 unique sequences exdata <- read.table(text="t1 t2 t3 t4 sex a a b b f a a b b f b b a a f a c c b m b b a c m b b a c m ", header=TRUE) weights=rep(1, nrow(exdata)) s.exdata <- seqdef(exdata[,1:4], weights = weights, id=paste("id",1:nrow(exdata), sep="")) ## 3 altered sequence datasets set.seed(25) altseq.list <- MCseqReplicate(s.exdata, J=1, R=3) ## list of dissimilarity matrices disslist <- MCdisslist(altseq.list, method="LCS") ## Group comparison per MC-dissimilarity matrices res <- MCcompgrp(disslist,group=exdata$sex) res
Correlation between observed and MC-simulated distances
MCdisscorr(disslist, diss.o = NULL, method = "Spearman", weights = NULL)MCdisscorr(disslist, diss.o = NULL, method = "Spearman", weights = NULL)
disslist |
List of matrices or dist objects: the MC-replicated dissimilarities |
diss.o |
Matrix or dist object: Observed dissimilarities |
method |
String. One of |
weights |
vector of doubles. Case weights. If |
When diss.o=NULL, the last element of disslist is taken as diss.o and the other elements as sets of MC-replicated dissimilarities.
vector of correlation between observed and MC-dissimilarities.
## mini test data, 6 sequences of length 4, 4 unique sequences exdata <- read.table(text=" a a b b a a b b b b a a a c c b b b a c b b a c ") weights=rep(1, nrow(exdata)) s.exdata <- seqdef(exdata, weights = weights, id=paste("id",1:nrow(exdata), sep="")) ## 3 altered sequence datasets set.seed(25) altseq.list <- MCseqReplicate(s.exdata, J=1, R=3, include.obs=TRUE) ## list of dissimilarity matrices disslist <- MCdisslist(altseq.list) MCdisscorr(disslist)## mini test data, 6 sequences of length 4, 4 unique sequences exdata <- read.table(text=" a a b b a a b b b b a a a c c b b b a c b b a c ") weights=rep(1, nrow(exdata)) s.exdata <- seqdef(exdata, weights = weights, id=paste("id",1:nrow(exdata), sep="")) ## 3 altered sequence datasets set.seed(25) altseq.list <- MCseqReplicate(s.exdata, J=1, R=3, include.obs=TRUE) ## list of dissimilarity matrices disslist <- MCdisslist(altseq.list) MCdisscorr(disslist)
Compute the dissimilarity matrix for each of the provided sets of sequences.
MCdisslist( MCrseqdata, method = "LCS", seqref = NULL, full.matrix = FALSE, use.udiss = FALSE, ... )MCdisslist( MCrseqdata, method = "LCS", seqref = NULL, full.matrix = FALSE, use.udiss = FALSE, ... )
MCrseqdata |
List of state sequence objects of class |
method |
string. Name of a distance method (see |
seqref |
state sequence object of class |
full.matrix |
logical. Should pairwise distances be returned in matrix form? If |
use.udiss |
logical. Should computation be based on unique sequences? |
... |
further arguments passed to |
When use.udiss=TRUE, the function first computes dissimilarities between unique merged replicated sequences through a single call to seqdist() and the set of dissimilarity matrices are then extracted from the resulting distance matrix. This is generally faster when the number of unique merged replicated sequences is less than sqrt(number of replicated datasets) * (sample size), which can be checked with MCnunique.
list of dissimilarity matrices or dist objects with logical attribute "obs", which is TRUE when the list includes the dissimilarities between observed sequences as last element.
MCseqReplicate, MCudist and examples in their help pages.
Extract k-th dissimilarity matrix from u.diss
MCExtractDist(u.diss, k, full.matrix = FALSE)MCExtractDist(u.diss, k, full.matrix = FALSE)
u.diss |
|
k |
integer. Subset index number for which the dissimilarity matrix must be extracted |
full.matrix |
logical. If |
a dissimilarity matrix or distance object.
Correlation between 1st MDS factor of observed and MC-simulated distances
MCmdscorr( disslist, diss.o = NULL, method = "Spearman", weights = NULL, what = "corr", core = 1, snow = TRUE, verbose = !silent, silent = FALSE )MCmdscorr( disslist, diss.o = NULL, method = "Spearman", weights = NULL, what = "corr", core = 1, snow = TRUE, verbose = !silent, silent = FALSE )
disslist |
List of matrices or dist objects: the MC-replicated dissimilarities |
diss.o |
Matrix or dist object: Observed dissimilarities |
method |
String. One of |
weights |
vector of doubles. Case weights. If |
what |
String. One of |
core |
Integer or |
snow |
Logical. If |
verbose |
Logical. Should waiting and timing messages be printed? |
silent |
Logical. Deprecated, use |
When diss.o=NULL, the last element of disslist is taken as diss.o and the other elements as sets of MC-replicated dissimilarities.
when what="corr", vector of correlation between mds of dissimilarities in MC-replicated sets, when what="mds", of first mds scores, and when what="both", list with corr as first element and mdslist, the list of mds scores as second element.
## mini test data, 6 sequences of length 4, 4 unique sequences exdata <- read.table(text=" a a b b a a b b b b a a a c c b b b a c b b a c ") weights=rep(1, nrow(exdata)) s.exdata <- seqdef(exdata, weights = weights, id=paste("id",1:nrow(exdata), sep="")) ## 3 altered sequence datasets set.seed(25) altseq.list <- MCseqReplicate(s.exdata, J=1, R=3) ## list of dissimilarity matrices disslist <- MCdisslist(altseq.list) MCmdscorr(disslist)## mini test data, 6 sequences of length 4, 4 unique sequences exdata <- read.table(text=" a a b b a a b b b b a a a c c b b b a c b b a c ") weights=rep(1, nrow(exdata)) s.exdata <- seqdef(exdata, weights = weights, id=paste("id",1:nrow(exdata), sep="")) ## 3 altered sequence datasets set.seed(25) altseq.list <- MCseqReplicate(s.exdata, J=1, R=3) ## list of dissimilarity matrices disslist <- MCdisslist(altseq.list) MCmdscorr(disslist)
Number of unique replicated sequences
MCnunique(MCrseqdata, check = FALSE)MCnunique(MCrseqdata, check = FALSE)
MCrseqdata |
list of replicated |
check |
logical. When |
nu number of unique replicated sequences and, when check=TRUE, u.ok the check result.
Generates a distribution of timing errors that complies with the provided expected size of non-zero timing errors and the expected probability of no error.
MCpj(Emean, pzero = NULL, maxterr = 10, pinterv = 0.99, fill.short.side = TRUE)MCpj(Emean, pzero = NULL, maxterr = 10, pinterv = 0.99, fill.short.side = TRUE)
Emean |
scalar or vector of size two. Expected size of non-zero timing errors. If a vector, the first value is used for negative errors and the second value for positive errors. If a scalar, the value is used for both negative and positive errors. Values must be strictly greater than 1. |
pzero |
number in range [0,1]. Probability of no-error. If |
maxterr |
integer. Maximal error size to consider. Default is 10. |
pinterv |
control value used for solving numerically an implicit function. Default is .99 and should be increased in case the zero of the implicit function cannot be found because of ending values of same sign. |
fill.short.side |
logical. Should the shortest side be filled with zeros to equal length of the other side. Default is |
Currently MCseqReplicate expects a vector Pj with same number of backward and forward error values. To comply with this, the shorter side of Pj is by default filled with zeros.
The vector of probabilities Pj with the computed lambda values as attribute.
# expected timing error of 1.2 on each side MCpj(Emean=1.2, pzero=.4) # expected backward timing error higher than for forward errors MCpj(Emean=c(3.5,1.2), pzero=.4)# expected timing error of 1.2 on each side MCpj(Emean=1.2, pzero=.4) # expected backward timing error higher than for forward errors MCpj(Emean=c(3.5,1.2), pzero=.4)
Ratios of the observed distances to their MC standard errors and of the mean MC-simulated distances to the standard error of the mean.
MCratios(object, diss.o = NULL)MCratios(object, diss.o = NULL)
object |
Object of class |
diss.o |
Matrix or |
The standard error of the mean simulated distances is mean.se = MC.se/sqrt(R) (or mean.se = MC.sd/R when object is obtained with seqdistMCSE::seqdistMCSE, because there are R*R simulated distances in that case). The ratios computed are diss.z = diss.o/MC.se, where diss.o is the distance between observed sequences, and MC.mean.z = MC.mean/mean.se with MC.mean the mean of the MC-simulated distances.
When diss.o=NULL, the diss.o element of object is used when it exists.
This function is handy to get afterwards ratios for outcome of MCseqdistSE obtained with ratios=FALSE.
diss.z, MC.mean.z, and mean.se (the three as dist objects).
Gilbert Ritschard
print.MCratios and MCseqdistSE
Computes the mean and standard deviation of each element of the pairwise distance matrix across sets of MC-replicated sequences.
MCseqdistSE( dissrepl = "LCS", MCrseqdata = NULL, udiss = FALSE, full.matrix = FALSE, ... )MCseqdistSE( dissrepl = "LCS", MCrseqdata = NULL, udiss = FALSE, full.matrix = FALSE, ... )
dissrepl |
list, string, or object of class |
MCrseqdata |
list of MC-replicated sequence datasets of class |
udiss |
logical. When |
full.matrix |
logical. Should dissimilarities be organized in matrix form? Default is |
... |
further arguments passed to |
Providing u.diss distances or computing distances with MCudist may be faster and can save space when the number of unique replicated sequences is smaller than the sample size times the squared root of R, which can be checked with MCnunique. When the number of unique replicated sequences largely exceeds the threshold, it is more efficient to compute distance matrices separately for each updated set of sequences with MCdisslist or by setting udiss=FALSE.
Five objects:MCmean Mean of distance objects over replicated sets of sequences.MCsd Standard deviation of distances over replicated sets of sequences.
In addition, when the observed distances are provided as last element of the dissrepl list:MCbias Difference between observed distance and MCmeanMCse Standard error of individual distances.MCmse Mean square error of individual distances.
The five objects are of class dist when attr(MCrseqdata,"toref")==FALSE and matrices otherwise.
MCseqReplicate, MCdisslist, MCudist, print.distMC, summary.distMC
# example code exdata <- read.table(text=" a a b b a a b b b b a a a c c b b b a c b b a c ") weights=rep(1, nrow(exdata)) s.exdata <- seqdef(exdata, weights = weights, id=paste("id",1:nrow(exdata), sep="")) ## 3 MC-replicated sequence datasets altseq.list <- MCseqReplicate(s.exdata, J=1, R=3, include.obs=TRUE) ## list of dissimilarity matrices disslist <- MCdisslist(altseq.list, method="HAM") MCdselist <- MCseqdistSE(disslist) print(MCdselist) MCratioslist <- MCratios(MCdselist) print(MCratioslist)# example code exdata <- read.table(text=" a a b b a a b b b b a a a c c b b b a c b b a c ") weights=rep(1, nrow(exdata)) s.exdata <- seqdef(exdata, weights = weights, id=paste("id",1:nrow(exdata), sep="")) ## 3 MC-replicated sequence datasets altseq.list <- MCseqReplicate(s.exdata, J=1, R=3, include.obs=TRUE) ## list of dissimilarity matrices disslist <- MCdisslist(altseq.list, method="HAM") MCdselist <- MCseqdistSE(disslist) print(MCdselist) MCratioslist <- MCratios(MCdselist) print(MCratioslist)
R stslist state sequence objects are generated by applying the chosen timing error model to the provided state sequence object.
MCseqReplicate( seqdata, J = 1, R = 20, silent = FALSE, unique = FALSE, model = "keep.dss", jfixed = FALSE, kchanges = NULL, include.obs = FALSE )MCseqReplicate( seqdata, J = 1, R = 20, silent = FALSE, unique = FALSE, model = "keep.dss", jfixed = FALSE, kchanges = NULL, include.obs = FALSE )
seqdata |
A state sequence |
J |
Integer or vector of positive numbers. If an integer, maximal timing error (number of unit periods around first state of new spell. Default is |
R |
Integer. Number of random replicated sequence data. Default is |
silent |
Logical. Should waiting and timing messages be hidden? |
unique |
Logical. Should only unique sequences be replicated? Default is |
model |
String. Time alteration model. One of |
jfixed |
Logical. Should same error j be applied to all transitions in a sequence? Default is |
kchanges |
Integer, string, or |
include.obs |
logical. Should the observed sequence data be added as last element. |
This function is handy for testing how outcome of a sequence analysis may vary with timing errors in the reported sequences.
Use the vector form of J to specify the probability distribution of the timing error. See function MCpj to generate a probability vector that complies with expected mean timing errors.
List of R altered stslist objects plus observed sequence object as last element when include.obs=TRUE.
Gilbert Ritschard
Ritschard G, Liao TF (2026). “Assessing the Impact of Timing Errors in Sequence Analysis.” International Journal of Social Research Methodology. doi:10.1080/13645579.2026.2666297.
## mini test data, 6 sequences of length 4, 4 unique sequences exdata <- read.table(text=" a a b b a a b b b b a a a c c b b b a c b b a c ") weights=rep(1, nrow(exdata)) s.exdata <- seqdef(exdata, weights = weights, id=paste("id",1:nrow(exdata), sep="")) ## 3 altered sequence datasets (altseq.list <- MCseqReplicate(s.exdata, J=1, R=3)) ## list of dissimilarity matrices suppressMessages(dist.list <- lapply(altseq.list, seqdist, method="LCS", full.matrix=FALSE)) dist.list ## Can also be obtained with MCdisslist, which offers option use.udiss; ## use.udiss=TRUE is faster when number of unique merged replicated ## sequences is less than n*sqrt(R). suppressMessages(dist.list <- MCdisslist(altseq.list, method="LCS", use.udiss=TRUE)) ## Replication based on expected left and right non-zero errors of 1.1 ## and assuming a 0.5 probability of no error Pj <- MCpj(Emean=1.1, pzero=.5) (altseq2.list <- MCseqReplicate(s.exdata, J=Pj, R=3))## mini test data, 6 sequences of length 4, 4 unique sequences exdata <- read.table(text=" a a b b a a b b b b a a a c c b b b a c b b a c ") weights=rep(1, nrow(exdata)) s.exdata <- seqdef(exdata, weights = weights, id=paste("id",1:nrow(exdata), sep="")) ## 3 altered sequence datasets (altseq.list <- MCseqReplicate(s.exdata, J=1, R=3)) ## list of dissimilarity matrices suppressMessages(dist.list <- lapply(altseq.list, seqdist, method="LCS", full.matrix=FALSE)) dist.list ## Can also be obtained with MCdisslist, which offers option use.udiss; ## use.udiss=TRUE is faster when number of unique merged replicated ## sequences is less than n*sqrt(R). suppressMessages(dist.list <- MCdisslist(altseq.list, method="LCS", use.udiss=TRUE)) ## Replication based on expected left and right non-zero errors of 1.1 ## and assuming a 0.5 probability of no error Pj <- MCpj(Emean=1.1, pzero=.5) (altseq2.list <- MCseqReplicate(s.exdata, J=Pj, R=3))
Returns the dissimilarity matrix (or dist object) between merged replicated sequences with the disaggregation indexes as attribute.
MCudist(MCrseqdata, method = "LCS", seqref = NULL, ...)MCudist(MCrseqdata, method = "LCS", seqref = NULL, ...)
MCrseqdata |
list of replicated |
method |
string. Name of distance method (see |
seqref |
state sequence object of class |
... |
Further arguments passed to |
object of class u.diss (pairwise dissimilarities between unique sequences) with two attributes: sdx, inverted aggregation indexes, N, number of datasets, and obs, logical indicating whether k=N corresponds to observed sequences.
## mini test data, 6 sequences of length 4, 4 unique sequences exdata <- read.table(text=" a a b b a a b b b b a a a c c b b b a c b b a c ") weights=rep(1, nrow(exdata)) s.exdata <- seqdef(exdata, weights = weights, id=paste("id",1:nrow(exdata), sep="")) ## 3 altered sequence datasets (altseq.list <- MCseqReplicate(s.exdata, J=1, R=3)) MCnunique(altseq.list, check=TRUE) u.diss <- MCudist(altseq.list, method="LCS", full.matrix=FALSE) ## Dissimilarities within first MC-set MCExtractDist(u.diss, 1) ## list of dissimilarity matrices disslist <- MCdisslist(altseq.list, use.udiss=TRUE)## mini test data, 6 sequences of length 4, 4 unique sequences exdata <- read.table(text=" a a b b a a b b b b a a a c c b b b a c b b a c ") weights=rep(1, nrow(exdata)) s.exdata <- seqdef(exdata, weights = weights, id=paste("id",1:nrow(exdata), sep="")) ## 3 altered sequence datasets (altseq.list <- MCseqReplicate(s.exdata, J=1, R=3)) MCnunique(altseq.list, check=TRUE) u.diss <- MCudist(altseq.list, method="LCS", full.matrix=FALSE) ## Dissimilarities within first MC-set MCExtractDist(u.diss, 1) ## list of dissimilarity matrices disslist <- MCdisslist(altseq.list, use.udiss=TRUE)
Prints, for each pair of the first n sequences, the mean and/or the standard deviation of the MC-replicated distances between sequences. When available, ratios are also printed by default.
## S3 method for class 'distMC' print(x, n = 6, what = "all", ...)## S3 method for class 'distMC' print(x, n = 6, what = "all", ...)
x |
|
n |
Integer. Number of first sequences. Default is 6. If |
what |
character string. One of |
... |
further arguments passed to or from other methods. |
Last printed table, a matrix when toref attribute is TRUE and a dist object otherwise.
Gilbert Ritschard
Prints ratios for each pair of the first n sequences.
## S3 method for class 'MCratios' print(x, n = 6, what = "all", ...)## S3 method for class 'MCratios' print(x, n = 6, what = "all", ...)
x |
|
n |
Integer. Number of first sequences. Default is 6. If |
what |
character string. One of |
... |
further arguments passed to or from other methods. |
Last printed table, a matrix when toref attribute is TRUE and a dist object otherwise.
Gilbert Ritschard
Prints summary statistics of the observed dissimilarity diss, the mean MC.mean, standard deviation MC.sd, and standard error of dissimilarities between MC-replicated sequences, and the ratios diss/MC.se and MC.mean/MC.se. Reported statistics concern all distances between original sequences.
## S3 method for class 'distMC' summary(object, ..., silent = FALSE)## S3 method for class 'distMC' summary(object, ..., silent = FALSE)
object |
|
... |
further arguments passed to or from other methods. |
silent |
logical: Should additional info be displayed? |
fivenumb table with the statistics (min, Q1, med, Q3, max) of the observed dissimilarities, the mean, standard deviation, and standard error of the MC-simulated dissimilarities, standardized ratios, MC-bias and mean squared errors when available.
Gilbert Ritschard
Prints summary statistics of the ratios diss/MC.se and MC.mean/MC.se. Reported statistics concern all distances between original sequences.
## S3 method for class 'MCratios' summary(object, ..., weights = NULL, silent = FALSE, thresh = 2)## S3 method for class 'MCratios' summary(object, ..., weights = NULL, silent = FALSE, thresh = 2)
object |
|
... |
further arguments passed to or from other methods. |
weights |
vector of doubles. Case weights. |
silent |
logical: Should additional info be displayed? |
thresh |
real: threshold for counting ratios less than |
fivenumb table with the statistics (min, Q1, med, Q3, max) of mean.se and the standardized ratios diss.z and MC.mean.z.
Gilbert Ritschard