Identify and count spells (Distinctive events within each group) Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern) Data science time! April 2019 and salary with experience Should we burninate the [wrap] tag? The Ask Question Wizard is Live!R - list to data frameCount number of rows within each groupCounting unique / distinct values by group in a data frameR: find relative weight within each group and within the entire dataframeR: how to calculate summary for each group and all the data?count the number of distinct variables in a groupusing tidyverse; counting after and before change in value, within groups, generating new variables for each unique shiftDistinct in r within groups of datahow to get count and distinct count with group by in dataframe RNest a dataframe by group, but include extra rows within each groupChange value by group based in reference within group

Is it ethical to give a final exam after the professor has quit before teaching the remaining chapters of the course?

When were vectors invented?

How to call a function with default parameter through a pointer to function that is the return of another function?

Sci-Fi book where patients in a coma ward all live in a subconscious world linked together

Matrices and TikZ : arrows inside the matrix

What exactly is a "Meth" in Altered Carbon?

How much time will it take to get my passport back if I am applying for multiple Schengen visa countries?

Installing Debian packages from Stretch DVD 2 and 3 after installation using apt?

How does the particle を relate to the verb 行く in the structure「A を + B に行く」?

Do I really need recursive chmod to restrict access to a folder?

Is it fair for a professor to grade us on the possession of past papers?

Apollo command module space walk?

Deactivate Gutenberg tips forever - not Gutenberg

How do I keep my slimes from escaping their pens?

Using et al. for a last / senior author rather than for a first author

Denied boarding although I have proper visa and documentation. To whom should I make a complaint?

3 doors, three guards, one stone

Dating a Former Employee

How do pianists reach extremely loud dynamics?

Why is "Consequences inflicted." not a sentence?

How do I stop a creek from eroding my steep embankment?

Should I discuss the type of campaign with my players?

How come Sam didn't become Lord of Horn Hill?

The logistics of corpse disposal

Identify and count spells (Distinctive events within each group)

Announcing the arrival of Valued Associate #679: Cesar Manara

Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern)

Data science time! April 2019 and salary with experience

Should we burninate the [wrap] tag?

The Ask Question Wizard is Live!R - list to data frameCount number of rows within each groupCounting unique / distinct values by group in a data frameR: find relative weight within each group and within the entire dataframeR: how to calculate summary for each group and all the data?count the number of distinct variables in a groupusing tidyverse; counting after and before change in value, within groups, generating new variables for each unique shiftDistinct in r within groups of datahow to get count and distinct count with group by in dataframe RNest a dataframe by group, but include extra rows within each groupChange value by group based in reference within group

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;

I'm looking for an efficient way to identify spells/runs in a time series. In the image below, the first three columns is what I have, the fourth column, spell is what I'm trying to compute. I've tried using dplyr's lead and lag, but that gets too complicated. I've tried rle but got nowhere.

enter image description here

ReprEx

df <- structure(list(time = structure(c(1538876340, 1538876400, 
1538876460,1538876520, 1538876580, 1538876640, 1538876700, 1538876760, 1526824800, 
1526824860, 1526824920, 1526824980, 1526825040, 1526825100), class = c("POSIXct", 
"POSIXt"), tzone = "UTC"), group = c("A", "A", "A", "A", "A", "A", "A", "A", "B", 
"B", "B", "B", "B", "B"), is.5 = c(0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1)), 
class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -14L))

I prefer a tidyverse solution.

Assumptions

Data is sorted by group and then by time

There are no gaps in time within each group

Update

Thanks for the contributions. I've timed some of the proposed approaches on the full data (n=2,583,360)

the rle approach by @markus took 0.53 seconds

the cumsum approach by @M-M took 2.85 seconds

the function approach by @MrFlick took 0.66 seconds

the rle and dense_rank by @tmfmnk took 0.89

I ended up choosing (1) by @markus because it's fast and still somewhat intuitive (subjective). (2) by @M-M best satisfied my desire for a dplyr solution, though it is computationally inefficient.

edited Apr 3 at 3:07

asked Apr 1 at 20:44

Thomas Speidel

349216

5

For someone who is not familiar with how the spell is computed, can you share a formula or description?

– nsinghs
Apr 1 at 20:55

@nsinghs I think they mean "hospital spell"

– zx8754
Apr 1 at 21:29

Curious for the results if you timed my answer? You should also consider accepting the best answer.

– Hector Haffenden
Apr 2 at 21:28

add a comment |

enter image description here

ReprEx

df <- structure(list(time = structure(c(1538876340, 1538876400, 
1538876460,1538876520, 1538876580, 1538876640, 1538876700, 1538876760, 1526824800, 
1526824860, 1526824920, 1526824980, 1526825040, 1526825100), class = c("POSIXct", 
"POSIXt"), tzone = "UTC"), group = c("A", "A", "A", "A", "A", "A", "A", "A", "B", 
"B", "B", "B", "B", "B"), is.5 = c(0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1)), 
class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -14L))

I prefer a tidyverse solution.

Assumptions

Data is sorted by group and then by time

There are no gaps in time within each group

Update

Thanks for the contributions. I've timed some of the proposed approaches on the full data (n=2,583,360)

the rle approach by @markus took 0.53 seconds

the cumsum approach by @M-M took 2.85 seconds

the function approach by @MrFlick took 0.66 seconds

the rle and dense_rank by @tmfmnk took 0.89

I ended up choosing (1) by @markus because it's fast and still somewhat intuitive (subjective). (2) by @M-M best satisfied my desire for a dplyr solution, though it is computationally inefficient.

edited Apr 3 at 3:07

asked Apr 1 at 20:44

Thomas Speidel

349216

5

For someone who is not familiar with how the spell is computed, can you share a formula or description?

– nsinghs
Apr 1 at 20:55

@nsinghs I think they mean "hospital spell"

– zx8754
Apr 1 at 21:29

Curious for the results if you timed my answer? You should also consider accepting the best answer.

– Hector Haffenden
Apr 2 at 21:28

add a comment |

enter image description here

ReprEx

df <- structure(list(time = structure(c(1538876340, 1538876400, 
1538876460,1538876520, 1538876580, 1538876640, 1538876700, 1538876760, 1526824800, 
1526824860, 1526824920, 1526824980, 1526825040, 1526825100), class = c("POSIXct", 
"POSIXt"), tzone = "UTC"), group = c("A", "A", "A", "A", "A", "A", "A", "A", "B", 
"B", "B", "B", "B", "B"), is.5 = c(0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1)), 
class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -14L))

I prefer a tidyverse solution.

Assumptions

Data is sorted by group and then by time

There are no gaps in time within each group

Update

Thanks for the contributions. I've timed some of the proposed approaches on the full data (n=2,583,360)

the rle approach by @markus took 0.53 seconds

the cumsum approach by @M-M took 2.85 seconds

the function approach by @MrFlick took 0.66 seconds

the rle and dense_rank by @tmfmnk took 0.89

I ended up choosing (1) by @markus because it's fast and still somewhat intuitive (subjective). (2) by @M-M best satisfied my desire for a dplyr solution, though it is computationally inefficient.

edited Apr 3 at 3:07

asked Apr 1 at 20:44

Thomas Speidel

349216

enter image description here

ReprEx

df <- structure(list(time = structure(c(1538876340, 1538876400, 
1538876460,1538876520, 1538876580, 1538876640, 1538876700, 1538876760, 1526824800, 
1526824860, 1526824920, 1526824980, 1526825040, 1526825100), class = c("POSIXct", 
"POSIXt"), tzone = "UTC"), group = c("A", "A", "A", "A", "A", "A", "A", "A", "B", 
"B", "B", "B", "B", "B"), is.5 = c(0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1)), 
class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -14L))

I prefer a tidyverse solution.

Assumptions

Data is sorted by group and then by time

There are no gaps in time within each group

Update

Thanks for the contributions. I've timed some of the proposed approaches on the full data (n=2,583,360)

the rle approach by @markus took 0.53 seconds

the cumsum approach by @M-M took 2.85 seconds

the function approach by @MrFlick took 0.66 seconds

the rle and dense_rank by @tmfmnk took 0.89

I ended up choosing (1) by @markus because it's fast and still somewhat intuitive (subjective). (2) by @M-M best satisfied my desire for a dplyr solution, though it is computationally inefficient.

r dataframe dplyr time-series tidyverse

edited Apr 3 at 3:07

asked Apr 1 at 20:44

Thomas Speidel

349216

edited Apr 3 at 3:07

asked Apr 1 at 20:44

Thomas Speidel

349216

edited Apr 3 at 3:07

asked Apr 1 at 20:44

Thomas Speidel

349216

asked Apr 1 at 20:44

Thomas Speidel

349216

asked Apr 1 at 20:44

Thomas Speidel

349216

5

For someone who is not familiar with how the spell is computed, can you share a formula or description?

– nsinghs
Apr 1 at 20:55

@nsinghs I think they mean "hospital spell"

– zx8754
Apr 1 at 21:29

Curious for the results if you timed my answer? You should also consider accepting the best answer.

– Hector Haffenden
Apr 2 at 21:28

add a comment |

5

For someone who is not familiar with how the spell is computed, can you share a formula or description?

– nsinghs
Apr 1 at 20:55

@nsinghs I think they mean "hospital spell"

– zx8754
Apr 1 at 21:29

Curious for the results if you timed my answer? You should also consider accepting the best answer.

– Hector Haffenden
Apr 2 at 21:28

For someone who is not familiar with how the spell is computed, can you share a formula or description?

– nsinghs
Apr 1 at 20:55

@nsinghs I think they mean "hospital spell"

– zx8754
Apr 1 at 21:29

Curious for the results if you timed my answer? You should also consider accepting the best answer.

– Hector Haffenden
Apr 2 at 21:28

add a comment |

6 Answers
6

active

oldest

votes

One option using rle

library(dplyr)
df %>% 
 group_by(group) %>% 
 mutate(
 spell = 
 r <- rle(is.5)
 r$values <- cumsum(r$values) * r$values
 inverse.rle(r) 
 
 )
# A tibble: 14 x 4
# Groups: group [2]
# time group is.5 spell
# <dttm> <chr> <dbl> <dbl>
# 1 2018-10-07 01:39:00 A 0 0
# 2 2018-10-07 01:40:00 A 1 1
# 3 2018-10-07 01:41:00 A 1 1
# 4 2018-10-07 01:42:00 A 0 0
# 5 2018-10-07 01:43:00 A 1 2
# 6 2018-10-07 01:44:00 A 0 0
# 7 2018-10-07 01:45:00 A 0 0
# 8 2018-10-07 01:46:00 A 1 3
# 9 2018-05-20 14:00:00 B 0 0
#10 2018-05-20 14:01:00 B 0 0
#11 2018-05-20 14:02:00 B 1 1
#12 2018-05-20 14:03:00 B 1 1
#13 2018-05-20 14:04:00 B 0 0
#14 2018-05-20 14:05:00 B 1 2

You asked for a tidyverse solution but if speed is your concern, you might use data.table. The syntax is very similar

library(data.table)
setDT(df)[, spell := 
 r <- rle(is.5)
 r$values <- cumsum(r$values) * r$values
 inverse.rle(r) 
 , by = group][] # the [] at the end prints the data.table

explanation

When we call

r <- rle(df$is.5)

the result we get is

r
#Run Length Encoding
# lengths: int [1:10] 1 2 1 1 2 1 2 2 1 1
# values : num [1:10] 0 1 0 1 0 1 0 1 0 1

We need to replace values with the cumulative sum where values == 1 while values should remain zero otherwise.

We can achieve this when we multiple cumsum(r$values) with r$values; where the latter is a vector of 0s and 1s.

r$values <- cumsum(r$values) * r$values
r$values
# [1] 0 1 0 2 0 3 0 4 0 5

Finally we call inverse.rle to get back a vector of the same length as is.5.

inverse.rle(r)
# [1] 0 1 1 0 2 0 0 3 0 0 4 4 0 5

We do this for every group.

edited Apr 2 at 8:11

answered Apr 1 at 21:05

markus

15.7k11336

1

I understand why and how that works, but it'd be nice if you could draw your line of thoughts into the logic. Cheers.

– M-M
Apr 1 at 22:55

1

@M-M Added some explanation. Thanks for the comment.

– markus
Apr 1 at 23:09

add a comment |

Here's a helper function that can return what you are after

spell_index <- function(time, flag) 
 change <- time-lag(time)==1 & flag==1 & lag(flag)!=1
 cumsum(change) * (flag==1)+0

And you can use it with your data like

library(dplyr)
df %>% 
 group_by(group) %>% 
 mutate(
 spell = spell_index(time, is.5)
 )

Basically the helper functions uses lag() to look for changes. We use cumsum() to increment the number of changes. Then we multiply by a boolean value so zero-out the values you want to be zeroed out.

answered Apr 1 at 20:57

MrFlick

125k12142175

add a comment |

Here is one option with rleid from data.table. Convert the 'data.frame' to 'data.table' (setDT(df)), grouped by 'group', get the run-length-id (rleid) of 'is.5' and multiply with the values of 'is.5' so as to replace the ids corresponding to 0s in is.5 to 0, assign it to 'spell', then specify the i with a logical vector to select rows that have 'spell' values not zero, match those values of 'spell' with unique 'spell' and assign it to 'spell'

library(data.table)
setDT(df)[, spell := rleid(is.5) * as.integer(is.5), group
 ][!!spell, spell := match(spell, unique(spell))][]
# time group is.5 spell
# 1: 2018-10-07 01:39:00 A 0 0
# 2: 2018-10-07 01:40:00 A 1 1
# 3: 2018-10-07 01:41:00 A 1 1
# 4: 2018-10-07 01:42:00 A 0 0
# 5: 2018-10-07 01:43:00 A 1 2
# 6: 2018-10-07 01:44:00 A 0 0
# 7: 2018-10-07 01:45:00 A 0 0
# 8: 2018-10-07 01:46:00 A 1 3
# 9: 2018-05-20 14:00:00 B 0 0
#10: 2018-05-20 14:01:00 B 0 0
#11: 2018-05-20 14:02:00 B 1 1
#12: 2018-05-20 14:03:00 B 1 1
#13: 2018-05-20 14:04:00 B 0 0
#14: 2018-05-20 14:05:00 B 1 2

Or after the first step, use .GRP

df[!!spell, spell := .GRP, spell]

edited Apr 2 at 2:41

answered Apr 2 at 2:35

akrun

422k13209285

add a comment |

This works,

The data,

df <- structure(list(time = structure(c(1538876340, 1538876400, 1538876460,1538876520, 1538876580, 1538876640, 1538876700, 1538876760, 1526824800, 1526824860, 1526824920, 1526824980, 1526825040, 1526825100), class = c("POSIXct", "POSIXt"), tzone = "UTC"), group = c("A", "A", "A", "A", "A", "A", "A", "A", "B", "B", "B", "B", "B", "B"), is.5 = c(0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -14L))

We split our data by group,

df2 <- split(df, df$group)

Build a function we can apply to the list,

my_func <- function(dat)
 rst <- dat %>% 
 mutate(change = diff(c(0,is.5))) %>% 
 mutate(flag = change*abs(is.5)) %>% 
 mutate(spell = ifelse(is.5 == 0

Then apply it,

l <- lapply(df2, my_func)

We can now turn this list back into a data frame:

do.call(rbind.data.frame, l)

edited Apr 1 at 21:13

answered Apr 1 at 21:02

Hector Haffenden

632316

add a comment |

One options is using cumsum:

library(dplyr)
df %>% group_by(group) %>% arrange(group, time) %>% 
 mutate(spell = is.5 * cumsum( c(0,lag(is.5)[-1]) != is.5 & is.5!=0) )


# # A tibble: 14 x 4
# # Groups: group [2]
# time group is.5 spell
# <dttm> <chr> <dbl> <dbl>
# 1 2018-10-07 01:39:00 A 0 0
# 2 2018-10-07 01:40:00 A 1 1
# 3 2018-10-07 01:41:00 A 1 1
# 4 2018-10-07 01:42:00 A 0 0
# 5 2018-10-07 01:43:00 A 1 2
# 6 2018-10-07 01:44:00 A 0 0
# 7 2018-10-07 01:45:00 A 0 0
# 8 2018-10-07 01:46:00 A 1 3
# 9 2018-05-20 14:00:00 B 0 0
# 10 2018-05-20 14:01:00 B 0 0
# 11 2018-05-20 14:02:00 B 1 1
# 12 2018-05-20 14:03:00 B 1 1
# 13 2018-05-20 14:04:00 B 0 0
# 14 2018-05-20 14:05:00 B 1 2

c(0,lag(is.5)[-1]) != is.5 this takes care of assigning a new id (i.e. spell) whenever is.5 changes; but we want to avoid assigning new ones to those rows is.5 equal to 0 and that's why I have the second rule in cumsum function (i.e. (is.5!=0)).

However, that second rule only prevents assigning a new id (adding 1 to the previous id) but it won't set the id to 0. That's why I have multiplied the answer by is.5.

answered Apr 1 at 22:41

M-M

7,24262146

add a comment |

A somehow different possibility (not involving cumsum()) could be:

df %>%
 group_by(group) %>%
 mutate(spell = with(rle(is.5), rep(seq_along(lengths), lengths))) %>%
 group_by(group, is.5) %>%
 mutate(spell = dense_rank(spell)) %>%
 ungroup() %>%
 mutate(spell = ifelse(is.5 == 0, 0, spell))

 time group is.5 spell
 <dttm> <chr> <dbl> <dbl>
 1 2018-10-07 01:39:00 A 0 0
 2 2018-10-07 01:40:00 A 1 1
 3 2018-10-07 01:41:00 A 1 1
 4 2018-10-07 01:42:00 A 0 0
 5 2018-10-07 01:43:00 A 1 2
 6 2018-10-07 01:44:00 A 0 0
 7 2018-10-07 01:45:00 A 0 0
 8 2018-10-07 01:46:00 A 1 3
 9 2018-05-20 14:00:00 B 0 0
10 2018-05-20 14:01:00 B 0 0
11 2018-05-20 14:02:00 B 1 1
12 2018-05-20 14:03:00 B 1 1
13 2018-05-20 14:04:00 B 0 0
14 2018-05-20 14:05:00 B 1 2

Here it, first, groups by "group" and then gets the run-length-ID of "is.5". Second, it groups by "group" and "is.5" and ranks the values on the run-length-ID. Finally, it assigns 0 to rows where "is.5" == 0.

edited Apr 2 at 6:11

answered Apr 1 at 21:37

tmfmnk

4,0461516

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55463310%2fidentify-and-count-spells-distinctive-events-within-each-group%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

6 Answers
6

active

oldest

votes

6 Answers
6

active

oldest

votes

One option using rle

library(dplyr)
df %>% 
 group_by(group) %>% 
 mutate(
 spell = 
 r <- rle(is.5)
 r$values <- cumsum(r$values) * r$values
 inverse.rle(r) 
 
 )
# A tibble: 14 x 4
# Groups: group [2]
# time group is.5 spell
# <dttm> <chr> <dbl> <dbl>
# 1 2018-10-07 01:39:00 A 0 0
# 2 2018-10-07 01:40:00 A 1 1
# 3 2018-10-07 01:41:00 A 1 1
# 4 2018-10-07 01:42:00 A 0 0
# 5 2018-10-07 01:43:00 A 1 2
# 6 2018-10-07 01:44:00 A 0 0
# 7 2018-10-07 01:45:00 A 0 0
# 8 2018-10-07 01:46:00 A 1 3
# 9 2018-05-20 14:00:00 B 0 0
#10 2018-05-20 14:01:00 B 0 0
#11 2018-05-20 14:02:00 B 1 1
#12 2018-05-20 14:03:00 B 1 1
#13 2018-05-20 14:04:00 B 0 0
#14 2018-05-20 14:05:00 B 1 2

You asked for a tidyverse solution but if speed is your concern, you might use data.table. The syntax is very similar

library(data.table)
setDT(df)[, spell := 
 r <- rle(is.5)
 r$values <- cumsum(r$values) * r$values
 inverse.rle(r) 
 , by = group][] # the [] at the end prints the data.table

explanation

When we call

r <- rle(df$is.5)

the result we get is

r
#Run Length Encoding
# lengths: int [1:10] 1 2 1 1 2 1 2 2 1 1
# values : num [1:10] 0 1 0 1 0 1 0 1 0 1

We need to replace values with the cumulative sum where values == 1 while values should remain zero otherwise.

We can achieve this when we multiple cumsum(r$values) with r$values; where the latter is a vector of 0s and 1s.

r$values <- cumsum(r$values) * r$values
r$values
# [1] 0 1 0 2 0 3 0 4 0 5

Finally we call inverse.rle to get back a vector of the same length as is.5.

inverse.rle(r)
# [1] 0 1 1 0 2 0 0 3 0 0 4 4 0 5

We do this for every group.

edited Apr 2 at 8:11

answered Apr 1 at 21:05

markus

15.7k11336

1

I understand why and how that works, but it'd be nice if you could draw your line of thoughts into the logic. Cheers.

– M-M
Apr 1 at 22:55

1

@M-M Added some explanation. Thanks for the comment.

– markus
Apr 1 at 23:09

add a comment |

One option using rle

library(dplyr)
df %>% 
 group_by(group) %>% 
 mutate(
 spell = 
 r <- rle(is.5)
 r$values <- cumsum(r$values) * r$values
 inverse.rle(r) 
 
 )
# A tibble: 14 x 4
# Groups: group [2]
# time group is.5 spell
# <dttm> <chr> <dbl> <dbl>
# 1 2018-10-07 01:39:00 A 0 0
# 2 2018-10-07 01:40:00 A 1 1
# 3 2018-10-07 01:41:00 A 1 1
# 4 2018-10-07 01:42:00 A 0 0
# 5 2018-10-07 01:43:00 A 1 2
# 6 2018-10-07 01:44:00 A 0 0
# 7 2018-10-07 01:45:00 A 0 0
# 8 2018-10-07 01:46:00 A 1 3
# 9 2018-05-20 14:00:00 B 0 0
#10 2018-05-20 14:01:00 B 0 0
#11 2018-05-20 14:02:00 B 1 1
#12 2018-05-20 14:03:00 B 1 1
#13 2018-05-20 14:04:00 B 0 0
#14 2018-05-20 14:05:00 B 1 2

You asked for a tidyverse solution but if speed is your concern, you might use data.table. The syntax is very similar

library(data.table)
setDT(df)[, spell := 
 r <- rle(is.5)
 r$values <- cumsum(r$values) * r$values
 inverse.rle(r) 
 , by = group][] # the [] at the end prints the data.table

explanation

When we call

r <- rle(df$is.5)

the result we get is

r
#Run Length Encoding
# lengths: int [1:10] 1 2 1 1 2 1 2 2 1 1
# values : num [1:10] 0 1 0 1 0 1 0 1 0 1

We need to replace values with the cumulative sum where values == 1 while values should remain zero otherwise.

We can achieve this when we multiple cumsum(r$values) with r$values; where the latter is a vector of 0s and 1s.

r$values <- cumsum(r$values) * r$values
r$values
# [1] 0 1 0 2 0 3 0 4 0 5

Finally we call inverse.rle to get back a vector of the same length as is.5.

inverse.rle(r)
# [1] 0 1 1 0 2 0 0 3 0 0 4 4 0 5

We do this for every group.

edited Apr 2 at 8:11

answered Apr 1 at 21:05

markus

15.7k11336

1

I understand why and how that works, but it'd be nice if you could draw your line of thoughts into the logic. Cheers.

– M-M
Apr 1 at 22:55

1

@M-M Added some explanation. Thanks for the comment.

– markus
Apr 1 at 23:09

add a comment |

One option using rle

library(dplyr)
df %>% 
 group_by(group) %>% 
 mutate(
 spell = 
 r <- rle(is.5)
 r$values <- cumsum(r$values) * r$values
 inverse.rle(r) 
 
 )
# A tibble: 14 x 4
# Groups: group [2]
# time group is.5 spell
# <dttm> <chr> <dbl> <dbl>
# 1 2018-10-07 01:39:00 A 0 0
# 2 2018-10-07 01:40:00 A 1 1
# 3 2018-10-07 01:41:00 A 1 1
# 4 2018-10-07 01:42:00 A 0 0
# 5 2018-10-07 01:43:00 A 1 2
# 6 2018-10-07 01:44:00 A 0 0
# 7 2018-10-07 01:45:00 A 0 0
# 8 2018-10-07 01:46:00 A 1 3
# 9 2018-05-20 14:00:00 B 0 0
#10 2018-05-20 14:01:00 B 0 0
#11 2018-05-20 14:02:00 B 1 1
#12 2018-05-20 14:03:00 B 1 1
#13 2018-05-20 14:04:00 B 0 0
#14 2018-05-20 14:05:00 B 1 2

You asked for a tidyverse solution but if speed is your concern, you might use data.table. The syntax is very similar

library(data.table)
setDT(df)[, spell := 
 r <- rle(is.5)
 r$values <- cumsum(r$values) * r$values
 inverse.rle(r) 
 , by = group][] # the [] at the end prints the data.table

explanation

When we call

r <- rle(df$is.5)

the result we get is

r
#Run Length Encoding
# lengths: int [1:10] 1 2 1 1 2 1 2 2 1 1
# values : num [1:10] 0 1 0 1 0 1 0 1 0 1

We need to replace values with the cumulative sum where values == 1 while values should remain zero otherwise.

We can achieve this when we multiple cumsum(r$values) with r$values; where the latter is a vector of 0s and 1s.

r$values <- cumsum(r$values) * r$values
r$values
# [1] 0 1 0 2 0 3 0 4 0 5

Finally we call inverse.rle to get back a vector of the same length as is.5.

inverse.rle(r)
# [1] 0 1 1 0 2 0 0 3 0 0 4 4 0 5

We do this for every group.

edited Apr 2 at 8:11

answered Apr 1 at 21:05

markus

15.7k11336

One option using rle

library(dplyr)
df %>% 
 group_by(group) %>% 
 mutate(
 spell = 
 r <- rle(is.5)
 r$values <- cumsum(r$values) * r$values
 inverse.rle(r) 
 
 )
# A tibble: 14 x 4
# Groups: group [2]
# time group is.5 spell
# <dttm> <chr> <dbl> <dbl>
# 1 2018-10-07 01:39:00 A 0 0
# 2 2018-10-07 01:40:00 A 1 1
# 3 2018-10-07 01:41:00 A 1 1
# 4 2018-10-07 01:42:00 A 0 0
# 5 2018-10-07 01:43:00 A 1 2
# 6 2018-10-07 01:44:00 A 0 0
# 7 2018-10-07 01:45:00 A 0 0
# 8 2018-10-07 01:46:00 A 1 3
# 9 2018-05-20 14:00:00 B 0 0
#10 2018-05-20 14:01:00 B 0 0
#11 2018-05-20 14:02:00 B 1 1
#12 2018-05-20 14:03:00 B 1 1
#13 2018-05-20 14:04:00 B 0 0
#14 2018-05-20 14:05:00 B 1 2

You asked for a tidyverse solution but if speed is your concern, you might use data.table. The syntax is very similar

library(data.table)
setDT(df)[, spell := 
 r <- rle(is.5)
 r$values <- cumsum(r$values) * r$values
 inverse.rle(r) 
 , by = group][] # the [] at the end prints the data.table

explanation

When we call

r <- rle(df$is.5)

the result we get is

r
#Run Length Encoding
# lengths: int [1:10] 1 2 1 1 2 1 2 2 1 1
# values : num [1:10] 0 1 0 1 0 1 0 1 0 1

We need to replace values with the cumulative sum where values == 1 while values should remain zero otherwise.

We can achieve this when we multiple cumsum(r$values) with r$values; where the latter is a vector of 0s and 1s.

r$values <- cumsum(r$values) * r$values
r$values
# [1] 0 1 0 2 0 3 0 4 0 5

Finally we call inverse.rle to get back a vector of the same length as is.5.

inverse.rle(r)
# [1] 0 1 1 0 2 0 0 3 0 0 4 4 0 5

We do this for every group.

edited Apr 2 at 8:11

answered Apr 1 at 21:05

markus

15.7k11336

edited Apr 2 at 8:11

answered Apr 1 at 21:05

markus

15.7k11336

answered Apr 1 at 21:05

markus

15.7k11336

answered Apr 1 at 21:05

markus

15.7k11336

1

I understand why and how that works, but it'd be nice if you could draw your line of thoughts into the logic. Cheers.

– M-M
Apr 1 at 22:55

1

@M-M Added some explanation. Thanks for the comment.

– markus
Apr 1 at 23:09

add a comment |

1

I understand why and how that works, but it'd be nice if you could draw your line of thoughts into the logic. Cheers.

– M-M
Apr 1 at 22:55

1

@M-M Added some explanation. Thanks for the comment.

– markus
Apr 1 at 23:09

I understand why and how that works, but it'd be nice if you could draw your line of thoughts into the logic. Cheers.

– M-M
Apr 1 at 22:55

@M-M Added some explanation. Thanks for the comment.

– markus
Apr 1 at 23:09

add a comment |

Here's a helper function that can return what you are after

spell_index <- function(time, flag) 
 change <- time-lag(time)==1 & flag==1 & lag(flag)!=1
 cumsum(change) * (flag==1)+0

And you can use it with your data like

library(dplyr)
df %>% 
 group_by(group) %>% 
 mutate(
 spell = spell_index(time, is.5)
 )

answered Apr 1 at 20:57

MrFlick

125k12142175

add a comment |

Here's a helper function that can return what you are after

spell_index <- function(time, flag) 
 change <- time-lag(time)==1 & flag==1 & lag(flag)!=1
 cumsum(change) * (flag==1)+0

And you can use it with your data like

library(dplyr)
df %>% 
 group_by(group) %>% 
 mutate(
 spell = spell_index(time, is.5)
 )

answered Apr 1 at 20:57

MrFlick

125k12142175

add a comment |

Here's a helper function that can return what you are after

spell_index <- function(time, flag) 
 change <- time-lag(time)==1 & flag==1 & lag(flag)!=1
 cumsum(change) * (flag==1)+0

And you can use it with your data like

library(dplyr)
df %>% 
 group_by(group) %>% 
 mutate(
 spell = spell_index(time, is.5)
 )

answered Apr 1 at 20:57

MrFlick

125k12142175

Here's a helper function that can return what you are after

spell_index <- function(time, flag) 
 change <- time-lag(time)==1 & flag==1 & lag(flag)!=1
 cumsum(change) * (flag==1)+0

And you can use it with your data like

library(dplyr)
df %>% 
 group_by(group) %>% 
 mutate(
 spell = spell_index(time, is.5)
 )

answered Apr 1 at 20:57

MrFlick

125k12142175

answered Apr 1 at 20:57

MrFlick

125k12142175

answered Apr 1 at 20:57

MrFlick

125k12142175

answered Apr 1 at 20:57

MrFlick

125k12142175

add a comment |

library(data.table)
setDT(df)[, spell := rleid(is.5) * as.integer(is.5), group
 ][!!spell, spell := match(spell, unique(spell))][]
# time group is.5 spell
# 1: 2018-10-07 01:39:00 A 0 0
# 2: 2018-10-07 01:40:00 A 1 1
# 3: 2018-10-07 01:41:00 A 1 1
# 4: 2018-10-07 01:42:00 A 0 0
# 5: 2018-10-07 01:43:00 A 1 2
# 6: 2018-10-07 01:44:00 A 0 0
# 7: 2018-10-07 01:45:00 A 0 0
# 8: 2018-10-07 01:46:00 A 1 3
# 9: 2018-05-20 14:00:00 B 0 0
#10: 2018-05-20 14:01:00 B 0 0
#11: 2018-05-20 14:02:00 B 1 1
#12: 2018-05-20 14:03:00 B 1 1
#13: 2018-05-20 14:04:00 B 0 0
#14: 2018-05-20 14:05:00 B 1 2

Or after the first step, use .GRP

df[!!spell, spell := .GRP, spell]

edited Apr 2 at 2:41

answered Apr 2 at 2:35

akrun

422k13209285

add a comment |

library(data.table)
setDT(df)[, spell := rleid(is.5) * as.integer(is.5), group
 ][!!spell, spell := match(spell, unique(spell))][]
# time group is.5 spell
# 1: 2018-10-07 01:39:00 A 0 0
# 2: 2018-10-07 01:40:00 A 1 1
# 3: 2018-10-07 01:41:00 A 1 1
# 4: 2018-10-07 01:42:00 A 0 0
# 5: 2018-10-07 01:43:00 A 1 2
# 6: 2018-10-07 01:44:00 A 0 0
# 7: 2018-10-07 01:45:00 A 0 0
# 8: 2018-10-07 01:46:00 A 1 3
# 9: 2018-05-20 14:00:00 B 0 0
#10: 2018-05-20 14:01:00 B 0 0
#11: 2018-05-20 14:02:00 B 1 1
#12: 2018-05-20 14:03:00 B 1 1
#13: 2018-05-20 14:04:00 B 0 0
#14: 2018-05-20 14:05:00 B 1 2

Or after the first step, use .GRP

df[!!spell, spell := .GRP, spell]

edited Apr 2 at 2:41

answered Apr 2 at 2:35

akrun

422k13209285

add a comment |

library(data.table)
setDT(df)[, spell := rleid(is.5) * as.integer(is.5), group
 ][!!spell, spell := match(spell, unique(spell))][]
# time group is.5 spell
# 1: 2018-10-07 01:39:00 A 0 0
# 2: 2018-10-07 01:40:00 A 1 1
# 3: 2018-10-07 01:41:00 A 1 1
# 4: 2018-10-07 01:42:00 A 0 0
# 5: 2018-10-07 01:43:00 A 1 2
# 6: 2018-10-07 01:44:00 A 0 0
# 7: 2018-10-07 01:45:00 A 0 0
# 8: 2018-10-07 01:46:00 A 1 3
# 9: 2018-05-20 14:00:00 B 0 0
#10: 2018-05-20 14:01:00 B 0 0
#11: 2018-05-20 14:02:00 B 1 1
#12: 2018-05-20 14:03:00 B 1 1
#13: 2018-05-20 14:04:00 B 0 0
#14: 2018-05-20 14:05:00 B 1 2

Or after the first step, use .GRP

df[!!spell, spell := .GRP, spell]

edited Apr 2 at 2:41

answered Apr 2 at 2:35

akrun

422k13209285

library(data.table)
setDT(df)[, spell := rleid(is.5) * as.integer(is.5), group
 ][!!spell, spell := match(spell, unique(spell))][]
# time group is.5 spell
# 1: 2018-10-07 01:39:00 A 0 0
# 2: 2018-10-07 01:40:00 A 1 1
# 3: 2018-10-07 01:41:00 A 1 1
# 4: 2018-10-07 01:42:00 A 0 0
# 5: 2018-10-07 01:43:00 A 1 2
# 6: 2018-10-07 01:44:00 A 0 0
# 7: 2018-10-07 01:45:00 A 0 0
# 8: 2018-10-07 01:46:00 A 1 3
# 9: 2018-05-20 14:00:00 B 0 0
#10: 2018-05-20 14:01:00 B 0 0
#11: 2018-05-20 14:02:00 B 1 1
#12: 2018-05-20 14:03:00 B 1 1
#13: 2018-05-20 14:04:00 B 0 0
#14: 2018-05-20 14:05:00 B 1 2

Or after the first step, use .GRP

df[!!spell, spell := .GRP, spell]

edited Apr 2 at 2:41

answered Apr 2 at 2:35

akrun

422k13209285

edited Apr 2 at 2:41

answered Apr 2 at 2:35

akrun

422k13209285

answered Apr 2 at 2:35

akrun

422k13209285

answered Apr 2 at 2:35

akrun

422k13209285

add a comment |

This works,

The data,

df <- structure(list(time = structure(c(1538876340, 1538876400, 1538876460,1538876520, 1538876580, 1538876640, 1538876700, 1538876760, 1526824800, 1526824860, 1526824920, 1526824980, 1526825040, 1526825100), class = c("POSIXct", "POSIXt"), tzone = "UTC"), group = c("A", "A", "A", "A", "A", "A", "A", "A", "B", "B", "B", "B", "B", "B"), is.5 = c(0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -14L))

We split our data by group,

df2 <- split(df, df$group)

Build a function we can apply to the list,

my_func <- function(dat)
 rst <- dat %>% 
 mutate(change = diff(c(0,is.5))) %>% 
 mutate(flag = change*abs(is.5)) %>% 
 mutate(spell = ifelse(is.5 == 0

Then apply it,

l <- lapply(df2, my_func)

We can now turn this list back into a data frame:

do.call(rbind.data.frame, l)

edited Apr 1 at 21:13

answered Apr 1 at 21:02

Hector Haffenden

632316

add a comment |

This works,

The data,

df <- structure(list(time = structure(c(1538876340, 1538876400, 1538876460,1538876520, 1538876580, 1538876640, 1538876700, 1538876760, 1526824800, 1526824860, 1526824920, 1526824980, 1526825040, 1526825100), class = c("POSIXct", "POSIXt"), tzone = "UTC"), group = c("A", "A", "A", "A", "A", "A", "A", "A", "B", "B", "B", "B", "B", "B"), is.5 = c(0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -14L))

We split our data by group,

df2 <- split(df, df$group)

Build a function we can apply to the list,

my_func <- function(dat)
 rst <- dat %>% 
 mutate(change = diff(c(0,is.5))) %>% 
 mutate(flag = change*abs(is.5)) %>% 
 mutate(spell = ifelse(is.5 == 0

Then apply it,

l <- lapply(df2, my_func)

We can now turn this list back into a data frame:

do.call(rbind.data.frame, l)

edited Apr 1 at 21:13

answered Apr 1 at 21:02

Hector Haffenden

632316

add a comment |

This works,

The data,

df <- structure(list(time = structure(c(1538876340, 1538876400, 1538876460,1538876520, 1538876580, 1538876640, 1538876700, 1538876760, 1526824800, 1526824860, 1526824920, 1526824980, 1526825040, 1526825100), class = c("POSIXct", "POSIXt"), tzone = "UTC"), group = c("A", "A", "A", "A", "A", "A", "A", "A", "B", "B", "B", "B", "B", "B"), is.5 = c(0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -14L))

We split our data by group,

df2 <- split(df, df$group)

Build a function we can apply to the list,

my_func <- function(dat)
 rst <- dat %>% 
 mutate(change = diff(c(0,is.5))) %>% 
 mutate(flag = change*abs(is.5)) %>% 
 mutate(spell = ifelse(is.5 == 0

Then apply it,

l <- lapply(df2, my_func)

We can now turn this list back into a data frame:

do.call(rbind.data.frame, l)

edited Apr 1 at 21:13

answered Apr 1 at 21:02

Hector Haffenden

632316

This works,

The data,

df <- structure(list(time = structure(c(1538876340, 1538876400, 1538876460,1538876520, 1538876580, 1538876640, 1538876700, 1538876760, 1526824800, 1526824860, 1526824920, 1526824980, 1526825040, 1526825100), class = c("POSIXct", "POSIXt"), tzone = "UTC"), group = c("A", "A", "A", "A", "A", "A", "A", "A", "B", "B", "B", "B", "B", "B"), is.5 = c(0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -14L))

We split our data by group,

df2 <- split(df, df$group)

Build a function we can apply to the list,

my_func <- function(dat)
 rst <- dat %>% 
 mutate(change = diff(c(0,is.5))) %>% 
 mutate(flag = change*abs(is.5)) %>% 
 mutate(spell = ifelse(is.5 == 0

Then apply it,

l <- lapply(df2, my_func)

We can now turn this list back into a data frame:

do.call(rbind.data.frame, l)

edited Apr 1 at 21:13

answered Apr 1 at 21:02

Hector Haffenden

632316

edited Apr 1 at 21:13

answered Apr 1 at 21:02

Hector Haffenden

632316

answered Apr 1 at 21:02

Hector Haffenden

632316

answered Apr 1 at 21:02

Hector Haffenden

632316

add a comment |

One options is using cumsum:

library(dplyr)
df %>% group_by(group) %>% arrange(group, time) %>% 
 mutate(spell = is.5 * cumsum( c(0,lag(is.5)[-1]) != is.5 & is.5!=0) )


# # A tibble: 14 x 4
# # Groups: group [2]
# time group is.5 spell
# <dttm> <chr> <dbl> <dbl>
# 1 2018-10-07 01:39:00 A 0 0
# 2 2018-10-07 01:40:00 A 1 1
# 3 2018-10-07 01:41:00 A 1 1
# 4 2018-10-07 01:42:00 A 0 0
# 5 2018-10-07 01:43:00 A 1 2
# 6 2018-10-07 01:44:00 A 0 0
# 7 2018-10-07 01:45:00 A 0 0
# 8 2018-10-07 01:46:00 A 1 3
# 9 2018-05-20 14:00:00 B 0 0
# 10 2018-05-20 14:01:00 B 0 0
# 11 2018-05-20 14:02:00 B 1 1
# 12 2018-05-20 14:03:00 B 1 1
# 13 2018-05-20 14:04:00 B 0 0
# 14 2018-05-20 14:05:00 B 1 2

However, that second rule only prevents assigning a new id (adding 1 to the previous id) but it won't set the id to 0. That's why I have multiplied the answer by is.5.

answered Apr 1 at 22:41

M-M

7,24262146

add a comment |

One options is using cumsum:

library(dplyr)
df %>% group_by(group) %>% arrange(group, time) %>% 
 mutate(spell = is.5 * cumsum( c(0,lag(is.5)[-1]) != is.5 & is.5!=0) )


# # A tibble: 14 x 4
# # Groups: group [2]
# time group is.5 spell
# <dttm> <chr> <dbl> <dbl>
# 1 2018-10-07 01:39:00 A 0 0
# 2 2018-10-07 01:40:00 A 1 1
# 3 2018-10-07 01:41:00 A 1 1
# 4 2018-10-07 01:42:00 A 0 0
# 5 2018-10-07 01:43:00 A 1 2
# 6 2018-10-07 01:44:00 A 0 0
# 7 2018-10-07 01:45:00 A 0 0
# 8 2018-10-07 01:46:00 A 1 3
# 9 2018-05-20 14:00:00 B 0 0
# 10 2018-05-20 14:01:00 B 0 0
# 11 2018-05-20 14:02:00 B 1 1
# 12 2018-05-20 14:03:00 B 1 1
# 13 2018-05-20 14:04:00 B 0 0
# 14 2018-05-20 14:05:00 B 1 2

However, that second rule only prevents assigning a new id (adding 1 to the previous id) but it won't set the id to 0. That's why I have multiplied the answer by is.5.

answered Apr 1 at 22:41

M-M

7,24262146

add a comment |

One options is using cumsum:

library(dplyr)
df %>% group_by(group) %>% arrange(group, time) %>% 
 mutate(spell = is.5 * cumsum( c(0,lag(is.5)[-1]) != is.5 & is.5!=0) )


# # A tibble: 14 x 4
# # Groups: group [2]
# time group is.5 spell
# <dttm> <chr> <dbl> <dbl>
# 1 2018-10-07 01:39:00 A 0 0
# 2 2018-10-07 01:40:00 A 1 1
# 3 2018-10-07 01:41:00 A 1 1
# 4 2018-10-07 01:42:00 A 0 0
# 5 2018-10-07 01:43:00 A 1 2
# 6 2018-10-07 01:44:00 A 0 0
# 7 2018-10-07 01:45:00 A 0 0
# 8 2018-10-07 01:46:00 A 1 3
# 9 2018-05-20 14:00:00 B 0 0
# 10 2018-05-20 14:01:00 B 0 0
# 11 2018-05-20 14:02:00 B 1 1
# 12 2018-05-20 14:03:00 B 1 1
# 13 2018-05-20 14:04:00 B 0 0
# 14 2018-05-20 14:05:00 B 1 2

However, that second rule only prevents assigning a new id (adding 1 to the previous id) but it won't set the id to 0. That's why I have multiplied the answer by is.5.

answered Apr 1 at 22:41

M-M

7,24262146

One options is using cumsum:

library(dplyr)
df %>% group_by(group) %>% arrange(group, time) %>% 
 mutate(spell = is.5 * cumsum( c(0,lag(is.5)[-1]) != is.5 & is.5!=0) )


# # A tibble: 14 x 4
# # Groups: group [2]
# time group is.5 spell
# <dttm> <chr> <dbl> <dbl>
# 1 2018-10-07 01:39:00 A 0 0
# 2 2018-10-07 01:40:00 A 1 1
# 3 2018-10-07 01:41:00 A 1 1
# 4 2018-10-07 01:42:00 A 0 0
# 5 2018-10-07 01:43:00 A 1 2
# 6 2018-10-07 01:44:00 A 0 0
# 7 2018-10-07 01:45:00 A 0 0
# 8 2018-10-07 01:46:00 A 1 3
# 9 2018-05-20 14:00:00 B 0 0
# 10 2018-05-20 14:01:00 B 0 0
# 11 2018-05-20 14:02:00 B 1 1
# 12 2018-05-20 14:03:00 B 1 1
# 13 2018-05-20 14:04:00 B 0 0
# 14 2018-05-20 14:05:00 B 1 2

However, that second rule only prevents assigning a new id (adding 1 to the previous id) but it won't set the id to 0. That's why I have multiplied the answer by is.5.

answered Apr 1 at 22:41

M-M

7,24262146

answered Apr 1 at 22:41

M-M

7,24262146

answered Apr 1 at 22:41

M-M

7,24262146

answered Apr 1 at 22:41

M-M

7,24262146

add a comment |

A somehow different possibility (not involving cumsum()) could be:

df %>%
 group_by(group) %>%
 mutate(spell = with(rle(is.5), rep(seq_along(lengths), lengths))) %>%
 group_by(group, is.5) %>%
 mutate(spell = dense_rank(spell)) %>%
 ungroup() %>%
 mutate(spell = ifelse(is.5 == 0, 0, spell))

 time group is.5 spell
 <dttm> <chr> <dbl> <dbl>
 1 2018-10-07 01:39:00 A 0 0
 2 2018-10-07 01:40:00 A 1 1
 3 2018-10-07 01:41:00 A 1 1
 4 2018-10-07 01:42:00 A 0 0
 5 2018-10-07 01:43:00 A 1 2
 6 2018-10-07 01:44:00 A 0 0
 7 2018-10-07 01:45:00 A 0 0
 8 2018-10-07 01:46:00 A 1 3
 9 2018-05-20 14:00:00 B 0 0
10 2018-05-20 14:01:00 B 0 0
11 2018-05-20 14:02:00 B 1 1
12 2018-05-20 14:03:00 B 1 1
13 2018-05-20 14:04:00 B 0 0
14 2018-05-20 14:05:00 B 1 2

edited Apr 2 at 6:11

answered Apr 1 at 21:37

tmfmnk

4,0461516

add a comment |

A somehow different possibility (not involving cumsum()) could be:

df %>%
 group_by(group) %>%
 mutate(spell = with(rle(is.5), rep(seq_along(lengths), lengths))) %>%
 group_by(group, is.5) %>%
 mutate(spell = dense_rank(spell)) %>%
 ungroup() %>%
 mutate(spell = ifelse(is.5 == 0, 0, spell))

 time group is.5 spell
 <dttm> <chr> <dbl> <dbl>
 1 2018-10-07 01:39:00 A 0 0
 2 2018-10-07 01:40:00 A 1 1
 3 2018-10-07 01:41:00 A 1 1
 4 2018-10-07 01:42:00 A 0 0
 5 2018-10-07 01:43:00 A 1 2
 6 2018-10-07 01:44:00 A 0 0
 7 2018-10-07 01:45:00 A 0 0
 8 2018-10-07 01:46:00 A 1 3
 9 2018-05-20 14:00:00 B 0 0
10 2018-05-20 14:01:00 B 0 0
11 2018-05-20 14:02:00 B 1 1
12 2018-05-20 14:03:00 B 1 1
13 2018-05-20 14:04:00 B 0 0
14 2018-05-20 14:05:00 B 1 2

edited Apr 2 at 6:11

answered Apr 1 at 21:37

tmfmnk

4,0461516

add a comment |

A somehow different possibility (not involving cumsum()) could be:

df %>%
 group_by(group) %>%
 mutate(spell = with(rle(is.5), rep(seq_along(lengths), lengths))) %>%
 group_by(group, is.5) %>%
 mutate(spell = dense_rank(spell)) %>%
 ungroup() %>%
 mutate(spell = ifelse(is.5 == 0, 0, spell))

 time group is.5 spell
 <dttm> <chr> <dbl> <dbl>
 1 2018-10-07 01:39:00 A 0 0
 2 2018-10-07 01:40:00 A 1 1
 3 2018-10-07 01:41:00 A 1 1
 4 2018-10-07 01:42:00 A 0 0
 5 2018-10-07 01:43:00 A 1 2
 6 2018-10-07 01:44:00 A 0 0
 7 2018-10-07 01:45:00 A 0 0
 8 2018-10-07 01:46:00 A 1 3
 9 2018-05-20 14:00:00 B 0 0
10 2018-05-20 14:01:00 B 0 0
11 2018-05-20 14:02:00 B 1 1
12 2018-05-20 14:03:00 B 1 1
13 2018-05-20 14:04:00 B 0 0
14 2018-05-20 14:05:00 B 1 2

edited Apr 2 at 6:11

answered Apr 1 at 21:37

tmfmnk

4,0461516

A somehow different possibility (not involving cumsum()) could be:

df %>%
 group_by(group) %>%
 mutate(spell = with(rle(is.5), rep(seq_along(lengths), lengths))) %>%
 group_by(group, is.5) %>%
 mutate(spell = dense_rank(spell)) %>%
 ungroup() %>%
 mutate(spell = ifelse(is.5 == 0, 0, spell))

 time group is.5 spell
 <dttm> <chr> <dbl> <dbl>
 1 2018-10-07 01:39:00 A 0 0
 2 2018-10-07 01:40:00 A 1 1
 3 2018-10-07 01:41:00 A 1 1
 4 2018-10-07 01:42:00 A 0 0
 5 2018-10-07 01:43:00 A 1 2
 6 2018-10-07 01:44:00 A 0 0
 7 2018-10-07 01:45:00 A 0 0
 8 2018-10-07 01:46:00 A 1 3
 9 2018-05-20 14:00:00 B 0 0
10 2018-05-20 14:01:00 B 0 0
11 2018-05-20 14:02:00 B 1 1
12 2018-05-20 14:03:00 B 1 1
13 2018-05-20 14:04:00 B 0 0
14 2018-05-20 14:05:00 B 1 2

edited Apr 2 at 6:11

answered Apr 1 at 21:37

tmfmnk

4,0461516

edited Apr 2 at 6:11

answered Apr 1 at 21:37

tmfmnk

4,0461516

answered Apr 1 at 21:37

tmfmnk

4,0461516

answered Apr 1 at 21:37

tmfmnk

4,0461516

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Trjtdtk

Update

Update

Update

Update

6 Answers
6

Your Answer

Post as a guest

6 Answers
6

6 Answers
6

Post as a guest

Popular posts from this blog

Update

Update

Update

Update

6 Answers 6

Your Answer

Sign up or log in

Post as a guest

Post as a guest

6 Answers 6

6 Answers 6

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

6 Answers
6

6 Answers
6

6 Answers
6