Rank groups within a grouped sequence of TRUE/FALSE and NAGrouping functions (tapply, by, aggregate) and the *apply familyCharacters counting and subletting specific patternsWhat is the purpose of setting a key in data.table?data.table vs dplyr: can one do something well the other can't or does poorly?how to make a bar plot for a list of dataframes?How to group by unique values in a list in RPandas - Alternative to rank() function that gives unique ordinal ranks for a columnRank within group in for loop in RData transformation: from dyadic to observational data in RGetting map from purrr to work with paste0
Understanding trademark infringements in a world where many dictionary words are trademarks?
How to use dependency injection and avoid temporal coupling?
How to write a 12-bar blues melody
Can you Ready a Bard spell to release it after using Battle Magic?
Why did the Apollo 13 crew extend the LM landing gear?
SafeCracker #3 - We've Been Blocked
As a Bard multi-classing into Warlock, what spells do I get?
Multiple SQL versions with Docker
Didn't attend field-specific conferences during my PhD; how much of a disadvantage is it?
Copy previous line to current line from text file
Would glacier 'trees' be plausible?
As matter approaches a black hole, does it speed up?
Find the cheapest shipping option based on item weight
Python - What if the end-user didn't have the required library?
Why does this derived table improve performance?
Appropriate certificate to ask for a fibre installation (ANSI/TIA-568.3-D?)
How can internet speed be 10 times slower without a router than when using a router?
What to use instead of cling film to wrap pastry
What is the most remote airport from the center of the city it supposedly serves?
Validation rule Scheduled Apex
Shutter speed -vs- effective image stabilisation
Why do people keep telling me that I am a bad photographer?
Controlled Hadamard gate in ZX-calculus
A factorization game
Rank groups within a grouped sequence of TRUE/FALSE and NA
Grouping functions (tapply, by, aggregate) and the *apply familyCharacters counting and subletting specific patternsWhat is the purpose of setting a key in data.table?data.table vs dplyr: can one do something well the other can't or does poorly?how to make a bar plot for a list of dataframes?How to group by unique values in a list in RPandas - Alternative to rank() function that gives unique ordinal ranks for a columnRank within group in for loop in RData transformation: from dyadic to observational data in RGetting map from purrr to work with paste0
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;
I have a little nut to crack.
I have a data.frame
like this:
group criterium
1 A NA
2 A TRUE
3 A TRUE
4 A TRUE
5 A FALSE
6 A FALSE
7 A TRUE
8 A TRUE
9 A FALSE
10 A TRUE
11 A TRUE
12 A TRUE
13 B NA
14 B FALSE
15 B TRUE
16 B TRUE
17 B TRUE
18 B FALSE
structure(list(group = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("A",
"B"), class = "factor"), criterium = c(NA, TRUE, TRUE, TRUE,
FALSE, FALSE, TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, NA, FALSE,
TRUE, TRUE, TRUE, FALSE)), class = "data.frame", row.names = c(NA,
-18L))
And I want to rank the groups of TRUE
in column criterium
in ascending order while disregarding the FALSE
and NA
. The goal is to have a unique group identifier inside each group of group
.
So the result should look like:
group criterium goal
1 A NA NA
2 A TRUE 1
3 A TRUE 1
4 A TRUE 1
5 A FALSE NA
6 A FALSE NA
7 A TRUE 2
8 A TRUE 2
9 A FALSE NA
10 A TRUE 3
11 A TRUE 3
12 A TRUE 3
13 B NA NA
14 B FALSE NA
15 B TRUE 1
16 B TRUE 1
17 B TRUE 1
18 B FALSE NA
I'm sure there is a relatively easy way to do this, I just can't think of one. I experimented with dense_rank()
and other window functions of dplyr
, but to no avail.
r dplyr data.table rank
|
show 1 more comment
I have a little nut to crack.
I have a data.frame
like this:
group criterium
1 A NA
2 A TRUE
3 A TRUE
4 A TRUE
5 A FALSE
6 A FALSE
7 A TRUE
8 A TRUE
9 A FALSE
10 A TRUE
11 A TRUE
12 A TRUE
13 B NA
14 B FALSE
15 B TRUE
16 B TRUE
17 B TRUE
18 B FALSE
structure(list(group = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("A",
"B"), class = "factor"), criterium = c(NA, TRUE, TRUE, TRUE,
FALSE, FALSE, TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, NA, FALSE,
TRUE, TRUE, TRUE, FALSE)), class = "data.frame", row.names = c(NA,
-18L))
And I want to rank the groups of TRUE
in column criterium
in ascending order while disregarding the FALSE
and NA
. The goal is to have a unique group identifier inside each group of group
.
So the result should look like:
group criterium goal
1 A NA NA
2 A TRUE 1
3 A TRUE 1
4 A TRUE 1
5 A FALSE NA
6 A FALSE NA
7 A TRUE 2
8 A TRUE 2
9 A FALSE NA
10 A TRUE 3
11 A TRUE 3
12 A TRUE 3
13 B NA NA
14 B FALSE NA
15 B TRUE 1
16 B TRUE 1
17 B TRUE 1
18 B FALSE NA
I'm sure there is a relatively easy way to do this, I just can't think of one. I experimented with dense_rank()
and other window functions of dplyr
, but to no avail.
r dplyr data.table rank
1
you can just about grab what you need with this work of beauty;as.numeric(as.factor(cumsum(is.na(d$criterium^NA)) + d$criterium^NA))
-- just needs to be applied by group
– user20650
Apr 10 at 8:45
that is a really funny solution. Very good job!
– Humpelstielzchen
Apr 10 at 8:49
In your example all of group A comes first, then group B. We don't need to handle cases with group=A, criterium=TRUE interspersed with group=B, criterium=TRUE?
– smci
Apr 10 at 8:50
No, when group A stops so stops the sequence for group A.
– Humpelstielzchen
Apr 10 at 8:51
But I'm suggesting if you construct an example with group=A, criterium=TRUE followed by group=B, criterium=TRUE (with no FALSE's in-between), would that get a new 'goal' number or not? Some of the answers here will fail because they don't group-bygroup
or consider the discontinuity ingroup
.
– smci
Apr 10 at 8:53
|
show 1 more comment
I have a little nut to crack.
I have a data.frame
like this:
group criterium
1 A NA
2 A TRUE
3 A TRUE
4 A TRUE
5 A FALSE
6 A FALSE
7 A TRUE
8 A TRUE
9 A FALSE
10 A TRUE
11 A TRUE
12 A TRUE
13 B NA
14 B FALSE
15 B TRUE
16 B TRUE
17 B TRUE
18 B FALSE
structure(list(group = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("A",
"B"), class = "factor"), criterium = c(NA, TRUE, TRUE, TRUE,
FALSE, FALSE, TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, NA, FALSE,
TRUE, TRUE, TRUE, FALSE)), class = "data.frame", row.names = c(NA,
-18L))
And I want to rank the groups of TRUE
in column criterium
in ascending order while disregarding the FALSE
and NA
. The goal is to have a unique group identifier inside each group of group
.
So the result should look like:
group criterium goal
1 A NA NA
2 A TRUE 1
3 A TRUE 1
4 A TRUE 1
5 A FALSE NA
6 A FALSE NA
7 A TRUE 2
8 A TRUE 2
9 A FALSE NA
10 A TRUE 3
11 A TRUE 3
12 A TRUE 3
13 B NA NA
14 B FALSE NA
15 B TRUE 1
16 B TRUE 1
17 B TRUE 1
18 B FALSE NA
I'm sure there is a relatively easy way to do this, I just can't think of one. I experimented with dense_rank()
and other window functions of dplyr
, but to no avail.
r dplyr data.table rank
I have a little nut to crack.
I have a data.frame
like this:
group criterium
1 A NA
2 A TRUE
3 A TRUE
4 A TRUE
5 A FALSE
6 A FALSE
7 A TRUE
8 A TRUE
9 A FALSE
10 A TRUE
11 A TRUE
12 A TRUE
13 B NA
14 B FALSE
15 B TRUE
16 B TRUE
17 B TRUE
18 B FALSE
structure(list(group = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("A",
"B"), class = "factor"), criterium = c(NA, TRUE, TRUE, TRUE,
FALSE, FALSE, TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, NA, FALSE,
TRUE, TRUE, TRUE, FALSE)), class = "data.frame", row.names = c(NA,
-18L))
And I want to rank the groups of TRUE
in column criterium
in ascending order while disregarding the FALSE
and NA
. The goal is to have a unique group identifier inside each group of group
.
So the result should look like:
group criterium goal
1 A NA NA
2 A TRUE 1
3 A TRUE 1
4 A TRUE 1
5 A FALSE NA
6 A FALSE NA
7 A TRUE 2
8 A TRUE 2
9 A FALSE NA
10 A TRUE 3
11 A TRUE 3
12 A TRUE 3
13 B NA NA
14 B FALSE NA
15 B TRUE 1
16 B TRUE 1
17 B TRUE 1
18 B FALSE NA
I'm sure there is a relatively easy way to do this, I just can't think of one. I experimented with dense_rank()
and other window functions of dplyr
, but to no avail.
r dplyr data.table rank
r dplyr data.table rank
edited Apr 17 at 22:04
TylerH
16.2k105569
16.2k105569
asked Apr 10 at 6:47
HumpelstielzchenHumpelstielzchen
1,7991319
1,7991319
1
you can just about grab what you need with this work of beauty;as.numeric(as.factor(cumsum(is.na(d$criterium^NA)) + d$criterium^NA))
-- just needs to be applied by group
– user20650
Apr 10 at 8:45
that is a really funny solution. Very good job!
– Humpelstielzchen
Apr 10 at 8:49
In your example all of group A comes first, then group B. We don't need to handle cases with group=A, criterium=TRUE interspersed with group=B, criterium=TRUE?
– smci
Apr 10 at 8:50
No, when group A stops so stops the sequence for group A.
– Humpelstielzchen
Apr 10 at 8:51
But I'm suggesting if you construct an example with group=A, criterium=TRUE followed by group=B, criterium=TRUE (with no FALSE's in-between), would that get a new 'goal' number or not? Some of the answers here will fail because they don't group-bygroup
or consider the discontinuity ingroup
.
– smci
Apr 10 at 8:53
|
show 1 more comment
1
you can just about grab what you need with this work of beauty;as.numeric(as.factor(cumsum(is.na(d$criterium^NA)) + d$criterium^NA))
-- just needs to be applied by group
– user20650
Apr 10 at 8:45
that is a really funny solution. Very good job!
– Humpelstielzchen
Apr 10 at 8:49
In your example all of group A comes first, then group B. We don't need to handle cases with group=A, criterium=TRUE interspersed with group=B, criterium=TRUE?
– smci
Apr 10 at 8:50
No, when group A stops so stops the sequence for group A.
– Humpelstielzchen
Apr 10 at 8:51
But I'm suggesting if you construct an example with group=A, criterium=TRUE followed by group=B, criterium=TRUE (with no FALSE's in-between), would that get a new 'goal' number or not? Some of the answers here will fail because they don't group-bygroup
or consider the discontinuity ingroup
.
– smci
Apr 10 at 8:53
1
1
you can just about grab what you need with this work of beauty;
as.numeric(as.factor(cumsum(is.na(d$criterium^NA)) + d$criterium^NA))
-- just needs to be applied by group– user20650
Apr 10 at 8:45
you can just about grab what you need with this work of beauty;
as.numeric(as.factor(cumsum(is.na(d$criterium^NA)) + d$criterium^NA))
-- just needs to be applied by group– user20650
Apr 10 at 8:45
that is a really funny solution. Very good job!
– Humpelstielzchen
Apr 10 at 8:49
that is a really funny solution. Very good job!
– Humpelstielzchen
Apr 10 at 8:49
In your example all of group A comes first, then group B. We don't need to handle cases with group=A, criterium=TRUE interspersed with group=B, criterium=TRUE?
– smci
Apr 10 at 8:50
In your example all of group A comes first, then group B. We don't need to handle cases with group=A, criterium=TRUE interspersed with group=B, criterium=TRUE?
– smci
Apr 10 at 8:50
No, when group A stops so stops the sequence for group A.
– Humpelstielzchen
Apr 10 at 8:51
No, when group A stops so stops the sequence for group A.
– Humpelstielzchen
Apr 10 at 8:51
But I'm suggesting if you construct an example with group=A, criterium=TRUE followed by group=B, criterium=TRUE (with no FALSE's in-between), would that get a new 'goal' number or not? Some of the answers here will fail because they don't group-by
group
or consider the discontinuity in group
.– smci
Apr 10 at 8:53
But I'm suggesting if you construct an example with group=A, criterium=TRUE followed by group=B, criterium=TRUE (with no FALSE's in-between), would that get a new 'goal' number or not? Some of the answers here will fail because they don't group-by
group
or consider the discontinuity in group
.– smci
Apr 10 at 8:53
|
show 1 more comment
4 Answers
4
active
oldest
votes
Another data.table
approach:
library(data.table)
setDT(dt)
dt[, cr := rleid(criterium)][
(criterium), goal := rleid(cr), by=.(group)]
add a comment |
Maybe I have over-complicated this but one way with dplyr
is
library(dplyr)
df %>%
mutate(temp = replace(criterium, is.na(criterium), FALSE),
temp1 = cumsum(!temp)) %>%
group_by(temp1) %>%
mutate(goal = +(row_number() == which.max(temp) & any(temp))) %>%
group_by(group) %>%
mutate(goal = ifelse(temp, cumsum(goal), NA)) %>%
select(-temp, -temp1)
# group criterium goal
# <fct> <lgl> <int>
# 1 A NA NA
# 2 A TRUE 1
# 3 A TRUE 1
# 4 A TRUE 1
# 5 A FALSE NA
# 6 A FALSE NA
# 7 A TRUE 2
# 8 A TRUE 2
# 9 A FALSE NA
#10 A TRUE 3
#11 A TRUE 3
#12 A TRUE 3
#13 B NA NA
#14 B FALSE NA
#15 B TRUE 1
#16 B TRUE 1
#17 B TRUE 1
#18 B FALSE NA
We first replace
NA
s in criterium
column to FALSE
and take cumulative sum over the negation of it (temp1
). We group_by
temp1
and assign 1 to every first TRUE
value in the group. Finally grouping by group
we take a cumulative sum for TRUE
values or return NA
for FALSE
and NA
values.
add a comment |
A pure Base R solution, we can create a custom function via rle
, and use it per group, i.e.
f1 <- function(x)
x[is.na(x)] <- FALSE
rle1 <- rle(x)
y <- rle1$values
rle1$values[!y] <- 0
rle1$values[y] <- cumsum(rle1$values[y])
return(inverse.rle(rle1))
do.call(rbind,
lapply(split(df, df$group), function(i)i$goal <- f1(i$criterium);
i$goal <- replace(i$goal, is.na(i$criterium)))
Of course, If you want you can apply it via dplyr
, i.e.
library(dplyr)
df %>%
group_by(group) %>%
mutate(goal = f1(criterium),
goal = replace(goal, is.na(criterium)|!criterium, NA))
which gives,
# A tibble: 18 x 3
# Groups: group [2]
group criterium goal
<fct> <lgl> <dbl>
1 A NA NA
2 A TRUE 1
3 A TRUE 1
4 A TRUE 1
5 A FALSE NA
6 A FALSE NA
7 A TRUE 2
8 A TRUE 2
9 A FALSE NA
10 A TRUE 3
11 A TRUE 3
12 A TRUE 3
13 B NA NA
14 B FALSE NA
15 B TRUE 1
16 B TRUE 1
17 B TRUE 1
18 B FALSE NA
add a comment |
A data.table
option using rle
library(data.table)
DT <- as.data.table(dat)
DT[, goal :=
r <- rle(replace(criterium, is.na(criterium), FALSE))
r$values <- with(r, cumsum(values) * values)
out <- inverse.rle(r)
replace(out, out == 0, NA)
, by = group]
DT
# group criterium goal
# 1: A NA NA
# 2: A TRUE 1
# 3: A TRUE 1
# 4: A TRUE 1
# 5: A FALSE NA
# 6: A FALSE NA
# 7: A TRUE 2
# 8: A TRUE 2
# 9: A FALSE NA
#10: A TRUE 3
#11: A TRUE 3
#12: A TRUE 3
#13: B NA NA
#14: B FALSE NA
#15: B TRUE 1
#16: B TRUE 1
#17: B TRUE 1
#18: B FALSE NA
step by step
When we call r <- rle(replace(criterium, is.na(criterium), FALSE))
we get an object of class rle
r
#Run Length Encoding
# lengths: int [1:9] 1 3 2 2 1 3 2 3 1
# values : logi [1:9] FALSE TRUE FALSE TRUE FALSE TRUE ...
We manipulate the values
compenent in the following way
r$values <- with(r, cumsum(values) * values)
r
#Run Length Encoding
# lengths: int [1:9] 1 3 2 2 1 3 2 3 1
# values : int [1:9] 0 1 0 2 0 3 0 4 0
That is, we replaced TRUE
s with the cumulative sum of values
and set the FALSE
s to 0
. Now inverse.rle
returns a vector in which values
will repeated lenghts
times
out <- inverse.rle(r)
out
# [1] 0 1 1 1 0 0 2 2 0 3 3 3 0 0 4 4 4 0
This is almost what OP wants but we need to replace the 0
s with NA
replace(out, out == 0, NA)
This is done for each group
.
data
dat <- structure(list(group = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("A",
"B"), class = "factor"), criterium = c(NA, TRUE, TRUE, TRUE,
FALSE, FALSE, TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, NA, FALSE,
TRUE, TRUE, TRUE, FALSE)), class = "data.frame", row.names = c(NA,
-18L))
Wow, impressive. Thanks for introducing me torle
andinverse.rle
. Gruß nach Leipzig.
– Humpelstielzchen
Apr 10 at 7:59
1
@Humpelstielzchen Gern geschehen. Will try to simplify and explain the logic a bit.
– markus
Apr 10 at 8:02
Thanks! I was dissecting your answer just like that. Your answer taught me the most. But chinsoon12 is just a Teufelskerl. ^^
– Humpelstielzchen
Apr 10 at 8:36
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55606323%2frank-groups-within-a-grouped-sequence-of-true-false-and-na%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
4 Answers
4
active
oldest
votes
4 Answers
4
active
oldest
votes
active
oldest
votes
active
oldest
votes
Another data.table
approach:
library(data.table)
setDT(dt)
dt[, cr := rleid(criterium)][
(criterium), goal := rleid(cr), by=.(group)]
add a comment |
Another data.table
approach:
library(data.table)
setDT(dt)
dt[, cr := rleid(criterium)][
(criterium), goal := rleid(cr), by=.(group)]
add a comment |
Another data.table
approach:
library(data.table)
setDT(dt)
dt[, cr := rleid(criterium)][
(criterium), goal := rleid(cr), by=.(group)]
Another data.table
approach:
library(data.table)
setDT(dt)
dt[, cr := rleid(criterium)][
(criterium), goal := rleid(cr), by=.(group)]
answered Apr 10 at 8:20
chinsoon12chinsoon12
10.1k11420
10.1k11420
add a comment |
add a comment |
Maybe I have over-complicated this but one way with dplyr
is
library(dplyr)
df %>%
mutate(temp = replace(criterium, is.na(criterium), FALSE),
temp1 = cumsum(!temp)) %>%
group_by(temp1) %>%
mutate(goal = +(row_number() == which.max(temp) & any(temp))) %>%
group_by(group) %>%
mutate(goal = ifelse(temp, cumsum(goal), NA)) %>%
select(-temp, -temp1)
# group criterium goal
# <fct> <lgl> <int>
# 1 A NA NA
# 2 A TRUE 1
# 3 A TRUE 1
# 4 A TRUE 1
# 5 A FALSE NA
# 6 A FALSE NA
# 7 A TRUE 2
# 8 A TRUE 2
# 9 A FALSE NA
#10 A TRUE 3
#11 A TRUE 3
#12 A TRUE 3
#13 B NA NA
#14 B FALSE NA
#15 B TRUE 1
#16 B TRUE 1
#17 B TRUE 1
#18 B FALSE NA
We first replace
NA
s in criterium
column to FALSE
and take cumulative sum over the negation of it (temp1
). We group_by
temp1
and assign 1 to every first TRUE
value in the group. Finally grouping by group
we take a cumulative sum for TRUE
values or return NA
for FALSE
and NA
values.
add a comment |
Maybe I have over-complicated this but one way with dplyr
is
library(dplyr)
df %>%
mutate(temp = replace(criterium, is.na(criterium), FALSE),
temp1 = cumsum(!temp)) %>%
group_by(temp1) %>%
mutate(goal = +(row_number() == which.max(temp) & any(temp))) %>%
group_by(group) %>%
mutate(goal = ifelse(temp, cumsum(goal), NA)) %>%
select(-temp, -temp1)
# group criterium goal
# <fct> <lgl> <int>
# 1 A NA NA
# 2 A TRUE 1
# 3 A TRUE 1
# 4 A TRUE 1
# 5 A FALSE NA
# 6 A FALSE NA
# 7 A TRUE 2
# 8 A TRUE 2
# 9 A FALSE NA
#10 A TRUE 3
#11 A TRUE 3
#12 A TRUE 3
#13 B NA NA
#14 B FALSE NA
#15 B TRUE 1
#16 B TRUE 1
#17 B TRUE 1
#18 B FALSE NA
We first replace
NA
s in criterium
column to FALSE
and take cumulative sum over the negation of it (temp1
). We group_by
temp1
and assign 1 to every first TRUE
value in the group. Finally grouping by group
we take a cumulative sum for TRUE
values or return NA
for FALSE
and NA
values.
add a comment |
Maybe I have over-complicated this but one way with dplyr
is
library(dplyr)
df %>%
mutate(temp = replace(criterium, is.na(criterium), FALSE),
temp1 = cumsum(!temp)) %>%
group_by(temp1) %>%
mutate(goal = +(row_number() == which.max(temp) & any(temp))) %>%
group_by(group) %>%
mutate(goal = ifelse(temp, cumsum(goal), NA)) %>%
select(-temp, -temp1)
# group criterium goal
# <fct> <lgl> <int>
# 1 A NA NA
# 2 A TRUE 1
# 3 A TRUE 1
# 4 A TRUE 1
# 5 A FALSE NA
# 6 A FALSE NA
# 7 A TRUE 2
# 8 A TRUE 2
# 9 A FALSE NA
#10 A TRUE 3
#11 A TRUE 3
#12 A TRUE 3
#13 B NA NA
#14 B FALSE NA
#15 B TRUE 1
#16 B TRUE 1
#17 B TRUE 1
#18 B FALSE NA
We first replace
NA
s in criterium
column to FALSE
and take cumulative sum over the negation of it (temp1
). We group_by
temp1
and assign 1 to every first TRUE
value in the group. Finally grouping by group
we take a cumulative sum for TRUE
values or return NA
for FALSE
and NA
values.
Maybe I have over-complicated this but one way with dplyr
is
library(dplyr)
df %>%
mutate(temp = replace(criterium, is.na(criterium), FALSE),
temp1 = cumsum(!temp)) %>%
group_by(temp1) %>%
mutate(goal = +(row_number() == which.max(temp) & any(temp))) %>%
group_by(group) %>%
mutate(goal = ifelse(temp, cumsum(goal), NA)) %>%
select(-temp, -temp1)
# group criterium goal
# <fct> <lgl> <int>
# 1 A NA NA
# 2 A TRUE 1
# 3 A TRUE 1
# 4 A TRUE 1
# 5 A FALSE NA
# 6 A FALSE NA
# 7 A TRUE 2
# 8 A TRUE 2
# 9 A FALSE NA
#10 A TRUE 3
#11 A TRUE 3
#12 A TRUE 3
#13 B NA NA
#14 B FALSE NA
#15 B TRUE 1
#16 B TRUE 1
#17 B TRUE 1
#18 B FALSE NA
We first replace
NA
s in criterium
column to FALSE
and take cumulative sum over the negation of it (temp1
). We group_by
temp1
and assign 1 to every first TRUE
value in the group. Finally grouping by group
we take a cumulative sum for TRUE
values or return NA
for FALSE
and NA
values.
answered Apr 10 at 7:24
Ronak ShahRonak Shah
51.4k104370
51.4k104370
add a comment |
add a comment |
A pure Base R solution, we can create a custom function via rle
, and use it per group, i.e.
f1 <- function(x)
x[is.na(x)] <- FALSE
rle1 <- rle(x)
y <- rle1$values
rle1$values[!y] <- 0
rle1$values[y] <- cumsum(rle1$values[y])
return(inverse.rle(rle1))
do.call(rbind,
lapply(split(df, df$group), function(i)i$goal <- f1(i$criterium);
i$goal <- replace(i$goal, is.na(i$criterium)))
Of course, If you want you can apply it via dplyr
, i.e.
library(dplyr)
df %>%
group_by(group) %>%
mutate(goal = f1(criterium),
goal = replace(goal, is.na(criterium)|!criterium, NA))
which gives,
# A tibble: 18 x 3
# Groups: group [2]
group criterium goal
<fct> <lgl> <dbl>
1 A NA NA
2 A TRUE 1
3 A TRUE 1
4 A TRUE 1
5 A FALSE NA
6 A FALSE NA
7 A TRUE 2
8 A TRUE 2
9 A FALSE NA
10 A TRUE 3
11 A TRUE 3
12 A TRUE 3
13 B NA NA
14 B FALSE NA
15 B TRUE 1
16 B TRUE 1
17 B TRUE 1
18 B FALSE NA
add a comment |
A pure Base R solution, we can create a custom function via rle
, and use it per group, i.e.
f1 <- function(x)
x[is.na(x)] <- FALSE
rle1 <- rle(x)
y <- rle1$values
rle1$values[!y] <- 0
rle1$values[y] <- cumsum(rle1$values[y])
return(inverse.rle(rle1))
do.call(rbind,
lapply(split(df, df$group), function(i)i$goal <- f1(i$criterium);
i$goal <- replace(i$goal, is.na(i$criterium)))
Of course, If you want you can apply it via dplyr
, i.e.
library(dplyr)
df %>%
group_by(group) %>%
mutate(goal = f1(criterium),
goal = replace(goal, is.na(criterium)|!criterium, NA))
which gives,
# A tibble: 18 x 3
# Groups: group [2]
group criterium goal
<fct> <lgl> <dbl>
1 A NA NA
2 A TRUE 1
3 A TRUE 1
4 A TRUE 1
5 A FALSE NA
6 A FALSE NA
7 A TRUE 2
8 A TRUE 2
9 A FALSE NA
10 A TRUE 3
11 A TRUE 3
12 A TRUE 3
13 B NA NA
14 B FALSE NA
15 B TRUE 1
16 B TRUE 1
17 B TRUE 1
18 B FALSE NA
add a comment |
A pure Base R solution, we can create a custom function via rle
, and use it per group, i.e.
f1 <- function(x)
x[is.na(x)] <- FALSE
rle1 <- rle(x)
y <- rle1$values
rle1$values[!y] <- 0
rle1$values[y] <- cumsum(rle1$values[y])
return(inverse.rle(rle1))
do.call(rbind,
lapply(split(df, df$group), function(i)i$goal <- f1(i$criterium);
i$goal <- replace(i$goal, is.na(i$criterium)))
Of course, If you want you can apply it via dplyr
, i.e.
library(dplyr)
df %>%
group_by(group) %>%
mutate(goal = f1(criterium),
goal = replace(goal, is.na(criterium)|!criterium, NA))
which gives,
# A tibble: 18 x 3
# Groups: group [2]
group criterium goal
<fct> <lgl> <dbl>
1 A NA NA
2 A TRUE 1
3 A TRUE 1
4 A TRUE 1
5 A FALSE NA
6 A FALSE NA
7 A TRUE 2
8 A TRUE 2
9 A FALSE NA
10 A TRUE 3
11 A TRUE 3
12 A TRUE 3
13 B NA NA
14 B FALSE NA
15 B TRUE 1
16 B TRUE 1
17 B TRUE 1
18 B FALSE NA
A pure Base R solution, we can create a custom function via rle
, and use it per group, i.e.
f1 <- function(x)
x[is.na(x)] <- FALSE
rle1 <- rle(x)
y <- rle1$values
rle1$values[!y] <- 0
rle1$values[y] <- cumsum(rle1$values[y])
return(inverse.rle(rle1))
do.call(rbind,
lapply(split(df, df$group), function(i)i$goal <- f1(i$criterium);
i$goal <- replace(i$goal, is.na(i$criterium)))
Of course, If you want you can apply it via dplyr
, i.e.
library(dplyr)
df %>%
group_by(group) %>%
mutate(goal = f1(criterium),
goal = replace(goal, is.na(criterium)|!criterium, NA))
which gives,
# A tibble: 18 x 3
# Groups: group [2]
group criterium goal
<fct> <lgl> <dbl>
1 A NA NA
2 A TRUE 1
3 A TRUE 1
4 A TRUE 1
5 A FALSE NA
6 A FALSE NA
7 A TRUE 2
8 A TRUE 2
9 A FALSE NA
10 A TRUE 3
11 A TRUE 3
12 A TRUE 3
13 B NA NA
14 B FALSE NA
15 B TRUE 1
16 B TRUE 1
17 B TRUE 1
18 B FALSE NA
edited Apr 10 at 9:59
answered Apr 10 at 7:29
SotosSotos
32.2k51843
32.2k51843
add a comment |
add a comment |
A data.table
option using rle
library(data.table)
DT <- as.data.table(dat)
DT[, goal :=
r <- rle(replace(criterium, is.na(criterium), FALSE))
r$values <- with(r, cumsum(values) * values)
out <- inverse.rle(r)
replace(out, out == 0, NA)
, by = group]
DT
# group criterium goal
# 1: A NA NA
# 2: A TRUE 1
# 3: A TRUE 1
# 4: A TRUE 1
# 5: A FALSE NA
# 6: A FALSE NA
# 7: A TRUE 2
# 8: A TRUE 2
# 9: A FALSE NA
#10: A TRUE 3
#11: A TRUE 3
#12: A TRUE 3
#13: B NA NA
#14: B FALSE NA
#15: B TRUE 1
#16: B TRUE 1
#17: B TRUE 1
#18: B FALSE NA
step by step
When we call r <- rle(replace(criterium, is.na(criterium), FALSE))
we get an object of class rle
r
#Run Length Encoding
# lengths: int [1:9] 1 3 2 2 1 3 2 3 1
# values : logi [1:9] FALSE TRUE FALSE TRUE FALSE TRUE ...
We manipulate the values
compenent in the following way
r$values <- with(r, cumsum(values) * values)
r
#Run Length Encoding
# lengths: int [1:9] 1 3 2 2 1 3 2 3 1
# values : int [1:9] 0 1 0 2 0 3 0 4 0
That is, we replaced TRUE
s with the cumulative sum of values
and set the FALSE
s to 0
. Now inverse.rle
returns a vector in which values
will repeated lenghts
times
out <- inverse.rle(r)
out
# [1] 0 1 1 1 0 0 2 2 0 3 3 3 0 0 4 4 4 0
This is almost what OP wants but we need to replace the 0
s with NA
replace(out, out == 0, NA)
This is done for each group
.
data
dat <- structure(list(group = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("A",
"B"), class = "factor"), criterium = c(NA, TRUE, TRUE, TRUE,
FALSE, FALSE, TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, NA, FALSE,
TRUE, TRUE, TRUE, FALSE)), class = "data.frame", row.names = c(NA,
-18L))
Wow, impressive. Thanks for introducing me torle
andinverse.rle
. Gruß nach Leipzig.
– Humpelstielzchen
Apr 10 at 7:59
1
@Humpelstielzchen Gern geschehen. Will try to simplify and explain the logic a bit.
– markus
Apr 10 at 8:02
Thanks! I was dissecting your answer just like that. Your answer taught me the most. But chinsoon12 is just a Teufelskerl. ^^
– Humpelstielzchen
Apr 10 at 8:36
add a comment |
A data.table
option using rle
library(data.table)
DT <- as.data.table(dat)
DT[, goal :=
r <- rle(replace(criterium, is.na(criterium), FALSE))
r$values <- with(r, cumsum(values) * values)
out <- inverse.rle(r)
replace(out, out == 0, NA)
, by = group]
DT
# group criterium goal
# 1: A NA NA
# 2: A TRUE 1
# 3: A TRUE 1
# 4: A TRUE 1
# 5: A FALSE NA
# 6: A FALSE NA
# 7: A TRUE 2
# 8: A TRUE 2
# 9: A FALSE NA
#10: A TRUE 3
#11: A TRUE 3
#12: A TRUE 3
#13: B NA NA
#14: B FALSE NA
#15: B TRUE 1
#16: B TRUE 1
#17: B TRUE 1
#18: B FALSE NA
step by step
When we call r <- rle(replace(criterium, is.na(criterium), FALSE))
we get an object of class rle
r
#Run Length Encoding
# lengths: int [1:9] 1 3 2 2 1 3 2 3 1
# values : logi [1:9] FALSE TRUE FALSE TRUE FALSE TRUE ...
We manipulate the values
compenent in the following way
r$values <- with(r, cumsum(values) * values)
r
#Run Length Encoding
# lengths: int [1:9] 1 3 2 2 1 3 2 3 1
# values : int [1:9] 0 1 0 2 0 3 0 4 0
That is, we replaced TRUE
s with the cumulative sum of values
and set the FALSE
s to 0
. Now inverse.rle
returns a vector in which values
will repeated lenghts
times
out <- inverse.rle(r)
out
# [1] 0 1 1 1 0 0 2 2 0 3 3 3 0 0 4 4 4 0
This is almost what OP wants but we need to replace the 0
s with NA
replace(out, out == 0, NA)
This is done for each group
.
data
dat <- structure(list(group = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("A",
"B"), class = "factor"), criterium = c(NA, TRUE, TRUE, TRUE,
FALSE, FALSE, TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, NA, FALSE,
TRUE, TRUE, TRUE, FALSE)), class = "data.frame", row.names = c(NA,
-18L))
Wow, impressive. Thanks for introducing me torle
andinverse.rle
. Gruß nach Leipzig.
– Humpelstielzchen
Apr 10 at 7:59
1
@Humpelstielzchen Gern geschehen. Will try to simplify and explain the logic a bit.
– markus
Apr 10 at 8:02
Thanks! I was dissecting your answer just like that. Your answer taught me the most. But chinsoon12 is just a Teufelskerl. ^^
– Humpelstielzchen
Apr 10 at 8:36
add a comment |
A data.table
option using rle
library(data.table)
DT <- as.data.table(dat)
DT[, goal :=
r <- rle(replace(criterium, is.na(criterium), FALSE))
r$values <- with(r, cumsum(values) * values)
out <- inverse.rle(r)
replace(out, out == 0, NA)
, by = group]
DT
# group criterium goal
# 1: A NA NA
# 2: A TRUE 1
# 3: A TRUE 1
# 4: A TRUE 1
# 5: A FALSE NA
# 6: A FALSE NA
# 7: A TRUE 2
# 8: A TRUE 2
# 9: A FALSE NA
#10: A TRUE 3
#11: A TRUE 3
#12: A TRUE 3
#13: B NA NA
#14: B FALSE NA
#15: B TRUE 1
#16: B TRUE 1
#17: B TRUE 1
#18: B FALSE NA
step by step
When we call r <- rle(replace(criterium, is.na(criterium), FALSE))
we get an object of class rle
r
#Run Length Encoding
# lengths: int [1:9] 1 3 2 2 1 3 2 3 1
# values : logi [1:9] FALSE TRUE FALSE TRUE FALSE TRUE ...
We manipulate the values
compenent in the following way
r$values <- with(r, cumsum(values) * values)
r
#Run Length Encoding
# lengths: int [1:9] 1 3 2 2 1 3 2 3 1
# values : int [1:9] 0 1 0 2 0 3 0 4 0
That is, we replaced TRUE
s with the cumulative sum of values
and set the FALSE
s to 0
. Now inverse.rle
returns a vector in which values
will repeated lenghts
times
out <- inverse.rle(r)
out
# [1] 0 1 1 1 0 0 2 2 0 3 3 3 0 0 4 4 4 0
This is almost what OP wants but we need to replace the 0
s with NA
replace(out, out == 0, NA)
This is done for each group
.
data
dat <- structure(list(group = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("A",
"B"), class = "factor"), criterium = c(NA, TRUE, TRUE, TRUE,
FALSE, FALSE, TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, NA, FALSE,
TRUE, TRUE, TRUE, FALSE)), class = "data.frame", row.names = c(NA,
-18L))
A data.table
option using rle
library(data.table)
DT <- as.data.table(dat)
DT[, goal :=
r <- rle(replace(criterium, is.na(criterium), FALSE))
r$values <- with(r, cumsum(values) * values)
out <- inverse.rle(r)
replace(out, out == 0, NA)
, by = group]
DT
# group criterium goal
# 1: A NA NA
# 2: A TRUE 1
# 3: A TRUE 1
# 4: A TRUE 1
# 5: A FALSE NA
# 6: A FALSE NA
# 7: A TRUE 2
# 8: A TRUE 2
# 9: A FALSE NA
#10: A TRUE 3
#11: A TRUE 3
#12: A TRUE 3
#13: B NA NA
#14: B FALSE NA
#15: B TRUE 1
#16: B TRUE 1
#17: B TRUE 1
#18: B FALSE NA
step by step
When we call r <- rle(replace(criterium, is.na(criterium), FALSE))
we get an object of class rle
r
#Run Length Encoding
# lengths: int [1:9] 1 3 2 2 1 3 2 3 1
# values : logi [1:9] FALSE TRUE FALSE TRUE FALSE TRUE ...
We manipulate the values
compenent in the following way
r$values <- with(r, cumsum(values) * values)
r
#Run Length Encoding
# lengths: int [1:9] 1 3 2 2 1 3 2 3 1
# values : int [1:9] 0 1 0 2 0 3 0 4 0
That is, we replaced TRUE
s with the cumulative sum of values
and set the FALSE
s to 0
. Now inverse.rle
returns a vector in which values
will repeated lenghts
times
out <- inverse.rle(r)
out
# [1] 0 1 1 1 0 0 2 2 0 3 3 3 0 0 4 4 4 0
This is almost what OP wants but we need to replace the 0
s with NA
replace(out, out == 0, NA)
This is done for each group
.
data
dat <- structure(list(group = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("A",
"B"), class = "factor"), criterium = c(NA, TRUE, TRUE, TRUE,
FALSE, FALSE, TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, NA, FALSE,
TRUE, TRUE, TRUE, FALSE)), class = "data.frame", row.names = c(NA,
-18L))
edited Apr 10 at 11:34
answered Apr 10 at 7:26
markusmarkus
16.4k11336
16.4k11336
Wow, impressive. Thanks for introducing me torle
andinverse.rle
. Gruß nach Leipzig.
– Humpelstielzchen
Apr 10 at 7:59
1
@Humpelstielzchen Gern geschehen. Will try to simplify and explain the logic a bit.
– markus
Apr 10 at 8:02
Thanks! I was dissecting your answer just like that. Your answer taught me the most. But chinsoon12 is just a Teufelskerl. ^^
– Humpelstielzchen
Apr 10 at 8:36
add a comment |
Wow, impressive. Thanks for introducing me torle
andinverse.rle
. Gruß nach Leipzig.
– Humpelstielzchen
Apr 10 at 7:59
1
@Humpelstielzchen Gern geschehen. Will try to simplify and explain the logic a bit.
– markus
Apr 10 at 8:02
Thanks! I was dissecting your answer just like that. Your answer taught me the most. But chinsoon12 is just a Teufelskerl. ^^
– Humpelstielzchen
Apr 10 at 8:36
Wow, impressive. Thanks for introducing me to
rle
and inverse.rle
. Gruß nach Leipzig.– Humpelstielzchen
Apr 10 at 7:59
Wow, impressive. Thanks for introducing me to
rle
and inverse.rle
. Gruß nach Leipzig.– Humpelstielzchen
Apr 10 at 7:59
1
1
@Humpelstielzchen Gern geschehen. Will try to simplify and explain the logic a bit.
– markus
Apr 10 at 8:02
@Humpelstielzchen Gern geschehen. Will try to simplify and explain the logic a bit.
– markus
Apr 10 at 8:02
Thanks! I was dissecting your answer just like that. Your answer taught me the most. But chinsoon12 is just a Teufelskerl. ^^
– Humpelstielzchen
Apr 10 at 8:36
Thanks! I was dissecting your answer just like that. Your answer taught me the most. But chinsoon12 is just a Teufelskerl. ^^
– Humpelstielzchen
Apr 10 at 8:36
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55606323%2frank-groups-within-a-grouped-sequence-of-true-false-and-na%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
you can just about grab what you need with this work of beauty;
as.numeric(as.factor(cumsum(is.na(d$criterium^NA)) + d$criterium^NA))
-- just needs to be applied by group– user20650
Apr 10 at 8:45
that is a really funny solution. Very good job!
– Humpelstielzchen
Apr 10 at 8:49
In your example all of group A comes first, then group B. We don't need to handle cases with group=A, criterium=TRUE interspersed with group=B, criterium=TRUE?
– smci
Apr 10 at 8:50
No, when group A stops so stops the sequence for group A.
– Humpelstielzchen
Apr 10 at 8:51
But I'm suggesting if you construct an example with group=A, criterium=TRUE followed by group=B, criterium=TRUE (with no FALSE's in-between), would that get a new 'goal' number or not? Some of the answers here will fail because they don't group-by
group
or consider the discontinuity ingroup
.– smci
Apr 10 at 8:53