Rank groups within a grouped sequence of TRUE/FALSE and NA

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}

I have a little nut to crack.

I have a data.frame like this:

   group criterium

1      A        NA

2      A      TRUE

3      A      TRUE

4      A      TRUE

5      A     FALSE

6      A     FALSE

7      A      TRUE

8      A      TRUE

9      A     FALSE

10     A      TRUE

11     A      TRUE

12     A      TRUE

13     B        NA

14     B     FALSE

15     B      TRUE

16     B      TRUE

17     B      TRUE

18     B     FALSE



structure(list(group = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 

1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("A", 

"B"), class = "factor"), criterium = c(NA, TRUE, TRUE, TRUE, 

FALSE, FALSE, TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, NA, FALSE, 

TRUE, TRUE, TRUE, FALSE)), class = "data.frame", row.names = c(NA, 

-18L))

And I want to rank the groups of TRUE in column criterium in ascending order while disregarding the FALSEand NA. The goal is to have a unique group identifier inside each group of group.

So the result should look like:

    group criterium goal

1      A        NA   NA

2      A      TRUE    1

3      A      TRUE    1

4      A      TRUE    1

5      A     FALSE   NA

6      A     FALSE   NA

7      A      TRUE    2

8      A      TRUE    2

9      A     FALSE   NA

10     A      TRUE    3

11     A      TRUE    3

12     A      TRUE    3

13     B        NA   NA

14     B     FALSE   NA

15     B      TRUE    1

16     B      TRUE    1

17     B      TRUE    1

18     B     FALSE   NA

I'm sure there is a relatively easy way to do this, I just can't think of one. I experimented with dense_rank() and other window functions of dplyr, but to no avail.

Thanks for the help!

edited 19 hours ago

asked 21 hours ago

Humpelstielzchen

1,3901318

1

you can just about grab what you need with this work of beauty; as.numeric(as.factor(cumsum(is.na(d$criterium^NA)) + d$criterium^NA)) -- just needs to be applied by group

– user20650
19 hours ago

that is a really funny solution. Very good job!

– Humpelstielzchen
19 hours ago

In your example all of group A comes first, then group B. We don't need to handle cases with group=A, criterium=TRUE interspersed with group=B, criterium=TRUE?

– smci
19 hours ago

No, when group A stops so stops the sequence for group A.

– Humpelstielzchen
19 hours ago

But I'm suggesting if you construct an example with group=A, criterium=TRUE followed by group=B, criterium=TRUE (with no FALSE's in-between), would that get a new 'goal' number or not? Some of the answers here will fail because they don't group-by group or consider the discontinuity in group.

– smci
19 hours ago

|
show 1 more comment

I have a little nut to crack.

I have a data.frame like this:

   group criterium

1      A        NA

2      A      TRUE

3      A      TRUE

4      A      TRUE

5      A     FALSE

6      A     FALSE

7      A      TRUE

8      A      TRUE

9      A     FALSE

10     A      TRUE

11     A      TRUE

12     A      TRUE

13     B        NA

14     B     FALSE

15     B      TRUE

16     B      TRUE

17     B      TRUE

18     B     FALSE



structure(list(group = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 

1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("A", 

"B"), class = "factor"), criterium = c(NA, TRUE, TRUE, TRUE, 

FALSE, FALSE, TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, NA, FALSE, 

TRUE, TRUE, TRUE, FALSE)), class = "data.frame", row.names = c(NA, 

-18L))

And I want to rank the groups of TRUE in column criterium in ascending order while disregarding the FALSEand NA. The goal is to have a unique group identifier inside each group of group.

So the result should look like:

    group criterium goal

1      A        NA   NA

2      A      TRUE    1

3      A      TRUE    1

4      A      TRUE    1

5      A     FALSE   NA

6      A     FALSE   NA

7      A      TRUE    2

8      A      TRUE    2

9      A     FALSE   NA

10     A      TRUE    3

11     A      TRUE    3

12     A      TRUE    3

13     B        NA   NA

14     B     FALSE   NA

15     B      TRUE    1

16     B      TRUE    1

17     B      TRUE    1

18     B     FALSE   NA

I'm sure there is a relatively easy way to do this, I just can't think of one. I experimented with dense_rank() and other window functions of dplyr, but to no avail.

Thanks for the help!

edited 19 hours ago

asked 21 hours ago

Humpelstielzchen

1,3901318

1

you can just about grab what you need with this work of beauty; as.numeric(as.factor(cumsum(is.na(d$criterium^NA)) + d$criterium^NA)) -- just needs to be applied by group

– user20650
19 hours ago

that is a really funny solution. Very good job!

– Humpelstielzchen
19 hours ago

In your example all of group A comes first, then group B. We don't need to handle cases with group=A, criterium=TRUE interspersed with group=B, criterium=TRUE?

– smci
19 hours ago

No, when group A stops so stops the sequence for group A.

– Humpelstielzchen
19 hours ago

But I'm suggesting if you construct an example with group=A, criterium=TRUE followed by group=B, criterium=TRUE (with no FALSE's in-between), would that get a new 'goal' number or not? Some of the answers here will fail because they don't group-by group or consider the discontinuity in group.

– smci
19 hours ago

|
show 1 more comment

I have a little nut to crack.

I have a data.frame like this:

   group criterium

1      A        NA

2      A      TRUE

3      A      TRUE

4      A      TRUE

5      A     FALSE

6      A     FALSE

7      A      TRUE

8      A      TRUE

9      A     FALSE

10     A      TRUE

11     A      TRUE

12     A      TRUE

13     B        NA

14     B     FALSE

15     B      TRUE

16     B      TRUE

17     B      TRUE

18     B     FALSE



structure(list(group = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 

1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("A", 

"B"), class = "factor"), criterium = c(NA, TRUE, TRUE, TRUE, 

FALSE, FALSE, TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, NA, FALSE, 

TRUE, TRUE, TRUE, FALSE)), class = "data.frame", row.names = c(NA, 

-18L))

And I want to rank the groups of TRUE in column criterium in ascending order while disregarding the FALSEand NA. The goal is to have a unique group identifier inside each group of group.

So the result should look like:

    group criterium goal

1      A        NA   NA

2      A      TRUE    1

3      A      TRUE    1

4      A      TRUE    1

5      A     FALSE   NA

6      A     FALSE   NA

7      A      TRUE    2

8      A      TRUE    2

9      A     FALSE   NA

10     A      TRUE    3

11     A      TRUE    3

12     A      TRUE    3

13     B        NA   NA

14     B     FALSE   NA

15     B      TRUE    1

16     B      TRUE    1

17     B      TRUE    1

18     B     FALSE   NA

I'm sure there is a relatively easy way to do this, I just can't think of one. I experimented with dense_rank() and other window functions of dplyr, but to no avail.

Thanks for the help!

edited 19 hours ago

asked 21 hours ago

Humpelstielzchen

1,3901318

I have a little nut to crack.

I have a data.frame like this:

   group criterium

1      A        NA

2      A      TRUE

3      A      TRUE

4      A      TRUE

5      A     FALSE

6      A     FALSE

7      A      TRUE

8      A      TRUE

9      A     FALSE

10     A      TRUE

11     A      TRUE

12     A      TRUE

13     B        NA

14     B     FALSE

15     B      TRUE

16     B      TRUE

17     B      TRUE

18     B     FALSE



structure(list(group = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 

1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("A", 

"B"), class = "factor"), criterium = c(NA, TRUE, TRUE, TRUE, 

FALSE, FALSE, TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, NA, FALSE, 

TRUE, TRUE, TRUE, FALSE)), class = "data.frame", row.names = c(NA, 

-18L))

And I want to rank the groups of TRUE in column criterium in ascending order while disregarding the FALSEand NA. The goal is to have a unique group identifier inside each group of group.

So the result should look like:

    group criterium goal

1      A        NA   NA

2      A      TRUE    1

3      A      TRUE    1

4      A      TRUE    1

5      A     FALSE   NA

6      A     FALSE   NA

7      A      TRUE    2

8      A      TRUE    2

9      A     FALSE   NA

10     A      TRUE    3

11     A      TRUE    3

12     A      TRUE    3

13     B        NA   NA

14     B     FALSE   NA

15     B      TRUE    1

16     B      TRUE    1

17     B      TRUE    1

18     B     FALSE   NA

I'm sure there is a relatively easy way to do this, I just can't think of one. I experimented with dense_rank() and other window functions of dplyr, but to no avail.

Thanks for the help!

r dplyr data.table rank

edited 19 hours ago

asked 21 hours ago

Humpelstielzchen

1,3901318

edited 19 hours ago

asked 21 hours ago

Humpelstielzchen

1,3901318

edited 19 hours ago

asked 21 hours ago

Humpelstielzchen

1,3901318

asked 21 hours ago

Humpelstielzchen

1,3901318

asked 21 hours ago

Humpelstielzchen

1,3901318

1

you can just about grab what you need with this work of beauty; as.numeric(as.factor(cumsum(is.na(d$criterium^NA)) + d$criterium^NA)) -- just needs to be applied by group

– user20650
19 hours ago

that is a really funny solution. Very good job!

– Humpelstielzchen
19 hours ago

In your example all of group A comes first, then group B. We don't need to handle cases with group=A, criterium=TRUE interspersed with group=B, criterium=TRUE?

– smci
19 hours ago

No, when group A stops so stops the sequence for group A.

– Humpelstielzchen
19 hours ago

But I'm suggesting if you construct an example with group=A, criterium=TRUE followed by group=B, criterium=TRUE (with no FALSE's in-between), would that get a new 'goal' number or not? Some of the answers here will fail because they don't group-by group or consider the discontinuity in group.

– smci
19 hours ago

|
show 1 more comment

1

you can just about grab what you need with this work of beauty; as.numeric(as.factor(cumsum(is.na(d$criterium^NA)) + d$criterium^NA)) -- just needs to be applied by group

– user20650
19 hours ago

that is a really funny solution. Very good job!

– Humpelstielzchen
19 hours ago

In your example all of group A comes first, then group B. We don't need to handle cases with group=A, criterium=TRUE interspersed with group=B, criterium=TRUE?

– smci
19 hours ago

No, when group A stops so stops the sequence for group A.

– Humpelstielzchen
19 hours ago

But I'm suggesting if you construct an example with group=A, criterium=TRUE followed by group=B, criterium=TRUE (with no FALSE's in-between), would that get a new 'goal' number or not? Some of the answers here will fail because they don't group-by group or consider the discontinuity in group.

– smci
19 hours ago

you can just about grab what you need with this work of beauty; as.numeric(as.factor(cumsum(is.na(d$criterium^NA)) + d$criterium^NA)) -- just needs to be applied by group

– user20650
19 hours ago

that is a really funny solution. Very good job!

– Humpelstielzchen
19 hours ago

In your example all of group A comes first, then group B. We don't need to handle cases with group=A, criterium=TRUE interspersed with group=B, criterium=TRUE?

– smci
19 hours ago

No, when group A stops so stops the sequence for group A.

– Humpelstielzchen
19 hours ago

But I'm suggesting if you construct an example with group=A, criterium=TRUE followed by group=B, criterium=TRUE (with no FALSE's in-between), would that get a new 'goal' number or not? Some of the answers here will fail because they don't group-by group or consider the discontinuity in group.

– smci
19 hours ago

|
show 1 more comment

4 Answers
4

active

oldest

votes

Another data.table approach:

library(data.table)

setDT(dt)

dt[, cr := rleid(criterium)][

    (criterium), goal := rleid(cr), by=.(group)]

answered 20 hours ago

chinsoon12

9,93611420

1

Tried with rleid but didn't get it to work. (+1)

– markus
19 hours ago

works for me. And seems to be the most elegant answer.

– Humpelstielzchen
19 hours ago

add a comment |

Maybe I have over-complicated this but one way with dplyr is

library(dplyr)



df %>%

  mutate(temp = replace(criterium, is.na(criterium), FALSE), 

         temp1 = cumsum(!temp)) %>%

   group_by(temp1) %>%

   mutate(goal =  +(row_number() == which.max(temp) & any(temp))) %>%

   group_by(group) %>%

   mutate(goal = ifelse(temp, cumsum(goal), NA)) %>%

   select(-temp, -temp1)



#  group criterium  goal

#   <fct> <lgl>     <int>

# 1 A     NA           NA

# 2 A     TRUE          1

# 3 A     TRUE          1

# 4 A     TRUE          1

# 5 A     FALSE        NA

# 6 A     FALSE        NA

# 7 A     TRUE          2

# 8 A     TRUE          2

# 9 A     FALSE        NA

#10 A     TRUE          3

#11 A     TRUE          3

#12 A     TRUE          3

#13 B     NA           NA

#14 B     FALSE        NA

#15 B     TRUE          1

#16 B     TRUE          1

#17 B     TRUE          1

#18 B     FALSE        NA

We first replace NAs in criterium column to FALSE and take cumulative sum over the negation of it (temp1). We group_by temp1 and assign 1 to every first TRUE value in the group. Finally grouping by group we take a cumulative sum for TRUE values or return NA for FALSE and NA values.

answered 21 hours ago

Ronak Shah

46.1k104268

add a comment |

A pure Base R solution, we can create a custom function via rle, and use it per group, i.e.

f1 <- function(x) {

    x[is.na(x)] <- FALSE

    rle1 <- rle(x)

    y <- rle1$values

    rle1$values[!y] <- 0

    rle1$values[y] <- cumsum(rle1$values[y])

    return(inverse.rle(rle1))

}





do.call(rbind, 

     lapply(split(df, df$group), function(i){i$goal <- f1(i$criterium); 

                                             i$goal <- replace(i$goal, is.na(i$criterium)|!i$criterium, NA); 

    i}))

Of course, If you want you can apply it via dplyr, i.e.

library(dplyr)



df %>% 

 group_by(group) %>% 

 mutate(goal = f1(criterium), 

        goal = replace(goal, is.na(criterium)|!criterium, NA))

which gives,

# A tibble: 18 x 3

# Groups:   group [2]

   group criterium  goal

   <fct> <lgl>     <dbl>

 1 A     NA           NA

 2 A     TRUE          1

 3 A     TRUE          1

 4 A     TRUE          1

 5 A     FALSE        NA

 6 A     FALSE        NA

 7 A     TRUE          2

 8 A     TRUE          2

 9 A     FALSE        NA

10 A     TRUE          3

11 A     TRUE          3

12 A     TRUE          3

13 B     NA           NA

14 B     FALSE        NA

15 B     TRUE          1

16 B     TRUE          1

17 B     TRUE          1

18 B     FALSE        NA

edited 18 hours ago

answered 20 hours ago

Sotos

31.4k51741

add a comment |

A data.table option using rle

library(data.table)

DT <- as.data.table(dat)

DT[, goal := {

  r <- rle(replace(criterium, is.na(criterium), FALSE))

  r$values <- with(r, cumsum(values) * values)          

  out <- inverse.rle(r)                                 

  replace(out, out == 0, NA)

}, by = group]

DT

#    group criterium goal

# 1:     A        NA   NA

# 2:     A      TRUE    1

# 3:     A      TRUE    1

# 4:     A      TRUE    1

# 5:     A     FALSE   NA

# 6:     A     FALSE   NA

# 7:     A      TRUE    2

# 8:     A      TRUE    2

# 9:     A     FALSE   NA

#10:     A      TRUE    3

#11:     A      TRUE    3

#12:     A      TRUE    3

#13:     B        NA   NA

#14:     B     FALSE   NA

#15:     B      TRUE    1

#16:     B      TRUE    1

#17:     B      TRUE    1

#18:     B     FALSE   NA

step by step

When we call r <- rle(replace(criterium, is.na(criterium), FALSE)) we get an object of class rle

r

#Run Length Encoding

#  lengths: int [1:9] 1 3 2 2 1 3 2 3 1

#  values : logi [1:9] FALSE TRUE FALSE TRUE FALSE TRUE ...

We manipulate the values compenent in the following way

r$values <- with(r, cumsum(values) * values)

r

#Run Length Encoding

#  lengths: int [1:9] 1 3 2 2 1 3 2 3 1

#  values : int [1:9] 0 1 0 2 0 3 0 4 0

That is, we replaced TRUEs with the cumulative sum of values and set the FALSEs to 0. Now inverse.rle returns a vector in which values will repeated lenghts times

out <- inverse.rle(r)

out

# [1] 0 1 1 1 0 0 2 2 0 3 3 3 0 0 4 4 4 0

This is almost what OP wants but we need to replace the 0s with NA

replace(out, out == 0, NA)

This is done for each group.

data

dat <- structure(list(group = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 

1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("A", 

"B"), class = "factor"), criterium = c(NA, TRUE, TRUE, TRUE, 

FALSE, FALSE, TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, NA, FALSE, 

TRUE, TRUE, TRUE, FALSE)), class = "data.frame", row.names = c(NA, 

-18L))

edited 16 hours ago

answered 20 hours ago

markus

15.4k11336

Wow, impressive. Thanks for introducing me to rleand inverse.rle. Gruß nach Leipzig.

– Humpelstielzchen
20 hours ago

1

@Humpelstielzchen Gern geschehen. Will try to simplify and explain the logic a bit.

– markus
20 hours ago

Thanks! I was dissecting your answer just like that. Your answer taught me the most. But chinsoon12 is just a Teufelskerl. ^^

– Humpelstielzchen
19 hours ago

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55606323%2frank-groups-within-a-grouped-sequence-of-true-false-and-na%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

4 Answers
4

active

oldest

votes

4 Answers
4

active

oldest

votes

Another data.table approach:

library(data.table)

setDT(dt)

dt[, cr := rleid(criterium)][

    (criterium), goal := rleid(cr), by=.(group)]

answered 20 hours ago

chinsoon12

9,93611420

1

Tried with rleid but didn't get it to work. (+1)

– markus
19 hours ago

works for me. And seems to be the most elegant answer.

– Humpelstielzchen
19 hours ago

add a comment |

Another data.table approach:

library(data.table)

setDT(dt)

dt[, cr := rleid(criterium)][

    (criterium), goal := rleid(cr), by=.(group)]

answered 20 hours ago

chinsoon12

9,93611420

1

Tried with rleid but didn't get it to work. (+1)

– markus
19 hours ago

works for me. And seems to be the most elegant answer.

– Humpelstielzchen
19 hours ago

add a comment |

Another data.table approach:

library(data.table)

setDT(dt)

dt[, cr := rleid(criterium)][

    (criterium), goal := rleid(cr), by=.(group)]

answered 20 hours ago

chinsoon12

9,93611420

Another data.table approach:

library(data.table)

setDT(dt)

dt[, cr := rleid(criterium)][

    (criterium), goal := rleid(cr), by=.(group)]

answered 20 hours ago

chinsoon12

9,93611420

answered 20 hours ago

chinsoon12

9,93611420

answered 20 hours ago

chinsoon12

9,93611420

answered 20 hours ago

chinsoon12

9,93611420

1

Tried with rleid but didn't get it to work. (+1)

– markus
19 hours ago

works for me. And seems to be the most elegant answer.

– Humpelstielzchen
19 hours ago

add a comment |

1

Tried with rleid but didn't get it to work. (+1)

– markus
19 hours ago

works for me. And seems to be the most elegant answer.

– Humpelstielzchen
19 hours ago

Tried with rleid but didn't get it to work. (+1)

– markus
19 hours ago

works for me. And seems to be the most elegant answer.

– Humpelstielzchen
19 hours ago

add a comment |

Maybe I have over-complicated this but one way with dplyr is

library(dplyr)



df %>%

  mutate(temp = replace(criterium, is.na(criterium), FALSE), 

         temp1 = cumsum(!temp)) %>%

   group_by(temp1) %>%

   mutate(goal =  +(row_number() == which.max(temp) & any(temp))) %>%

   group_by(group) %>%

   mutate(goal = ifelse(temp, cumsum(goal), NA)) %>%

   select(-temp, -temp1)



#  group criterium  goal

#   <fct> <lgl>     <int>

# 1 A     NA           NA

# 2 A     TRUE          1

# 3 A     TRUE          1

# 4 A     TRUE          1

# 5 A     FALSE        NA

# 6 A     FALSE        NA

# 7 A     TRUE          2

# 8 A     TRUE          2

# 9 A     FALSE        NA

#10 A     TRUE          3

#11 A     TRUE          3

#12 A     TRUE          3

#13 B     NA           NA

#14 B     FALSE        NA

#15 B     TRUE          1

#16 B     TRUE          1

#17 B     TRUE          1

#18 B     FALSE        NA

answered 21 hours ago

Ronak Shah

46.1k104268

add a comment |

Maybe I have over-complicated this but one way with dplyr is

library(dplyr)



df %>%

  mutate(temp = replace(criterium, is.na(criterium), FALSE), 

         temp1 = cumsum(!temp)) %>%

   group_by(temp1) %>%

   mutate(goal =  +(row_number() == which.max(temp) & any(temp))) %>%

   group_by(group) %>%

   mutate(goal = ifelse(temp, cumsum(goal), NA)) %>%

   select(-temp, -temp1)



#  group criterium  goal

#   <fct> <lgl>     <int>

# 1 A     NA           NA

# 2 A     TRUE          1

# 3 A     TRUE          1

# 4 A     TRUE          1

# 5 A     FALSE        NA

# 6 A     FALSE        NA

# 7 A     TRUE          2

# 8 A     TRUE          2

# 9 A     FALSE        NA

#10 A     TRUE          3

#11 A     TRUE          3

#12 A     TRUE          3

#13 B     NA           NA

#14 B     FALSE        NA

#15 B     TRUE          1

#16 B     TRUE          1

#17 B     TRUE          1

#18 B     FALSE        NA

answered 21 hours ago

Ronak Shah

46.1k104268

add a comment |

Maybe I have over-complicated this but one way with dplyr is

library(dplyr)



df %>%

  mutate(temp = replace(criterium, is.na(criterium), FALSE), 

         temp1 = cumsum(!temp)) %>%

   group_by(temp1) %>%

   mutate(goal =  +(row_number() == which.max(temp) & any(temp))) %>%

   group_by(group) %>%

   mutate(goal = ifelse(temp, cumsum(goal), NA)) %>%

   select(-temp, -temp1)



#  group criterium  goal

#   <fct> <lgl>     <int>

# 1 A     NA           NA

# 2 A     TRUE          1

# 3 A     TRUE          1

# 4 A     TRUE          1

# 5 A     FALSE        NA

# 6 A     FALSE        NA

# 7 A     TRUE          2

# 8 A     TRUE          2

# 9 A     FALSE        NA

#10 A     TRUE          3

#11 A     TRUE          3

#12 A     TRUE          3

#13 B     NA           NA

#14 B     FALSE        NA

#15 B     TRUE          1

#16 B     TRUE          1

#17 B     TRUE          1

#18 B     FALSE        NA

answered 21 hours ago

Ronak Shah

46.1k104268

Maybe I have over-complicated this but one way with dplyr is

library(dplyr)



df %>%

  mutate(temp = replace(criterium, is.na(criterium), FALSE), 

         temp1 = cumsum(!temp)) %>%

   group_by(temp1) %>%

   mutate(goal =  +(row_number() == which.max(temp) & any(temp))) %>%

   group_by(group) %>%

   mutate(goal = ifelse(temp, cumsum(goal), NA)) %>%

   select(-temp, -temp1)



#  group criterium  goal

#   <fct> <lgl>     <int>

# 1 A     NA           NA

# 2 A     TRUE          1

# 3 A     TRUE          1

# 4 A     TRUE          1

# 5 A     FALSE        NA

# 6 A     FALSE        NA

# 7 A     TRUE          2

# 8 A     TRUE          2

# 9 A     FALSE        NA

#10 A     TRUE          3

#11 A     TRUE          3

#12 A     TRUE          3

#13 B     NA           NA

#14 B     FALSE        NA

#15 B     TRUE          1

#16 B     TRUE          1

#17 B     TRUE          1

#18 B     FALSE        NA

answered 21 hours ago

Ronak Shah

46.1k104268

answered 21 hours ago

Ronak Shah

46.1k104268

answered 21 hours ago

Ronak Shah

46.1k104268

answered 21 hours ago

Ronak Shah

46.1k104268

add a comment |

A pure Base R solution, we can create a custom function via rle, and use it per group, i.e.

f1 <- function(x) {

    x[is.na(x)] <- FALSE

    rle1 <- rle(x)

    y <- rle1$values

    rle1$values[!y] <- 0

    rle1$values[y] <- cumsum(rle1$values[y])

    return(inverse.rle(rle1))

}





do.call(rbind, 

     lapply(split(df, df$group), function(i){i$goal <- f1(i$criterium); 

                                             i$goal <- replace(i$goal, is.na(i$criterium)|!i$criterium, NA); 

    i}))

Of course, If you want you can apply it via dplyr, i.e.

library(dplyr)



df %>% 

 group_by(group) %>% 

 mutate(goal = f1(criterium), 

        goal = replace(goal, is.na(criterium)|!criterium, NA))

which gives,

# A tibble: 18 x 3

# Groups:   group [2]

   group criterium  goal

   <fct> <lgl>     <dbl>

 1 A     NA           NA

 2 A     TRUE          1

 3 A     TRUE          1

 4 A     TRUE          1

 5 A     FALSE        NA

 6 A     FALSE        NA

 7 A     TRUE          2

 8 A     TRUE          2

 9 A     FALSE        NA

10 A     TRUE          3

11 A     TRUE          3

12 A     TRUE          3

13 B     NA           NA

14 B     FALSE        NA

15 B     TRUE          1

16 B     TRUE          1

17 B     TRUE          1

18 B     FALSE        NA

edited 18 hours ago

answered 20 hours ago

Sotos

31.4k51741

add a comment |

A pure Base R solution, we can create a custom function via rle, and use it per group, i.e.

f1 <- function(x) {

    x[is.na(x)] <- FALSE

    rle1 <- rle(x)

    y <- rle1$values

    rle1$values[!y] <- 0

    rle1$values[y] <- cumsum(rle1$values[y])

    return(inverse.rle(rle1))

}





do.call(rbind, 

     lapply(split(df, df$group), function(i){i$goal <- f1(i$criterium); 

                                             i$goal <- replace(i$goal, is.na(i$criterium)|!i$criterium, NA); 

    i}))

Of course, If you want you can apply it via dplyr, i.e.

library(dplyr)



df %>% 

 group_by(group) %>% 

 mutate(goal = f1(criterium), 

        goal = replace(goal, is.na(criterium)|!criterium, NA))

which gives,

# A tibble: 18 x 3

# Groups:   group [2]

   group criterium  goal

   <fct> <lgl>     <dbl>

 1 A     NA           NA

 2 A     TRUE          1

 3 A     TRUE          1

 4 A     TRUE          1

 5 A     FALSE        NA

 6 A     FALSE        NA

 7 A     TRUE          2

 8 A     TRUE          2

 9 A     FALSE        NA

10 A     TRUE          3

11 A     TRUE          3

12 A     TRUE          3

13 B     NA           NA

14 B     FALSE        NA

15 B     TRUE          1

16 B     TRUE          1

17 B     TRUE          1

18 B     FALSE        NA

edited 18 hours ago

answered 20 hours ago

Sotos

31.4k51741

add a comment |

A pure Base R solution, we can create a custom function via rle, and use it per group, i.e.

f1 <- function(x) {

    x[is.na(x)] <- FALSE

    rle1 <- rle(x)

    y <- rle1$values

    rle1$values[!y] <- 0

    rle1$values[y] <- cumsum(rle1$values[y])

    return(inverse.rle(rle1))

}





do.call(rbind, 

     lapply(split(df, df$group), function(i){i$goal <- f1(i$criterium); 

                                             i$goal <- replace(i$goal, is.na(i$criterium)|!i$criterium, NA); 

    i}))

Of course, If you want you can apply it via dplyr, i.e.

library(dplyr)



df %>% 

 group_by(group) %>% 

 mutate(goal = f1(criterium), 

        goal = replace(goal, is.na(criterium)|!criterium, NA))

which gives,

# A tibble: 18 x 3

# Groups:   group [2]

   group criterium  goal

   <fct> <lgl>     <dbl>

 1 A     NA           NA

 2 A     TRUE          1

 3 A     TRUE          1

 4 A     TRUE          1

 5 A     FALSE        NA

 6 A     FALSE        NA

 7 A     TRUE          2

 8 A     TRUE          2

 9 A     FALSE        NA

10 A     TRUE          3

11 A     TRUE          3

12 A     TRUE          3

13 B     NA           NA

14 B     FALSE        NA

15 B     TRUE          1

16 B     TRUE          1

17 B     TRUE          1

18 B     FALSE        NA

edited 18 hours ago

answered 20 hours ago

Sotos

31.4k51741

A pure Base R solution, we can create a custom function via rle, and use it per group, i.e.

f1 <- function(x) {

    x[is.na(x)] <- FALSE

    rle1 <- rle(x)

    y <- rle1$values

    rle1$values[!y] <- 0

    rle1$values[y] <- cumsum(rle1$values[y])

    return(inverse.rle(rle1))

}





do.call(rbind, 

     lapply(split(df, df$group), function(i){i$goal <- f1(i$criterium); 

                                             i$goal <- replace(i$goal, is.na(i$criterium)|!i$criterium, NA); 

    i}))

Of course, If you want you can apply it via dplyr, i.e.

library(dplyr)



df %>% 

 group_by(group) %>% 

 mutate(goal = f1(criterium), 

        goal = replace(goal, is.na(criterium)|!criterium, NA))

which gives,

# A tibble: 18 x 3

# Groups:   group [2]

   group criterium  goal

   <fct> <lgl>     <dbl>

 1 A     NA           NA

 2 A     TRUE          1

 3 A     TRUE          1

 4 A     TRUE          1

 5 A     FALSE        NA

 6 A     FALSE        NA

 7 A     TRUE          2

 8 A     TRUE          2

 9 A     FALSE        NA

10 A     TRUE          3

11 A     TRUE          3

12 A     TRUE          3

13 B     NA           NA

14 B     FALSE        NA

15 B     TRUE          1

16 B     TRUE          1

17 B     TRUE          1

18 B     FALSE        NA

edited 18 hours ago

answered 20 hours ago

Sotos

31.4k51741

edited 18 hours ago

answered 20 hours ago

Sotos

31.4k51741

answered 20 hours ago

Sotos

31.4k51741

answered 20 hours ago

Sotos

31.4k51741

add a comment |

A data.table option using rle

library(data.table)

DT <- as.data.table(dat)

DT[, goal := {

  r <- rle(replace(criterium, is.na(criterium), FALSE))

  r$values <- with(r, cumsum(values) * values)          

  out <- inverse.rle(r)                                 

  replace(out, out == 0, NA)

}, by = group]

DT

#    group criterium goal

# 1:     A        NA   NA

# 2:     A      TRUE    1

# 3:     A      TRUE    1

# 4:     A      TRUE    1

# 5:     A     FALSE   NA

# 6:     A     FALSE   NA

# 7:     A      TRUE    2

# 8:     A      TRUE    2

# 9:     A     FALSE   NA

#10:     A      TRUE    3

#11:     A      TRUE    3

#12:     A      TRUE    3

#13:     B        NA   NA

#14:     B     FALSE   NA

#15:     B      TRUE    1

#16:     B      TRUE    1

#17:     B      TRUE    1

#18:     B     FALSE   NA

step by step

When we call r <- rle(replace(criterium, is.na(criterium), FALSE)) we get an object of class rle

r

#Run Length Encoding

#  lengths: int [1:9] 1 3 2 2 1 3 2 3 1

#  values : logi [1:9] FALSE TRUE FALSE TRUE FALSE TRUE ...

We manipulate the values compenent in the following way

r$values <- with(r, cumsum(values) * values)

r

#Run Length Encoding

#  lengths: int [1:9] 1 3 2 2 1 3 2 3 1

#  values : int [1:9] 0 1 0 2 0 3 0 4 0

That is, we replaced TRUEs with the cumulative sum of values and set the FALSEs to 0. Now inverse.rle returns a vector in which values will repeated lenghts times

out <- inverse.rle(r)

out

# [1] 0 1 1 1 0 0 2 2 0 3 3 3 0 0 4 4 4 0

This is almost what OP wants but we need to replace the 0s with NA

replace(out, out == 0, NA)

This is done for each group.

data

dat <- structure(list(group = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 

1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("A", 

"B"), class = "factor"), criterium = c(NA, TRUE, TRUE, TRUE, 

FALSE, FALSE, TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, NA, FALSE, 

TRUE, TRUE, TRUE, FALSE)), class = "data.frame", row.names = c(NA, 

-18L))

edited 16 hours ago

answered 20 hours ago

markus

15.4k11336

Wow, impressive. Thanks for introducing me to rleand inverse.rle. Gruß nach Leipzig.

– Humpelstielzchen
20 hours ago

1

@Humpelstielzchen Gern geschehen. Will try to simplify and explain the logic a bit.

– markus
20 hours ago

Thanks! I was dissecting your answer just like that. Your answer taught me the most. But chinsoon12 is just a Teufelskerl. ^^

– Humpelstielzchen
19 hours ago

add a comment |

A data.table option using rle

library(data.table)

DT <- as.data.table(dat)

DT[, goal := {

  r <- rle(replace(criterium, is.na(criterium), FALSE))

  r$values <- with(r, cumsum(values) * values)          

  out <- inverse.rle(r)                                 

  replace(out, out == 0, NA)

}, by = group]

DT

#    group criterium goal

# 1:     A        NA   NA

# 2:     A      TRUE    1

# 3:     A      TRUE    1

# 4:     A      TRUE    1

# 5:     A     FALSE   NA

# 6:     A     FALSE   NA

# 7:     A      TRUE    2

# 8:     A      TRUE    2

# 9:     A     FALSE   NA

#10:     A      TRUE    3

#11:     A      TRUE    3

#12:     A      TRUE    3

#13:     B        NA   NA

#14:     B     FALSE   NA

#15:     B      TRUE    1

#16:     B      TRUE    1

#17:     B      TRUE    1

#18:     B     FALSE   NA

step by step

When we call r <- rle(replace(criterium, is.na(criterium), FALSE)) we get an object of class rle

r

#Run Length Encoding

#  lengths: int [1:9] 1 3 2 2 1 3 2 3 1

#  values : logi [1:9] FALSE TRUE FALSE TRUE FALSE TRUE ...

We manipulate the values compenent in the following way

r$values <- with(r, cumsum(values) * values)

r

#Run Length Encoding

#  lengths: int [1:9] 1 3 2 2 1 3 2 3 1

#  values : int [1:9] 0 1 0 2 0 3 0 4 0

That is, we replaced TRUEs with the cumulative sum of values and set the FALSEs to 0. Now inverse.rle returns a vector in which values will repeated lenghts times

out <- inverse.rle(r)

out

# [1] 0 1 1 1 0 0 2 2 0 3 3 3 0 0 4 4 4 0

This is almost what OP wants but we need to replace the 0s with NA

replace(out, out == 0, NA)

This is done for each group.

data

dat <- structure(list(group = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 

1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("A", 

"B"), class = "factor"), criterium = c(NA, TRUE, TRUE, TRUE, 

FALSE, FALSE, TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, NA, FALSE, 

TRUE, TRUE, TRUE, FALSE)), class = "data.frame", row.names = c(NA, 

-18L))

edited 16 hours ago

answered 20 hours ago

markus

15.4k11336

Wow, impressive. Thanks for introducing me to rleand inverse.rle. Gruß nach Leipzig.

– Humpelstielzchen
20 hours ago

1

@Humpelstielzchen Gern geschehen. Will try to simplify and explain the logic a bit.

– markus
20 hours ago

Thanks! I was dissecting your answer just like that. Your answer taught me the most. But chinsoon12 is just a Teufelskerl. ^^

– Humpelstielzchen
19 hours ago

add a comment |

A data.table option using rle

library(data.table)

DT <- as.data.table(dat)

DT[, goal := {

  r <- rle(replace(criterium, is.na(criterium), FALSE))

  r$values <- with(r, cumsum(values) * values)          

  out <- inverse.rle(r)                                 

  replace(out, out == 0, NA)

}, by = group]

DT

#    group criterium goal

# 1:     A        NA   NA

# 2:     A      TRUE    1

# 3:     A      TRUE    1

# 4:     A      TRUE    1

# 5:     A     FALSE   NA

# 6:     A     FALSE   NA

# 7:     A      TRUE    2

# 8:     A      TRUE    2

# 9:     A     FALSE   NA

#10:     A      TRUE    3

#11:     A      TRUE    3

#12:     A      TRUE    3

#13:     B        NA   NA

#14:     B     FALSE   NA

#15:     B      TRUE    1

#16:     B      TRUE    1

#17:     B      TRUE    1

#18:     B     FALSE   NA

step by step

When we call r <- rle(replace(criterium, is.na(criterium), FALSE)) we get an object of class rle

r

#Run Length Encoding

#  lengths: int [1:9] 1 3 2 2 1 3 2 3 1

#  values : logi [1:9] FALSE TRUE FALSE TRUE FALSE TRUE ...

We manipulate the values compenent in the following way

r$values <- with(r, cumsum(values) * values)

r

#Run Length Encoding

#  lengths: int [1:9] 1 3 2 2 1 3 2 3 1

#  values : int [1:9] 0 1 0 2 0 3 0 4 0

That is, we replaced TRUEs with the cumulative sum of values and set the FALSEs to 0. Now inverse.rle returns a vector in which values will repeated lenghts times

out <- inverse.rle(r)

out

# [1] 0 1 1 1 0 0 2 2 0 3 3 3 0 0 4 4 4 0

This is almost what OP wants but we need to replace the 0s with NA

replace(out, out == 0, NA)

This is done for each group.

data

dat <- structure(list(group = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 

1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("A", 

"B"), class = "factor"), criterium = c(NA, TRUE, TRUE, TRUE, 

FALSE, FALSE, TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, NA, FALSE, 

TRUE, TRUE, TRUE, FALSE)), class = "data.frame", row.names = c(NA, 

-18L))

edited 16 hours ago

answered 20 hours ago

markus

15.4k11336

A data.table option using rle

library(data.table)

DT <- as.data.table(dat)

DT[, goal := {

  r <- rle(replace(criterium, is.na(criterium), FALSE))

  r$values <- with(r, cumsum(values) * values)          

  out <- inverse.rle(r)                                 

  replace(out, out == 0, NA)

}, by = group]

DT

#    group criterium goal

# 1:     A        NA   NA

# 2:     A      TRUE    1

# 3:     A      TRUE    1

# 4:     A      TRUE    1

# 5:     A     FALSE   NA

# 6:     A     FALSE   NA

# 7:     A      TRUE    2

# 8:     A      TRUE    2

# 9:     A     FALSE   NA

#10:     A      TRUE    3

#11:     A      TRUE    3

#12:     A      TRUE    3

#13:     B        NA   NA

#14:     B     FALSE   NA

#15:     B      TRUE    1

#16:     B      TRUE    1

#17:     B      TRUE    1

#18:     B     FALSE   NA

step by step

When we call r <- rle(replace(criterium, is.na(criterium), FALSE)) we get an object of class rle

r

#Run Length Encoding

#  lengths: int [1:9] 1 3 2 2 1 3 2 3 1

#  values : logi [1:9] FALSE TRUE FALSE TRUE FALSE TRUE ...

We manipulate the values compenent in the following way

r$values <- with(r, cumsum(values) * values)

r

#Run Length Encoding

#  lengths: int [1:9] 1 3 2 2 1 3 2 3 1

#  values : int [1:9] 0 1 0 2 0 3 0 4 0

That is, we replaced TRUEs with the cumulative sum of values and set the FALSEs to 0. Now inverse.rle returns a vector in which values will repeated lenghts times

out <- inverse.rle(r)

out

# [1] 0 1 1 1 0 0 2 2 0 3 3 3 0 0 4 4 4 0

This is almost what OP wants but we need to replace the 0s with NA

replace(out, out == 0, NA)

This is done for each group.

data

dat <- structure(list(group = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 

1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("A", 

"B"), class = "factor"), criterium = c(NA, TRUE, TRUE, TRUE, 

FALSE, FALSE, TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, NA, FALSE, 

TRUE, TRUE, TRUE, FALSE)), class = "data.frame", row.names = c(NA, 

-18L))

edited 16 hours ago

answered 20 hours ago

markus

15.4k11336

edited 16 hours ago

answered 20 hours ago

markus

15.4k11336

answered 20 hours ago

markus

15.4k11336

answered 20 hours ago

markus

15.4k11336

Wow, impressive. Thanks for introducing me to rleand inverse.rle. Gruß nach Leipzig.

– Humpelstielzchen
20 hours ago

1

@Humpelstielzchen Gern geschehen. Will try to simplify and explain the logic a bit.

– markus
20 hours ago

Thanks! I was dissecting your answer just like that. Your answer taught me the most. But chinsoon12 is just a Teufelskerl. ^^

– Humpelstielzchen
19 hours ago

add a comment |

Wow, impressive. Thanks for introducing me to rleand inverse.rle. Gruß nach Leipzig.

– Humpelstielzchen
20 hours ago

1

@Humpelstielzchen Gern geschehen. Will try to simplify and explain the logic a bit.

– markus
20 hours ago

Thanks! I was dissecting your answer just like that. Your answer taught me the most. But chinsoon12 is just a Teufelskerl. ^^

– Humpelstielzchen
19 hours ago

Wow, impressive. Thanks for introducing me to rleand inverse.rle. Gruß nach Leipzig.

– Humpelstielzchen
20 hours ago

@Humpelstielzchen Gern geschehen. Will try to simplify and explain the logic a bit.

– markus
20 hours ago

Thanks! I was dissecting your answer just like that. Your answer taught me the most. But chinsoon12 is just a Teufelskerl. ^^

– Humpelstielzchen
19 hours ago

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Ndtyhij