How to use Pandas to get the count of every combination inclusive
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}
I am trying to figure out what combination of clothing customers are buying together. I can figure out the exact combination, but the problem I can't figure out is the count that includes the combination + others.
For example, I have:
Cust_num Item Rev
Cust1 Shirt1 $40
Cust1 Shirt2 $40
Cust1 Shorts1 $40
Cust2 Shirt1 $40
Cust2 Shorts1 $40
This should result in:
Combo Count
Shirt1,Shirt2,Shorts1 1
Shirt1,Shorts1 2
The best I can do is unique combinations:
Combo Count
Shirt1,Shirt2,Shorts1 1
Shirt1,Shorts1 1
I tried:
df = df.pivot(index='Cust_num',columns='Item').sum()
df[df.notnull()] = "x"
df = df.loc[:,"Shirt1":].replace("x", pd.Series(df.columns, df.columns))
col = df.stack().groupby(level=0).apply(','.join)
df2 = pd.DataFrame(col)
df2.groupby([0]).size().reset_index(name='counts')
But that is just the unique counts.
python pandas
New contributor
add a comment |
I am trying to figure out what combination of clothing customers are buying together. I can figure out the exact combination, but the problem I can't figure out is the count that includes the combination + others.
For example, I have:
Cust_num Item Rev
Cust1 Shirt1 $40
Cust1 Shirt2 $40
Cust1 Shorts1 $40
Cust2 Shirt1 $40
Cust2 Shorts1 $40
This should result in:
Combo Count
Shirt1,Shirt2,Shorts1 1
Shirt1,Shorts1 2
The best I can do is unique combinations:
Combo Count
Shirt1,Shirt2,Shorts1 1
Shirt1,Shorts1 1
I tried:
df = df.pivot(index='Cust_num',columns='Item').sum()
df[df.notnull()] = "x"
df = df.loc[:,"Shirt1":].replace("x", pd.Series(df.columns, df.columns))
col = df.stack().groupby(level=0).apply(','.join)
df2 = pd.DataFrame(col)
df2.groupby([0]).size().reset_index(name='counts')
But that is just the unique counts.
python pandas
New contributor
3
I feel like this is one sort of problem pandas would not be suitable for.
– coldspeed
yesterday
add a comment |
I am trying to figure out what combination of clothing customers are buying together. I can figure out the exact combination, but the problem I can't figure out is the count that includes the combination + others.
For example, I have:
Cust_num Item Rev
Cust1 Shirt1 $40
Cust1 Shirt2 $40
Cust1 Shorts1 $40
Cust2 Shirt1 $40
Cust2 Shorts1 $40
This should result in:
Combo Count
Shirt1,Shirt2,Shorts1 1
Shirt1,Shorts1 2
The best I can do is unique combinations:
Combo Count
Shirt1,Shirt2,Shorts1 1
Shirt1,Shorts1 1
I tried:
df = df.pivot(index='Cust_num',columns='Item').sum()
df[df.notnull()] = "x"
df = df.loc[:,"Shirt1":].replace("x", pd.Series(df.columns, df.columns))
col = df.stack().groupby(level=0).apply(','.join)
df2 = pd.DataFrame(col)
df2.groupby([0]).size().reset_index(name='counts')
But that is just the unique counts.
python pandas
New contributor
I am trying to figure out what combination of clothing customers are buying together. I can figure out the exact combination, but the problem I can't figure out is the count that includes the combination + others.
For example, I have:
Cust_num Item Rev
Cust1 Shirt1 $40
Cust1 Shirt2 $40
Cust1 Shorts1 $40
Cust2 Shirt1 $40
Cust2 Shorts1 $40
This should result in:
Combo Count
Shirt1,Shirt2,Shorts1 1
Shirt1,Shorts1 2
The best I can do is unique combinations:
Combo Count
Shirt1,Shirt2,Shorts1 1
Shirt1,Shorts1 1
I tried:
df = df.pivot(index='Cust_num',columns='Item').sum()
df[df.notnull()] = "x"
df = df.loc[:,"Shirt1":].replace("x", pd.Series(df.columns, df.columns))
col = df.stack().groupby(level=0).apply(','.join)
df2 = pd.DataFrame(col)
df2.groupby([0]).size().reset_index(name='counts')
But that is just the unique counts.
python pandas
python pandas
New contributor
New contributor
New contributor
asked yesterday
lys_dadlys_dad
565
565
New contributor
New contributor
3
I feel like this is one sort of problem pandas would not be suitable for.
– coldspeed
yesterday
add a comment |
3
I feel like this is one sort of problem pandas would not be suitable for.
– coldspeed
yesterday
3
3
I feel like this is one sort of problem pandas would not be suitable for.
– coldspeed
yesterday
I feel like this is one sort of problem pandas would not be suitable for.
– coldspeed
yesterday
add a comment |
4 Answers
4
active
oldest
votes
Using pandas.DataFrame.groupby
:
grouped_item = df.groupby('Cust_num')['Item']
subsets = grouped_item.apply(lambda x: set(x)).tolist()
Count = [sum(s2.issubset(s1) for s1 in subsets) for s2 in subsets]
combo = grouped_item.apply(lambda x:','.join(x))
combo = combo.reset_index()
combo['Count']=Count
Output:
Cust_num Item Count
0 Cust1 Shirt1,Shirt2,Shorts1 1
1 Cust2 Shirt1,Shorts1 2
How is finding subsets finding inclusive combination ofdf['Item']
? Subsets are not combinations. Below is output of what you doing:{' Shirt2', ' Shorts1', ' Shirt1'}: True
followed by{' Shorts1', ' Shirt1'}: True
and then{' Shorts1', ' Shirt1'}: True
. Then you sum then to get[1,2]
. I agree my approach i did is wrong so is yours. I would think@ResidentSleeper
has correct answer.
– Kill3rbee
yesterday
@Chris
, I think you need to find combination ofItem
first which would give you yoursubsets
. You would sum unique. FYI, I did not mock your comment. I was asking a question just like you asked me a question. I did not throw a hissy or downvote you. Thanks for sharing
– Kill3rbee
yesterday
@LeeMtoti Apologies for a strong language. I've deleted it. BTW,grouped_item
contains a combination of eachCustN
's choice of clothing.lambda x: set(x)
was implemented for aissubset
comparison. As you pointed out,issubset
returnsTrue
if and only if a set is contained in other set, which I still believe is what OP wants.
– Chris
yesterday
I believe the term combination is subtle misleading. My understanding is a set of each customer's choices of Item. Hope this makes my answer and intention clearer.
– Chris
yesterday
Apologies if the language wasn't correct. I wasn't sure how to describe it. I tried this with the first 1,000 customers, and it worked! I'm running it now with the full list.
– lys_dad
yesterday
add a comment |
Late answer, but you can use:
df = df.groupby(['Cust_num'], as_index=False).agg(','.join).drop(columns=['Rev']).set_index(['Item']).rename_axis("combo").rename(columns={"Cust_num": "Count"})
df['Count'] = df['Count'].str.replace(r'Cust','')
combo Count
Shirt1,Shirt2,Shorts1 1
Shirt1,Shorts1 2
add a comment |
I think you need to create a combination of items first.
How to get all possible combinations of a list’s elements?
I used the function from Dan H's answer.
from itertools import chain, combinations
def all_subsets(ss):
return chain(*map(lambda x: combinations(ss, x), range(0, len(ss)+1)))
uq_items = df.Item.unique()
list(all_subsets(uq_items))
[(),
('Shirt1',),
('Shirt2',),
('Shorts1',),
('Shirt1', 'Shirt2'),
('Shirt1', 'Shorts1'),
('Shirt2', 'Shorts1'),
('Shirt1', 'Shirt2', 'Shorts1')]
And use groupby
each customer to get their items combination.
ls =
for _, d in df.groupby('Cust_num', group_keys=False):
# Get all possible subset of items
pi = np.array(list(all_subsets(d.Item)))
# Fliter only > 1
ls.append(pi[[len(l) > 1 for l in pi]])
Then convert to Series
and use value_counts()
.
pd.Series(np.concatenate(ls)).value_counts()
(Shirt1, Shorts1) 2
(Shirt2, Shorts1) 1
(Shirt1, Shirt2, Shorts1) 1
(Shirt1, Shirt2) 1
This probably works, but I keep getting a "Memory Error". I'll have to look for a solution for the memory issue as I can't use my desktop for this project (sensitive data).
– lys_dad
yesterday
I did the first 1000 customers and it worked! Any suggestion for low memory laptops?
– lys_dad
yesterday
@lys_dad The accepted answer has already solved your memory problem, right?
– ResidentSleeper
yesterday
1
It does, yes. But thank you for your elegant solution!
– lys_dad
yesterday
add a comment |
My version which I believe is easier to understand
new_df = df.groupby("Cust_num").agg({lambda x: ''.join(x.unique())})
new_df ['count'] = range(1, len(new_df ) + 1)
Output:
Item Rev count
<lambda> <lambda>
Cust_num
Cust1 Shirt1 Shirt2 Shorts1 $40 1
Cust2 Shirt1 Shorts1 $40 2
Since you do not need the Rev
column, you can drop it:
new_df = new_df = new_df.drop(columns=["Rev"]).reset_index()
new_df
Output:
Cust_num Item count
<lambda>
0 Cust1 Shirt1 Shirt2 Shorts1 1
1 Cust2 Shirt1 Shorts1 2
This edit is to respond to @Chris
by looking at his approach written using list comprehension. He created an list of sets:
[{' Shirt1', ' Shirt2', ' Shorts1'}, {' Shirt1', ' Shorts1'}]
Then next step finds the subsets:
for s1 in subsets:
for s2 in subsets:
if s2.issubset(s1):
print("{}: {}".format(s2,s2.issubset(s1)))
Output:
{' Shirt2', ' Shorts1', ' Shirt1'}: True
{' Shorts1', ' Shirt1'}: True
{' Shorts1', ' Shirt1'}: True
You asked me to explain myself and I did. However after thinking about it, I realized your approach was wrong too. As such I was not mocking you, but thanking you for making me think about my solution. Also thanks to @ResidentSleeper for his solution.
@Chris
thank for mocking my comment and down voting my answer. I did not downvote your answer out of the spirit of learning from each other.
– Kill3rbee
yesterday
@Chris
thanks for removing the downvote. Whole point is to learn from each other while helping each other.
– Kill3rbee
yesterday
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
lys_dad is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55565916%2fhow-to-use-pandas-to-get-the-count-of-every-combination-inclusive%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
4 Answers
4
active
oldest
votes
4 Answers
4
active
oldest
votes
active
oldest
votes
active
oldest
votes
Using pandas.DataFrame.groupby
:
grouped_item = df.groupby('Cust_num')['Item']
subsets = grouped_item.apply(lambda x: set(x)).tolist()
Count = [sum(s2.issubset(s1) for s1 in subsets) for s2 in subsets]
combo = grouped_item.apply(lambda x:','.join(x))
combo = combo.reset_index()
combo['Count']=Count
Output:
Cust_num Item Count
0 Cust1 Shirt1,Shirt2,Shorts1 1
1 Cust2 Shirt1,Shorts1 2
How is finding subsets finding inclusive combination ofdf['Item']
? Subsets are not combinations. Below is output of what you doing:{' Shirt2', ' Shorts1', ' Shirt1'}: True
followed by{' Shorts1', ' Shirt1'}: True
and then{' Shorts1', ' Shirt1'}: True
. Then you sum then to get[1,2]
. I agree my approach i did is wrong so is yours. I would think@ResidentSleeper
has correct answer.
– Kill3rbee
yesterday
@Chris
, I think you need to find combination ofItem
first which would give you yoursubsets
. You would sum unique. FYI, I did not mock your comment. I was asking a question just like you asked me a question. I did not throw a hissy or downvote you. Thanks for sharing
– Kill3rbee
yesterday
@LeeMtoti Apologies for a strong language. I've deleted it. BTW,grouped_item
contains a combination of eachCustN
's choice of clothing.lambda x: set(x)
was implemented for aissubset
comparison. As you pointed out,issubset
returnsTrue
if and only if a set is contained in other set, which I still believe is what OP wants.
– Chris
yesterday
I believe the term combination is subtle misleading. My understanding is a set of each customer's choices of Item. Hope this makes my answer and intention clearer.
– Chris
yesterday
Apologies if the language wasn't correct. I wasn't sure how to describe it. I tried this with the first 1,000 customers, and it worked! I'm running it now with the full list.
– lys_dad
yesterday
add a comment |
Using pandas.DataFrame.groupby
:
grouped_item = df.groupby('Cust_num')['Item']
subsets = grouped_item.apply(lambda x: set(x)).tolist()
Count = [sum(s2.issubset(s1) for s1 in subsets) for s2 in subsets]
combo = grouped_item.apply(lambda x:','.join(x))
combo = combo.reset_index()
combo['Count']=Count
Output:
Cust_num Item Count
0 Cust1 Shirt1,Shirt2,Shorts1 1
1 Cust2 Shirt1,Shorts1 2
How is finding subsets finding inclusive combination ofdf['Item']
? Subsets are not combinations. Below is output of what you doing:{' Shirt2', ' Shorts1', ' Shirt1'}: True
followed by{' Shorts1', ' Shirt1'}: True
and then{' Shorts1', ' Shirt1'}: True
. Then you sum then to get[1,2]
. I agree my approach i did is wrong so is yours. I would think@ResidentSleeper
has correct answer.
– Kill3rbee
yesterday
@Chris
, I think you need to find combination ofItem
first which would give you yoursubsets
. You would sum unique. FYI, I did not mock your comment. I was asking a question just like you asked me a question. I did not throw a hissy or downvote you. Thanks for sharing
– Kill3rbee
yesterday
@LeeMtoti Apologies for a strong language. I've deleted it. BTW,grouped_item
contains a combination of eachCustN
's choice of clothing.lambda x: set(x)
was implemented for aissubset
comparison. As you pointed out,issubset
returnsTrue
if and only if a set is contained in other set, which I still believe is what OP wants.
– Chris
yesterday
I believe the term combination is subtle misleading. My understanding is a set of each customer's choices of Item. Hope this makes my answer and intention clearer.
– Chris
yesterday
Apologies if the language wasn't correct. I wasn't sure how to describe it. I tried this with the first 1,000 customers, and it worked! I'm running it now with the full list.
– lys_dad
yesterday
add a comment |
Using pandas.DataFrame.groupby
:
grouped_item = df.groupby('Cust_num')['Item']
subsets = grouped_item.apply(lambda x: set(x)).tolist()
Count = [sum(s2.issubset(s1) for s1 in subsets) for s2 in subsets]
combo = grouped_item.apply(lambda x:','.join(x))
combo = combo.reset_index()
combo['Count']=Count
Output:
Cust_num Item Count
0 Cust1 Shirt1,Shirt2,Shorts1 1
1 Cust2 Shirt1,Shorts1 2
Using pandas.DataFrame.groupby
:
grouped_item = df.groupby('Cust_num')['Item']
subsets = grouped_item.apply(lambda x: set(x)).tolist()
Count = [sum(s2.issubset(s1) for s1 in subsets) for s2 in subsets]
combo = grouped_item.apply(lambda x:','.join(x))
combo = combo.reset_index()
combo['Count']=Count
Output:
Cust_num Item Count
0 Cust1 Shirt1,Shirt2,Shorts1 1
1 Cust2 Shirt1,Shorts1 2
answered yesterday
ChrisChris
3,811523
3,811523
How is finding subsets finding inclusive combination ofdf['Item']
? Subsets are not combinations. Below is output of what you doing:{' Shirt2', ' Shorts1', ' Shirt1'}: True
followed by{' Shorts1', ' Shirt1'}: True
and then{' Shorts1', ' Shirt1'}: True
. Then you sum then to get[1,2]
. I agree my approach i did is wrong so is yours. I would think@ResidentSleeper
has correct answer.
– Kill3rbee
yesterday
@Chris
, I think you need to find combination ofItem
first which would give you yoursubsets
. You would sum unique. FYI, I did not mock your comment. I was asking a question just like you asked me a question. I did not throw a hissy or downvote you. Thanks for sharing
– Kill3rbee
yesterday
@LeeMtoti Apologies for a strong language. I've deleted it. BTW,grouped_item
contains a combination of eachCustN
's choice of clothing.lambda x: set(x)
was implemented for aissubset
comparison. As you pointed out,issubset
returnsTrue
if and only if a set is contained in other set, which I still believe is what OP wants.
– Chris
yesterday
I believe the term combination is subtle misleading. My understanding is a set of each customer's choices of Item. Hope this makes my answer and intention clearer.
– Chris
yesterday
Apologies if the language wasn't correct. I wasn't sure how to describe it. I tried this with the first 1,000 customers, and it worked! I'm running it now with the full list.
– lys_dad
yesterday
add a comment |
How is finding subsets finding inclusive combination ofdf['Item']
? Subsets are not combinations. Below is output of what you doing:{' Shirt2', ' Shorts1', ' Shirt1'}: True
followed by{' Shorts1', ' Shirt1'}: True
and then{' Shorts1', ' Shirt1'}: True
. Then you sum then to get[1,2]
. I agree my approach i did is wrong so is yours. I would think@ResidentSleeper
has correct answer.
– Kill3rbee
yesterday
@Chris
, I think you need to find combination ofItem
first which would give you yoursubsets
. You would sum unique. FYI, I did not mock your comment. I was asking a question just like you asked me a question. I did not throw a hissy or downvote you. Thanks for sharing
– Kill3rbee
yesterday
@LeeMtoti Apologies for a strong language. I've deleted it. BTW,grouped_item
contains a combination of eachCustN
's choice of clothing.lambda x: set(x)
was implemented for aissubset
comparison. As you pointed out,issubset
returnsTrue
if and only if a set is contained in other set, which I still believe is what OP wants.
– Chris
yesterday
I believe the term combination is subtle misleading. My understanding is a set of each customer's choices of Item. Hope this makes my answer and intention clearer.
– Chris
yesterday
Apologies if the language wasn't correct. I wasn't sure how to describe it. I tried this with the first 1,000 customers, and it worked! I'm running it now with the full list.
– lys_dad
yesterday
How is finding subsets finding inclusive combination of
df['Item']
? Subsets are not combinations. Below is output of what you doing: {' Shirt2', ' Shorts1', ' Shirt1'}: True
followed by {' Shorts1', ' Shirt1'}: True
and then {' Shorts1', ' Shirt1'}: True
. Then you sum then to get [1,2]
. I agree my approach i did is wrong so is yours. I would think @ResidentSleeper
has correct answer.– Kill3rbee
yesterday
How is finding subsets finding inclusive combination of
df['Item']
? Subsets are not combinations. Below is output of what you doing: {' Shirt2', ' Shorts1', ' Shirt1'}: True
followed by {' Shorts1', ' Shirt1'}: True
and then {' Shorts1', ' Shirt1'}: True
. Then you sum then to get [1,2]
. I agree my approach i did is wrong so is yours. I would think @ResidentSleeper
has correct answer.– Kill3rbee
yesterday
@Chris
, I think you need to find combination of Item
first which would give you your subsets
. You would sum unique. FYI, I did not mock your comment. I was asking a question just like you asked me a question. I did not throw a hissy or downvote you. Thanks for sharing– Kill3rbee
yesterday
@Chris
, I think you need to find combination of Item
first which would give you your subsets
. You would sum unique. FYI, I did not mock your comment. I was asking a question just like you asked me a question. I did not throw a hissy or downvote you. Thanks for sharing– Kill3rbee
yesterday
@LeeMtoti Apologies for a strong language. I've deleted it. BTW,
grouped_item
contains a combination of each CustN
's choice of clothing. lambda x: set(x)
was implemented for a issubset
comparison. As you pointed out, issubset
returns True
if and only if a set is contained in other set, which I still believe is what OP wants.– Chris
yesterday
@LeeMtoti Apologies for a strong language. I've deleted it. BTW,
grouped_item
contains a combination of each CustN
's choice of clothing. lambda x: set(x)
was implemented for a issubset
comparison. As you pointed out, issubset
returns True
if and only if a set is contained in other set, which I still believe is what OP wants.– Chris
yesterday
I believe the term combination is subtle misleading. My understanding is a set of each customer's choices of Item. Hope this makes my answer and intention clearer.
– Chris
yesterday
I believe the term combination is subtle misleading. My understanding is a set of each customer's choices of Item. Hope this makes my answer and intention clearer.
– Chris
yesterday
Apologies if the language wasn't correct. I wasn't sure how to describe it. I tried this with the first 1,000 customers, and it worked! I'm running it now with the full list.
– lys_dad
yesterday
Apologies if the language wasn't correct. I wasn't sure how to describe it. I tried this with the first 1,000 customers, and it worked! I'm running it now with the full list.
– lys_dad
yesterday
add a comment |
Late answer, but you can use:
df = df.groupby(['Cust_num'], as_index=False).agg(','.join).drop(columns=['Rev']).set_index(['Item']).rename_axis("combo").rename(columns={"Cust_num": "Count"})
df['Count'] = df['Count'].str.replace(r'Cust','')
combo Count
Shirt1,Shirt2,Shorts1 1
Shirt1,Shorts1 2
add a comment |
Late answer, but you can use:
df = df.groupby(['Cust_num'], as_index=False).agg(','.join).drop(columns=['Rev']).set_index(['Item']).rename_axis("combo").rename(columns={"Cust_num": "Count"})
df['Count'] = df['Count'].str.replace(r'Cust','')
combo Count
Shirt1,Shirt2,Shorts1 1
Shirt1,Shorts1 2
add a comment |
Late answer, but you can use:
df = df.groupby(['Cust_num'], as_index=False).agg(','.join).drop(columns=['Rev']).set_index(['Item']).rename_axis("combo").rename(columns={"Cust_num": "Count"})
df['Count'] = df['Count'].str.replace(r'Cust','')
combo Count
Shirt1,Shirt2,Shorts1 1
Shirt1,Shorts1 2
Late answer, but you can use:
df = df.groupby(['Cust_num'], as_index=False).agg(','.join).drop(columns=['Rev']).set_index(['Item']).rename_axis("combo").rename(columns={"Cust_num": "Count"})
df['Count'] = df['Count'].str.replace(r'Cust','')
combo Count
Shirt1,Shirt2,Shorts1 1
Shirt1,Shorts1 2
edited yesterday
answered yesterday
Pedro LobitoPedro Lobito
50.6k16138172
50.6k16138172
add a comment |
add a comment |
I think you need to create a combination of items first.
How to get all possible combinations of a list’s elements?
I used the function from Dan H's answer.
from itertools import chain, combinations
def all_subsets(ss):
return chain(*map(lambda x: combinations(ss, x), range(0, len(ss)+1)))
uq_items = df.Item.unique()
list(all_subsets(uq_items))
[(),
('Shirt1',),
('Shirt2',),
('Shorts1',),
('Shirt1', 'Shirt2'),
('Shirt1', 'Shorts1'),
('Shirt2', 'Shorts1'),
('Shirt1', 'Shirt2', 'Shorts1')]
And use groupby
each customer to get their items combination.
ls =
for _, d in df.groupby('Cust_num', group_keys=False):
# Get all possible subset of items
pi = np.array(list(all_subsets(d.Item)))
# Fliter only > 1
ls.append(pi[[len(l) > 1 for l in pi]])
Then convert to Series
and use value_counts()
.
pd.Series(np.concatenate(ls)).value_counts()
(Shirt1, Shorts1) 2
(Shirt2, Shorts1) 1
(Shirt1, Shirt2, Shorts1) 1
(Shirt1, Shirt2) 1
This probably works, but I keep getting a "Memory Error". I'll have to look for a solution for the memory issue as I can't use my desktop for this project (sensitive data).
– lys_dad
yesterday
I did the first 1000 customers and it worked! Any suggestion for low memory laptops?
– lys_dad
yesterday
@lys_dad The accepted answer has already solved your memory problem, right?
– ResidentSleeper
yesterday
1
It does, yes. But thank you for your elegant solution!
– lys_dad
yesterday
add a comment |
I think you need to create a combination of items first.
How to get all possible combinations of a list’s elements?
I used the function from Dan H's answer.
from itertools import chain, combinations
def all_subsets(ss):
return chain(*map(lambda x: combinations(ss, x), range(0, len(ss)+1)))
uq_items = df.Item.unique()
list(all_subsets(uq_items))
[(),
('Shirt1',),
('Shirt2',),
('Shorts1',),
('Shirt1', 'Shirt2'),
('Shirt1', 'Shorts1'),
('Shirt2', 'Shorts1'),
('Shirt1', 'Shirt2', 'Shorts1')]
And use groupby
each customer to get their items combination.
ls =
for _, d in df.groupby('Cust_num', group_keys=False):
# Get all possible subset of items
pi = np.array(list(all_subsets(d.Item)))
# Fliter only > 1
ls.append(pi[[len(l) > 1 for l in pi]])
Then convert to Series
and use value_counts()
.
pd.Series(np.concatenate(ls)).value_counts()
(Shirt1, Shorts1) 2
(Shirt2, Shorts1) 1
(Shirt1, Shirt2, Shorts1) 1
(Shirt1, Shirt2) 1
This probably works, but I keep getting a "Memory Error". I'll have to look for a solution for the memory issue as I can't use my desktop for this project (sensitive data).
– lys_dad
yesterday
I did the first 1000 customers and it worked! Any suggestion for low memory laptops?
– lys_dad
yesterday
@lys_dad The accepted answer has already solved your memory problem, right?
– ResidentSleeper
yesterday
1
It does, yes. But thank you for your elegant solution!
– lys_dad
yesterday
add a comment |
I think you need to create a combination of items first.
How to get all possible combinations of a list’s elements?
I used the function from Dan H's answer.
from itertools import chain, combinations
def all_subsets(ss):
return chain(*map(lambda x: combinations(ss, x), range(0, len(ss)+1)))
uq_items = df.Item.unique()
list(all_subsets(uq_items))
[(),
('Shirt1',),
('Shirt2',),
('Shorts1',),
('Shirt1', 'Shirt2'),
('Shirt1', 'Shorts1'),
('Shirt2', 'Shorts1'),
('Shirt1', 'Shirt2', 'Shorts1')]
And use groupby
each customer to get their items combination.
ls =
for _, d in df.groupby('Cust_num', group_keys=False):
# Get all possible subset of items
pi = np.array(list(all_subsets(d.Item)))
# Fliter only > 1
ls.append(pi[[len(l) > 1 for l in pi]])
Then convert to Series
and use value_counts()
.
pd.Series(np.concatenate(ls)).value_counts()
(Shirt1, Shorts1) 2
(Shirt2, Shorts1) 1
(Shirt1, Shirt2, Shorts1) 1
(Shirt1, Shirt2) 1
I think you need to create a combination of items first.
How to get all possible combinations of a list’s elements?
I used the function from Dan H's answer.
from itertools import chain, combinations
def all_subsets(ss):
return chain(*map(lambda x: combinations(ss, x), range(0, len(ss)+1)))
uq_items = df.Item.unique()
list(all_subsets(uq_items))
[(),
('Shirt1',),
('Shirt2',),
('Shorts1',),
('Shirt1', 'Shirt2'),
('Shirt1', 'Shorts1'),
('Shirt2', 'Shorts1'),
('Shirt1', 'Shirt2', 'Shorts1')]
And use groupby
each customer to get their items combination.
ls =
for _, d in df.groupby('Cust_num', group_keys=False):
# Get all possible subset of items
pi = np.array(list(all_subsets(d.Item)))
# Fliter only > 1
ls.append(pi[[len(l) > 1 for l in pi]])
Then convert to Series
and use value_counts()
.
pd.Series(np.concatenate(ls)).value_counts()
(Shirt1, Shorts1) 2
(Shirt2, Shorts1) 1
(Shirt1, Shirt2, Shorts1) 1
(Shirt1, Shirt2) 1
edited yesterday
answered yesterday
ResidentSleeperResidentSleeper
46210
46210
This probably works, but I keep getting a "Memory Error". I'll have to look for a solution for the memory issue as I can't use my desktop for this project (sensitive data).
– lys_dad
yesterday
I did the first 1000 customers and it worked! Any suggestion for low memory laptops?
– lys_dad
yesterday
@lys_dad The accepted answer has already solved your memory problem, right?
– ResidentSleeper
yesterday
1
It does, yes. But thank you for your elegant solution!
– lys_dad
yesterday
add a comment |
This probably works, but I keep getting a "Memory Error". I'll have to look for a solution for the memory issue as I can't use my desktop for this project (sensitive data).
– lys_dad
yesterday
I did the first 1000 customers and it worked! Any suggestion for low memory laptops?
– lys_dad
yesterday
@lys_dad The accepted answer has already solved your memory problem, right?
– ResidentSleeper
yesterday
1
It does, yes. But thank you for your elegant solution!
– lys_dad
yesterday
This probably works, but I keep getting a "Memory Error". I'll have to look for a solution for the memory issue as I can't use my desktop for this project (sensitive data).
– lys_dad
yesterday
This probably works, but I keep getting a "Memory Error". I'll have to look for a solution for the memory issue as I can't use my desktop for this project (sensitive data).
– lys_dad
yesterday
I did the first 1000 customers and it worked! Any suggestion for low memory laptops?
– lys_dad
yesterday
I did the first 1000 customers and it worked! Any suggestion for low memory laptops?
– lys_dad
yesterday
@lys_dad The accepted answer has already solved your memory problem, right?
– ResidentSleeper
yesterday
@lys_dad The accepted answer has already solved your memory problem, right?
– ResidentSleeper
yesterday
1
1
It does, yes. But thank you for your elegant solution!
– lys_dad
yesterday
It does, yes. But thank you for your elegant solution!
– lys_dad
yesterday
add a comment |
My version which I believe is easier to understand
new_df = df.groupby("Cust_num").agg({lambda x: ''.join(x.unique())})
new_df ['count'] = range(1, len(new_df ) + 1)
Output:
Item Rev count
<lambda> <lambda>
Cust_num
Cust1 Shirt1 Shirt2 Shorts1 $40 1
Cust2 Shirt1 Shorts1 $40 2
Since you do not need the Rev
column, you can drop it:
new_df = new_df = new_df.drop(columns=["Rev"]).reset_index()
new_df
Output:
Cust_num Item count
<lambda>
0 Cust1 Shirt1 Shirt2 Shorts1 1
1 Cust2 Shirt1 Shorts1 2
This edit is to respond to @Chris
by looking at his approach written using list comprehension. He created an list of sets:
[{' Shirt1', ' Shirt2', ' Shorts1'}, {' Shirt1', ' Shorts1'}]
Then next step finds the subsets:
for s1 in subsets:
for s2 in subsets:
if s2.issubset(s1):
print("{}: {}".format(s2,s2.issubset(s1)))
Output:
{' Shirt2', ' Shorts1', ' Shirt1'}: True
{' Shorts1', ' Shirt1'}: True
{' Shorts1', ' Shirt1'}: True
You asked me to explain myself and I did. However after thinking about it, I realized your approach was wrong too. As such I was not mocking you, but thanking you for making me think about my solution. Also thanks to @ResidentSleeper for his solution.
@Chris
thank for mocking my comment and down voting my answer. I did not downvote your answer out of the spirit of learning from each other.
– Kill3rbee
yesterday
@Chris
thanks for removing the downvote. Whole point is to learn from each other while helping each other.
– Kill3rbee
yesterday
add a comment |
My version which I believe is easier to understand
new_df = df.groupby("Cust_num").agg({lambda x: ''.join(x.unique())})
new_df ['count'] = range(1, len(new_df ) + 1)
Output:
Item Rev count
<lambda> <lambda>
Cust_num
Cust1 Shirt1 Shirt2 Shorts1 $40 1
Cust2 Shirt1 Shorts1 $40 2
Since you do not need the Rev
column, you can drop it:
new_df = new_df = new_df.drop(columns=["Rev"]).reset_index()
new_df
Output:
Cust_num Item count
<lambda>
0 Cust1 Shirt1 Shirt2 Shorts1 1
1 Cust2 Shirt1 Shorts1 2
This edit is to respond to @Chris
by looking at his approach written using list comprehension. He created an list of sets:
[{' Shirt1', ' Shirt2', ' Shorts1'}, {' Shirt1', ' Shorts1'}]
Then next step finds the subsets:
for s1 in subsets:
for s2 in subsets:
if s2.issubset(s1):
print("{}: {}".format(s2,s2.issubset(s1)))
Output:
{' Shirt2', ' Shorts1', ' Shirt1'}: True
{' Shorts1', ' Shirt1'}: True
{' Shorts1', ' Shirt1'}: True
You asked me to explain myself and I did. However after thinking about it, I realized your approach was wrong too. As such I was not mocking you, but thanking you for making me think about my solution. Also thanks to @ResidentSleeper for his solution.
@Chris
thank for mocking my comment and down voting my answer. I did not downvote your answer out of the spirit of learning from each other.
– Kill3rbee
yesterday
@Chris
thanks for removing the downvote. Whole point is to learn from each other while helping each other.
– Kill3rbee
yesterday
add a comment |
My version which I believe is easier to understand
new_df = df.groupby("Cust_num").agg({lambda x: ''.join(x.unique())})
new_df ['count'] = range(1, len(new_df ) + 1)
Output:
Item Rev count
<lambda> <lambda>
Cust_num
Cust1 Shirt1 Shirt2 Shorts1 $40 1
Cust2 Shirt1 Shorts1 $40 2
Since you do not need the Rev
column, you can drop it:
new_df = new_df = new_df.drop(columns=["Rev"]).reset_index()
new_df
Output:
Cust_num Item count
<lambda>
0 Cust1 Shirt1 Shirt2 Shorts1 1
1 Cust2 Shirt1 Shorts1 2
This edit is to respond to @Chris
by looking at his approach written using list comprehension. He created an list of sets:
[{' Shirt1', ' Shirt2', ' Shorts1'}, {' Shirt1', ' Shorts1'}]
Then next step finds the subsets:
for s1 in subsets:
for s2 in subsets:
if s2.issubset(s1):
print("{}: {}".format(s2,s2.issubset(s1)))
Output:
{' Shirt2', ' Shorts1', ' Shirt1'}: True
{' Shorts1', ' Shirt1'}: True
{' Shorts1', ' Shirt1'}: True
You asked me to explain myself and I did. However after thinking about it, I realized your approach was wrong too. As such I was not mocking you, but thanking you for making me think about my solution. Also thanks to @ResidentSleeper for his solution.
My version which I believe is easier to understand
new_df = df.groupby("Cust_num").agg({lambda x: ''.join(x.unique())})
new_df ['count'] = range(1, len(new_df ) + 1)
Output:
Item Rev count
<lambda> <lambda>
Cust_num
Cust1 Shirt1 Shirt2 Shorts1 $40 1
Cust2 Shirt1 Shorts1 $40 2
Since you do not need the Rev
column, you can drop it:
new_df = new_df = new_df.drop(columns=["Rev"]).reset_index()
new_df
Output:
Cust_num Item count
<lambda>
0 Cust1 Shirt1 Shirt2 Shorts1 1
1 Cust2 Shirt1 Shorts1 2
This edit is to respond to @Chris
by looking at his approach written using list comprehension. He created an list of sets:
[{' Shirt1', ' Shirt2', ' Shorts1'}, {' Shirt1', ' Shorts1'}]
Then next step finds the subsets:
for s1 in subsets:
for s2 in subsets:
if s2.issubset(s1):
print("{}: {}".format(s2,s2.issubset(s1)))
Output:
{' Shirt2', ' Shorts1', ' Shirt1'}: True
{' Shorts1', ' Shirt1'}: True
{' Shorts1', ' Shirt1'}: True
You asked me to explain myself and I did. However after thinking about it, I realized your approach was wrong too. As such I was not mocking you, but thanking you for making me think about my solution. Also thanks to @ResidentSleeper for his solution.
edited yesterday
answered yesterday
Kill3rbeeKill3rbee
13410
13410
@Chris
thank for mocking my comment and down voting my answer. I did not downvote your answer out of the spirit of learning from each other.
– Kill3rbee
yesterday
@Chris
thanks for removing the downvote. Whole point is to learn from each other while helping each other.
– Kill3rbee
yesterday
add a comment |
@Chris
thank for mocking my comment and down voting my answer. I did not downvote your answer out of the spirit of learning from each other.
– Kill3rbee
yesterday
@Chris
thanks for removing the downvote. Whole point is to learn from each other while helping each other.
– Kill3rbee
yesterday
@Chris
thank for mocking my comment and down voting my answer. I did not downvote your answer out of the spirit of learning from each other.– Kill3rbee
yesterday
@Chris
thank for mocking my comment and down voting my answer. I did not downvote your answer out of the spirit of learning from each other.– Kill3rbee
yesterday
@Chris
thanks for removing the downvote. Whole point is to learn from each other while helping each other.– Kill3rbee
yesterday
@Chris
thanks for removing the downvote. Whole point is to learn from each other while helping each other.– Kill3rbee
yesterday
add a comment |
lys_dad is a new contributor. Be nice, and check out our Code of Conduct.
lys_dad is a new contributor. Be nice, and check out our Code of Conduct.
lys_dad is a new contributor. Be nice, and check out our Code of Conduct.
lys_dad is a new contributor. Be nice, and check out our Code of Conduct.
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55565916%2fhow-to-use-pandas-to-get-the-count-of-every-combination-inclusive%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
3
I feel like this is one sort of problem pandas would not be suitable for.
– coldspeed
yesterday