'pandas' 태그의 글 목록

pandas

[Pandas]dataframe에서 특정 column값 기준으로 상위 rows 선별 2020.11.10
[Pandas]dataframe의 row를 shuffle하기 2020.11.10
[Pandas]중복인것만 살리기 2020.10.25
[Pandas]itertuples에 관하여 2020.10.25
[Pandas]Iteration을 특정 index 이상에서부터 하고 싶을 때 2020.10.25

728x90

PREV 1 NEXT

[Pandas]dataframe에서 특정 column값 기준으로 상위 rows 선별

2020. 11. 10. 01:46

728x90

df = df.nlargest(10, col) # col이란 column값 기준으로 상위 10개 rows를 선별

728x90

저작자표시 (새창열림)

'CS' 카테고리의 다른 글

[Ubuntu]command man과 info (0)	2020.11.12
[Ubuntu]command tr (0)	2020.11.12
[Pandas]dataframe의 row를 shuffle하기 (0)	2020.11.10
(미완)[SQL]기본 쿼리 예제 모음 (0)	2020.11.07
(미완)[Python]Global Interpreter Lock에 대해서 (0)	2020.11.06

[Pandas]dataframe의 row를 shuffle하기

2020. 11. 10. 01:41

728x90

df = df.sample(frac=1) # row 전체 shuffle

df = df.sample(frac=1).reset_index(drop=True) # shuffling하고 index reset

728x90

저작자표시 (새창열림)

'CS' 카테고리의 다른 글

[Ubuntu]command tr (0)	2020.11.12
[Pandas]dataframe에서 특정 column값 기준으로 상위 rows 선별 (0)	2020.11.10
(미완)[SQL]기본 쿼리 예제 모음 (0)	2020.11.07
(미완)[Python]Global Interpreter Lock에 대해서 (0)	2020.11.06
[Database]RDBMS(Relational DataBase Management System)란 무엇인가? (0)	2020.11.03

[Pandas]중복인것만 살리기

2020. 10. 25. 12:10

728x90

dataframe에서 특정 column에 duplicates가 존재하면 지우는 것은 drop_duplicates를 활용한다.

중복인 것만 살리고 싶다면?

즉, 특정 column에서 한번만 등장한 것을 지우고, 다중 등장인 row만 살리고 싶다면

df = df[df."COLUMN".duplicated(keep=False)]

df = df[df.duplicated(['COLUMN'], keep=False)

를 사용하자. (후자는 multiple columns에도 활용 가능)

중복 여부 확인은 duplicated()

중복값 처리는 drop_duplicated()

keep에 'first', 'last', False가 가능

'first'는 중복이 있으면 첫 등장하는 것은 True, 이후는 False

'last'는 중복이 있으면 마지막 등장하는 것은 True, 이전은 False

False는 중복이 있으면 처음이든 끝이든 모두 True

728x90

저작자표시 (새창열림)

'ML' 카테고리의 다른 글

[Numpy]numpy.ndarray에서 각 row마다 특정 column의 원소를 가져오고 싶을 때 (0)	2020.10.25
[Numpy]numpy.ndarray 각 원소에 dictionary map할 때 (0)	2020.10.25
[Pandas]itertuples에 관하여 (0)	2020.10.25
[Pandas]Iteration을 특정 index 이상에서부터 하고 싶을 때 (0)	2020.10.25
contextual data, contextual features, context features (0)	2020.10.19

[Pandas]itertuples에 관하여

2020. 10. 25. 12:03

728x90

itertuples가 iterrows보다 빠르다

itertuples에 들어갈 수 있는 argument로는

-index, default는 True이고 True이면 first element에 index가 포함된다. False면 index가 포함되지 않는다.

-name, default는 Pandas이며 반출하는 namedtuple의 이름을 가리킨다. 만약 name=None을 사용하면 namedtuple이 아니라 tuple을 반출한다.

-column의 개수가 255 이상이면 무조건 tuple을 반출한다. 따라서 itertuples를 활용한 함수를 정의시, 입력받은 dataframe의 column개수가 255이상인지 아닌지에 따라 구분해서 작성이 필요하다.

728x90

저작자표시 (새창열림)

'ML' 카테고리의 다른 글

[Numpy]numpy.ndarray 각 원소에 dictionary map할 때 (0)	2020.10.25
[Pandas]중복인것만 살리기 (0)	2020.10.25
[Pandas]Iteration을 특정 index 이상에서부터 하고 싶을 때 (0)	2020.10.25
contextual data, contextual features, context features (0)	2020.10.19
(미완)Python 3.9의 주요 특징 (0)	2020.10.06

[Pandas]Iteration을 특정 index 이상에서부터 하고 싶을 때

2020. 10. 25. 12:00

728x90

Python built-in인 islice을 활용한다.

from itertools import islice

for ind, row in islice(df.itertuples(index=False), 1, None): # 2번째 row부터 iteration 시작

728x90

저작자표시 (새창열림)

'ML' 카테고리의 다른 글

[Pandas]중복인것만 살리기 (0)	2020.10.25
[Pandas]itertuples에 관하여 (0)	2020.10.25
contextual data, contextual features, context features (0)	2020.10.19
(미완)Python 3.9의 주요 특징 (0)	2020.10.06
Streaming or stateful metric (0)	2020.10.06

나를 잃지 말자

pandas

[Pandas]dataframe에서 특정 column값 기준으로 상위 rows 선별

'CS' 카테고리의 다른 글

[Pandas]dataframe의 row를 shuffle하기

'CS' 카테고리의 다른 글

[Pandas]중복인것만 살리기

'ML' 카테고리의 다른 글

[Pandas]itertuples에 관하여

'ML' 카테고리의 다른 글

[Pandas]Iteration을 특정 index 이상에서부터 하고 싶을 때

'ML' 카테고리의 다른 글

+ Recent posts

티스토리툴바