Python

[Python] Concurrency, multiprocessing, multithreading, asyncio 기초, 예제 2022.11.09
[Python] 임포트타임과 런타임(class variable, decorator, mutable arguments) 2022.01.25
[Python] 해시가능하다(hashable) 정의 2022.01.25
[Python]Static variable, Static method, Class method 2020.11.22
[Algorithm]Find all sampling from nested dictionary of list 2020.11.15
[Python] Walrus Operator 2020.11.14
[Numpy]ndarray가 (built-in)list보다 빠른 이유 2020.11.02
[Python] __slots__ 사용에 대해 2020.11.02
[Python]Iterable VS Iterator (feat Generator) 정의에서 대조까지 2020.11.02
(미완)[Python]습관을 바꾸어, 속도를 높이자. 2020.10.30

728x90

PREV 1 2 NEXT

[Python] Concurrency, multiprocessing, multithreading, asyncio 기초, 예제

2022. 11. 9. 00:32

728x90

기초

동기와 비동기

동기 : 호출대상(=함수나 메서드)를 호출했을 때, 그 처리가 완료될 때까지 호출자는 다음 처리를 하지 않는 것
비동기 : 호출대상(=함수나 메서드)를 호출했을 때, 호출자는 다음 처리를 진행할 수 있는 것
- 스레드 기반 비동기를 다중스레드라 한다.
- 프로세스 기반 비동기를 다중프로세스라 한다.
- asyncio를 기반으로도 비동기가 가능하다.

다중스레드의 문제점

thread-safe 보장 필요

다중프로세스의 문제점

오버헤드가 다중스레드에 비해 큼
호출대상과 반환값은 피클가능한 객체만 가능
- 이는 multiprocessing 모듈의 Queue를 내부적으로 사용하기 때문
OS가 subprocess 생성 시 부모 process를 fork하는게 기본인 OS에서는 난수생성시 같은 값을 그대로 사용할 수 있음
- random seed를 초기화하거나 파이썬의 경우 built-in random을 사용하면 알아서 random seed 초기화함

다중프로세스vs다중스레드vsAsyncio

다중프로세스가 유용할 때
- CPU-bound
다중스레드가 유용할 때
- I/O-bound
Asyncio가 유용할 때
- I/O-bound
- thread-safe를 비교적 덜 생각해서, 유지보수가 잘 되어야하는 부분
  - 다중스레드의 경우 lock을 걸어야하는 경우가 많아지면 유지보수가 힘들어짐

concurrent.futures

동시 처리를 수행하기 위한 표준 라이브러리
예전에는 threading, multiprocessing 라이브러리를 활용했지만, 지금은 concurrent.futures로 둘 다 구현 가능하고 권장함

concurrent.futures.Future와 concurrent.futures.Executor

concurrent.futures.Executor는 추상 클래스

구현한 서브 클래스로는 ThreadPoolExecutor, ProcessPoolExecutor가 있다.
ThreadPoolExecutor나 ProcessPoolExecutor나 API 사용법이 유사하여 서로간 변경이 간단하다.
Executor 클래스에 비동기로 수행하고 싶은 callable 객체를 전달하면(submit메서드), 처리 실행을 스케줄링한 future(=Future 객체)를 반환한다.
- 처리 실행을 스케줄링했다란, 여러 스레드에 실행을 위임하는 것
- 첫 callable은 반드시 submit과 동시에 실행되지만, 그 이후 callable은 여유 worker가 있으면 submit과 동시에 실행되지만, 여유 worker가 없으면 pending으로 시작됨
- future의 메서드
  - return값 확인은 result
  - 상태 확인용 메서드(done, running, cancelled)
max_workers의 기본값은 코어수 * 5
concurrent.futures.Future는
- 테스트를 위해 직접 만들 수는 있지만, executor.submit을 통해 만들어 사용하기를 권장한다.(공식문서)
- submit과 동시에 실행할 수 있으면 하고 안되면 pending으로, 상태값을 갖는 객체이다.

concurrent.futures.wait

futures, timeout, return_when을 받아 지정한 return_when 규칙과 timeout에 따라 완료된 futures와 완료되지 않은 futures를 나눠 반환한다.

concurrent.futures.as_completed

futures, timeout을 받아 완료된 순으로 반환하는 iterators over futures를 받는다.
as_completed를 호출하기 전에 완료된 futures가 argument에 있었다면 그것을 먼저 반환함
메인 스레드에서 blocking된 상태로 완료되는 future 순으로 받는 형태이다.

asyncio

async/await 문법을 사용하여 동시성 코드를 작성하는 라이브러리
I/O-bound and 고수준의 정형화된 네트워크를 작성하는데 적합할 때가 많다.
동시성은 task단위로 이루어진다. coro단위가 아니다.
- 즉, await가 붙는 순간 await의 오른쪽을 현재 task에서 실행시키고 제어권을 other task로 이동시킨다.
- 다수의 task로 코드를 작성하지 않는다면 asyncio는 의미가 없다.
thread-safe일 경우가 많다. 단일 스레드 기반이므로
asyncio로 작성되지 않은 타 라이브러리와 함께 사용할 경우에는 loop.run_in_executor를 통해 다중스레드 형태로 사용한다.

High-level APIs

Coroutines and Tasks

Coroutines
- coroutine function
  - async def로 작성된 함수
- coroutine object
  - coroutine function을 호출하여 얻은 객체
  - coro라고 부를 때가 많다.
  - coro자체가 function body를 실행하지는 않는다.
- coro(coroutine function body)를 실행하는 방법
  - asyncio.run()에 넘긴다.
    - 대게는 asyncio.run(main()) 형태로 top-level entry point로서 사용
  - await coro
    - 이것은 coro가 완료될 때까지 current task에서는 기다리고 제어권은 other task가 가져간다.
  - asyncio.create_task(coro, *, name=None):
    - coro를 실행과 동시에 task로 변환하여 반환한다.(task_1)
      - coro를 실행하고 내부에 제어권을 계속 가져가다가 await를 만나면 그 때 task_1의 제어권을 other task에 넘긴다.
  - asyncio.gather(*coros_or_futures, loop=None, return_exceptions=False)
    - coro들은 task로 wrap된다. 꼭 입력받은 순으로 wrap되지는 않음
      - coro1의 body를 실행하다가 await를 만나면 coro1을 task1으로 바꾸고 이벤트 루프에 등록되고, 다른 argument에 대해서도 똑같이 한다. 하지만 순서는 보장안됨
    - one future를 반환한다. 따라서 awaitable
    - res = await asyncio.gather(coro1, coro2, ...) 의 형태로 사용
    - res는 완료순이 아니라 입력받은 순의 결과로 정렬된다.
  - asyncio.wait(fs, *, timeout=None, return_when=ALL_COMPLETED)
    - awaitable objects(=fs)를 동시적으로 실행한다.
    - timeout, return_when을 고려하여 wait
    - two sets of Future를 반환(done, not_done)
  - asyncio.as_completed(fs, *, loop=None, timeout=None)
    - awaitable objects(=fs)를 동시적으로 실행하고
    - iterator over coro
      - 각 coro는 fs에 있는 coro가 아니라, fs에 있는 것 중 return이 먼저 나오는 것을 받아오는 새로운 coro인 것이다.(wait for one)
Awaitables
- 키워드 await가 붙을 수 있는 객체를 awaitable object라 한다.
- await “something”은 current task에서 something이 완료될 때까지 기다린다는 것이다. 그리고 제어권은 other task로 간다.
  - other task가 없으면 current task에서 그냥 기다릴 뿐
  - 동시에 여러 coros을 실행케하려면 asyncio.gather나 asyncio.wait, asyncio.as_completed 를 실행한다.
  - 완전 동시는 아니지만 제어권을 금방 다시 가져와서 여러 coro를 순차적으로 실행시키는 것은 asyncio.create_task를 사용한다.
- awaitable은
  - coro
  - task/future
  - __await__가 정의된 클래스의 객체
current task에서 주도적으로 other task로 동작시키는 방법(제어권을 넘기는 방법)
- asyncio.sleep(0)을 사용한다.
  - asyncio.sleep은 coro를 반환한다.
  - await asyncio.sleep(0)을 하면 current task에서는 0초만큼 block하고 other task로 제어권을 넘김
event loop
- 이벤트 루프는 다수의 tasks를 가질 수 있고 각 task는 각자의 call stack을 갖는다. 하지만 각 시점마다 한개의 task만 처리한다.
- coroutine function 내부에서 현재 event loop를 얻으려면 asyncio.get_running_loop()를 호출
- loop.run_in_executor를 통해 동기 함수도 다중스레드로 처리하여 동시성을 얻을 수 있다.
- asyncio.{run, gather, wait} 등으로 coro를 실행시킨게 아니라, get_event_loop로 얻은 loop를 두고 loop.create_task했을 때는 coro가 즉시 동작하지 않는다.
- get_event_loop는 하나의 스레드 사용을 위해 만들어진 메소드다. 다수 스레드 경우 스레드 내부에 new_event_loop 호출하고 set_event_loop해서 스레드에 새 이벤트 루프를 매핑하고 new event loop를 해당 스레드에서 사용하면 된다.
- asyncio.create_task(coro)도 결국 loop.create_task(coro)을 쓰는 것이고, 후자의 경우 loop를 get_running_loop를 통해 얻은 loop여야 한다.
  - asyncio.create_task(coro)을 쓰는 게 현대식 방법

asyncio.Future

concurrent.futures.Future처럼, 언젠가 완료될 작업 클래스를 가리킨다.

asyncio.Task

asyncio.Future의 subclass
coroutin을 wraps했을 때의 future를 task라 한다.

asyncio.run

parameter로 받은 coro를 실행하고 result를 반환함
호출하면 이벤트 루프를 생성하고 이 이벤트 루프가 coro의 실행을 제어한다.
단일 이벤트 루프로 돌기 때문에, asyncio.run(main()) 형태로 한번의 호출만 존재한다.

asyncio.ensure_future

future(혹은 task)를 넣으면 그대로 반환
coro를 넣으면 task로 만들어 반환
프레임워크 설계자를 위한 함수(최종 사용자는 create_task 사용하면 됨)
- 무조건 결과가 future임을 만들기 위한 함수이다.
- asyncio.gather(*aws, …)도 내부에 보면 awaitable 객체가 오면 다 future가 되게 만드는 ensure_future를 호출한다.

loop.run_in_executor

asyncio를 활용하다가 동기 I/O를 동시적으로 처리하려고 할 때 사용
parameter로 concurrent.futures.Executor 구현 클래스 객체를 넣든가, None(default executor를 사용하며, set_default_executor를 따로 하지 않았다면 ThreadPoolExecutor를 사용)
executor, func, *args를 parameter로 받는다.

async with

기존 동기 context manager의 with 사용과 다른 점은 딱 하나다.
- enter와 exit의 동작이 coro로 수행된다는 점이다.
  - 즉, enter와 exit가 I/O-bound여서 event loop가 그 시간에 다른 task를 했으면할 때 사용한다.
즉, aenter와 aexit가 I/O-bound일 때, async로 둬서 다른 task가 cpu사용할 수 있게끔하기 위함이다.

async for

기존 동기 이터레이터 만드는 것과 차이점은 딱 하나다.
- anext가 coro로 동작한다는 것
- 대표적인 예로 next 원소를 db에서 가져오는 경우, I/O-bound이므로 이 때 event loop가 타 task를 다룰 수 있게 하는 상황에 필요

예시

순차 처리와 멀티스레딩 기본 예제

"""
아래 max_workers를 바꿔가며 테스트한다.
max_workers=1이면 첫 download url은 submit과 동시에 running이지만
다음 url부터는 submit해도 pending으로 시작됨

max_workers=3으로 두면 sequential보다 다중스레드가 낫다는걸 확인가능
"""
from concurrent.futures import as_completed
from concurrent.futures import ThreadPoolExecutor
import time
from hashlib import md5
from pathlib import Path
from urllib import request

def elapsed_time(f):
  def wrapper(*args, **kwargs):
    st = time.time()
    v = f(*args, **kwargs)
    print(f"{f.__name__}: {time.time() - st}")
    return v
  return wrapper

urls = [
  '<https://twitter.com>',
  '<https://facebook.com>',
  '<https://instagram.com>'
]

def download(url):
  print(f"DOWNLOAD START, url = {url}")
  req = request.Request(url)

  # 파일 이름에 / 등이 포함되지 않도록 함
  name = md5(url.encode('utf-8')).hexdigest()
  file_path = './' + name
  with request.urlopen(req) as res:
    Path(file_path).write_bytes(res.read())
    return url, file_path

@elapsed_time
def get_sequential():
  for url in urls:
    print(download(url))

@elapsed_time
def get_multi_thread():
  with ThreadPoolExecutor(max_workers=1) as executor:
    futures = [executor.submit(download, url) for url in urls]
    print(futures)
    for future in as_completed(futures):
      print(future.result())

if __name__ == '__main__':
  get_sequential()
  get_multi_thread()

thread-unsafe 예시와 thread-safe 수정 예시

from concurrent.futures import ThreadPoolExecutor, wait

# thread-unsafe
class Counter:
  def __init__(self):
    self.count = 0
  def increment(self):
    self.count += 1

def count_up(counter):
  for _ in range(1_000_000):
    counter.increment()

if __name__ == "__main__":
  counter = Counter()
  thread = 2
  with ThreadPoolExecutor() as e:
    futures = [e.submit(count_up, counter) for _ in range(thread)]
    done, not_done = wait(futures)  # (*)

  print(f'{counter.count=:,}')  # 2,000,000이 표시되지 않음

"""(*)
wait(fs, timeout=None, return_when=ALL_COMPLETED)
futures를 받아 기다린다.
	- parameter timeout
		timeout(seconds)까지 완료된 것과 완료되지 않은 것을 
		tuple로 반환한다.
	- parameter return_when
		FIRST_COMPLETED, FIRST_EXCEPTION, ALL_COMPLETED 상수 설정
		모두 concurrent.futures에 존재하는 상수들
		default는 ALL_COMPLETED
"""

import threading
from concurrent.futures import ThreadPoolExecutor, wait

# thread-unsafe
class ThreadSafeCounter:
  lock = threading.Lock()  # 
  def __init__(self):
    self.count = 0
  def increment(self):
    with self.lock:
      self.count += 1

def count_up(counter):
  for _ in range(1_000_000):
    counter.increment()

if __name__ == "__main__":
  counter = ThreadSafeCounter()
  thread = 2
  with ThreadPoolExecutor() as e:
    futures = [e.submit(count_up, counter) for _ in range(thread)]
    done, not_done = wait(futures)  # (*)

  print(f'{counter.count=:,}')  # 2,000,000

피보나치수열 - Sequential, 다중스레드, 다중프로세스 비교

"""
피보나치 수열 - Sequential
"""

import sys
import os
import time

def elapsed_time(f):
  def wrapper(*args, **kwargs):
    st = time.time()
    v = f(*args, **kwargs)
    print(f"{f.__name__}: {time.time() - st}")
    return v
  return wrapper

def fibonacci(n):
  a, b = 0, 1
  for _ in range(n):
    a, b = b, b + a
  else:
    return a

@elapsed_time
def get_sequential(nums):
  for num in nums:
    _ = fibonacci(num)

def main():
  n = 1_000_000
  nums = [n] * os.cpu_count()
  get_sequential(nums)

if __name__ == '__main__':
  main()  # 168초(cpu 개수마다 다를 값)

"""
피보나치 수열 - multi-process
"""

import os
import sys
import time

from concurrent.futures import ProcessPoolExecutor, as_completed

def elapsed_time(f):
  def wrapper(*args, **kwargs):
    st = time.time()
    v = f(*args, **kwargs)
    print(f"{f.__name__}: {time.time() - st}")
    return v
  return wrapper

def fibonacci(n):
  a, b = 0, 1
  for _ in range(n):
    a, b = b, b + a
  else:
    return a

@elapsed_time
def get_multi_process(nums):
  with ProcessPoolExecutor() as e:
    futures = [e.submit(fibonacci, num) for num in nums]
    for future in as_completed(futures):
      _ = future.result()

def main():
  n = 1_000_000
  nums = [n] * os.cpu_count()
  get_multi_process(nums)

if __name__ == '__main__':
  main()  # 약 14초

"""
피보나치 수열 - multi-thread
"""

import os
import sys
import time

from concurrent.futures import ThreadPoolExecutor, as_completed

def elapsed_time(f):
  def wrapper(*args, **kwargs):
    st = time.time()
    v = f(*args, **kwargs)
    print(f"{f.__name__}: {time.time() - st}")
    return v
  return wrapper

def fibonacci(n):
  a, b = 0, 1
  for _ in range(n):
    a, b = b, b + a
  else:
    return a

@elapsed_time
def get_multi_thread(nums):
  with ThreadPoolExecutor() as e:
    futures = [e.submit(fibonacci, num) for num in nums]
    for future in as_completed(futures):
      _ = future.result()

def main():
  n = 1_000_000
  nums = [n] * os.cpu_count()
  get_multi_thread(nums)

if __name__ == '__main__':
  main()

unpickable callable을 다중프로세스에 사용시 에러와 해결

from concurrent.futures import ProcessPoolExecutor, wait

func = lambda: 1
# def func():
#   return 1

def main():
  with ProcessPoolExecutor() as e:
    future = e.submit(func)
    done, not_done = wait([future])
  print(future.result())  # (*) 

if __name__ == "__main__":
  main()

"""
(*) 여기서 error raised, multiprocessing.Queue에서 반환값을 가져올 때 
pickle dump를 사용하는데 lambda가 pickle가능하지 않아 에러 발생
"""

from concurrent.futures import ThreadPoolExecutor, wait

func = lambda: 1
# def func():
#   return 1

def main():
  with ThreadPoolExecutor() as e:
    future = e.submit(func)
    done, not_done = wait([future])
  print(future.result())  # (*) 

if __name__ == "__main__":
  main()

"""
ThreadPoolExecutor는 에러 발생하지 않음
"""

from concurrent.futures import ProcessPoolExecutor, wait

def func():
  return 1

def main():
  with ProcessPoolExecutor() as e:
    future = e.submit(func)
    done, not_done = wait([future])
  print(future.result())  # (*) 

if __name__ == "__main__":
  main()

"""
일반함수는 ProcessPoolExecutor여도 pickle가능하므로 error not raised
"""

다중프로세스에서 fork 방식의 난수 생성 문제와 해결

# np_random_multiprocess.py

from concurrent.futures import ProcessPoolExecutor, as_completed

import numpy as np

def use_numpy_random():
  return np.random.random()

def main():
  with ProcessPoolExecutor() as e:
    futures = [e.submit(use_numpy_random) for _ in range(3)]
    for future in as_completed(futures):
      print(future.result())

if __name__ == "__main__":
  main()

"""
해당코드를 WINDOWS, MAC에서는 문제 없음
다만 UNIX 환경에서 실행하면 같은 값이 중복해서 나온다.
이는 UNIX에서는 자식 프로세스 만드는 방식이 부모 프로세스를 복제하는 fork방식이 기본값
"""

# 해결방법 1 np.random.seed() 추가

from concurrent.futures import ProcessPoolExecutor, as_completed

import numpy as np

def use_numpy_random():
	np.random.seed()  # (*)
  return np.random.random()

def main():
  with ProcessPoolExecutor() as e:
    futures = [e.submit(use_numpy_random) for _ in range(3)]
    for future in as_completed(futures):
      print(future.result())

if __name__ == "__main__":
  main()

"""
(*) np.random.seed()를 통해 난수 생성기를 초기화해서 해결
"""

# 해결방법 2 np.random말고 빌트인 random 사용

from concurrent.futures import ProcessPoolExecutor, as_completed

import numpy as np

def use_random():  # (*)
  return random.random()

def main():
  with ProcessPoolExecutor() as e:
    futures = [e.submit(use_random) for _ in range(3)]
    for future in as_completed(futures):
      print(future.result())

if __name__ == "__main__":
  main()

"""
(*) built-in random의 경우 fork할 떄 자동으로 난수생성기를 초기화함
"""

asyncio의 helloworld

import asyncio

async def main():
    print('Hello ...')
    await asyncio.sleep(10)
    print('... World!')

# Python 3.7+
asyncio.run(main())  # Hello ...  출력 후 10초 후에 ... World!가 출력

asyncio.gather 예제

import asyncio
import random

async def call_web_api(url):
  # Web API 처리를 sleep으로 대체
  print(f'send a request: {url}')
  await asyncio.sleep(random.random())
  print(f'got a response: {url}')
  return url

async def async_download(url):
  # await를 사용해 코루틴을 호출
  response = await call_web_api(url)
  return response

async def main():
  task = asyncio.gather(
    async_download('<https://twitter.com/>'),
    async_download('<https://facebook.com/>'),
    async_download('<https://instagram.com/>'),
  )
  return await task

result = asyncio.run(main())

asyncio.create_task를 통해 코루틴 함수 내부도 동시적으로 실행하기

# create_task를 쓰지 않은 예
# 6초 걸려 끝남
import asyncio

async def coro(n):
  await asyncio.sleep(n)
  return n

async def main():
  print(await coro(3))
  print(await coro(2))
  print(await coro(1))

asyncio.run(main())

# create_task를 사용한 예
# 3초만에 끝남
import asyncio

async def coro(n):
  await asyncio.sleep(n)
  return n

async def main():
  task1 = asyncio.create_task(coro(3))
  task2 = asyncio.create_task(coro(2))
  task3 = asyncio.create_task(coro(1))  
  print(await task1)
  print(await task2)
  print(await task3)
  

asyncio.run(main())

"""
3
2
1
"""

loop.run_in_executor를 통해 동기 I/O를 동시적으로 처리하기

import asyncio
from concurrent.futures import as_completed
from concurrent.futures import ThreadPoolExecutor
import time
from hashlib import md5
from pathlib import Path
from urllib import request

urls = [
  '<https://twitter.com>',
  '<https://facebook.com>',
  '<https://instagram.com>'
]

def download(url):
  print(f"DOWNLOAD START, url = {url}")
  req = request.Request(url)

  # 파일 이름에 / 등이 포함되지 않도록 함
  name = md5(url.encode('utf-8')).hexdigest()
  file_path = './' + name
  with request.urlopen(req) as res:
    Path(file_path).write_bytes(res.read())
    return url, file_path

async def main():
  loop = asyncio.get_running_loop()
  # 동기 I/O를 이용하는 download를 동시적으로 처리
  futures = []
  for url in urls:
    future = loop.run_in_executor(None, download, url)
    futures.append(future)

  for result in await asyncio.gather(*futures):
    print(result)

asyncio.run(main())

ForLoop도 task마다 돌게 만든 예제

import asyncio

async def counter(name: str):
    for i in range(0, 100):
        print(f"{name}: {i}")
        await asyncio.sleep(0)

async def main():
    tasks = []
    for n in range(0, 4):
        tasks.append(asyncio.create_task(counter(f"task{n}")))

    while True:
        tasks = [t for t in tasks if not t.done()]
        if len(tasks) == 0:
            return

        await tasks[0]

asyncio.run(main())

"""
CounterStart of task0
task0: 0
CounterStart of task1
task1: 0
CounterStart of task2
task2: 0
CounterStart of task3
task3: 0
task0: 1
task1: 1
task2: 1
task3: 1
task0: 2
task1: 2
task2: 2
task3: 2
task0: 3
task1: 3
task2: 3
task3: 3
task0: 4
task1: 4
task2: 4
task3: 4
task0: 5
task1: 5
task2: 5
task3: 5
task0: 6
task1: 6
task2: 6
task3: 6
task0: 7
task1: 7
task2: 7
task3: 7
task0: 8
task1: 8
task2: 8
task3: 8
task0: 9
task1: 9
task2: 9
task3: 9
"""

async with 예제

import asyncio
import sys

async def log(msg, l=10, f='.'):
  for i in range(l*2+1):
    if i == l:
      for c in msg:
        sys.stdout.write(c)
        sys.stdout.flush()
        await asyncio.sleep(0.05)
    else:
      sys.stdout.write(f)
      sys.stdout.flush()
    await asyncio.sleep(0.2)
  sys.stdout.write('\\n')
  sys.stdout.flush()

class AsyncCM:
  def __init__(self, i):
    self.i = i
  async def __aenter__(self):
    await log('Entering Context')
    return self
  async def __aexit__(self, *args):
    await log('Exiting Context')
    return self

async def main1():
  '''Test Async Context Manager'''
  async with AsyncCM(10) as c:
    for i in range(c.i):
      print(i)
## 실행

# loop = asyncio.get_event_loop()
# loop.run_until_complete(main1())
async def main():
  task = asyncio.gather(main1(), main1())
  return await task

asyncio.run(main())

"""
....................EEnntteerriinngg  CCoonntteexxtt....................
0
1
2
3
4
5
6
7
8
9
.
0
1
2
3
4
5
6
7
8
9
...................EExxiittiinngg  CCoonntteexxtt....................
"""
"""
__aenter__와 __aexit__가 각각 다른 task로 동작함을 알 수 있다.
"""

async for 예제

# 블로킹 이터레이터
class A:
	def __iter__(self):
		self.x = 0
		return self
	def __next__(self):
		if self.x > 2:
			raise StopIteration
		else:
			self.x += 1
			return self.x

for i in A():
	print(i)  

"""
1
2
3
"""

# 비동기(논블로킹) 이터레이터
import asyncio
from aioredis import create_redis

async def main():
	redis = await create_redis(('localhost', 6379))
	keys = ["Americas", "Africa", "Europe", "Asia"]
	async for value in OneAtATime(redis, keys):  # (1)
		await do_something_with(value)  # (2)

class OneAtATime:
	def __init__(self, redis, keys):
		self.redis = redis
		self.keys = keys
	def __aiter__(self):
		self.ikeys = iter(self.keys)
		return self
	async def __anext__(self):
		try:
			k = next(self.ikeys)
		except StopIteration:
			raise StopAsyncIteration  # (3)
		value = await self.redis.get(k)  # (4)
		return value

asyncio.run(main())

"""
- def __aiter__를 구현해야한다.(not async def)
- __aiter__()는 async def __anext__()를 구현한 객체를 반환해야한다.
- __anext__()는 반복의 각 단계에 대한 값을 반환하고, 반복이 끝나면 StopAsyncIteraction을 발생시켜야 한다.

(1) async for를 사용한다. 중요한 점은 반복 중에 다음 데이터를 얻기 전까지 반복 자체를 일시 정지할 수 있다는 점이다.
(2) I/O 동작을 수행한다고 하자. 예를 들면 데이터를 변환하고 다른 데이터베이스에 전달하는 동작
(3) 일반적인 iteration이 끝나서 StopIteration을 발생시키고 그것을 StopAsyncIteration으로 변환시키는 방법
(4) redis에서 값을 가져올 때도 await를 줘서 이벤트 루프가 다른 작업을 할 수 있게 했다.
"""

# 비동기(논블로킹) 제너레이터로 바꾼 예제
import asyncio
from aioredis import create_redis

async def main():
	redis = await create_redis(('localhost', 6379))
	keys = ["Americas", "Africa", "Europe", "Asia"]
	async for value in one_at_a_time(redis, keys):
		await do_something_with(value)

async def one_at_a_time(redis, keys):  # (1)
	for k in keys:
		value = await redis.get(k)
		yield value  # (2)

asyncio.run(main())

"""
- 코루틴과 제너레이터는 완전 다른 개념이다.
- 비동기 제너레이터는 일반 제너레이터와 유사하게 작동한다.
- 반복 수행 시, for 대신 async for를 사용한다.
(1) 비동기 제너레이터는 async def로 정의한다.
(2) 비동기 제너레이터는 제너레이터처럼 yield를 쓴다.

"""

contextlib 사용한 async with 예제

# 먼저 일반적인 블로킹 방식
from contextlib import contextmanager

@contextmanager
def web_page(url):
	data = download_webpage(url)  # (1)
	yield data
	update_stats(url)  # (2)

with web_page('google.com') as data:
	process(data)

"""
contextmanager 데커레이터는 제너레이터함수를 콘텍스트 관리자로 변환한다.
yield한 것이 as data의 data로 들어간다.
with문이 끝나면 update_stats(url)이 실행된다.

(1) 위 블로킹 방식에서는 (1)을 수행하는 동안 프로그램이 중지 된다.
download_webpage(url)이 coro가 되게 수정을 하든가, 아래 executor 방식을 사용한다.
coro가 되게 수정하면 베스트겠지만, third-pary library인 경우 수정이 쉽지 않다.

(2) URL을 통해 전달받은 데이터를 처리할 때마다 다운로드 횟수와 같은 통계를 갱신하는 상황을 가정한 것이다. 만약 이 함수가 데이터베이스를 갱신하는 것과 같은 I/O 동작을 내부적으로 포함하고 있다면 마찬가지로 블로킹 호출이 되므로, coro가 되게 하든 executor를 활용한다.
"""

# 논블로킹(단, download_webpage, update_stats이 coroutine function이어야함)
from contextlib import asynccontextmanager

@asynccontextmanager
async def web_page(url):
	data = await download_webpage(url)
	yield data
	await update_stats(url)

async with web_page('google.com') as data:
	process(data)

"""
download_webpage와 update_stats를 coroutine function으로 수정했을 때의 예제이다.
@asynccontextmanager를 사용하려면 async def로 정의해야한다.
yield가 있으니 제너레이터 함수인데, async def까지 사용했으니, 이 함수를 호출하면 비동기 제너레이터 객체를 반환한다.
비동기 제너레이터 함수/객체임을 확인하는 방법은 
inspect 모듈의 isasyncgenfunction()/isasyncgen()가 있다.
asynccontextmanager를 활용하려면 async with로 시작해야한다.
"""

# executor를 활용한 논블로킹(download_webpage, update_stats를 coro로 바꾸기 힘들 때)
from contextlib import asynccontextmanager

@asynccontextmanager
async def web_page(url):
	loop = asyncio.get_event_loop()
	data = await loop.run_in_executor(None, download_webpage, url)
	yield data
	await loop.run_in_executor(None, update_stats, url)

async with web_page('google.com') as data:
	process(data)

"""
별도의 스레드에서 executor로 블로킹 호출함수를 넘겨 논블로킹을 구현한 방식

"""

728x90

저작자표시

'Python' 카테고리의 다른 글

[Python] pip install mysqlclient 설치할 때 OSError: mysql_config not found 에러 (0)	2023.08.21
[Python][TroubleShooting] pip freeze가 버전이 아닌 설치 경로를 출력할 때 (0)	2023.08.21
[Python] 임포트타임과 런타임(class variable, decorator, mutable arguments) (0)	2022.01.25
[Python] 해시가능하다(hashable) 정의 (0)	2022.01.25
[Python] __new__ (0)	2020.12.13

[Python] 임포트타임과 런타임(class variable, decorator, mutable arguments)

2022. 1. 25. 23:52

728x90

import 타임이란 것을 체험?해보자.

# registration.py
# BEGIN REGISTRATION

registry = []  # <1>

def register(func):  # <2>
    print('running register(%s)' % func)  # <3>
    registry.append(func)  # <4>
    return func  # <5>

@register  # <6>
def f1():
    print('running f1()')

@register
def f2():
    print('running f2()')

def f3():  # <7>
    print('running f3()')

def main():  # <8>
    print('running main()')
    print('registry ->', registry)
    f1()
    f2()
    f3()

if __name__=='__main__':
    main()  # <9>

# END REGISTRATION

"""
$ python3 registration.py
running register(<function f1 at 0x100631bf8>)  # running main보다 먼저 실행
running register(<function f2 at 0x100631c80>)  # running main보다 먼저 실행
running main()
registry -> [<function f1 at 0x100631bf8>, <function f2 at 0x100631c80>]
running f1()
running f2()
running f3()
"""

# registration.py를 실행하지 않고 import하면
>>> import registration
running register(<function f1 at 0x10063b1e0>)  # decorator의 실행이 확인됨
running register(<function f2 at 0x10063b268>)

즉, decorator는 모듈이 import되자마자 실행되지만, decorated function은 명시적으로 호출될 때만 실행됨을 알 수 있다. 이 예제는 파이썬 개발자가 "import time"과 "runtime"이라고 부르는 것의 차이를 명확히 보여준다.

참고
- 위처럼 decorator와 decorated function이 한 모듈에 정의되어 있는 것은 일반적이지 않다. 대개 다른 모듈에 정의하여 사용함
- 위처럼 decorator가 decorated function을 그대로 반환하는 것은 일반적이지 않다. 대개 내부함수를 정의해서 반환한다.
- 그대로 반환하는 부분은 유용하지 않지만, registry.append 부분처럼 웹 프레임워크에서 "함수를 어떤 중앙의 레지스트리에 추가"하는 형태로 사용한다.
이와 마찬가지로
- class가 포함된 module을 import할 때 class variable이 실행됨
  - 이는 singleton을 아주 쉽게 구현할 때 사용
- function을 정의할 때 default value를 mutable한 것으로 설정하면 안되는 이유가
  - function을 import할 때, default value 객체를 만든다.
  - 이후 function을 call을 할 때, import할 때 만들어진 default value를 계속 활용한다.
  - 근데, 이 default value가 mutable이었으면 기대한 바와 다르게 프로그래밍 될 수 있다.
```
def test_function(a, b=[]):
    b.append(a)
    print(b)

if __name__ == '__main__':
    test_function(3)  # [3], expected
    test_function(4)  # [3, 4], unexpected
```

728x90

저작자표시

'Python' 카테고리의 다른 글

[Python][TroubleShooting] pip freeze가 버전이 아닌 설치 경로를 출력할 때 (0)	2023.08.21
[Python] Concurrency, multiprocessing, multithreading, asyncio 기초, 예제 (0)	2022.11.09
[Python] 해시가능하다(hashable) 정의 (0)	2022.01.25
[Python] __new__ (0)	2020.12.13
[Python] super() (0)	2020.12.13

[Python] 해시가능하다(hashable) 정의

2022. 1. 25. 01:20

728x90

수명 주기 동안 결코 변하지 않는 해시값을 갖고 있고(__hash__() 메서드가 필요) 다른 객체와 비교할 수 있으면(__eq__() 메서드가 필요), 객체를 해시가능하다고 한다. 동일(==)하다고 판단되는 객체는 반드시 해시값이 동일해야 한다.

모든 불변형은 해시가능하다.
- 조금 틀린 말이다. tuple의 경우 불변형이지만, 해시불가능한 객체를 참조할 때는 튜플 그 자체도 해시불가능해진다.
사용자 정의 자료형은 기본적으로 해시가능하다. 그 이유는 __hash__() 가 id() 를 이용하여 구하므로 모든 객체가 서로 다르기 때문
파이썬 dictionary의 경우 hashtable, open addressing 방식으로 구현되어 있다. dict.get("key")로 조회를 하는 경우 다음 순서를 따른다.
- "key"의 hash값으로 먼저 조회
- hash값이 hashtable에 존재하더라도 그 hash값을 가지는 원소의 "key"를 비교한다. hash와 key값 비교(==)가 완료된 것의 value를 반환한다.
따라서 아래의 예제가 이해된다.즉, key는 가변적이고 hash값은 dictionary 객체가 생성될 때의 값을 사용한다는 것을 알기.

class MyList(list):
    # 임의로 수명주기 동안 변하지 않는 hash가 아니라
    # 수명주기 동안에도 변하는 hash를 준 예제
    def __hash__(self):
        return sum(self)


my_list = MyList([1, 2, 3])

my_dict = {my_list: 'a'}

print(my_dict.get(my_list))  # a

my_list[2] = 4  # __hash__() becomes 7
print(next(iter(my_dict)))  # [1, 2, 4], 즉 key가 변경

print(my_dict.get(MyList([1, 2, 3])))  
# None, hash값은 같지만 key비교가 안맞아서 조회 불가 

print(my_dict.get(MyList([1, 2, 4])))  
# None, dictionary 객체가 생성될 때의 hash값(6)과 안맞아서 조회 불가
# 이 부분이 중요, 같은 key값을 넣었지만, dictionary(hashtable)이 보유한 hash는 6이라서 안된 점
# 즉, line:14에서 key변경을 해도 dictionary는 보유한 hash를 업데이트 하지 않음(6으로 고정)

my_list[0] = 0  # __hash_() is 6 again, but for different elements
print(next(iter(my_dict)))  # [0, 2, 4], 즉 key가 변경

print(my_dict.get(my_list))  
# 'a', hash값비교(6==6), key값비교(MyList([0,2,4])==MyList([0,2,4]))돼서 조회가능

따라서,
- hash메소드는 객체 수명 주기 동안 불변한 값으로 정의하라는 지침이 있는 이유를 알 수 있다.
- dictionary key는 왜 hashable만 받게 해둔 것인지를 알 수 있다.

728x90

저작자표시

'Python' 카테고리의 다른 글

[Python][TroubleShooting] pip freeze가 버전이 아닌 설치 경로를 출력할 때 (0)	2023.08.21
[Python] Concurrency, multiprocessing, multithreading, asyncio 기초, 예제 (0)	2022.11.09
[Python] 임포트타임과 런타임(class variable, decorator, mutable arguments) (0)	2022.01.25
[Python] __new__ (0)	2020.12.13
[Python] super() (0)	2020.12.13

[Python]Static variable, Static method, Class method

2020. 11. 22. 21:56

728x90

static variable이나 static method, 모두 static이란 의미는

"객체마다 달라지지 않는"

"객체 생성을 하지 않아도"

라는 컨셉이다.

Static variable = Class variable

파이썬에서는 static variable을 아주 간단하게 구현이 가능

class Sample(object):

sv = 'I'm static variable'

이와 같이 class variable이 static variable 역할을 한다.

즉 객체가 몇개를 생성하든, 같은 memory를 참조하는 variable이 static=class variable이다.

(즉 객체마다 달라지지 않고 고정된(static) variable)

클래스에서 직접 접근할 수 있는 method로는 파이썬에서 2가지가 존재한다. classmethod와 staticmethod

둘의 차이는

classmethod는 method 직전 라인에 @classmethod를 작성

staticmethod는 method 직전 라인에 @staticmethod를 작성

classmethod는 method에 첫 인자로 cls를 입력 e.g. def add(cls, x, y)

staticmethod는 method에 첫 인자가 cls나 self가 필요 없음 e.g. def add(x,y)

classmethod는 class variable로의 접근, 수정이 가능

staticmethod는 class variable로의 접근과 수정이 모두 불가능

classmethod는 factory methods(디자인 패턴)을 만들 때 사용

staticmethod는 utility functions을 만들 때 사용(사실 class 내에 정의하지 않고 사용해도 되지만, 해당 class에 있는게 문맥상 떨어져서 넣는게 대다수)

factory methods란, 조건에 따라 다른 객체를 생성하는 일을 factory라는 것에 위임하는 형태

2가지 효용을 갖는다.

-객체를 생성하는 곳이 너무 많은 상황에, 클래스의 생성자 수정이 발생했을 때, 각 객체생성부분을 다 찾아 수정하기가 번거로울 때, factory methods를 사용하면 factory methods부분만 수정하면 된다.

-조건에 따라 다른 객체를 생성하는 부분을 factory에 위임함으로써, 객체를 생성하여 사용하는 개발자로 하여금 개별 클래스에 대한 상세사항을 알 필요가 없도록 함

즉, 클래스가 아주 다양하고 조건에 따라 다른 클래스를 사용해야할 때 factory methods를 만들어 사용하며

python의 경우 그 때 classmethod가 그 일을 해줄 수가 있다.

참고자료:

class method vs static method in Python - GeeksforGeeks

A Computer Science portal for geeks. It contains well written, well thought and well explained computer science and programming articles, quizzes and practice/competitive programming/company interview Questions.

www.geeksforgeeks.org

728x90

저작자표시

'CS' 카테고리의 다른 글

[OS]인터럽트, 트랩 (0)	2020.11.24
(미완)UML, Unified Modeling Language란 (0)	2020.11.22
Programming paradigm, Declarative(선언형)과 Imperative(명령형)에 대해서 (0)	2020.11.22
[Algorithm]Find all sampling from nested dictionary of list (0)	2020.11.15
[Algorithm]the shortest repetitive pattern in a string (0)	2020.11.14

[Algorithm]Find all sampling from nested dictionary of list

2020. 11. 15. 02:11

728x90

문제:

주어진 dictionary의 key는 string, value는 list거나 dictionary이다.

이 때,

list인 경우는 element가 정수이고 size가 1이상이다.

dictionary인 경우는 마찬가지로 key가 string이고 value가 dictionary거나 list이다.

이 때, 원래의 dictionary의 nested structure는 유지하면서, list에서 1개의 원소를 꺼낸 가능한 모든 dictionary를 list로 반환하는 함수를 작성하여라.

ex)

dic1 = {'a':[1,2], 'b':[3], 'c':{'e':[4,5], 'd':{'e':[6,7], 'f':[8]}}}

이 때, 반환값은 list이고 각 원소는

{'a': 1, 'b': 3, 'c': {'e': 4, 'd': {'e': 6, 'f': 8}}}
{'a': 1, 'b': 3, 'c': {'e': 4, 'd': {'e': 7, 'f': 8}}}
{'a': 1, 'b': 3, 'c': {'e': 5, 'd': {'e': 6, 'f': 8}}}
{'a': 1, 'b': 3, 'c': {'e': 5, 'd': {'e': 7, 'f': 8}}}
{'a': 2, 'b': 3, 'c': {'e': 4, 'd': {'e': 6, 'f': 8}}}
{'a': 2, 'b': 3, 'c': {'e': 4, 'd': {'e': 7, 'f': 8}}}
{'a': 2, 'b': 3, 'c': {'e': 5, 'd': {'e': 6, 'f': 8}}}
{'a': 2, 'b': 3, 'c': {'e': 5, 'd': {'e': 7, 'f': 8}}}

해결:

비고:

-실제 업무에서 RandomizedSearch를 구현하는 데에 필요하여 작성함

-비교적 간단한 재귀인데도, 생각보다 시간이 필요했다.

-필요한 함수들을 침착하게 만들다보면 해결됨

-dictionary를 merge하는 것은 python 3.9에서는 union operator를 사용할 수 있지만, 이전 버전에서는 {**dic1, **dic2}라는 못생긴 방법으로 해야함

728x90

저작자표시

'CS' 카테고리의 다른 글

[Python]Static variable, Static method, Class method (0)	2020.11.22
Programming paradigm, Declarative(선언형)과 Imperative(명령형)에 대해서 (0)	2020.11.22
[Algorithm]the shortest repetitive pattern in a string (0)	2020.11.14
[Python] Walrus Operator (0)	2020.11.14
Index, Multi-index 이해하기 (0)	2020.11.12

[Python] Walrus Operator

2020. 11. 14. 12:10

728x90

Python 3.8부터는 Walrus Operator라는 기능을 제공한다.

이는 assignment operation인데 다음과 같은 상황에서, one line을 생략할 수 있다.

Before)

n=max([1,2,3])

if n > 5:

print("SUCCESS")

else:

print("FAIL")

After)

if (n := max([1,2,3]) > 5:

print("SUCCESS")

else:

print("FAILE")

print(n) # 3

즉, 어떠한 값을 assign과 condition판단을 동시에 해야하는 경우, line reduction효과를 얻을 수 있다.

728x90

저작자표시

'CS' 카테고리의 다른 글

[Algorithm]Find all sampling from nested dictionary of list (0)	2020.11.15
[Algorithm]the shortest repetitive pattern in a string (0)	2020.11.14
Index, Multi-index 이해하기 (0)	2020.11.12
(미완)[Ubuntu]명령어 test모음 (0)	2020.11.12
[Ubuntu]command mv (0)	2020.11.12

[Numpy]ndarray가 (built-in)list보다 빠른 이유

2020. 11. 2. 23:41

728x90

빠른 이유로는 3가지 이유가 있다.

1. numpy.ndarray는 a collection of similar data-types that are densely packed in memory.

(반면, list는 different data-types을 가질 수 있고 computation하는 데에 있어서 몇가지 과정을 더 타야한다.)

(이 부분은 하단의 설명을 다시 보자.)

2. numpy는 한 task를 subtask로 알아서 나눠서 parallely하게 작동하기도 한다.

(예를 들면, np.dot()을 수행할 때, argument의 size가 크면 cpu core 전부를 쓰는 것을 확인할 수 있다.)

3. numpy는 C언어로 구현되어 있어서 더 빠르게 작동한다.

하늘색이 실제 저장된 값

ndarray는 element조회시 "data"에 접근 후, 모든 데이터를 쭉 접근 가능

쭉이란, 각 값들이 메모리에 연속적으로 저장되어 있음

게다가 각 element가 같은 dtype이라, +N byte형태로 빠른 element 연속 접근이 가능

list의 경우 각 파란색 값이 메모리에 연속적으로 존재하지 않음

ob_item 내에 각 element의 reference(메모리 주소)를 갖고 있다.

그 reference를 타고 가더라도, 객체 자체가 있고, 그 객체 내에 ob_digit(객체가 int라면)로 가야 element에 접근

즉, 접근단계가 ndarray(1)에 비해 list(3)가 접근 단계가 많다.

그리고 next element에 접근할 때도 ndarray(1)인데 list(3)이므로 접근 방식 자체에서 느린 구조이다.

결론

ndarray는

dtype이 similar한 녀석들로 만들면 고속 접근이 가능

list는

dtype이 different하더라도 담을 수가 있음, 따라서 무한한 정수가 가능(담긴 element int가 아무리 커져도 int32 형태로 제한이 걸릴 일은 없다는 것)

참고자료:

towardsdatascience.com/how-fast-numpy-really-is-e9111df44347

How Fast Numpy Really is and Why?

A comparison with standard Python Lists.

towardsdatascience.com

spyhce.com/blog/cpython-data-structures

CPython data structures | Spyhce blog

In this article we have a look at the underlying C implementation, how these types work and what tweaks are there to make them faster. Learn more!

spyhce.com

www.youtube.com/watch?v=fiYD0yCou4k

728x90

저작자표시

'ML' 카테고리의 다른 글

[MachineLearning] Optimizer에 대해서 (0)	2020.11.05
[MachineLearning] Batch Normalization (0)	2020.11.05
[Clustering]K-means Clustering (0)	2020.10.29
[Numpy]각 row에서 k개의 the largest values 뽑기 (0)	2020.10.25
[Numpy]numpy.ndarray에서 각 row마다 특정 column의 원소를 가져오고 싶을 때 (0)	2020.10.25

[Python] slots 사용에 대해

2020. 11. 2. 23:35

728x90

배경:

우리가 어떤 class의 object를 만들 때면, 각 object마다 dictionary가 할당되는데, 이는 object의 attribute를 저장해두기 위함이다. 이는 dictionary다 보니까 메모리를 꽤나 차지한다. 다수의 object를 만들 때면 이러한 메모리들이 쌓여 태산이 된다.

사용할 상황:

class에 attibutes가 제한되어 있다면(즉 향후에 dynamic하게 attributes를 추가하거나 하는 작업이 없다면)

__slots__로 attribute를 제한하고 시작하고, 이렇게 되면 dictionary를 사용하지 않아 메모리를 절약함과 동시에

attribute 접근 속도도 빨라진다.

사용법:

class attribute로 __slots__ = ['att1', 'att2'] 형태로, 사용할 attributes를 선언하면,

이 class의 object는 __dict__를 갖지 않는다.

장점:

-object의 attribute 접근 속도 향상

-메모리 절약

단점:

-dynamic attribute 할당은 불가

참고자료:

www.geeksforgeeks.org/python-use-of-__slots__/

Python | Use of __slots__ - GeeksforGeeks

www.geeksforgeeks.org

728x90

저작자표시

'CS' 카테고리의 다른 글

(미완)[Python]Global Interpreter Lock에 대해서 (0)	2020.11.06
[Database]RDBMS(Relational DataBase Management System)란 무엇인가? (0)	2020.11.03
[Python]Iterable VS Iterator (feat Generator) 정의에서 대조까지 (0)	2020.11.02
(미완)소프트웨어 성능 측정 metric의 종류 (0)	2020.11.01
(미완)[Python]습관을 바꾸어, 속도를 높이자. (0)	2020.10.30

[Python]Iterable VS Iterator (feat Generator) 정의에서 대조까지

2020. 11. 2. 00:45

728x90

헷갈리는 Iterator, Iterable 개념의 공통점과 차이를 보아 확실히 이해해보자.

명확한 정의를 알아보자. 단순히 특징을 알아보는게 아니다.

즉, Iterable은 for-loop을 돌 수 있는 것, 따위 형태로 알아보자는 것이 아니라,

정확한 정의를 통해 성질을 알아보는 형태로 작성한다.

Iterable->Iterator->Generator 순으로 알아보자.

Iterable이란

An object capable of returning its members one at a time. Examples of iterables include all sequence types (such as list, str, and tuple) and some non-equence types like dict, file objects, and objects of any classes you define with an __iter__() method or with a __getitem__() method that implements Sequence semantics.

Iterator란

Iterators are required to have both __iter__() method that returns the iterator object itself so every iterator is also iterable and __next__() method.

Generator란, 아래 2개 중 하나를 가리킬 때 쓰는데, 여기서는 후자를 가리킨다.

-generator function란

function which returns a generator iterator. It looks like a normal function except that it contains yield expressions for producing a series of values usable in a for-loop or that can be retrieved one at a time with the next() function.

-generator iterator란

An object created by a generator function.

비고:

generator iterator는 생성하면 반드시 __iter__() method와 __next__() method를 갖는다.

따라서 generator는 반드시 iterator이다.

iteration을 돌 때 순차적인 값을 얻는 것에만 관심있다면 generator만으로 족하다.

하지만, 현재 current state를 조회한다는 등의 추가적인 method가 필요하다면 iterator를 직접 정의하여 사용하자.

따라서 다음 포함관계가 성립한다.

generator ⊂ iterator ⊂ iterable

위 3개의 개념을 헷갈리게 만드는 주범으로는

from collections.abc import Iterable

isinstance([object], Iterable)

-> 위에서 정의한 Iterable을 판단하기에 완벽하지 않다.

object가 만약 __getitem__() method만 갖는 object면 False를 반환한다.

그렇다면 정확한 Iterable 객체임을 판단하는 방법은 무엇인가?

iter([object])을 씌웠을 때 error가 안뜨면 object는 iterable 객체이다.

Iterable 객체가 loop을 돌 때 작동하는 방식은

-iter을 씌운 다음에

-next해서 원소들을 반환함

Iterable, Iterator, Generator를 각각 언제 쓸 것인가?

순환하고 값 조회를 더이상 할 필요가 없다면 iterator/generator를 사용

이 때, 순차적인 1회성 조회할 iterator를 만들 것이면 generator를 사용

순환하면서도 current state같은 것을 조회하려면 (custom) iterator를 사용

iter() function은 무엇인가?

Return an iterator object. The first argument is interpreted very differently depending on the presence of the second argument. Without a second argument, object must be a collection object which supports the iteration protocol (the __iter__() method), or it must support the sequence protocol (the __getitem__() method with integer arguments starting at 0). If it does not support either of those protocols, TypeError is raised. If the second argument, sentinel, is given, then object must be a callable object. The iterator created in this case will call object with no arguments for each call to its __next__() method; if the value returned is equal to sentinel, StopIteration will be raised, otherwise the value will be returned.

즉 sentinel(=보초병, 감시병)값이 argument로 넣냐 안넣냐에 따라 달라진다.

안넣으면 object의 __iter__()을 실행시켜 iterator를 반환한다.

넣으면 first argument는 반드시 callable이어야하고 sentinel값이 나올 때 까지 next가 가능한 iterator를 반환한다.

후자는 callable_iterator type이 반환된다.

전자든 후자든 isinstance([object], Iterator)로 iterator 확인 가능

즉 iter(iterable) or iter(callable object, sentinel) 형태로 사용

참고자료의 마지막을 꼭 보자.(Iterator을 만드는 다양한 방법을 제시한다.)

__iter__()를 활용하여 iterable을 정의하고 __next__()을 추가하여 iterator을 만드는 예제

__getitem__()을 활용한 iterable을 정의하고 iterator을 만들어서 동작하는 예제

callable object(class with __call__() or function)와 sentinel을 활용한 iterator 생성하는 예제

참고자료:

docs.python.org/3/glossary.html

Glossary — Python 3.9.0 documentation

The implicit conversion of an instance of one type to another during an operation which involves two arguments of the same type. For example, int(3.15) converts the floating point number to the integer 3, but in 3+4.5, each argument is of a different type

docs.python.org

stackoverflow.com/questions/2776829/difference-between-pythons-generators-and-iterators

Difference between Python's Generators and Iterators

What is the difference between iterators and generators? Some examples for when you would use each case would be helpful.

stackoverflow.com

twiserandom.com/python/python-iterable-and-iterator-a-tutorial/#implement_the_getitem_methods

Python iterable and iterator a tutorial | Twise Random

what is an iterable ? In python , objects are abstraction of data , they have methods that work with data , and help us to manipulate it . If we take a look at a list , and see all of its methods >>> import json >>> _list = [] # an empty list >>> json_list

twiserandom.com

728x90

저작자표시

'CS' 카테고리의 다른 글

[Database]RDBMS(Relational DataBase Management System)란 무엇인가? (0)	2020.11.03
[Python] __slots__ 사용에 대해 (0)	2020.11.02
(미완)소프트웨어 성능 측정 metric의 종류 (0)	2020.11.01
(미완)[Python]습관을 바꾸어, 속도를 높이자. (0)	2020.10.30
[Python]List comprehension에서 if else 쓰기 (0)	2020.10.30

(미완)[Python]습관을 바꾸어, 속도를 높이자.

2020. 10. 30. 10:50

728x90

dictionary의 값을 한번 조회하고 그 이후에 필요없다면 pop method를 사용하자.

(자꾸 get이나 dict[key]만 사용하려고함)

element가 sequence에 존재하는지를 자주 체크한다면

list보다는 set을 사용하자.

list는 O(n), set은 O(1)(open hashing 방식이므로)

메모리에 크게 들고 있지 않아도 된다면, generator를 쓸 생각을 하자.

이는 단순히 메모리 절약차원 뿐 아니라, 실제 속도도 더 높을 수가 있다.

순차적으로 sequence내 원소를 합하는 경우,

단순 큰 list였다면 메모리에 builing하느라 시간을 잡아먹기때문

Global variable을 Local ones로 바꿀 수 있다면 바꿔라

이는 variable search 순서에서 오는 속도높이는 방법인데

local에서 variable을 search할 때,

local->global->built-in namespace 순서로 찾기 때문이다.

Class property(예를 들면 self._value)를 자주 access한다면

마찬가지로 local variable(class내의 function에서 자주 접근한다면)로 바꿔라.

.function을 자주 쓸 것이면, function을 assign해서 쓰자.

즉, list.append()을 자주 쓸 것이면

appender = list.append라 두고 appender를 쓰자.

이는, function call할 때면 __getattribute__()나 __getattr__()을 호출하게되는데 이 time cost를 줄일 수 있다.

많은 string을 여러번 +연산을 할 때에는 join을 사용하자.

'a' + 'b'을 한다고하면 memory space 요청을 1번 하게 되고 그 때 a와 b를 copy하여 박는다.

'a' + 'b' + 'c'는 memory space요청을 2번하게 된다.

따라서 n개의 string을 +하면 n-1개의 요청을 하게된다.

이 때, join을 쓰면, 전체 필요 memory space를 계산하여 1번만 메모리 요청을 한다.

Multiple conditions에서 condition의 위치는

-if Condition1 and Condition2 에는 1과 2중 False가 자주 뜰 것을 Condition1에 할당

-if Condition1 or Condition2에는 1과 2중 True가 자주 뜰 것을 Condition1에 할당

(short-circuit evaluation, AND 혹은 OR 연산에 있어서 First condition에 의하여 return이 확정되면, 이후 condition은 연산을 실행조차 하지 않는 것을 가리킴)

While문보다는 Foor문을 쓰자.

이는 While문에서 i

참고자료:

towardsdatascience.com/10-techniques-to-speed-up-python-runtime-95e213e925dc

10 Techniques to Speed Up Python Runtime

Compare good writing style and bad writing style with the code runtime

towardsdatascience.com

728x90

저작자표시

'CS' 카테고리의 다른 글

[Python]Iterable VS Iterator (feat Generator) 정의에서 대조까지 (0)	2020.11.02
(미완)소프트웨어 성능 측정 metric의 종류 (0)	2020.11.01
[Python]List comprehension에서 if else 쓰기 (0)	2020.10.30
(미완)Faiss, Facebook AI Similarity Search (0)	2020.10.29
파이썬에서 원소 체크를 자주한다면 list말고 set이나 dictionary를 쓰자. (0)	2020.10.29

Python

기초

동기와 비동기

다중스레드의 문제점

다중프로세스의 문제점

다중프로세스vs다중스레드vsAsyncio

concurrent.futures

concurrent.futures.Future와 concurrent.futures.Executor

concurrent.futures.Executor는 추상 클래스

concurrent.futures.wait

concurrent.futures.as_completed

asyncio

High-level APIs

Coroutines and Tasks

asyncio.Future

asyncio.Task

asyncio.run

asyncio.ensure_future

loop.run_in_executor

async with

async for

예시

순차 처리와 멀티스레딩 기본 예제

thread-unsafe 예시와 thread-safe 수정 예시

피보나치수열 - Sequential, 다중스레드, 다중프로세스 비교

unpickable callable을 다중프로세스에 사용시 에러와 해결

다중프로세스에서 fork 방식의 난수 생성 문제와 해결

asyncio의 helloworld

asyncio.gather 예제

asyncio.create_task를 통해 코루틴 함수 내부도 동시적으로 실행하기

loop.run_in_executor를 통해 동기 I/O를 동시적으로 처리하기

ForLoop도 task마다 돌게 만든 예제

async with 예제

async for 예제

contextlib 사용한 async with 예제

'Python' 카테고리의 다른 글

'Python' 카테고리의 다른 글

'Python' 카테고리의 다른 글

'CS' 카테고리의 다른 글

'CS' 카테고리의 다른 글

'CS' 카테고리의 다른 글

'ML' 카테고리의 다른 글

'CS' 카테고리의 다른 글

'CS' 카테고리의 다른 글

'CS' 카테고리의 다른 글

티스토리툴바