'CS' 카테고리의 글 목록 (3 Page)

728x90

쿼리문을 최적화하는 것은 다른 문제이지만,

기본 쿼리문을 익히는 것은 단순 반복, 습관화하는 것일 뿐이다.

다만, 각 기능의 핵심 예제들을 한번에 보기 위하여 정리한다.

간단한 쿼리문은 의미를 적지 않고 넘어간다.

유의사항을 남기도록 하자.

(1) (=)

SELECT population

FROM world
WHERE name = 'France'

(2) (in category)

SELECT name, population

FROM world
WHERE name IN ('Brazil', 'Russia', 'India', 'China');

name이 category candidates에 속하는 걸로 table 반환

(3) (in range)

SELECT name, area

FROM world
WHERE area BETWEEN 250000 AND 300000

(각 숫자는 inclusive이다, 즉 해당 값도 포함시켜 반환함)

(4) (multiple conditions)

SELECT name,length(name)

FROM world

WHERE length(name)=5 and region='Europe'

(조건절에 있는 칼럼이 꼭 select한 column일 필요는 없음)

(5) (order by)

SELECT select_list

FROM table_name

ORDER BY column1, column2 DESC

(정렬 default는 ASC이므로 column1은 ASC로 ordering)

(column 1기준으로 ASC하고, column 1 값은 고정하고 column2로 DESC함)

(6) (order by, making new column)

SELECT orderNumber, orderlinenumber, quantityOrdered * priceEach

FROM orderdetails

ORDER BY quantityOrdered * priceEach DESC;

(이렇게 기존 column을 곱해가지고 new column을 기준으로 order by 가능하기도하고 select도 가능)

(7) (order by, making new column, using alias)

SELECT orderNumber, orderLineNumber, quantityOrdered * priceEach AS subtotal

FROM orderdetails

ORDER BY subtotal DESC;

(이렇게 making new column한 것의 alias를 줘서, order by시에 alias를 쓰는게 가능)

(8) (order by, custom order)

SELECT orderNumber, status

FROM orders

ORDER BY FIELD(status, 'In Process', 'On Hold', 'Cancelled', 'Resolved', 'Disputed', 'Shipped');

(FIELD(status, 'In Process', ...) 부분은 index of the status in the list ['In Process', ...]을 반환함)

(따라서 이 쿼리문의 결과는 status field에 In Process인 값부터 'Shipped'까지 정렬되어 반환)

(9) (like)

SELECT firstName, lastName

FROM employees

WHERE lastName LIKE '%son'

ORDER BY firstName;

(LIKE condition은 특정 pattern이면 True를 반환)

(wild card인 %는 any string of zero or more characters)

(wild card인 _는 any "single" character)

(10) (IS NULL)

SELECT lastName, firstName, reportsTo

FROM employees

WHERE reportsTo IS NULL;

(Database에서 NULL이란 missing or unknown을 가리킴, empty string이나 0을 가리키는게 아니니 주의)

(11) (<> 혹은 !=)

SELECT lastname, firstname, jobtitle

FROM employees

WHERE jobtitle <> 'Sales Rep';

(12) (> 혹은 <)

SELECT lastname, firstname, officeCode

FROM employees

WHERE officecode > 5;

(13) (DISTINCT)

SELECT DISTINCT state

FROM customers;

(state column을 가져오면서 duplicates row는 1개만 가져온다.)

(이 때, NULL이 중복이면 1개의 NULL row만 가져온다.)

(14) (IS NOT NULL)

SELECT state, city

FROM customers

WHERE state IS NOT NULL

ORDER BY state, city;

(15) (DISTINCT multiple columns)

SELECT DISTINCT state, city

FROM customers

WHERE state IS NOT NULL

ORDER BY state, city;

(이 때는 state, city 둘의 값이 동시에 같은 rows를 duplicate row로 취급하여 1개만 반환함)

(16) (GROUP BY)

SELECT state

FROM customers

GROUP BY state;

(이는 SELECT DISTINCT state FROM customers;와 같다. 즉 DISTINCT는 GROUP BY의 special case로 보자.)

(MySQl 8.0미만 버전에서는 GROUP BY가 implicit sorting을 하게 된다. 그 이상 버전에서는 하지 않음)

(17) (DISTINCT with an aggregate function)

SELECT COUNT(DISTINCT state)

FROM customers

WHERE country = 'USA';

(aggregate function인 COUNT, SUM, AVG 같은 것을 함께 써서 duplicated rows가 없는 취합이 가능)

(18) (DISTINCT with LIMIT)

SELECT DISTINCT state

FROM customers

WHERE state IS NOT NULL

LIMIT 5;

(MySQL은 LIMIT만큼의 결과를 찾은 즉시 searching을 그만 둔다.)

(19) (AND with NULL)

SELECT customername, country, state

FROM customers

WHERE country = 'USA' AND state = 'Victoria';

(NULL AND TRUE는 NULL을 반환, 즉 country가 'USA'이고 state가 null이면 select하지 않음)

(NULL AND FALSE는 FALSE을 반환)

(20) (OR AND가 존재할 때 판단 순서, Operator precedence)

SELECT true OR false AND false;

(이는 false AND false를 먼저 계산하여 false, 이후 true OR false계산하여 true, 따라서 1을 반환)

(21) (OR AND가 존재할 때 순서를 강제하기)

SELECT (true OR false) AND false:

(이렇게 순서를 괄호로 강제하면 true OR false에서 true, true AND false에서 false, 따라서 0을 반환)

(22) (OR AND가 여럿 존재하면 반드시 괄호 쓰기)

SELECT customername, country, creditLimit

FROM customers

WHERE country = 'USA' OR country = 'France' AND creditlimit > 10000;

(이 경우 결과는 (country='FRANCE' AND creditlimit>10000) or (country='USA')를 반환함)

(아마도 이 쿼리를 작성한 사람은 이 결과를 원한게 아닐 것이다.)

((country = 'USA' OR count = 'France') AND creditlimit > 10000; 을 사용했어야 했다.)

참고자료:

www.mysqltutorial.org/mysql-where/

MySQL WHERE

This tutorial shows you how to use MySQL WHERE clause to filter rows based on specified conditions.

www.mysqltutorial.org

728x90

저작자표시

'CS' 카테고리의 다른 글

[Pandas]dataframe에서 특정 column값 기준으로 상위 rows 선별 (0)	2020.11.10
[Pandas]dataframe의 row를 shuffle하기 (0)	2020.11.10
(미완)[Python]Global Interpreter Lock에 대해서 (0)	2020.11.06
[Database]RDBMS(Relational DataBase Management System)란 무엇인가? (0)	2020.11.03
[Python] __slots__ 사용에 대해 (0)	2020.11.02

728x90

GIL(Global Interpreter Lock)이란

In CPython, the global interpreter lock, or GIL, is a mutex that protects access to Python objects, preventing multiple threads from executing Python bytecodes at once. This lock is necessary mainly because CPython's memory management is not thread-safe.

deadlock이란

race condition이란

thread는 자기가 속한 process 내에 shared memory에 접근할 수가 있는데,

다수의 thread가 shared memory에 동시에 접근할 수도 있다.

이 때 다수의 thread가 동시에 한 변수에 각자의 작업을 한 것이,

각 thread가 작업한 것이 타 thread에 반영이 즉각 이루어 지지 않을 수가 있다. 이 현상을 race condition이라 한다.

즉, 다수 thread가 특정 값을 동시에 접근하여 변경하는 것을 가리킴

thread-safe란

race condition을 방지하며 thread가 작동함

mutex(mutual exclusion)란

참고자료:

dgkim5360.tistory.com/entry/understanding-the-global-interpreter-lock-of-cpython

왜 Python에는 GIL이 있는가

Python 사용자라면 한 번 쯤은 들어봤을 (안 들어봤다 해도 괜찮아요) 악명 높은 GIL (Global Interpreter Lock)에 대해 정리해본다. Global Interpreter Lock 그래서 GIL은 무엇인가? Python Wiki에서는 이렇게..

dgkim5360.tistory.com

timewizhan.tistory.com/entry/Global-Interpreter-Lock-GIL

Global Interpreter Lock (GIL)

해당 글은 아래 글을 번역 및 의역한 것이다. (보다 자세한 부분은 첨부된 페이지를 참조) https://realpython.com/python-gil/#why-wasnt-it-removed-in-python-3 Python의 Global Interpreter Lock (GIL)은 mute..

timewizhan.tistory.com

namu.wiki/w/Deadlock

Deadlock - 나무위키

먼 길아기가 잠드는 걸 보고 가려고 아빠는 머리맡에 앉아 계시고. 아빠가 가시는 걸 보고 자려고 아기는 말똥말똥 잠을 안 자고. Deadlock. 교착 상태. 운영체제 혹은 소프트웨어의 잘못된 자원 관

namu.wiki

728x90

저작자표시

'CS' 카테고리의 다른 글

[Pandas]dataframe의 row를 shuffle하기 (0)	2020.11.10
(미완)[SQL]기본 쿼리 예제 모음 (0)	2020.11.07
[Database]RDBMS(Relational DataBase Management System)란 무엇인가? (0)	2020.11.03
[Python] __slots__ 사용에 대해 (0)	2020.11.02
[Python]Iterable VS Iterator (feat Generator) 정의에서 대조까지 (0)	2020.11.02

728x90

DBMS의 정의:

a software system that enables users to define, create, maintain and control access to the database.

DBMS의 특징:

데이터 무결성(Integrity), 즉 primary key역할을 하는 경우 not null, unique등의 제약조건이 필요

데이터의 독립성, 즉 데이터베이스의 크기가 변경되거나 저장소가 변경되어도 DBMS는 잘 동작 해야한다.

보안, 계정별 접근권한에 따른 접근만 가능

데이터 중복 최소화, 여러계정이 데이터베이스를 공유하여 접근함으로써, 각 계정마다 데이터를 중복해서 가질 필요가 없음

응용 프로그램 제작 및 수정이 쉬워짐, 통일된 파일 형식으로 프로그램 작성|유지보수 등이 일관됨

데이터의 안전성 향상, 백업|복원 기능을 제공

간단한 예로

회원정보.xlsx(column이 회원코드, 회원이름, 회원주소 etc),

구매정보.xlsx(column이 회원코드, 회원주소, 상품코드 etc),

이 때 회원이 회원주소를 수정했다면 일일이 두개 파일 모두 수정해야한다.

엑셀파일이 2개가 아니라 많다면?

SQL(Structured Query Language)란 DBMS를 통해 정보의 입출력, 관리 등을 할 때 사용하는 언어

표준화된 언어로 각각의 DBMS는 거의다 호환되지만 약간의 차이가 존재하는 형태

클라이언트에서 질의하고 서버가 처리한 후 클라이언트에게 결과를 주는 형태의 대화식 언어

RDBMS(Relational DBMS)란

데이터를 row와 column을 이루는 하나의 이상의 테이블(=관계, relation)으로 정리하며 primary key가 각 row를 식별한다.

각 table을 join하여 사용하는 것이 큰 특징

참고자료:

www.kyobobook.co.kr/product/detailViewKor.laf?ejkGb=KOR&mallGb=KOR&barcode=9791162242780&orderClick=LAG&Kc=

이것이 MySQL이다 - 교보문고

2016년 출간 후 데이터베이스 도서 분야 부동의 베스트셀러 1위를 지켜오던 『이것이 MySQL이다』가 MySQL 8.0 버전을 반영하여 개정되었다. 특히 ‘파이썬 기초 및 파이썬과 데이터베이스의 연동’,

www.kyobobook.co.kr

en.wikipedia.org/wiki/Relational_database

Relational database - Wikipedia

From Wikipedia, the free encyclopedia Jump to navigation Jump to search Digital database whose organization is based on the relational model of data A relational database is a digital database based on the relational model of data, as proposed by E. F. Cod

en.wikipedia.org

728x90

저작자표시

'CS' 카테고리의 다른 글

(미완)[SQL]기본 쿼리 예제 모음 (0)	2020.11.07
(미완)[Python]Global Interpreter Lock에 대해서 (0)	2020.11.06
[Python] __slots__ 사용에 대해 (0)	2020.11.02
[Python]Iterable VS Iterator (feat Generator) 정의에서 대조까지 (0)	2020.11.02
(미완)소프트웨어 성능 측정 metric의 종류 (0)	2020.11.01

728x90

배경:

우리가 어떤 class의 object를 만들 때면, 각 object마다 dictionary가 할당되는데, 이는 object의 attribute를 저장해두기 위함이다. 이는 dictionary다 보니까 메모리를 꽤나 차지한다. 다수의 object를 만들 때면 이러한 메모리들이 쌓여 태산이 된다.

사용할 상황:

class에 attibutes가 제한되어 있다면(즉 향후에 dynamic하게 attributes를 추가하거나 하는 작업이 없다면)

__slots__로 attribute를 제한하고 시작하고, 이렇게 되면 dictionary를 사용하지 않아 메모리를 절약함과 동시에

attribute 접근 속도도 빨라진다.

사용법:

class attribute로 __slots__ = ['att1', 'att2'] 형태로, 사용할 attributes를 선언하면,

이 class의 object는 __dict__를 갖지 않는다.

장점:

-object의 attribute 접근 속도 향상

-메모리 절약

단점:

-dynamic attribute 할당은 불가

참고자료:

www.geeksforgeeks.org/python-use-of-__slots__/

Python | Use of __slots__ - GeeksforGeeks

A Computer Science portal for geeks. It contains well written, well thought and well explained computer science and programming articles, quizzes and practice/competitive programming/company interview Questions.

www.geeksforgeeks.org

728x90

저작자표시

'CS' 카테고리의 다른 글

(미완)[Python]Global Interpreter Lock에 대해서 (0)	2020.11.06
[Database]RDBMS(Relational DataBase Management System)란 무엇인가? (0)	2020.11.03
[Python]Iterable VS Iterator (feat Generator) 정의에서 대조까지 (0)	2020.11.02
(미완)소프트웨어 성능 측정 metric의 종류 (0)	2020.11.01
(미완)[Python]습관을 바꾸어, 속도를 높이자. (0)	2020.10.30

728x90

헷갈리는 Iterator, Iterable 개념의 공통점과 차이를 보아 확실히 이해해보자.

명확한 정의를 알아보자. 단순히 특징을 알아보는게 아니다.

즉, Iterable은 for-loop을 돌 수 있는 것, 따위 형태로 알아보자는 것이 아니라,

정확한 정의를 통해 성질을 알아보는 형태로 작성한다.

Iterable->Iterator->Generator 순으로 알아보자.

Iterable이란

An object capable of returning its members one at a time. Examples of iterables include all sequence types (such as list, str, and tuple) and some non-equence types like dict, file objects, and objects of any classes you define with an __iter__() method or with a __getitem__() method that implements Sequence semantics.

Iterator란

Iterators are required to have both __iter__() method that returns the iterator object itself so every iterator is also iterable and __next__() method.

Generator란, 아래 2개 중 하나를 가리킬 때 쓰는데, 여기서는 후자를 가리킨다.

-generator function란

function which returns a generator iterator. It looks like a normal function except that it contains yield expressions for producing a series of values usable in a for-loop or that can be retrieved one at a time with the next() function.

-generator iterator란

An object created by a generator function.

비고:

generator iterator는 생성하면 반드시 __iter__() method와 __next__() method를 갖는다.

따라서 generator는 반드시 iterator이다.

iteration을 돌 때 순차적인 값을 얻는 것에만 관심있다면 generator만으로 족하다.

하지만, 현재 current state를 조회한다는 등의 추가적인 method가 필요하다면 iterator를 직접 정의하여 사용하자.

따라서 다음 포함관계가 성립한다.

generator ⊂ iterator ⊂ iterable

위 3개의 개념을 헷갈리게 만드는 주범으로는

from collections.abc import Iterable

isinstance([object], Iterable)

-> 위에서 정의한 Iterable을 판단하기에 완벽하지 않다.

object가 만약 __getitem__() method만 갖는 object면 False를 반환한다.

그렇다면 정확한 Iterable 객체임을 판단하는 방법은 무엇인가?

iter([object])을 씌웠을 때 error가 안뜨면 object는 iterable 객체이다.

Iterable 객체가 loop을 돌 때 작동하는 방식은

-iter을 씌운 다음에

-next해서 원소들을 반환함

Iterable, Iterator, Generator를 각각 언제 쓸 것인가?

순환하고 값 조회를 더이상 할 필요가 없다면 iterator/generator를 사용

이 때, 순차적인 1회성 조회할 iterator를 만들 것이면 generator를 사용

순환하면서도 current state같은 것을 조회하려면 (custom) iterator를 사용

iter() function은 무엇인가?

Return an iterator object. The first argument is interpreted very differently depending on the presence of the second argument. Without a second argument, object must be a collection object which supports the iteration protocol (the __iter__() method), or it must support the sequence protocol (the __getitem__() method with integer arguments starting at 0). If it does not support either of those protocols, TypeError is raised. If the second argument, sentinel, is given, then object must be a callable object. The iterator created in this case will call object with no arguments for each call to its __next__() method; if the value returned is equal to sentinel, StopIteration will be raised, otherwise the value will be returned.

즉 sentinel(=보초병, 감시병)값이 argument로 넣냐 안넣냐에 따라 달라진다.

안넣으면 object의 __iter__()을 실행시켜 iterator를 반환한다.

넣으면 first argument는 반드시 callable이어야하고 sentinel값이 나올 때 까지 next가 가능한 iterator를 반환한다.

후자는 callable_iterator type이 반환된다.

전자든 후자든 isinstance([object], Iterator)로 iterator 확인 가능

즉 iter(iterable) or iter(callable object, sentinel) 형태로 사용

참고자료의 마지막을 꼭 보자.(Iterator을 만드는 다양한 방법을 제시한다.)

__iter__()를 활용하여 iterable을 정의하고 __next__()을 추가하여 iterator을 만드는 예제

__getitem__()을 활용한 iterable을 정의하고 iterator을 만들어서 동작하는 예제

callable object(class with __call__() or function)와 sentinel을 활용한 iterator 생성하는 예제

참고자료:

docs.python.org/3/glossary.html

Glossary — Python 3.9.0 documentation

The implicit conversion of an instance of one type to another during an operation which involves two arguments of the same type. For example, int(3.15) converts the floating point number to the integer 3, but in 3+4.5, each argument is of a different type

docs.python.org

stackoverflow.com/questions/2776829/difference-between-pythons-generators-and-iterators

Difference between Python's Generators and Iterators

What is the difference between iterators and generators? Some examples for when you would use each case would be helpful.

stackoverflow.com

twiserandom.com/python/python-iterable-and-iterator-a-tutorial/#implement_the_getitem_methods

Python iterable and iterator a tutorial | Twise Random

what is an iterable ? In python , objects are abstraction of data , they have methods that work with data , and help us to manipulate it . If we take a look at a list , and see all of its methods >>> import json >>> _list = [] # an empty list >>> json_list

twiserandom.com

728x90

저작자표시

'CS' 카테고리의 다른 글

[Database]RDBMS(Relational DataBase Management System)란 무엇인가? (0)	2020.11.03
[Python] __slots__ 사용에 대해 (0)	2020.11.02
(미완)소프트웨어 성능 측정 metric의 종류 (0)	2020.11.01
(미완)[Python]습관을 바꾸어, 속도를 높이자. (0)	2020.10.30
[Python]List comprehension에서 if else 쓰기 (0)	2020.10.30

728x90

throughput?

concurrency?...

운영체제 공부하며 봤던 것들 정리 필요

봐도봐도 헷갈리니까 예제 하나씩 끼워넣기

728x90

저작자표시

'CS' 카테고리의 다른 글

[Python] __slots__ 사용에 대해 (0)	2020.11.02
[Python]Iterable VS Iterator (feat Generator) 정의에서 대조까지 (0)	2020.11.02
(미완)[Python]습관을 바꾸어, 속도를 높이자. (0)	2020.10.30
[Python]List comprehension에서 if else 쓰기 (0)	2020.10.30
(미완)Faiss, Facebook AI Similarity Search (0)	2020.10.29

728x90

dictionary의 값을 한번 조회하고 그 이후에 필요없다면 pop method를 사용하자.

(자꾸 get이나 dict[key]만 사용하려고함)

element가 sequence에 존재하는지를 자주 체크한다면

list보다는 set을 사용하자.

list는 O(n), set은 O(1)(open hashing 방식이므로)

메모리에 크게 들고 있지 않아도 된다면, generator를 쓸 생각을 하자.

이는 단순히 메모리 절약차원 뿐 아니라, 실제 속도도 더 높을 수가 있다.

순차적으로 sequence내 원소를 합하는 경우,

단순 큰 list였다면 메모리에 builing하느라 시간을 잡아먹기때문

Global variable을 Local ones로 바꿀 수 있다면 바꿔라

이는 variable search 순서에서 오는 속도높이는 방법인데

local에서 variable을 search할 때,

local->global->built-in namespace 순서로 찾기 때문이다.

Class property(예를 들면 self._value)를 자주 access한다면

마찬가지로 local variable(class내의 function에서 자주 접근한다면)로 바꿔라.

.function을 자주 쓸 것이면, function을 assign해서 쓰자.

즉, list.append()을 자주 쓸 것이면

appender = list.append라 두고 appender를 쓰자.

이는, function call할 때면 __getattribute__()나 __getattr__()을 호출하게되는데 이 time cost를 줄일 수 있다.

많은 string을 여러번 +연산을 할 때에는 join을 사용하자.

'a' + 'b'을 한다고하면 memory space 요청을 1번 하게 되고 그 때 a와 b를 copy하여 박는다.

'a' + 'b' + 'c'는 memory space요청을 2번하게 된다.

따라서 n개의 string을 +하면 n-1개의 요청을 하게된다.

이 때, join을 쓰면, 전체 필요 memory space를 계산하여 1번만 메모리 요청을 한다.

Multiple conditions에서 condition의 위치는

-if Condition1 and Condition2 에는 1과 2중 False가 자주 뜰 것을 Condition1에 할당

-if Condition1 or Condition2에는 1과 2중 True가 자주 뜰 것을 Condition1에 할당

(short-circuit evaluation, AND 혹은 OR 연산에 있어서 First condition에 의하여 return이 확정되면, 이후 condition은 연산을 실행조차 하지 않는 것을 가리킴)

While문보다는 Foor문을 쓰자.

이는 While문에서 i

참고자료:

towardsdatascience.com/10-techniques-to-speed-up-python-runtime-95e213e925dc

10 Techniques to Speed Up Python Runtime

Compare good writing style and bad writing style with the code runtime

towardsdatascience.com

728x90

저작자표시

'CS' 카테고리의 다른 글

[Python]Iterable VS Iterator (feat Generator) 정의에서 대조까지 (0)	2020.11.02
(미완)소프트웨어 성능 측정 metric의 종류 (0)	2020.11.01
[Python]List comprehension에서 if else 쓰기 (0)	2020.10.30
(미완)Faiss, Facebook AI Similarity Search (0)	2020.10.29
파이썬에서 원소 체크를 자주한다면 list말고 set이나 dictionary를 쓰자. (0)	2020.10.29

728x90

list comprehension에서 if else를 쓰고 싶다면

res = [x if x >3 else 2 for x in iter_]

즉 for 뒤에 optional if condition이 아니라 실제 값을 받는 부분에 if else ternary operator를 쓰자.

참고자료:

medium.com/techtofreedom/8-levels-of-using-list-comprehension-in-python-efc3c339a1f0

8 Levels of Using List Comprehension in Python

A Complete Guidance From Elementary to Profound

medium.com

728x90

저작자표시

'CS' 카테고리의 다른 글

(미완)소프트웨어 성능 측정 metric의 종류 (0)	2020.11.01
(미완)[Python]습관을 바꾸어, 속도를 높이자. (0)	2020.10.30
(미완)Faiss, Facebook AI Similarity Search (0)	2020.10.29
파이썬에서 원소 체크를 자주한다면 list말고 set이나 dictionary를 쓰자. (0)	2020.10.29
(미완)[Elasticsearch]특징 (0)	2020.10.29

728x90

Faiss란,

Faiss is a library for efficient similarity search and clustering of dense vectors. It contains algorithms that search in sets of vectors of any size, up to ones that possibly do not fit in RAM. It also contains supporting code for evaluation and parameter tuning. Faiss is written in C++ with complete wrappers for Python/numpy. Some of the most useful algorithms are implemented on the GPU. It is developed by Facebook AI Research.

Faiss의 특징

즉, dense vector databases에서 query vector와 가장 유사한 vector를 뽑는 효율적인 알고리즘이다.

CPU, GPU 기반으로 작동하게 할 수 있으며,

IVF(InVerted File index)를 활용

L2 distance, Dot product(L2 normalization하면 cosine similarity도 가능)가 가능

binary vectors and compact quantization vectors(PCA, Product Quantization사용)를 활용

따라서 original vectors를 keep하지 않아 낮은 RAM으로도 billions vector database에 적용 가능

Multi-GPU 사용도 제공

Batch processing 제공

L2, inner product, L1, Linf 등의 distance 제공

query vector의 radius 내의 모든 vectors in DB를 return할 수도 있음

index(=DB)를 RAM이 아니라 DISK에 store

Faiss 설치

pypi.org/project/faiss-gpu/#files

에서 환경에 맞는 whl파일을 받아(file.whl)

pip3 install file.whl을 치면 된다.

Faiss의 tutorial을 보며 각 용어들을 정리하자.

Tutorial(Getting started, IndexFlatL2 사용)

Tutorial(Faster search, IndexIVFFlat 사용)

IndexIVFFlat이란

"Index":index객체를 만들것인데

"IVF":각 vector(word)마다 voronoi cell(document)가 무엇인지 mapping하는 quantiser를 활용할 것이고

"Flat":product quantization으로 vector를 compress하지 않고 raw vector를 활용하는

Index객체를 만들 것이다.

Tutorial(Faster search, PCA사용)

Tutorial(Faster search, Product Quantization사용)

실제 사용에서는 Batch + GPU 등으로 돌리니, github->wiki에서 더 많은 tutorial, basics 등을 참고하자.

참고자료:

github.com/facebookresearch/faiss

facebookresearch/faiss

A library for efficient similarity search and clustering of dense vectors. - facebookresearch/faiss

github.com

pypi.org/project/faiss-gpu/#files

faiss-gpu

A library for efficient similarity search and clustering of dense vectors.

pypi.org

github.com/facebookresearch/faiss/tree/master/tutorial/python

facebookresearch/faiss

A library for efficient similarity search and clustering of dense vectors. - facebookresearch/faiss

github.com

github.com/facebookresearch/faiss/wiki/Getting-started

facebookresearch/faiss

A library for efficient similarity search and clustering of dense vectors. - facebookresearch/faiss

github.com

github.com/facebookresearch/faiss/wiki/Faster-search

facebookresearch/faiss

A library for efficient similarity search and clustering of dense vectors. - facebookresearch/faiss

github.com

medium.com/dotstar/understanding-faiss-part-2-79d90b1e5388

Understanding FAISS : Part 2

Compression Techniques and Product Quantization on FAISS

medium.com

github.com/facebookresearch/faiss/wiki/Faiss-building-blocks:-clustering,-PCA,-quantization

facebookresearch/faiss

A library for efficient similarity search and clustering of dense vectors. - facebookresearch/faiss

github.com

728x90

저작자표시

'CS' 카테고리의 다른 글

(미완)[Python]습관을 바꾸어, 속도를 높이자. (0)	2020.10.30
[Python]List comprehension에서 if else 쓰기 (0)	2020.10.30
파이썬에서 원소 체크를 자주한다면 list말고 set이나 dictionary를 쓰자. (0)	2020.10.29
(미완)[Elasticsearch]특징 (0)	2020.10.29
Inverted index 이해하기 (0)	2020.10.29

728x90

set/dictionary는

hash table

hash collision을 막는 방법으로는 closed hashing(open addressing, 같은 의미인데 하나는 closed, 하나는 open...)

따라서, 특정 element가 set/dictionary에 존재하냐는

element의 hash값을 구하고 그 위치만 따지면 되므로 O(1)

참고자료:

stackoverflow.com/questions/327311/how-are-pythons-built-in-dictionaries-implemented

How are Python's Built In Dictionaries Implemented?

Does anyone know how the built in dictionary type for python is implemented? My understanding is that it is some sort of hash table, but I haven't been able to find any sort of definitive answer.

stackoverflow.com

jinyes-tistory.tistory.com/10

[python] 자료구조 - 해시 테이블(Hash Table)

해시 테이블(Hash Table) 해쉬 테이블은 키와 밸류를 기반으로 데이터를 저장한다. 파이썬에서는 딕셔너리가 있어서 굳이 만들 필요는 없는데, 아무래도 파이썬으로 코드를 짜면 간단해서 파악하

jinyes-tistory.tistory.com

jinyes-tistory.tistory.com/11

[python] 자료구조 - 오픈 해싱(Open Hashing)

오픈 해싱(Open Hashing) 오픈 해싱은 해시 테이블의 충돌 문제를 해결하는 대표적인 방법중 하나로 체이닝(Separate Chaining) 기법이라고도 한다. 만약 해시 값이 중복되는 경우, 먼저 저장된 데이터에

jinyes-tistory.tistory.com

jinyes-tistory.tistory.com/12

[python] 자료구조 - 클로즈 해싱(Close hashing) / Open Addressing

클로즈 해싱(Close Hashing) 클로즈 해싱은 해시 테이블의 충돌 문제를 해결하는 방법 중 하나로 Linear Probing, Open Addressing 이라고 부르기도 한다. 구조는 간단하다. 위 이미지에서 John Smith와 Sandra D..

jinyes-tistory.tistory.com

728x90

저작자표시

'CS' 카테고리의 다른 글

[Python]List comprehension에서 if else 쓰기 (0)	2020.10.30
(미완)Faiss, Facebook AI Similarity Search (0)	2020.10.29
(미완)[Elasticsearch]특징 (0)	2020.10.29
Inverted index 이해하기 (0)	2020.10.29
(미완)[Docker] option 정리 (0)	2020.10.21

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

나를 잃지 말자

CS

(미완)[SQL]기본 쿼리 예제 모음

'CS' 카테고리의 다른 글

(미완)[Python]Global Interpreter Lock에 대해서

'CS' 카테고리의 다른 글

[Database]RDBMS(Relational DataBase Management System)란 무엇인가?

'CS' 카테고리의 다른 글

[Python] slots 사용에 대해

'CS' 카테고리의 다른 글

[Python]Iterable VS Iterator (feat Generator) 정의에서 대조까지

'CS' 카테고리의 다른 글

(미완)소프트웨어 성능 측정 metric의 종류

'CS' 카테고리의 다른 글

(미완)[Python]습관을 바꾸어, 속도를 높이자.

'CS' 카테고리의 다른 글

[Python]List comprehension에서 if else 쓰기

'CS' 카테고리의 다른 글

(미완)Faiss, Facebook AI Similarity Search

'CS' 카테고리의 다른 글

파이썬에서 원소 체크를 자주한다면 list말고 set이나 dictionary를 쓰자.

'CS' 카테고리의 다른 글

+ Recent posts

티스토리툴바

개인정보

단축키

내 블로그

블로그 게시글

모든 영역