Apache AsterixDB和SQL++查询语言

  |   0 评论   |   580 浏览

背景

Apache AsterixDB 是一个开源的大数据管理系统 (BDMS),可以在一个集群中大规模存储,索引,管理和查询语义结构的数据。

Hyracks 是 AsterixDB 的底层数据流运行平台。

AsterixDB 拥有丰富的数据类型,扩展了 JSON 数据类型,支持立体的和临时的数据。

了解Apache AsterixDB的更多背景,见分享【清华IT汇】Apache AsterixDB数据库

本文提到的sql++查询语言,是指apache AsterixDB中的sqlpp部分。

目标

  • 大数据:Big Data
  • 半结构化数据:Big Semistructured Data

安装

下载

Apache AsterixDB页面的右下角中下载

或者直接在AsterixDB github上下载

包括两个压缩包:

编译asterixdb

根据apache-asterixdb-0.9.3中README.md中提供的方法:

$cd asterixdb
$mvn clean package -DskipTests

结果如下:

[INFO] Reactor Summary:
[INFO]
[INFO] hyracks-ecosystem-full-stack ....................... SUCCESS [  2.591 s]
[INFO] hyracks ............................................ SUCCESS [  0.335 s]
[INFO] hyracks-util ....................................... SUCCESS [  1.821 s]
[INFO] hyracks-ipc ........................................ SUCCESS [  0.360 s]
[INFO] hyracks-api ........................................ SUCCESS [  2.254 s]
[INFO] hyracks-net ........................................ SUCCESS [  0.710 s]
[INFO] hyracks-comm ....................................... SUCCESS [  0.492 s]
[INFO] hyracks-control .................................... SUCCESS [  0.100 s]
[INFO] hyracks-control-common ............................. SUCCESS [  0.955 s]
[INFO] hyracks-http ....................................... SUCCESS [  0.583 s]
[INFO] hyracks-control-cc ................................. SUCCESS [  1.130 s]
[INFO] hyracks-control-nc ................................. SUCCESS [  0.892 s]
[INFO] hyracks-client ..................................... SUCCESS [  0.611 s]
[INFO] hyracks-data ....................................... SUCCESS [  0.062 s]
[INFO] hyracks-data-std ................................... SUCCESS [  0.715 s]
[INFO] hyracks-dataflow-common ............................ SUCCESS [  0.846 s]
[INFO] hyracks-dataflow-std ............................... SUCCESS [  1.563 s]
[INFO] hyracks-nc-service ................................. SUCCESS [  0.497 s]
[INFO] hyracks-storage-common ............................. SUCCESS [  0.728 s]
[INFO] hyracks-storage-am-common .......................... SUCCESS [  1.084 s]
[INFO] hyracks-storage-am-bloomfilter ..................... SUCCESS [  0.645 s]
[INFO] hyracks-storage-am-btree ........................... SUCCESS [  1.077 s]
[INFO] hyracks-storage-am-lsm-common ...................... SUCCESS [  0.974 s]
[INFO] hyracks-storage-am-lsm-invertedindex ............... SUCCESS [  0.906 s]
[INFO] hyracks-storage-am-lsm-btree ....................... SUCCESS [  0.782 s]
[INFO] hyracks-storage-am-rtree ........................... SUCCESS [  0.738 s]
[INFO] hyracks-storage-am-lsm-rtree ....................... SUCCESS [  0.733 s]
[INFO] hyracks-test-support ............................... SUCCESS [  0.896 s]
[INFO] hyracks-tests ...................................... SUCCESS [  0.077 s]
[INFO] hyracks-storage-common-test ........................ SUCCESS [  0.584 s]
[INFO] hyracks-storage-am-btree-test ...................... SUCCESS [  0.610 s]
[INFO] hyracks-storage-am-rtree-test ...................... SUCCESS [  0.531 s]
[INFO] hyracks-storage-am-lsm-common-test ................. SUCCESS [  0.725 s]
[INFO] hyracks-storage-am-lsm-btree-test .................. SUCCESS [  0.831 s]
[INFO] hyracks-storage-am-lsm-rtree-test .................. SUCCESS [  0.641 s]
[INFO] hyracks-storage-am-lsm-invertedindex-test .......... SUCCESS [  0.567 s]
[INFO] hyracks-storage-am-bloomfilter-test ................ SUCCESS [  0.460 s]
[INFO] hyracks-server ..................................... SUCCESS [  0.996 s]
[INFO] hyracks-examples ................................... SUCCESS [  0.094 s]
[INFO] tpch-example ....................................... SUCCESS [  0.048 s]
[INFO] tpchclient ......................................... SUCCESS [  0.614 s]
[INFO] tpchserver ......................................... SUCCESS [  0.125 s]
[INFO] text-example ....................................... SUCCESS [  0.131 s]
[INFO] texthelper ......................................... SUCCESS [  0.361 s]
[INFO] textclient ......................................... SUCCESS [  0.385 s]
[INFO] textserver ......................................... SUCCESS [  1.062 s]
[INFO] btree-example ...................................... SUCCESS [  0.046 s]
[INFO] btreehelper ........................................ SUCCESS [  0.434 s]
[INFO] btreeclient ........................................ SUCCESS [  0.563 s]
[INFO] btreeserver ........................................ SUCCESS [  0.061 s]
[INFO] hyracks-integration-tests .......................... SUCCESS [  0.970 s]
[INFO] hyracks-shutdown-test .............................. SUCCESS [  0.489 s]
[INFO] hyracks-documentation .............................. SUCCESS [  1.179 s]
[INFO] hyracks-maven-plugins .............................. SUCCESS [  0.444 s]
[INFO] Hyracks VirtualCluster Maven Plugin ................ SUCCESS [  0.859 s]
[INFO] license-automation-plugin .......................... SUCCESS [  2.142 s]
[INFO] hyracks-hdfs ....................................... SUCCESS [  0.056 s]
[INFO] hyracks-hdfs-1.x ................................... SUCCESS [  0.546 s]
[INFO] hyracks-hdfs-2.x ................................... SUCCESS [  1.282 s]
[INFO] hyracks-hdfs-core .................................. SUCCESS [  1.413 s]
[INFO] hyracks-dist ....................................... SUCCESS [  0.165 s]
[INFO] algebricks ......................................... SUCCESS [  0.096 s]
[INFO] algebricks-common .................................. SUCCESS [  0.300 s]
[INFO] algebricks-data .................................... SUCCESS [  0.346 s]
[INFO] algebricks-runtime ................................. SUCCESS [  0.656 s]
[INFO] algebricks-core .................................... SUCCESS [  1.686 s]
[INFO] algebricks-compiler ................................ SUCCESS [  0.550 s]
[INFO] algebricks-rewriter ................................ SUCCESS [  0.814 s]
[INFO] algebricks-tests ................................... SUCCESS [  0.651 s]
[INFO] algebricks-examples ................................ SUCCESS [  0.334 s]
[INFO] piglet-example ..................................... SUCCESS [  1.119 s]
[INFO] hyracks-fullstack-license .......................... SUCCESS [  0.471 s]
[INFO] apache-asterixdb ................................... SUCCESS [  2.042 s]
[INFO] asterix-common ..................................... SUCCESS [  2.441 s]
[INFO] asterix-om ......................................... SUCCESS [  2.227 s]
[INFO] asterix-fuzzyjoin .................................. SUCCESS [  0.474 s]
[INFO] asterix-maven-plugins .............................. SUCCESS [  0.143 s]
[INFO] record-manager-generator-maven-plugin .............. SUCCESS [  0.856 s]
[INFO] asterix-transactions ............................... SUCCESS [  1.193 s]
[INFO] asterix-evaluator-generator-maven-plugin ........... SUCCESS [  1.560 s]
[INFO] asterix-runtime .................................... SUCCESS [  4.970 s]
[INFO] asterix-events ..................................... SUCCESS [  1.083 s]
[INFO] asterix-active ..................................... SUCCESS [  0.550 s]
[INFO] asterix-hivecompat ................................. SUCCESS [  1.344 s]
[INFO] lexer-generator-maven-plugin ....................... SUCCESS [  0.533 s]
[INFO] asterix-external-data .............................. SUCCESS [  3.772 s]
[INFO] asterix-metadata ................................... SUCCESS [  2.772 s]
[INFO] asterix-lang-common ................................ SUCCESS [  1.670 s]
[INFO] asterix-lang-aql ................................... SUCCESS [  2.390 s]
[INFO] asterix-lang-sqlpp ................................. SUCCESS [  3.424 s]
[INFO] asterix-grammar-extension-maven-plugin ............. SUCCESS [  5.768 s]
[INFO] asterix-algebra .................................... SUCCESS [  2.671 s]
[INFO] asterix-test-framework ............................. SUCCESS [  0.172 s]
[INFO] asterix-replication ................................ SUCCESS [  1.275 s]
[INFO] asterix-test-datagenerator-maven-plugin ............ SUCCESS [  0.693 s]
[INFO] asterix-app ........................................ SUCCESS [ 35.161 s]
[INFO] asterix-tools ...................................... SUCCESS [  2.004 s]
[INFO] asterix-examples ................................... SUCCESS [  0.044 s]
[INFO] asterix-client-helper .............................. SUCCESS [  0.939 s]
[INFO] asterix-server ..................................... SUCCESS [ 39.425 s]
[INFO] asterix-doc ........................................ SUCCESS [ 12.805 s]
[INFO] asterix-benchmark .................................. SUCCESS [  0.298 s]
[INFO] asterix-coverage ................................... SUCCESS [  0.032 s]
[INFO] asterix-license .................................... SUCCESS [  0.076 s]
[INFO] asterix-installer .................................. SUCCESS [ 37.749 s]
[INFO] asterix-yarn ....................................... SUCCESS [ 16.580 s]
[INFO] asterix-experiments ................................ SUCCESS [  5.857 s]
[INFO] hyracks-asterix .................................... SUCCESS [  0.007 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 04:07 min
[INFO] Finished at: 2018-05-21T10:51:45+08:00
[INFO] Final Memory: 246M/1745M
[INFO] ------------------------------------------------------------------------

运行asterixdb

cd asterixdb/asterix-server/target/asterix-server-0.9.3-binary-assembly
./opt/local/bin/start-sample-cluster.sh

通过浏览器访问:http://127.0.0.1:19001/

这个脚本中,启动了5个进程,分为3类:

  • NCDriver x 2: Node Controller(缩写为NC)进程,节点控制进程,实际负责查询和数据管理工作。
  • NCService x 2: 配置和启动NCDriver进程,其唯一的作用为在集群启动时,等待NCDriver调用。
  • CCDriver: Cluster Controller(缩写为CC)进程,集群控制进程,负责NCs上的分布式任务,以及启动时NCDriver调用NCService的参数。同时部署了web接口页面,查询编译和优化工作。

具体的启动顺序,请参看ncservice 原文

AsterixDB模型

如果要使用AsterixDB,需要先了解一下AsterixDB的数据模型,即ADM(AsterixDB’s data model).

  • Dataverses (data universe): 类数据库, 管理类型(type), 数据集(dataset)和函数(function)
  • Datatypes(data type): data类型,用户自定义的
  • Datasets(data set): datatype实例的集合

AsterixDB的目标是半结构化数据,分为两类datatype:

  • closed type: 预定义了所有的数据类型
  • open type(默认): 仅预定义了必需的类型,用户可以自己增加额外的类型

如果closed type的场景,则不要使用open type,这样存储时可以不存储额外的数据。

还是从例子讲起,引起官方的例子:

我们有两个社交网络Gleambook和Chirp的数据,那怎么来导入AsterixDB中呢?

建立TinySocial库

从头开始,我们先建立一个TinySocial的Dataverses。

    DROP DATAVERSE TinySocial IF EXISTS;
    CREATE DATAVERSE TinySocial;
    USE TinySocial;

建立数据类型

其次,来抽象出Gleambook和Chirp中的数据类型。

Chrip社交网络中,有用户和消息两种类型。

我们先定义用户类型,再定义消息类型。

    CREATE TYPE ChirpUserType AS {
        screenName: string,
        lang: string,
        friendsCount: int,
        statusesCount: int,
        name: string,
        followersCount: int
    };

    CREATE TYPE ChirpMessageType AS closed {
        chirpId: string,
        user: ChirpUserType,
        senderLocation: point?,
        sendTime: datetime,
        referredTopics: {{ string }},
        messageText: string
    };

在消息类型中,引入了用户类型来标识发消息的用户。即实现了ADM中的嵌套类型。

  • senderLocation字段是可选的,则字段定义末尾的?可以反应出。可选字段类似于SQL中的null类型。
  • referredTopics字段值是列表

接下来,继续定义Gleambook的部门类型, 用户类型和消息类型。

  • EmploymentType部门类型:记录了用户的部门历史
  • GleambookUserType用户类型:用户自身信息。其中的employment字段按顺序记录了用户的部门历史。
    CREATE TYPE EmploymentType AS {
        organizationName: string,
        startDate: date,
        endDate: date?
    };

    CREATE TYPE GleambookUserType AS {
        id: int,
        alias: string,
        name: string,
        userSince: datetime,
        friendIds: {{ int }},
        employment: [EmploymentType]
    };

    CREATE TYPE GleambookMessageType AS {
        messageId: int,
        authorId: int,
        inResponseTo: int?,
        senderLocation: point?,
        message: string
    };

AsterixDB会把预定义的字段保存在meta信息中,把其它字段保存在每个类型实例中。即其它字段会占用更多的空间,同时操作起来更慢一些。

创建数据集和索引

数据类型定义好了,接下来就创建数据集(dataset)和索引(index)吧。

话不多说,直接使用下面的SQL++语句来建表:

    USE TinySocial;

    CREATE DATASET GleambookUsers(GleambookUserType)
        PRIMARY KEY id;

    CREATE DATASET GleambookMessages(GleambookMessageType)
        PRIMARY KEY messageId;

    CREATE DATASET ChirpUsers(ChirpUserType)
        PRIMARY KEY screenName;

    CREATE DATASET ChirpMessages(ChirpMessageType)
        PRIMARY KEY chirpId
        hints(cardinality=100);

上面建立了四个数据集:

  • GleambookUsers
  • GleambookMessages
  • ChirpUsers
  • ChirpMessages: 每个id约有100条消息。指定了会更高效;如果不指定的话也可以使用,默认可以支持到百万级。

数据集中的数据,以主键做为B+树的形式存储。同时主键也是分区键(partition key, shard key)。

接下来创建4个索引:

    CREATE INDEX gbUserSinceIdx on GleambookUsers(userSince);
    CREATE INDEX gbAuthorIdx on GleambookMessages(authorId) TYPE btree;
    CREATE INDEX gbSenderLocIndex on GleambookMessages(senderLocation) TYPE rtree;
    CREATE INDEX gbMessageIdx on GleambookMessages(message) TYPE keyword;

数据集中的索引,默认为B+树索引,也支持rtree, keyword索引(即倒排索引)。

AsterixDB也支持复合索引,以及ngram(k)索引。

索引元数据

如果查看建好的表的元数据呢:

    SELECT VALUE ds FROM Metadata.`Dataset` ds;
    SELECT VALUE ix FROM Metadata.`Index` ix;

结果典型行分别如下:

{
	"DataverseName": "TinySocial",
	"DatasetName": "ChirpMessages",
	"DatatypeDataverseName": "TinySocial",
	"DatatypeName": "ChirpMessageType",
	"DatasetType": "INTERNAL",
	"GroupName": "TinySocial.ChirpMessages",
	"CompactionPolicy": "prefix",
	"CompactionPolicyProperties": [{
		"Name": "max-mergable-component-size",
		"Value": "1073741824"
	}, {
		"Name": "max-tolerance-component-count",
		"Value": "5"
	}],
	"InternalDetails": {
		"FileStructure": "BTREE",
		"PartitioningStrategy": "HASH",
		"PartitioningKey": [
			["chirpId"]
		],
		"PrimaryKey": [
			["chirpId"]
		],
		"Autogenerated": false
	},
	"Hints": [{
		"Name": "CARDINALITY",
		"Value": "100"
	}],
	"Timestamp": "Mon May 21 16:14:30 CST 2018",
	"DatasetId": 108,
	"PendingOp": 0
}

{
	"DataverseName": "TinySocial",
	"DatasetName": "ChirpMessages",
	"IndexName": "ChirpMessages",
	"IndexStructure": "BTREE",
	"SearchKey": [
		["chirpId"]
	],
	"IsPrimary": true,
	"Timestamp": "Mon May 21 16:14:30 CST 2018",
	"PendingOp": 0
}

可见一些SQL++的特点:

  • 可以通过库名.表名的方式来直接使用表名,而不用先use 库名
  • 关键字,可以通过反向引号(```)来引起来
  • SELECT语句的值,可以返回一个单独的值或者是对象

导入数据

看到这埯,已经迫不急待的想导入数据了吧?

导入Chirp Users数据

    USE TinySocial;

    INSERT INTO ChirpUsers
    ([
    {"screenName":"NathanGiesen@211","lang":"en","friendsCount":18,"statusesCount":473,"name":"Nathan Giesen","followersCount":49416},
    {"screenName":"ColineGeyer@63","lang":"en","friendsCount":121,"statusesCount":362,"name":"Coline Geyer","followersCount":17159},
    {"screenName":"NilaMilliron_tw","lang":"en","friendsCount":445,"statusesCount":164,"name":"Nila Milliron","followersCount":22649},
    {"screenName":"ChangEwing_573","lang":"en","friendsCount":182,"statusesCount":394,"name":"Chang Ewing","followersCount":32136}
    ]);

导入Chirp Messages数据

    USE TinySocial;

    INSERT INTO ChirpMessages
    ([
    {"chirpId":"1","user":{"screenName":"NathanGiesen@211","lang":"en","friendsCount":39339,"statusesCount":473,"name":"Nathan Giesen","followersCount":49416},"senderLocation":point("47.44,80.65"),"sendTime":datetime("2008-04-26T10:10:00"),"referredTopics":{{"product-z","customization"}},"messageText":" love product-z its customization is good:)"},
    {"chirpId":"2","user":{"screenName":"ColineGeyer@63","lang":"en","friendsCount":121,"statusesCount":362,"name":"Coline Geyer","followersCount":17159},"senderLocation":point("32.84,67.14"),"sendTime":datetime("2010-05-13T10:10:00"),"referredTopics":{{"ccast","shortcut-menu"}},"messageText":" like ccast its shortcut-menu is awesome:)"},
    {"chirpId":"3","user":{"screenName":"NathanGiesen@211","lang":"en","friendsCount":39339,"statusesCount":473,"name":"Nathan Giesen","followersCount":49416},"senderLocation":point("29.72,75.8"),"sendTime":datetime("2006-11-04T10:10:00"),"referredTopics":{{"product-w","speed"}},"messageText":" like product-w the speed is good:)"},
    {"chirpId":"4","user":{"screenName":"NathanGiesen@211","lang":"en","friendsCount":39339,"statusesCount":473,"name":"Nathan Giesen","followersCount":49416},"senderLocation":point("39.28,70.48"),"sendTime":datetime("2011-12-26T10:10:00"),"referredTopics":{{"product-b","voice-command"}},"messageText":" like product-b the voice-command is mind-blowing:)"},
    {"chirpId":"5","user":{"screenName":"NathanGiesen@211","lang":"en","friendsCount":39339,"statusesCount":473,"name":"Nathan Giesen","followersCount":49416},"senderLocation":point("40.09,92.69"),"sendTime":datetime("2006-08-04T10:10:00"),"referredTopics":{{"product-w","speed"}},"messageText":" can't stand product-w its speed is terrible:("},
    {"chirpId":"6","user":{"screenName":"ColineGeyer@63","lang":"en","friendsCount":121,"statusesCount":362,"name":"Coline Geyer","followersCount":17159},"senderLocation":point("47.51,83.99"),"sendTime":datetime("2010-05-07T10:10:00"),"referredTopics":{{"x-phone","voice-clarity"}},"messageText":" like x-phone the voice-clarity is good:)"},
    {"chirpId":"7","user":{"screenName":"ChangEwing_573","lang":"en","friendsCount":182,"statusesCount":394,"name":"Chang Ewing","followersCount":32136},"senderLocation":point("36.21,72.6"),"sendTime":datetime("2011-08-25T10:10:00"),"referredTopics":{{"product-y","platform"}},"messageText":" like product-y the platform is good"},
    {"chirpId":"8","user":{"screenName":"NathanGiesen@211","lang":"en","friendsCount":39339,"statusesCount":473,"name":"Nathan Giesen","followersCount":49416},"senderLocation":point("46.05,93.34"),"sendTime":datetime("2005-10-14T10:10:00"),"referredTopics":{{"product-z","shortcut-menu"}},"messageText":" like product-z the shortcut-menu is awesome:)"},
    {"chirpId":"9","user":{"screenName":"NathanGiesen@211","lang":"en","friendsCount":39339,"statusesCount":473,"name":"Nathan Giesen","followersCount":49416},"senderLocation":point("36.86,74.62"),"sendTime":datetime("2012-07-21T10:10:00"),"referredTopics":{{"ccast","voicemail-service"}},"messageText":" love ccast its voicemail-service is awesome"},
    {"chirpId":"10","user":{"screenName":"ColineGeyer@63","lang":"en","friendsCount":121,"statusesCount":362,"name":"Coline Geyer","followersCount":17159},"senderLocation":point("29.15,76.53"),"sendTime":datetime("2008-01-26T10:10:00"),"referredTopics":{{"ccast","voice-clarity"}},"messageText":" hate ccast its voice-clarity is OMG:("},
    {"chirpId":"11","user":{"screenName":"NilaMilliron_tw","lang":"en","friendsCount":445,"statusesCount":164,"name":"Nila Milliron","followersCount":22649},"senderLocation":point("37.59,68.42"),"sendTime":datetime("2008-03-09T10:10:00"),"referredTopics":{{"x-phone","platform"}},"messageText":" can't stand x-phone its platform is terrible"},
    {"chirpId":"12","user":{"screenName":"OliJackson_512","lang":"en","friendsCount":445,"statusesCount":164,"name":"Oli Jackson","followersCount":22649},"senderLocation":point("24.82,94.63"),"sendTime":datetime("2010-02-13T10:10:00"),"referredTopics":{{"product-y","voice-command"}},"messageText":" like product-y the voice-command is amazing:)"}
    ]);

导入Gleambook Users数据

    USE TinySocial;

    INSERT INTO GleambookUsers
    ([
    {"id":1,"alias":"Margarita","name":"MargaritaStoddard","nickname":"Mags","userSince":datetime("2012-08-20T10:10:00"),"friendIds":{{2,3,6,10}},"employment":[{"organizationName":"Codetechno","startDate":date("2006-08-06")},{"organizationName":"geomedia","startDate":date("2010-06-17"),"endDate":date("2010-01-26")}],"gender":"F"},
    {"id":2,"alias":"Isbel","name":"IsbelDull","nickname":"Izzy","userSince":datetime("2011-01-22T10:10:00"),"friendIds":{{1,4}},"employment":[{"organizationName":"Hexviafind","startDate":date("2010-04-27")}]},
    {"id":3,"alias":"Emory","name":"EmoryUnk","userSince":datetime("2012-07-10T10:10:00"),"friendIds":{{1,5,8,9}},"employment":[{"organizationName":"geomedia","startDate":date("2010-06-17"),"endDate":date("2010-01-26")}]},
    {"id":4,"alias":"Nicholas","name":"NicholasStroh","userSince":datetime("2010-12-27T10:10:00"),"friendIds":{{2}},"employment":[{"organizationName":"Zamcorporation","startDate":date("2010-06-08")}]},
    {"id":5,"alias":"Von","name":"VonKemble","userSince":datetime("2010-01-05T10:10:00"),"friendIds":{{3,6,10}},"employment":[{"organizationName":"Kongreen","startDate":date("2010-11-27")}]},
    {"id":6,"alias":"Willis","name":"WillisWynne","userSince":datetime("2005-01-17T10:10:00"),"friendIds":{{1,3,7}},"employment":[{"organizationName":"jaydax","startDate":date("2009-05-15")}]},
    {"id":7,"alias":"Suzanna","name":"SuzannaTillson","userSince":datetime("2012-08-07T10:10:00"),"friendIds":{{6}},"employment":[{"organizationName":"Labzatron","startDate":date("2011-04-19")}]},
    {"id":8,"alias":"Nila","name":"NilaMilliron","userSince":datetime("2008-01-01T10:10:00"),"friendIds":{{3}},"employment":[{"organizationName":"Plexlane","startDate":date("2010-02-28")}]},
    {"id":9,"alias":"Woodrow","name":"WoodrowNehling","nickname":"Woody","userSince":datetime("2005-09-20T10:10:00"),"friendIds":{{3,10}},"employment":[{"organizationName":"Zuncan","startDate":date("2003-04-22"),"endDate":date("2009-12-13")}]},
    {"id":10,"alias":"Bram","name":"BramHatch","userSince":datetime("2010-10-16T10:10:00"),"friendIds":{{1,5,9}},"employment":[{"organizationName":"physcane","startDate":date("2007-06-05"),"endDate":date("2011-11-05")}]}
    ]);

导入Gleambook Messages数据

    USE TinySocial;

    INSERT INTO GleambookMessages
    ([
    {"messageId":1,"authorId":3,"inResponseTo":2,"senderLocation":point("47.16,77.75"),"message":" love product-b its shortcut-menu is awesome:)"},
    {"messageId":2,"authorId":1,"inResponseTo":4,"senderLocation":point("41.66,80.87"),"message":" dislike x-phone its touch-screen is horrible"},
    {"messageId":3,"authorId":2,"inResponseTo":4,"senderLocation":point("48.09,81.01"),"message":" like product-y the plan is amazing"},
    {"messageId":4,"authorId":1,"inResponseTo":2,"senderLocation":point("37.73,97.04"),"message":" can't stand acast the network is horrible:("},
    {"messageId":5,"authorId":6,"inResponseTo":2,"senderLocation":point("34.7,90.76"),"message":" love product-b the customization is mind-blowing"},
    {"messageId":6,"authorId":2,"inResponseTo":1,"senderLocation":point("31.5,75.56"),"message":" like product-z its platform is mind-blowing"},
    {"messageId":7,"authorId":5,"inResponseTo":15,"senderLocation":point("32.91,85.05"),"message":" dislike product-b the speed is horrible"},
    {"messageId":8,"authorId":1,"inResponseTo":11,"senderLocation":point("40.33,80.87"),"message":" like ccast the 3G is awesome:)"},
    {"messageId":9,"authorId":3,"inResponseTo":12,"senderLocation":point("34.45,96.48"),"message":" love ccast its wireless is good"},
    {"messageId":10,"authorId":1,"inResponseTo":12,"senderLocation":point("42.5,70.01"),"message":" can't stand product-w the touch-screen is terrible"},
    {"messageId":11,"authorId":1,"inResponseTo":1,"senderLocation":point("38.97,77.49"),"message":" can't stand acast its plan is terrible"},
    {"messageId":12,"authorId":10,"inResponseTo":6,"senderLocation":point("42.26,77.76"),"message":" can't stand product-z its voicemail-service is OMG:("},
    {"messageId":13,"authorId":10,"inResponseTo":4,"senderLocation":point("42.77,78.92"),"message":" dislike x-phone the voice-command is bad:("},
    {"messageId":14,"authorId":9,"inResponseTo":12,"senderLocation":point("41.33,85.28"),"message":" love acast its 3G is good:)"},
    {"messageId":15,"authorId":7,"inResponseTo":11,"senderLocation":point("44.47,67.11"),"message":" like x-phone the voicemail-service is awesome"}
    ]);

总共4部分数据示例。。

查询数据

终于到了激动人心的查询数据的时刻了。

AsterixDB支持两种查询语法,原生的AQL (the Asterix Query Language)语法和SQL++语法。

  • AQL: 受XQuery启示
  • SQL++: 受SQL启示,为schema-less(schema-optional)和NOSQL系统设计

两种查询语法功能是相同的,但是SQL++语法更适合熟悉SQL的用户使用。

主键查询

查询id=8的用户

    SELECT VALUE user
    FROM TinySocial.GleambookUsers user
    WHERE user.id = 8;

结果

{
	"id": 8,
	"alias": "Nila",
	"name": "NilaMilliron",
	"userSince": "2008-01-01T10:10:00.000Z",
	"friendIds": [3],
	"employment": [{
		"organizationName": "Plexlane",
		"startDate": "2010-02-28"
	}]
}

范围查询

数字范围查询

查询id在[2,4]的用户

    SELECT VALUE user
    FROM TinySocial.GleambookUsers user
    WHERE user.id >= 2 AND user.id <= 4;

结果

{ "id": 2, "alias": "Isbel", "name": "IsbelDull", "userSince": "2011-01-22T10:10:00.000Z", "friendIds": [ 1, 4 ], "employment": [ { "organizationName": "Hexviafind", "startDate": "2010-04-27" } ], "nickname": "Izzy" }
{ "id": 3, "alias": "Emory", "name": "EmoryUnk", "userSince": "2012-07-10T10:10:00.000Z", "friendIds": [ 1, 5, 8, 9 ], "employment": [ { "organizationName": "geomedia", "startDate": "2010-06-17", "endDate": "2010-01-26" } ] }
{ "id": 4, "alias": "Nicholas", "name": "NicholasStroh", "userSince": "2010-12-27T10:10:00.000Z", "friendIds": [ 2 ], "employment": [ { "organizationName": "Zamcorporation", "startDate": "2010-06-08" } ] }

日期范围查询

只要数据类型支持比较运算,就可以进行范围查询。

如查询在[2010-07-22T00:00:00, 2012-07-29T23:59:59]加入的用户

    SELECT VALUE user
    FROM TinySocial.GleambookUsers user
    WHERE user.userSince >= datetime('2010-07-22T00:00:00')
      AND user.userSince <= datetime('2012-07-29T23:59:59');

结果如下

{ "id": 10, "alias": "Bram", "name": "BramHatch", "userSince": "2010-10-16T10:10:00.000Z", "friendIds": [ 1, 5, 9 ], "employment": [ { "organizationName": "physcane", "startDate": "2007-06-05", "endDate": "2011-11-05" } ] }
{ "id": 2, "alias": "Isbel", "name": "IsbelDull", "userSince": "2011-01-22T10:10:00.000Z", "friendIds": [ 1, 4 ], "employment": [ { "organizationName": "Hexviafind", "startDate": "2010-04-27" } ], "nickname": "Izzy" }
{ "id": 3, "alias": "Emory", "name": "EmoryUnk", "userSince": "2012-07-10T10:10:00.000Z", "friendIds": [ 1, 5, 8, 9 ], "employment": [ { "organizationName": "geomedia", "startDate": "2010-06-17", "endDate": "2010-01-26" } ] }
{ "id": 4, "alias": "Nicholas", "name": "NicholasStroh", "userSince": "2010-12-27T10:10:00.000Z", "friendIds": [ 2 ], "employment": [ { "organizationName": "Zamcorporation", "startDate": "2010-06-08" } ] }

join操作

Equijoin(直接Join)

SQL++支持projection和join操作,如:

    SELECT user.name AS uname, msg.message AS message
    FROM TinySocial.GleambookUsers user, TinySocial.GleambookMessages msg
    WHERE msg.authorId = user.id;

结果如下

{ "uname": "WillisWynne", "message": " love product-b the customization is mind-blowing" }
{ "uname": "WoodrowNehling", "message": " love acast its 3G is good:)" }
{ "uname": "BramHatch", "message": " can't stand product-z its voicemail-service is OMG:(" }
{ "uname": "BramHatch", "message": " dislike x-phone the voice-command is bad:(" }
{ "uname": "MargaritaStoddard", "message": " like ccast the 3G is awesome:)" }
{ "uname": "MargaritaStoddard", "message": " can't stand product-w the touch-screen is terrible" }
{ "uname": "MargaritaStoddard", "message": " can't stand acast its plan is terrible" }
{ "uname": "MargaritaStoddard", "message": " dislike x-phone its touch-screen is horrible" }
{ "uname": "MargaritaStoddard", "message": " can't stand acast the network is horrible:(" }
{ "uname": "IsbelDull", "message": " like product-z its platform is mind-blowing" }
{ "uname": "IsbelDull", "message": " like product-y the plan is amazing" }
{ "uname": "EmoryUnk", "message": " love ccast its wireless is good" }
{ "uname": "EmoryUnk", "message": " love product-b its shortcut-menu is awesome:)" }
{ "uname": "VonKemble", "message": " dislike product-b the speed is horrible" }
{ "uname": "SuzannaTillson", "message": " like x-phone the voicemail-service is awesome" }

如果直接使用了*的话,结果为:

    SELECT *
    FROM GleambookUsers user, GleambookMessages msg
    WHERE msg.authorId = user.id;

结果

{ "user": { "id": 6, "alias": "Willis", "name": "WillisWynne", "userSince": "2005-01-17T10:10:00.000Z", "friendIds": [ 1, 3, 7 ], "employment": [ { "organizationName": "jaydax", "startDate": "2009-05-15" } ] }, "msg": { "messageId": 5, "authorId": 6, "inResponseTo": 2, "senderLocation": [34.7, 90.76], "message": " love product-b the customization is mind-blowing" } }
{ "user": { "id": 9, "alias": "Woodrow", "name": "WoodrowNehling", "userSince": "2005-09-20T10:10:00.000Z", "friendIds": [ 3, 10 ], "employment": [ { "organizationName": "Zuncan", "startDate": "2003-04-22", "endDate": "2009-12-13" } ], "nickname": "Woody" }, "msg": { "messageId": 14, "authorId": 9, "inResponseTo": 12, "senderLocation": [41.33, 85.28], "message": " love acast its 3G is good:)" } }
{ "user": { "id": 10, "alias": "Bram", "name": "BramHatch", "userSince": "2010-10-16T10:10:00.000Z", "friendIds": [ 1, 5, 9 ], "employment": [ { "organizationName": "physcane", "startDate": "2007-06-05", "endDate": "2011-11-05" } ] }, "msg": { "messageId": 12, "authorId": 10, "inResponseTo": 6, "senderLocation": [42.26, 77.76], "message": " can't stand product-z its voicemail-service is OMG:(" } }
{ "user": { "id": 10, "alias": "Bram", "name": "BramHatch", "userSince": "2010-10-16T10:10:00.000Z", "friendIds": [ 1, 5, 9 ], "employment": [ { "organizationName": "physcane", "startDate": "2007-06-05", "endDate": "2011-11-05" } ] }, "msg": { "messageId": 13, "authorId": 10, "inResponseTo": 4, "senderLocation": [42.77, 78.92], "message": " dislike x-phone the voice-command is bad:(" } }
{ "user": { "id": 1, "alias": "Margarita", "name": "MargaritaStoddard", "userSince": "2012-08-20T10:10:00.000Z", "friendIds": [ 2, 3, 6, 10 ], "employment": [ { "organizationName": "Codetechno", "startDate": "2006-08-06" }, { "organizationName": "geomedia", "startDate": "2010-06-17", "endDate": "2010-01-26" } ], "nickname": "Mags", "gender": "F" }, "msg": { "messageId": 8, "authorId": 1, "inResponseTo": 11, "senderLocation": [40.33, 80.87], "message": " like ccast the 3G is awesome:)" } }
{ "user": { "id": 1, "alias": "Margarita", "name": "MargaritaStoddard", "userSince": "2012-08-20T10:10:00.000Z", "friendIds": [ 2, 3, 6, 10 ], "employment": [ { "organizationName": "Codetechno", "startDate": "2006-08-06" }, { "organizationName": "geomedia", "startDate": "2010-06-17", "endDate": "2010-01-26" } ], "nickname": "Mags", "gender": "F" }, "msg": { "messageId": 10, "authorId": 1, "inResponseTo": 12, "senderLocation": [42.5, 70.01], "message": " can't stand product-w the touch-screen is terrible" } }
{ "user": { "id": 1, "alias": "Margarita", "name": "MargaritaStoddard", "userSince": "2012-08-20T10:10:00.000Z", "friendIds": [ 2, 3, 6, 10 ], "employment": [ { "organizationName": "Codetechno", "startDate": "2006-08-06" }, { "organizationName": "geomedia", "startDate": "2010-06-17", "endDate": "2010-01-26" } ], "nickname": "Mags", "gender": "F" }, "msg": { "messageId": 11, "authorId": 1, "inResponseTo": 1, "senderLocation": [38.97, 77.49], "message": " can't stand acast its plan is terrible" } }
{ "user": { "id": 1, "alias": "Margarita", "name": "MargaritaStoddard", "userSince": "2012-08-20T10:10:00.000Z", "friendIds": [ 2, 3, 6, 10 ], "employment": [ { "organizationName": "Codetechno", "startDate": "2006-08-06" }, { "organizationName": "geomedia", "startDate": "2010-06-17", "endDate": "2010-01-26" } ], "nickname": "Mags", "gender": "F" }, "msg": { "messageId": 2, "authorId": 1, "inResponseTo": 4, "senderLocation": [41.66, 80.87], "message": " dislike x-phone its touch-screen is horrible" } }
{ "user": { "id": 1, "alias": "Margarita", "name": "MargaritaStoddard", "userSince": "2012-08-20T10:10:00.000Z", "friendIds": [ 2, 3, 6, 10 ], "employment": [ { "organizationName": "Codetechno", "startDate": "2006-08-06" }, { "organizationName": "geomedia", "startDate": "2010-06-17", "endDate": "2010-01-26" } ], "nickname": "Mags", "gender": "F" }, "msg": { "messageId": 4, "authorId": 1, "inResponseTo": 2, "senderLocation": [37.73, 97.04], "message": " can't stand acast the network is horrible:(" } }
{ "user": { "id": 2, "alias": "Isbel", "name": "IsbelDull", "userSince": "2011-01-22T10:10:00.000Z", "friendIds": [ 1, 4 ], "employment": [ { "organizationName": "Hexviafind", "startDate": "2010-04-27" } ], "nickname": "Izzy" }, "msg": { "messageId": 6, "authorId": 2, "inResponseTo": 1, "senderLocation": [31.5, 75.56], "message": " like product-z its platform is mind-blowing" } }
{ "user": { "id": 2, "alias": "Isbel", "name": "IsbelDull", "userSince": "2011-01-22T10:10:00.000Z", "friendIds": [ 1, 4 ], "employment": [ { "organizationName": "Hexviafind", "startDate": "2010-04-27" } ], "nickname": "Izzy" }, "msg": { "messageId": 3, "authorId": 2, "inResponseTo": 4, "senderLocation": [48.09, 81.01], "message": " like product-y the plan is amazing" } }
{ "user": { "id": 3, "alias": "Emory", "name": "EmoryUnk", "userSince": "2012-07-10T10:10:00.000Z", "friendIds": [ 1, 5, 8, 9 ], "employment": [ { "organizationName": "geomedia", "startDate": "2010-06-17", "endDate": "2010-01-26" } ] }, "msg": { "messageId": 9, "authorId": 3, "inResponseTo": 12, "senderLocation": [34.45, 96.48], "message": " love ccast its wireless is good" } }
{ "user": { "id": 3, "alias": "Emory", "name": "EmoryUnk", "userSince": "2012-07-10T10:10:00.000Z", "friendIds": [ 1, 5, 8, 9 ], "employment": [ { "organizationName": "geomedia", "startDate": "2010-06-17", "endDate": "2010-01-26" } ] }, "msg": { "messageId": 1, "authorId": 3, "inResponseTo": 2, "senderLocation": [47.16, 77.75], "message": " love product-b its shortcut-menu is awesome:)" } }
{ "user": { "id": 5, "alias": "Von", "name": "VonKemble", "userSince": "2010-01-05T10:10:00.000Z", "friendIds": [ 3, 6, 10 ], "employment": [ { "organizationName": "Kongreen", "startDate": "2010-11-27" } ] }, "msg": { "messageId": 7, "authorId": 5, "inResponseTo": 15, "senderLocation": [32.91, 85.05], "message": " dislike product-b the speed is horrible" } }
{ "user": { "id": 7, "alias": "Suzanna", "name": "SuzannaTillson", "userSince": "2012-08-07T10:10:00.000Z", "friendIds": [ 6 ], "employment": [ { "organizationName": "Labzatron", "startDate": "2011-04-19" } ] }, "msg": { "messageId": 15, "authorId": 7, "inResponseTo": 11, "senderLocation": [44.47, 67.11], "message": " like x-phone the voicemail-service is awesome" } }

ok,还有第三种查询形式,也可以看看

    SELECT VALUE {"user": user, "message": msg}
    FROM TinySocial.GleambookUsers user, TinySocial.GleambookMessages msg
    WHERE msg.authorId = user.id;

结果

{ "user": { "id": 6, "alias": "Willis", "name": "WillisWynne", "userSince": "2005-01-17T10:10:00.000Z", "friendIds": [ 1, 3, 7 ], "employment": [ { "organizationName": "jaydax", "startDate": "2009-05-15" } ] }, "message": { "messageId": 5, "authorId": 6, "inResponseTo": 2, "senderLocation": [34.7, 90.76], "message": " love product-b the customization is mind-blowing" } }
{ "user": { "id": 9, "alias": "Woodrow", "name": "WoodrowNehling", "userSince": "2005-09-20T10:10:00.000Z", "friendIds": [ 3, 10 ], "employment": [ { "organizationName": "Zuncan", "startDate": "2003-04-22", "endDate": "2009-12-13" } ], "nickname": "Woody" }, "message": { "messageId": 14, "authorId": 9, "inResponseTo": 12, "senderLocation": [41.33, 85.28], "message": " love acast its 3G is good:)" } }
{ "user": { "id": 10, "alias": "Bram", "name": "BramHatch", "userSince": "2010-10-16T10:10:00.000Z", "friendIds": [ 1, 5, 9 ], "employment": [ { "organizationName": "physcane", "startDate": "2007-06-05", "endDate": "2011-11-05" } ] }, "message": { "messageId": 13, "authorId": 10, "inResponseTo": 4, "senderLocation": [42.77, 78.92], "message": " dislike x-phone the voice-command is bad:(" } }
{ "user": { "id": 10, "alias": "Bram", "name": "BramHatch", "userSince": "2010-10-16T10:10:00.000Z", "friendIds": [ 1, 5, 9 ], "employment": [ { "organizationName": "physcane", "startDate": "2007-06-05", "endDate": "2011-11-05" } ] }, "message": { "messageId": 12, "authorId": 10, "inResponseTo": 6, "senderLocation": [42.26, 77.76], "message": " can't stand product-z its voicemail-service is OMG:(" } }
{ "user": { "id": 1, "alias": "Margarita", "name": "MargaritaStoddard", "userSince": "2012-08-20T10:10:00.000Z", "friendIds": [ 2, 3, 6, 10 ], "employment": [ { "organizationName": "Codetechno", "startDate": "2006-08-06" }, { "organizationName": "geomedia", "startDate": "2010-06-17", "endDate": "2010-01-26" } ], "nickname": "Mags", "gender": "F" }, "message": { "messageId": 2, "authorId": 1, "inResponseTo": 4, "senderLocation": [41.66, 80.87], "message": " dislike x-phone its touch-screen is horrible" } }
{ "user": { "id": 1, "alias": "Margarita", "name": "MargaritaStoddard", "userSince": "2012-08-20T10:10:00.000Z", "friendIds": [ 2, 3, 6, 10 ], "employment": [ { "organizationName": "Codetechno", "startDate": "2006-08-06" }, { "organizationName": "geomedia", "startDate": "2010-06-17", "endDate": "2010-01-26" } ], "nickname": "Mags", "gender": "F" }, "message": { "messageId": 4, "authorId": 1, "inResponseTo": 2, "senderLocation": [37.73, 97.04], "message": " can't stand acast the network is horrible:(" } }
{ "user": { "id": 1, "alias": "Margarita", "name": "MargaritaStoddard", "userSince": "2012-08-20T10:10:00.000Z", "friendIds": [ 2, 3, 6, 10 ], "employment": [ { "organizationName": "Codetechno", "startDate": "2006-08-06" }, { "organizationName": "geomedia", "startDate": "2010-06-17", "endDate": "2010-01-26" } ], "nickname": "Mags", "gender": "F" }, "message": { "messageId": 8, "authorId": 1, "inResponseTo": 11, "senderLocation": [40.33, 80.87], "message": " like ccast the 3G is awesome:)" } }
{ "user": { "id": 1, "alias": "Margarita", "name": "MargaritaStoddard", "userSince": "2012-08-20T10:10:00.000Z", "friendIds": [ 2, 3, 6, 10 ], "employment": [ { "organizationName": "Codetechno", "startDate": "2006-08-06" }, { "organizationName": "geomedia", "startDate": "2010-06-17", "endDate": "2010-01-26" } ], "nickname": "Mags", "gender": "F" }, "message": { "messageId": 10, "authorId": 1, "inResponseTo": 12, "senderLocation": [42.5, 70.01], "message": " can't stand product-w the touch-screen is terrible" } }
{ "user": { "id": 1, "alias": "Margarita", "name": "MargaritaStoddard", "userSince": "2012-08-20T10:10:00.000Z", "friendIds": [ 2, 3, 6, 10 ], "employment": [ { "organizationName": "Codetechno", "startDate": "2006-08-06" }, { "organizationName": "geomedia", "startDate": "2010-06-17", "endDate": "2010-01-26" } ], "nickname": "Mags", "gender": "F" }, "message": { "messageId": 11, "authorId": 1, "inResponseTo": 1, "senderLocation": [38.97, 77.49], "message": " can't stand acast its plan is terrible" } }
{ "user": { "id": 2, "alias": "Isbel", "name": "IsbelDull", "userSince": "2011-01-22T10:10:00.000Z", "friendIds": [ 1, 4 ], "employment": [ { "organizationName": "Hexviafind", "startDate": "2010-04-27" } ], "nickname": "Izzy" }, "message": { "messageId": 3, "authorId": 2, "inResponseTo": 4, "senderLocation": [48.09, 81.01], "message": " like product-y the plan is amazing" } }
{ "user": { "id": 2, "alias": "Isbel", "name": "IsbelDull", "userSince": "2011-01-22T10:10:00.000Z", "friendIds": [ 1, 4 ], "employment": [ { "organizationName": "Hexviafind", "startDate": "2010-04-27" } ], "nickname": "Izzy" }, "message": { "messageId": 6, "authorId": 2, "inResponseTo": 1, "senderLocation": [31.5, 75.56], "message": " like product-z its platform is mind-blowing" } }
{ "user": { "id": 3, "alias": "Emory", "name": "EmoryUnk", "userSince": "2012-07-10T10:10:00.000Z", "friendIds": [ 1, 5, 8, 9 ], "employment": [ { "organizationName": "geomedia", "startDate": "2010-06-17", "endDate": "2010-01-26" } ] }, "message": { "messageId": 1, "authorId": 3, "inResponseTo": 2, "senderLocation": [47.16, 77.75], "message": " love product-b its shortcut-menu is awesome:)" } }
{ "user": { "id": 3, "alias": "Emory", "name": "EmoryUnk", "userSince": "2012-07-10T10:10:00.000Z", "friendIds": [ 1, 5, 8, 9 ], "employment": [ { "organizationName": "geomedia", "startDate": "2010-06-17", "endDate": "2010-01-26" } ] }, "message": { "messageId": 9, "authorId": 3, "inResponseTo": 12, "senderLocation": [34.45, 96.48], "message": " love ccast its wireless is good" } }
{ "user": { "id": 5, "alias": "Von", "name": "VonKemble", "userSince": "2010-01-05T10:10:00.000Z", "friendIds": [ 3, 6, 10 ], "employment": [ { "organizationName": "Kongreen", "startDate": "2010-11-27" } ] }, "message": { "messageId": 7, "authorId": 5, "inResponseTo": 15, "senderLocation": [32.91, 85.05], "message": " dislike product-b the speed is horrible" } }
{ "user": { "id": 7, "alias": "Suzanna", "name": "SuzannaTillson", "userSince": "2012-08-07T10:10:00.000Z", "friendIds": [ 6 ], "employment": [ { "organizationName": "Labzatron", "startDate": "2011-04-19" } ] }, "message": { "messageId": 15, "authorId": 7, "inResponseTo": 11, "senderLocation": [44.47, 67.11], "message": " like x-phone the voicemail-service is awesome" } }

默认情况下,AsterixDB是通过Hash Join方法来实现的,在分布式环境中通过分治的方式实现并行计算。

由于AsterixDB还无法完全智能的来判断,用什么Join方式更好。因此AsterixDB还支持hint机制,来指定Join方法。

Index join(索引Join)

下面的语句实现了和上面同样的效果。但是其通过hint,来告诉AsterixDB通过使用索引来Join。

    SELECT user.name AS uname, msg.message AS message
    FROM TinySocial.GleambookUsers user, TinySocial.GleambookMessages msg
    WHERE msg.authorId /*+ indexnl */ = user.id;

结果如下:

{ "uname": "IsbelDull", "message": " like product-z its platform is mind-blowing" }
{ "uname": "MargaritaStoddard", "message": " like ccast the 3G is awesome:)" }
{ "uname": "EmoryUnk", "message": " love ccast its wireless is good" }
{ "uname": "MargaritaStoddard", "message": " can't stand product-w the touch-screen is terrible" }
{ "uname": "MargaritaStoddard", "message": " can't stand acast its plan is terrible" }
{ "uname": "BramHatch", "message": " can't stand product-z its voicemail-service is OMG:(" }
{ "uname": "WoodrowNehling", "message": " love acast its 3G is good:)" }
{ "uname": "EmoryUnk", "message": " love product-b its shortcut-menu is awesome:)" }
{ "uname": "MargaritaStoddard", "message": " dislike x-phone its touch-screen is horrible" }
{ "uname": "IsbelDull", "message": " like product-y the plan is amazing" }
{ "uname": "MargaritaStoddard", "message": " can't stand acast the network is horrible:(" }
{ "uname": "WillisWynne", "message": " love product-b the customization is mind-blowing" }
{ "uname": "VonKemble", "message": " dislike product-b the speed is horrible" }
{ "uname": "BramHatch", "message": " dislike x-phone the voice-command is bad:(" }
{ "uname": "SuzannaTillson", "message": " like x-phone the voicemail-service is awesome" }

Nested Outer Join

对于Join时出现了缺失(missing)或者悬挂(dangling)的元素时,该如何处理呢?

通过一个实例来说明吧。

如果想查看用户,以及每个用户发的消息集,可以直接使用下面的query

    SELECT user.name AS uname,
           (SELECT VALUE msg.message
            FROM TinySocial.GleambookMessages msg
            WHERE msg.authorId = user.id) AS messages
    FROM TinySocial.GleambookUsers user;

结果如下:

{ "uname": "WillisWynne", "messages": [ " love product-b the customization is mind-blowing" ] }
{ "uname": "NilaMilliron", "messages": [  ] }
{ "uname": "WoodrowNehling", "messages": [ " love acast its 3G is good:)" ] }
{ "uname": "BramHatch", "messages": [ " can't stand product-z its voicemail-service is OMG:(", " dislike x-phone the voice-command is bad:(" ] }
{ "uname": "MargaritaStoddard", "messages": [ " like ccast the 3G is awesome:)", " can't stand product-w the touch-screen is terrible", " can't stand acast its plan is terrible", " dislike x-phone its touch-screen is horrible", " can't stand acast the network is horrible:(" ] }
{ "uname": "IsbelDull", "messages": [ " like product-z its platform is mind-blowing", " like product-y the plan is amazing" ] }
{ "uname": "EmoryUnk", "messages": [ " love ccast its wireless is good", " love product-b its shortcut-menu is awesome:)" ] }
{ "uname": "NicholasStroh", "messages": [  ] }
{ "uname": "VonKemble", "messages": [ " dislike product-b the speed is horrible" ] }
{ "uname": "SuzannaTillson", "messages": [ " like x-phone the voicemail-service is awesome" ] }

ok,结果就是messages可能为空集合,用户可以自己去处理。

其实这个更像子查询,而不是Join。

Theta Join

其实除了相等关系外,Join还支持其它关系。如下面的场景:

找出每条消息的邻近的消息:

    SELECT cm1.messageText AS message,
           (SELECT VALUE cm2.messageText
            FROM TinySocial.ChirpMessages cm2
            WHERE `spatial-distance`(cm1.senderLocation, cm2.senderLocation) <= 1
              AND cm2.chirpId < cm1.chirpId) AS nearbyMessages
    FROM TinySocial.ChirpMessages cm1;

结果如下

{ "message": " can't stand x-phone its platform is terrible", "nearbyMessages": [  ] }
{ "message": " like ccast its shortcut-menu is awesome:)", "nearbyMessages": [  ] }
{ "message": " like product-w the speed is good:)", "nearbyMessages": [ " hate ccast its voice-clarity is OMG:(" ] }
{ "message": " like product-b the voice-command is mind-blowing:)", "nearbyMessages": [  ] }
{ "message": " like x-phone the voice-clarity is good:)", "nearbyMessages": [  ] }
{ "message": " like product-y the platform is good", "nearbyMessages": [  ] }
{ "message": " love ccast its voicemail-service is awesome", "nearbyMessages": [  ] }
{ "message": " love product-z its customization is good:)", "nearbyMessages": [  ] }
{ "message": " hate ccast its voice-clarity is OMG:(", "nearbyMessages": [  ] }
{ "message": " like product-y the voice-command is amazing:)", "nearbyMessages": [  ] }
{ "message": " can't stand product-w its speed is terrible:(", "nearbyMessages": [  ] }
{ "message": " like product-z the shortcut-menu is awesome:)", "nearbyMessages": [  ] }

模糊Join(Fuzzy Join)

继续举例子。如果我们想找,对于Gleambook的每个用户,Chirp中有哪些名字相似的用户?

这里就用到了模糊查询。AsterixDB内置了一些文本和集合的模糊函数。

具体SQL++如下:

    USE TinySocial;
    SET simfunction "edit-distance";
    SET simthreshold "3";

    SELECT gbu.id AS id, gbu.name AS name,
           (SELECT cm.user.screenName AS chirpScreenname,
                   cm.user.name AS chirpName
            FROM ChirpMessages cm
            WHERE cm.user.name ~= gbu.name) AS similarUsers
    FROM GleambookUsers gbu;

结果如下:

    { "id": 6, "name": "WillisWynne", "similarUsers": [  ] }
    { "id": 8, "name": "NilaMilliron", "similarUsers": [ { "chirpScreenname": "NilaMilliron_tw", "chirpName": "Nila Milliron" } ] }
    { "id": 9, "name": "WoodrowNehling", "similarUsers": [  ] }
    { "id": 10, "name": "BramHatch", "similarUsers": [  ] }
    { "id": 1, "name": "MargaritaStoddard", "similarUsers": [  ] }
    { "id": 2, "name": "IsbelDull", "similarUsers": [  ] }
    { "id": 3, "name": "EmoryUnk", "similarUsers": [  ] }
    { "id": 4, "name": "NicholasStroh", "similarUsers": [  ] }
    { "id": 5, "name": "VonKemble", "similarUsers": [  ] }
    { "id": 7, "name": "SuzannaTillson", "similarUsers": [  ] }

Existential Quantification

Query:

    USE TinySocial;

    SELECT VALUE gbu
    FROM GleambookUsers gbu
    WHERE (SOME e IN gbu.employment SATISFIES e.endDate IS UNKNOWN);

结果:

{ "id": 6, "alias": "Willis", "name": "WillisWynne", "userSince": "2005-01-17T10:10:00.000Z", "friendIds": [ 1, 3, 7 ], "employment": [ { "organizationName": "jaydax", "startDate": "2009-05-15" } ] }
{ "id": 8, "alias": "Nila", "name": "NilaMilliron", "userSince": "2008-01-01T10:10:00.000Z", "friendIds": [ 3 ], "employment": [ { "organizationName": "Plexlane", "startDate": "2010-02-28" } ] }
{ "id": 1, "alias": "Margarita", "name": "MargaritaStoddard", "userSince": "2012-08-20T10:10:00.000Z", "friendIds": [ 2, 3, 6, 10 ], "employment": [ { "organizationName": "Codetechno", "startDate": "2006-08-06" }, { "organizationName": "geomedia", "startDate": "2010-06-17", "endDate": "2010-01-26" } ], "nickname": "Mags", "gender": "F" }
{ "id": 2, "alias": "Isbel", "name": "IsbelDull", "userSince": "2011-01-22T10:10:00.000Z", "friendIds": [ 1, 4 ], "employment": [ { "organizationName": "Hexviafind", "startDate": "2010-04-27" } ], "nickname": "Izzy" }
{ "id": 4, "alias": "Nicholas", "name": "NicholasStroh", "userSince": "2010-12-27T10:10:00.000Z", "friendIds": [ 2 ], "employment": [ { "organizationName": "Zamcorporation", "startDate": "2010-06-08" } ] }
{ "id": 5, "alias": "Von", "name": "VonKemble", "userSince": "2010-01-05T10:10:00.000Z", "friendIds": [ 3, 6, 10 ], "employment": [ { "organizationName": "Kongreen", "startDate": "2010-11-27" } ] }
{ "id": 7, "alias": "Suzanna", "name": "SuzannaTillson", "userSince": "2012-08-07T10:10:00.000Z", "friendIds": [ 6 ], "employment": [ { "organizationName": "Labzatron", "startDate": "2011-04-19" } ] }

Universal Quantification

Query:

    USE TinySocial;

    SELECT VALUE gbu
    FROM GleambookUsers gbu
    WHERE (EVERY e IN gbu.employment SATISFIES e.endDate IS NOT UNKNOWN);

结果

{ "id": 9, "alias": "Woodrow", "name": "WoodrowNehling", "userSince": "2005-09-20T10:10:00.000Z", "friendIds": [ 3, 10 ], "employment": [ { "organizationName": "Zuncan", "startDate": "2003-04-22", "endDate": "2009-12-13" } ], "nickname": "Woody" }
{ "id": 10, "alias": "Bram", "name": "BramHatch", "userSince": "2010-10-16T10:10:00.000Z", "friendIds": [ 1, 5, 9 ], "employment": [ { "organizationName": "physcane", "startDate": "2007-06-05", "endDate": "2011-11-05" } ] }
{ "id": 3, "alias": "Emory", "name": "EmoryUnk", "userSince": "2012-07-10T10:10:00.000Z", "friendIds": [ 1, 5, 8, 9 ], "employment": [ { "organizationName": "geomedia", "startDate": "2010-06-17", "endDate": "2010-01-26" } ] }

Simple Aggregation

Query:

    USE TinySocial;

    SELECT COUNT(gbu) AS numUsers FROM GleambookUsers gbu;

结果

{ "numUsers": 10 }

也可以只输出其中的值

Query:

    USE TinySocial;

    SELECT VALUE COUNT(gbu) FROM GleambookUsers gbu;

结果

10

或者

    USE TinySocial;

    SELECT VALUE ARRAY_COUNT((SELECT gbu FROM GleambookUsers gbu));

结果

10

Grouping and Aggregation

Query:

    USE TinySocial;

    SELECT uid AS user, COUNT(cm) AS count
    FROM ChirpMessages cm
    GROUP BY cm.user.screenName AS uid;

结果

{ "user": "ChangEwing_573", "count": 1 }
{ "user": "OliJackson_512", "count": 1 }
{ "user": "ColineGeyer@63", "count": 3 }
{ "user": "NathanGiesen@211", "count": 6 }
{ "user": "NilaMilliron_tw", "count": 1 }

(Hash-Based) Grouping and Aggregation

Query:

    USE TinySocial;

    SELECT uid AS user, COUNT(cm) AS count
    FROM ChirpMessages cm
     /*+ hash */
    GROUP BY cm.user.screenName AS uid;

结果:

{ "user": "OliJackson_512", "count": 1 }
{ "user": "ChangEwing_573", "count": 1 }
{ "user": "NathanGiesen@211", "count": 6 }
{ "user": "NilaMilliron_tw", "count": 1 }
{ "user": "ColineGeyer@63", "count": 3 }

Grouping and Limits

Query:

    USE TinySocial;

    SELECT uid AS user, c AS count
    FROM ChirpMessages cm
    GROUP BY cm.user.screenName AS uid WITH c AS count(cm)
    ORDER BY c DESC
    LIMIT 3;

结果:

{ "user": "NathanGiesen@211", "count": 6 }
{ "user": "ColineGeyer@63", "count": 3 }
{ "user": "ChangEwing_573", "count": 1 }

Left Outer Fuzzy Join

Query:

    USE TinySocial;
    SET simfunction "jaccard";
    SET simthreshold "0.3";

    SELECT cm1 AS chirp,
           (SELECT VALUE cm2.chirpId
            FROM ChirpMessages cm2
            WHERE cm2.referredTopics ~= cm1.referredTopics
              AND cm2.chirpId > cm1.chirpId) AS similarChirps
    FROM ChirpMessages cm1;

结果

{ "chirp": { "chirpId": "11", "user": { "screenName": "NilaMilliron_tw", "lang": "en", "friendsCount": 445, "statusesCount": 164, "name": "Nila Milliron", "followersCount": 22649 }, "senderLocation": [37.59, 68.42], "sendTime": "2008-03-09T10:10:00.000Z", "referredTopics": [ "x-phone", "platform" ], "messageText": " can't stand x-phone its platform is terrible" }, "similarChirps": [ "6", "7" ] }
{ "chirp": { "chirpId": "2", "user": { "screenName": "ColineGeyer@63", "lang": "en", "friendsCount": 121, "statusesCount": 362, "name": "Coline Geyer", "followersCount": 17159 }, "senderLocation": [32.84, 67.14], "sendTime": "2010-05-13T10:10:00.000Z", "referredTopics": [ "ccast", "shortcut-menu" ], "messageText": " like ccast its shortcut-menu is awesome:)" }, "similarChirps": [ "9", "8" ] }
{ "chirp": { "chirpId": "3", "user": { "screenName": "NathanGiesen@211", "lang": "en", "friendsCount": 39339, "statusesCount": 473, "name": "Nathan Giesen", "followersCount": 49416 }, "senderLocation": [29.72, 75.8], "sendTime": "2006-11-04T10:10:00.000Z", "referredTopics": [ "product-w", "speed" ], "messageText": " like product-w the speed is good:)" }, "similarChirps": [ "5" ] }
{ "chirp": { "chirpId": "4", "user": { "screenName": "NathanGiesen@211", "lang": "en", "friendsCount": 39339, "statusesCount": 473, "name": "Nathan Giesen", "followersCount": 49416 }, "senderLocation": [39.28, 70.48], "sendTime": "2011-12-26T10:10:00.000Z", "referredTopics": [ "product-b", "voice-command" ], "messageText": " like product-b the voice-command is mind-blowing:)" }, "similarChirps": [  ] }
{ "chirp": { "chirpId": "6", "user": { "screenName": "ColineGeyer@63", "lang": "en", "friendsCount": 121, "statusesCount": 362, "name": "Coline Geyer", "followersCount": 17159 }, "senderLocation": [47.51, 83.99], "sendTime": "2010-05-07T10:10:00.000Z", "referredTopics": [ "x-phone", "voice-clarity" ], "messageText": " like x-phone the voice-clarity is good:)" }, "similarChirps": [  ] }
{ "chirp": { "chirpId": "7", "user": { "screenName": "ChangEwing_573", "lang": "en", "friendsCount": 182, "statusesCount": 394, "name": "Chang Ewing", "followersCount": 32136 }, "senderLocation": [36.21, 72.6], "sendTime": "2011-08-25T10:10:00.000Z", "referredTopics": [ "product-y", "platform" ], "messageText": " like product-y the platform is good" }, "similarChirps": [  ] }
{ "chirp": { "chirpId": "9", "user": { "screenName": "NathanGiesen@211", "lang": "en", "friendsCount": 39339, "statusesCount": 473, "name": "Nathan Giesen", "followersCount": 49416 }, "senderLocation": [36.86, 74.62], "sendTime": "2012-07-21T10:10:00.000Z", "referredTopics": [ "ccast", "voicemail-service" ], "messageText": " love ccast its voicemail-service is awesome" }, "similarChirps": [  ] }
{ "chirp": { "chirpId": "1", "user": { "screenName": "NathanGiesen@211", "lang": "en", "friendsCount": 39339, "statusesCount": 473, "name": "Nathan Giesen", "followersCount": 49416 }, "senderLocation": [47.44, 80.65], "sendTime": "2008-04-26T10:10:00.000Z", "referredTopics": [ "product-z", "customization" ], "messageText": " love product-z its customization is good:)" }, "similarChirps": [ "8" ] }
{ "chirp": { "chirpId": "10", "user": { "screenName": "ColineGeyer@63", "lang": "en", "friendsCount": 121, "statusesCount": 362, "name": "Coline Geyer", "followersCount": 17159 }, "senderLocation": [29.15, 76.53], "sendTime": "2008-01-26T10:10:00.000Z", "referredTopics": [ "ccast", "voice-clarity" ], "messageText": " hate ccast its voice-clarity is OMG:(" }, "similarChirps": [ "2", "6", "9" ] }
{ "chirp": { "chirpId": "12", "user": { "screenName": "OliJackson_512", "lang": "en", "friendsCount": 445, "statusesCount": 164, "name": "Oli Jackson", "followersCount": 22649 }, "senderLocation": [24.82, 94.63], "sendTime": "2010-02-13T10:10:00.000Z", "referredTopics": [ "product-y", "voice-command" ], "messageText": " like product-y the voice-command is amazing:)" }, "similarChirps": [ "4", "7" ] }
{ "chirp": { "chirpId": "5", "user": { "screenName": "NathanGiesen@211", "lang": "en", "friendsCount": 39339, "statusesCount": 473, "name": "Nathan Giesen", "followersCount": 49416 }, "senderLocation": [40.09, 92.69], "sendTime": "2006-08-04T10:10:00.000Z", "referredTopics": [ "product-w", "speed" ], "messageText": " can't stand product-w its speed is terrible:(" }, "similarChirps": [  ] }
{ "chirp": { "chirpId": "8", "user": { "screenName": "NathanGiesen@211", "lang": "en", "friendsCount": 39339, "statusesCount": 473, "name": "Nathan Giesen", "followersCount": 49416 }, "senderLocation": [46.05, 93.34], "sendTime": "2005-10-14T10:10:00.000Z", "referredTopics": [ "product-z", "shortcut-menu" ], "messageText": " like product-z the shortcut-menu is awesome:)" }, "similarChirps": [  ] }

删除数据

删除一条数据

    USE TinySocial;
    DELETE FROM ChirpMessages cm WHERE cm.chirpId = "13";

更新数据

Sorry,这个AsterixDB不支持。请先删除,再插入。

事务支持

AsterixDB支持单对象的ACID事务操作;不支持多statement的事务操作。

多statement的事务回滚操作,需要应用自身来保证。

批量导入数据

简单格式的数据文件,可以直接导入到AsterixDB中。

如:下面将gbu.adm文件导入数据库中。

    USE TinySocial;

    LOAD DATASET GleambookUsers USING localfs
        (("path"="nc1://home/user/gbu.adm"),("format"="adm"));

参考

评论

发表评论

validate