您的位置 首页 java

亿级别记录的mongodb分页查询java代码实现

1.准备环境

1.1 mongodb下载

1.2 mongodb 启动

C:mongodbbinmongod –dbpath D:mongodbdata

1.3 可视化mongo工具Robo 3T下载

2.准备数据

 <dependency>
 <groupId>org.mongodb</groupId>
 <artifactId>mongo-java-driver</artifactId>
 <version>3.6.1</version>
 </dependency>
 

java代码执行

 public static void main(String[] args) {
 try {
 /**** Connect to MongoDB ****/ // Since 2.10.0, uses MongoClient
 MongoClient mongo = new MongoClient("localhost", 27017);
 /**** Get database ****/ // if database doesn't exists, MongoDB will create it for you
 DB db = mongo.getDB("www");
 /**** Get collection / table from 'testdb' ****/ // if collection doesn't exists, MongoDB will create it for you
 DBCollection table = db.getCollection("person");
 /**** Insert ****/ // create a document to store key and value
 BasicDBObject document=null;
 
 for(int i=0;i<100000000;i++) {
 document = new BasicDBObject();
 document.put("name", "mkyong"+i);
 document.put("age", 30);
 document.put("sex", "f");
 table.insert(document);
 }
 /**** Done ****/ System.out.println("Done");
 } catch (UnknownHostException e) {
 e.printStackTrace();
 } catch (MongoException e) {
 e.printStackTrace();
 }
 }
 

3.分页查询

传统的limit方式当数据量较大时查询缓慢,不太适用。考虑别的方式,参考了logstash-input-mongodb的思路:

 public
 def get_cursor_for_collection(mongodb, mongo_collection_name, last_id_object, batch_size)
 collection = mongodb.collection(mongo_collection_name)
 # Need to make this sort by date in object id then get the first of the series
 # db.events_20150320.find().limit(1).sort({ts:1})
 return collection.find({:_id => {:$gt => last_id_object}}).limit(batch_size)
 end
 collection_name = collection[:name]
 @logger.debug("collection_data is: #{@collection_data}")
 last_id = @collection_data[index][:last_id]
 #@logger.debug("last_id is #{last_id}", :index => index, :collection => collection_name)
 # get batch of events starting at the last_place if it is set
 last_id_object = last_id
 if since_type == 'id'
 last_id_object = BSON::ObjectId(last_id)
 elsif since_type == 'time'
 if last_id != ''
 last_id_object = Time.at(last_id)
 end
 end
 cursor = get_cursor_for_collection(@mongodb, collection_name, last_id_object, batch_size)
 

使用java实现

import java.net.UnknownHostException;
import java.util.List;
import org.bson.types.ObjectId;
import com.mongodb.BasicDBObject;
import com.mongodb.DB;
import com.mongodb.DBCollection;
import com.mongodb.DBCursor;
import com.mongodb.DBObject;
import com.mongodb.MongoClient;
import com.mongodb.MongoException;
public class Test {
 public static void main(String[] args) {
 int pageSize=50000;
 try {
 /**** Connect to MongoDB ****/ // Since 2.10.0, uses MongoClient
 MongoClient mongo = new MongoClient("localhost", 27017);
 /**** Get database ****/ // if database doesn't exists, MongoDB will create it for you
 DB db = mongo.getDB("www");
 /**** Get collection / table from 'testdb' ****/ // if collection doesn't exists, MongoDB will create it for you
 DBCollection table = db.getCollection("person");
 DBCursor dbObjects; 
 Long cnt=table.count();
 //System.out.println(table.getStats());
 Long page=getPageSize(cnt,pageSize);
 ObjectId lastIdObject=new ObjectId("5bda8f66ef2ed979bab041aa");
 
 for(Long i=0L;i<page;i++) {
 Long start=System.currentTimeMillis();
 dbObjects=getCursorForCollection(table, lastIdObject, pageSize);
 System.out.println("第"+(i+1)+"次查询,耗时:"+(System.currentTimeMillis()-start)/1000+"秒");
 List<DBObject> objs=dbObjects.toArray();
 lastIdObject=(ObjectId) objs.get(objs.size()-1).get("_id");
 
 } 
 } catch (UnknownHostException e) {
 e.printStackTrace();
 } catch (MongoException e) {
 e.printStackTrace();
 }
 
 }
 
 public static DBCursor getCursorForCollection(DBCollection collection,ObjectId lastIdObject,int pageSize) {
 DBCursor dbObjects=null;
 if(lastIdObject==null) {
 lastIdObject=(ObjectId) collection.findOne().get("_id"); //TODO 排序sort取第一个,否则可能丢失数据
 }
 BasicDBObject query=new BasicDBObject();
 query.append("_id",new BasicDBObject("$gt",lastIdObject));
 BasicDBObject sort=new BasicDBObject();
 sort.append("_id",1);
 dbObjects=collection.find(query).limit(pageSize).sort(sort);
 return dbObjects;
 }
 
 public static Long getPageSize(Long cnt,int pageSize) {
 return cnt%pageSize==0?cnt/pageSize:cnt/pageSize+1;
 }
}
 

4.一些经验教训

1. 不小心漏打了一个$符号,导致查询不到数据,浪费了一些时间去查找原因

query.append("_id",new BasicDBObject("$gt",lastIdObject));
 2.创建索引
  创建普通的单列索引:db.collection.ensureIndex({field:1/-1}); 1是升续 -1是降续
 实例:db.articles.ensureIndex({title:1}) //注意 field 不要加""双引号,否则创建不成功
  查看当前索引状态: db.collection.getIndexes();
  实例:
  db.articles.getIndexes();
  删除单个索引db.collection.dropIndex({filed:1/-1});
 

3.执行计划

db.student.find({“name”:”dd1″}).explain()

参考文献:

【1】

【2】

【3】

文章来源:智云一二三科技

文章标题:亿级别记录的mongodb分页查询java代码实现

文章地址:https://www.zhihuclub.com/187272.shtml

关于作者: 智云科技

热门文章

发表回复

您的电子邮箱地址不会被公开。

网站地图